26

I'm working in image processing (mainly OCR) and I wonder how I should integrate unit tests in my development.

I'm already using unit tests for more "common" type of code but when dealing with image processing code I'm not sure how to deal with it. This kind of code always need some image data input/output and mocking this is not obvious. For now I'm mostly doing integration tests but they take a while to run and I would like some ideas on how to break down this kind of code into unit tests so that I can run them more quickly.

Edit: Analyzing a character can go through many steps involving multiple rotation, scaling and morphological operations. These steps change often as the algorithm is being developed. Thus the input and expected output can evolve a lot while testing. Each character can be 100x100 pixels so hardcoding them in the code or working with generated data is out of question.

6
  • Can you sketch an example of a function where you have trouble creating a unit test?
    – Doc Brown
    Commented Sep 28, 2012 at 7:43
  • 1
    Too short for a real answer and not really unit-testing: We're hand-processing data (as in: go through a high number of sample - I usually go beyond 1000 for such classification tasks, but it depends on your overall sample size) and comparing the final results to the hand-processed data automatically. I've set up a small framework to do this, it'll go open source in a few weeks, but this is the description - you could clone the process: birgitplays.wordpress.com/2012/09/15/…
    – Birgit P.
    Commented Oct 1, 2012 at 5:01
  • For your example, you could easily test rotation, scaling etc as small units of tests. Rotating a given image 45 degrees should not change much. This also goes for scaling and morphological operations. Testing something where the expected output evolves during implementation is, however, hard. You could try to make a quality measure, and say quality >= some_quality. To make sure your quality is not degrading, but this might also be hard. Other than that, all you can do is have tests which proves you underlying parts are not broken. Like scale/rotate/etc.
    – martiert
    Commented Oct 1, 2012 at 8:20
  • @martiert: I'm not testing rotation, scaling, etc as I calling these from a 3rd library which I believe is well-tested. The OCR algorithm is composed of many of these operation. But as you say, testing something where an output evolves is hard. Maybe it's a good warning we don't have the choice but to depend on integration tests...
    – rold2007
    Commented Oct 1, 2012 at 23:04
  • @Birgit P.: Interesting solution. As you say it is still integration testing. Having a framework like yours would help setting up these tests faster but they won't run faster...
    – rold2007
    Commented Oct 1, 2012 at 23:06

1 Answer 1

20

I work with video recording/analytics/streaming software and we faced a very similar problem. Below was our solution, not sure how it'll work out long-term but for now it seems to work.

Save input/output images as resources in your unit test project. Then have unit test verify that when a specific input is given, that specific output is produced.

9/10 times when you refactor the code and add other functionality, you would expect the behavior of your image handling routines not to change, so if all of a sudden unit tests start failing, it's likely due to an error.

On the other hand, if you make changes in the actual algorithm, that will also result in unit test failure. In this case, you would have to manually/visually verify that the results are correct and if they look good, then update the image resources to make the unit test pass again.

In our project, we ended up developing "fake" (or mock if you will) video sources, that can feed us data both for input and output. But the data itself is not fake, it was actually captured using helper data recording classes from a running system when we ran manual tests and verified that everything was working.

4
  • Agree, its OK to rely on some concrete files in your tests when you are testing routines working with files (you see it more ofter with integration tests).
    – Kemoda
    Commented Sep 28, 2012 at 6:43
  • 1
    If you run some input through the whole processing chain and then check the output, you're not unit-testing but integration-testing.
    – tdammers
    Commented Sep 28, 2012 at 9:25
  • @tdammers: I never said to run it through the entire chain. Run some input through one "unit", not the whole chain. And sure if the output of that happens to be something other than images, then you only need to have input saved as image resources.
    – DXM
    Commented Sep 28, 2012 at 14:51
  • 1
    @DXM: I understand your solution but I think we might not have the same constraints. My input/output data changes a lot while the algorithm is developed. How do you cope with these regular changes ? In OCR I can have over 99% accuracy so testing on only a couple of images can give me a false feeling of success while the integration tests might tell me later that I actually worsened the algorithm...
    – rold2007
    Commented Sep 30, 2012 at 23:38

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.