In my previous blog post we covered the top 3 mistakes managing image labeling teams for computer vision projects. If you do everything correctly and set up your labeling team(s) with clear labeling instructions, enough labeled examples, and you assess their performance periodically, is there still a chance something could go wrong? Absolutely.
If you don’t take this into account from the beginning, debugging will be much more difficult and it can take a while before you find out where the problem is. Again, the quantity and quality of the data used to train machine learning models is the biggest determining factor for performance.
Here are the main things to watch out for in your labeled image data before training computer vision models:
1) Check For Systematic Errors
There are two kinds of errors: random errors and systematic errors.
Random errors are common and may happen occasionally due to a variety of factors. For example, a labeler may have miss-clicked the object category within the labeling tool, resulting in an incorrect label. A labeler could also have come across a difficult one-of-a-kind edge case and made one or more mistakes labeling it.
There are several ways this could happen, but they are usually random errors. Given enough data, random examples have low statistical significance and may be safely ignored (still, try to have as pristine a dataset as possible).
Systematic errors, on the other hand, are consistent labeling mistakes. Again, this could happen in a variety of ways. For example, suppose you have a labeling team of 7 people each labeling 1,000 images for a total of 7,000 images. Each image has a high object density – meaning there are several individual objects in any given image. One of the labelers may have misunderstood or misread the labeling instructions. This could result in too large or too small bounding boxes, incorrect object classifications, labeling objects that should not be labeled, etc.
If 1,000 out of 7,000 images are consistently labeled incorrectly, that’s ~14% of your total data. Will this impact your model’s performance? Yes. Why? Because the model will learn to match this data during training. Inconsistent training examples will “confuse” the model and most likely result in undesired behavior or suboptimal performance.
Before training models, always have a labeling QA team verify that your labeling standards are met. This will save a lot of headaches and time.
2) Check For Out-Of-Bounds Values
Your labeling team(s) may be using an image labeling tool that does not allow drawing boxes outside of the bounds of an image. However, generally this may not be the case, especially when using third-party labeling vendors.
We’ve had cases with clients where the model seems to perform well during training and evaluation, but exhibit strange behavior at inference time. We had a hard time finding out what was the issue. Example questions in these cases could be: “Is it the hyperparameter configuration file?”, “Is it the TensorFlow Lite version not converting custom ops correctly?”, etc.
However, it turned out the issue had nothing to do with any of that. After close examination of the label files, particularly the spatial coordinates and sizes of the bounding boxes, it turned out there were many negative values, values outside the bounds of the images, and even values orders of magnitude larger than the image dimensions! (This one is crazy). Imagine feeding this kind of data into your model for training. Something is most likely going to go wrong, and it did.
We did not catch this at first because going through the labeled images with the labeling tool visually showed that everything was as we expected. The labeling tool, from a UI perspective, was designed to only show boxes within image bounds. The problem was in how the values were written into the label files behind the scenes. Once we fixed these out-of-bounds values in the label files, including discarding some files altogether, our model’s performance went through the roof and produced the desired results.
We made the mistake of trusting the labeled data because it came from vendors with a history of delivering great quality data for other use cases. But even they cannot fully control the individual human decisions of their labelers, so extra care is required. Also, they may be using a custom-built labeling tool, which may have its own bugs and other issues.
Make sure your computer vision pipeline contains code to check for these things after you receive the labeled data, and before training your models. A certain level of ‘paranoia’ about the quality of the labeled data is important to minimize the occurrence of these situations.
Once you have these ‘safety’ checks in place, your computer vision pipeline will run much more smoothly, it will be much more reliable, and you will have higher predictability for model performance.
If you need help to accelerate your company’s machine learning efforts, or if you need help getting started with enterprise AI adoption, send me a LinkedIn message or email me at email@example.com and I will be happy to help you.
Subscribe to this blog to get the latest tactics and strategies to thrive in this new era of AI and machine learning.
Subscribe to my YouTube channel for business AI video tutorials and technical hands-on tutorials.
Client case studies and testimonials: https://gradientgroup.ai/enterprise-case-studies/
Follow me on LinkedIn for more content: linkedin.com/in/CarlosLaraAI