If you consider the image and video annotation to be “routine” labeling, you’ve probably never done it yourself. This work entails categorization of the highest order – requiring art, design, and communication skills – and is generally a thankless effort. That said, such annotation is foundationally necessary for the interpretation, search, and retrieval, of images and videos for industries like transportation, healthcare, agriculture, etc.
Grand View Research predicts that the data annotation market (currently valued at approx. $500 million) will grow at an average rate of over 27% per year for the next seven years. Considering the many use cases of image and video annotation, this isn’t surprising.
The very nature of this growth highlights the need to reform the annotation process. To begin improving the process, it’s helpful to understand the core challenges of effective image and video annotation. We’ll start with image annotation.
Image Annotation Challenges
Image annotation involves labeling images with keys, tags, classifications, names, and a plethora of other information. This labeling is done to facilitate the understanding of the image by viewers. The process is complex due to the availability of resources as well as the nature of the process
- So much to Consider About the Annotation Input
What is the scale of interest? If only using a local feature, how large does it need to be for the algorithm to learn from it? How much of the image is used as the annotation input? In other words, which aspects are considered important and which are not?
What are the boundaries of each object to be labeled (the bounding box)? Are there lots of them or just one or two? How accurate does it have to be? And how should they overlap with neighboring objects? It is imperative to answer all such questions to properly scope and execute the annotation process, which even at its most simple, has a fundamental level of complexity.
- Poorly Supervised Learning Can Ruin the Day
Image Annotation is challenging because it’s difficult to train a computer program with utmost supervision. Perhaps the best way to explain this challenge is with the concept of AI drift. Poorly supervised learning can cause the issue of AI drift, where the system may end up with nothing but a few blurry dots. As such, the input data might contain some noise or other factors for which the system hasn’t been trained to account for or ultimately identifies as a false pattern, thus, leading to misclassification.
- Image Annotation — an AI-Complete/AI-Hard problem
Image Annotation is considered to be an AI-complete/AI-hard problem. This means that a homogenous machine can’t solve it, and multiple levels within the problem make it complex. For instance, when we talk about image annotation, we refer to a process where the most straightforward goal is to identify what exists within an image and how much of it is there.
This would include labeling each object in the image with neighborhood size, label quantization (as to whether or not something exists), identifying edges and shapes of objects/objects within the foreground scene, etc. There are multiple levels of detection within this process, making it difficult for machines as they have limited capacity to work toward one task at a time.
- Automated Image Annotation is Still Far from Perfect
Although automation is the key to progress in this field, it’s still not a stand-alone or complete approach. Through supervised learning, automated annotation is capable of detecting the label quantization and that, too, with a certain (somewhat reliable) level of accuracy, however, it’s extremely difficult to scale the underlying algorithms and train them for dynamic labeling.
Why are Image and Video Annotation Challenging and Complex?
- Again, an AI-Complete/AI-Hard Problem
A video, like an image, is a spatial and temporal sequence of data. In order to analyze a video properly, one would need to identify the objects in the scene at specific frames and label their appearance and behavior over time.
Furthermore, one can’t teach the algorithm everything about the real world by only providing it with observation data (no supervision). To help the algorithm, learn from what it didn’t directly experience, one must also encode a lot of knowledge into the model, which is a sophisticated and tedious process.
- Frame-by-Frame Annotation of a Video is Complicated
Even with a human annotating videos frame-by-frame and with full monitoring and supervision, it would still be difficult to do. A person has limited capability to handle all the information they need to represent in the task at hand.
The process of watching the video over and over again, going back and forth between the video and the annotations file where the person would write down what they see at each step of the way is troublesome, let alone complex. And as far as automation is concerned, it’s currently far from being considered an entirely viable stand-alone option.
- Still Knocking at the Doors of Perfection
Perfection does exist in the annotation world because the consequences of an incompetent process are manifold. As for video annotation, one can make use of references, but it is still considered a weak method because one only has the chance to find one instance of an object along with its label.
This, in truth, is still enough to train a machine, but it’s not enough to be implemented in real-world scenarios. Perhaps if this information can be forwarded to other systems for further analysis, greater possibilities exist to make this practice increasingly useful.
- Workforce Issues
The workforce required to generate large amounts of training data can be massive in scale. In addition, the usual difficulty of adding annotations to a video and the associated opportunities for mistakes make it a risky business altogether, which is a further burden on employees.
- AI Drift Problem and the Vitality of Human Insight
When an automated system is used to annotate videos, it often drifts into misclassification or labels an object as right or wrong when it’s not, leading to a huge loss of accuracy due to false positives or negatives. However, if experienced human modelers supervise the process, the system can learn and improve.
Annotations are a prerequisite for any AI system that wants to use images or videos to make intelligent decisions and take resulting actions. The complexity of the procedure and the difficulty of annotating without supervision are some of the most fundamental issues that are still present in this field. Reach out to our annotation experts to ensure the success of your AI/ML initiatives.
For more context, read: The Importance of Scale and Speed in the Era of AI and ML