IT leader boosts OCR AI accuracy with 3.5M transcriptions and 30k image annotations in 32 languages
A large IT company needed to improve the accuracy of its OCR AI engine and add support for new languages.
The client had an unstructured image dataset that required transcriptions and annotations to be meticulously completed in 32 different languages with a high level of accuracy.
The unstructured image dataset included images of text with letters and characters arranged in different shapes and alignments, some of which were partially obstructed, blurry, and more.
- The unstructured image dataset included round and vertical text, partially blocked characters, and unclear letters, among other issues
- Data annotations, including boxing and transcribing in 32 languages with a high degree of accuracy, were required
- 500+ AI data specialists from our TrainAI community completed:
- 3.5 million transcriptions
- 30,000 image annotations
- In 32 languages
- With greater than 99% data accuracy
- Based on these results, the project scope was expanded