IT leader boosts OCR AI accuracy with 3.5M transcriptions and 30k image annotations in 32 languages

IT leader turns to TrainAI by RWS to help enhance the accuracy and language capabilities of its OCR engine.
shape dots shape dots

A large IT company needed to improve the accuracy of its OCR AI engine and add support for new languages.

The client had an unstructured image dataset that required transcriptions and annotations to be meticulously completed in 32 different languages with a high level of accuracy.

The unstructured image dataset included images of text with letters and characters arranged in different shapes and alignments, some of which were partially obstructed, blurry, and more.


  • The unstructured image dataset included round and vertical text, partially blocked characters, and unclear letters, among other issues
  • Data annotations, including boxing and transcribing in 32 languages with a high degree of accuracy, were required


  • Data annotation and labelling
  • TrainAI developed a custom annotation program with rigorous annotator training and certification
  • Instead of crowdsourcing and hoping for the best, we SmartSourced skilled annotators for the project from our TrainAI community of AI data specialists


  • 500+ AI data specialists from our TrainAI community completed:
    • 3.5 million transcriptions
    • 30,000 image annotations
    • In 32 languages
    • With greater than 99% data accuracy
  • Based on these results, the project scope was expanded