Overcoming the challenges of machine translation for long-tail languages

Andrea Stevens 17 Mar 2021 6 min read
SDL RWS Machine translation long tail
Even in the middle of a pandemic with serious effects on the economic climate, globalization shows no sign of letting up. On the contrary, international collaboration and joint partnerships are more important than ever. Just take the development and roll-out of successful vaccines as an example – none of this could happen without businesses and people working together across borders and language divides.
Global collaboration and communication matter for all businesses and machine translation is playing an increasingly important role in enabling businesses to speak to their customers in their own language.

Of course, this also means communicating in languages that are considered long-tail or niche languages for machine translation. Long-tail languages are languages that are less frequently localized and have therefore not been the main focus for machine translation or post-editing.

Why is that? There are several reasons for this. Long-tail languages refer to a very diverse group of languages, some with just a few thousand and others with millions of speakers. But there are also commonalities which can be blockers for a successful MT strategy.

SDL RWS Machine translation long tail

Low data resources
Leading commercial languages such as English, FIGS, Dutch or Portuguese are frequently localized and have huge data resources which can be used for building or enhancing machine translation capabilities. Long-tail languages often have low data resources or potentially lower quality data which can have a negative effect on machine translation quality.

Little Post-Editing experience
Throwing Post-Editing into the mix adds another layer of complexity. Freelance markets are often small and very conservative with no previous exposure to MT or Post-Editing.

Lack of market awareness in the translation industry
Many of our global customers require fast and effective localization solutions for a large number of language pairs, including long-tail languages but very few Language Service Providers have the know-how to penetrate these markets and successfully introduce machine translation and Post-Editing.

How do we address these challenges?

Understanding and acknowledging the challenges that long-tail languages pose for MTPE (Machine Translation Post-Editing) adoption in terms of technology, translation resources and local market know-how is the first step in building a successful strategy.

Deep MT experience is the second. NMT technology has already proven to be a game changer for previously incredibly challenging languages such as Japanese or Russian and is now key to paving the way for long-tail languages. It has transformed our approach to developing direct models for a number of long-tail languages, producing tangible quality improvements confirmed by translators. And thanks to powerful enhancements such as any-to-any translations, where English is used as a pivot language behind the scenes, we can continue to expand our footprint for these languages. This is especially relevant in terms of regional commercial developments that are increasing the need for more specialist MT. A good example is the 2020 Regional Comprehensive Economic Partnership (RCEP) free trade agreement between 15 Asia-Pacific nations which accounts for about 30% of the world's population.

Our third building block is our Post-Editing expertise. SDL have been committed to sharing knowledge and spreading the word about machine translation right from the start, not only within our organization but also with our freelance community through trainings, collaterals and our very own Post-Editing certification course. With NMT technology moving very quickly, our popular course is currently undergoing an update to include the latest developments as well as real-life examples from a wide range of languages.

And last but not least we have a dedicated Global Language Office handling all external – primarily long-tail – languages on a 24/7 basis, offering specialist support to the rest of the organization and managing the all-important vendor relationships with a view to introducing MT in a sustainable fashion.

SDL RWS Machine translation long tail

Rodolfo Lima, Operations Manager for the Global Language Office, describes our holistic approach:

“Thanks to the Global Language Office, we now have a centralised and dedicated team to connect with our vendors for long-tail languages. This helps us to develop stable, trust-based relationships which create the right conditions for Post-Editing. Working closely with our vendor community and our MT team, we are finally able to pave the way for deploying MT across all the languages our customers are interested in.”

We have all the building blocks in place to meet the challenges of deploying Machine Translation and Post-Editing for long-tail languages. Our MT technology, experience and expertise are the cornerstones on which our customers can build a truly global strategy where developing markets and long-tail languages are a core element rather than an afterthought.    
Andrea Stevens
Author

Andrea Stevens

Principal Linguistic AI Consultant
Andrea has over 20 years of experience in the localization industry, starting her career as a translator. She is now part of SDL’s Linguistic AI team, working closely with the rest of the organization on MT and Post-Editing adoption and training.
All from Andrea Stevens