Understanding machine translation quality: What really matters?


- Lowest cost
- “Best quality” assessments based on metrics like BLEU, Lepor or TER, usually done by a third party
English to French |
English to Chinese |
English to Dutch |
Vendor A – 46.5 |
Vendor C – 36.9 |
Vendor B – 39.5 |
Vendor B – 43.2 |
Vendor A – 34.5 |
Vendor C – 37.7 |
Vendor C – 42.5 |
Vendor B – 32.7 |
Vendor A – 32.5 |
Assessing business value and impact
The first post in this blog series exposes many of the fallacies of automated metrics that use string-matching algorithms (like BLEU and Lepor), which are not reliable machine translation quality assessment techniques as they only reflect the calculated precision and recall characteristics of text matches in a single test set, on material that is usually unrelated to the enterprise domain of interest. The issues discussed challenge the notion that single-point scores can really tell you enough about long-term MT quality implications.
The enterprise value equation is much more complex and goes far beyond linguistic quality and Natural Language Processing (NLP) scores. To truly reflect business value and impact, evaluation of MT technology must factor in non-linguistic attributes including:
- Adaptability to business use cases
- Manageability
- Integration into enterprise infrastructure
- Deployment flexibility
A more meaningful evaluation framework
- Adaptability: Range of options and controls available to tune the MT system performance for very specific use cases. For example, optimization techniques applied to eCommerce catalogue content should be very different from those applied to technical support chatbot content or multilingual corporate email systems.
- Data privacy and security: If an MT system will be used to translate confidential emails, business strategy and tactics documents, human evaluation requirements will differ greatly from a system that only focuses on product documentation. Some systems will harvest data for machine learning purposes, and it is important to understand this upfront.
- Deployment flexibility: Some MT systems need to be deployed on premises to meet legal requirements, such as is the case in litigation scenarios or when handling high-security data.
- Expert services: Having highly qualified experts to assist in the MT system tuning and customization can be critical for certain customers to develop ideal systems.
- IT integration: Increasingly, MT systems are embedded in larger business workflows to enable greater multilingual capabilities, for example, in communication and collaboration software infrastructures like email, chat and CMS systems.
- Overall flexibility: Together, all these elements provide flexibility to tune the MT technology to specific use cases and develop successful solutions.
True expressions of successful business outcomes for different use cases
- Increased volume in cross-language internal communication and knowledge sharing with safeguarded security and privacy
- Better monitoring and understanding of global customers
- Rapid resolution of global customer problems, measured by volume and degree of engagement
- More active customer and partner communications and information sharing
- Higher volume of successful self-service across the globe
- Easy and quick access to multilingual support content
- Increased customer satisfaction across the globe
- Ability of monolingual live agents to service global customers regardless of the originating customer’s language
- Measurably increased traffic drawn by new language content
- Successful conversions in all markets
- Transactions driven by new translated content
- Stickiness of new visitors in new language geographies
- Ability to identify key brand impressions
- Easy identification of key themes and issues
- Clear understanding of key positive and negative reactions
- Faster turnaround for all MT-based projects
- Lower production cost as a reflection of lower cost per word
- Better MTPE experience based on post-editor ratings
- Adaptability and continuous improvement of the MT system