Issue #45 - Improving Robustness in Real-World Neural Machine Translation

Dr. John Tinsley 11 Jul 2019

The topic of this blog post is robustness.

Next month, the 17th Machine Translation Summit will take place in Dublin, Ireland and the Iconic team will be in attendance. Not only that, we will be presenting our own work - Gupta et al. (2019) - on some of the steps we take to improve the robustness, stability, and quality of the Neural MT engines that we run in production for our clients. In this week's post, we are staying a little closer to home and will review some of the key topics we covered in the above work.

Getting "Real-World" Ready

In the previous 44 issues of this series, we have reviewed many cutting-edge approaches in Neural MT research and development. However, we are not just paying lip service. We practice what we preach and look to incorporate as many of these as possible into our own software. That being said, many of these approaches are prototypical, or at an early research stage. As a commercial provider of machine translation, we need to rigorously test the implementations to make sure they stand up to the challenges and variables of production deployment. Let's look at some of the steps we implement and why.

Data Preparation

We've all heard the old adage "garbage in, garbage out" when it comes to MT. So important is this topic with Neural MT, we covered it in our second ever post! In our MT Summit paper, we describe the steps we take, not only to clean and filter noisy data, but additional processing in order to make sure certain entities and formatting are retained correctly in the translation output.

On the cleaning side, this includes the normalisation of character encodings, removal of duplicate or overly long entries, and perhaps not so obviously, filtering of segments that are actually in the wrong language, which can be disturbingly common. In addition to that, we carry out steps to retain or process numbers and punctuation in the correct manner, depending on the language pair. Finally, we implement some proprietary steps to ensure that certain terms are translated in a specific manner and/or not translated (so-called 'Do not translates').

While it is difficult to fully quantify the positive impact of these steps - especially when it comes to preserving terminology in certain applications - we broadly see an improvement of 3 BLEU points across generic data sets, as well as a significant reduction in common errors like over- and under-generation.

Tokenisation and Subword Encoding

Statistical MT was quite robust with non-standard inputs that we maybe hadn't seen in our training data, but Neural MT is much more unpredictable. Entities such as email addresses, URLs, tags, and lists, which are clearly important in the documents where they are found, need to be handled directly in order to ensure they are processed/translated correctly. This involves the application of specific tokenisation, not only for the language, but to address preservation of these entities. In order to improve vocabulary coverage, particularly with Asian languages, we also employed a version of subword encoding, which is a topic we covered in Issues #3 and #12.

We found that, from a practical perspective, automatic quality metrics were not sensitive enough to pick up the subtleties of the differences in the output, but extensive manual evaluation showed that such entities were preserved in 100% of the cases, which significantly decreases in out-of-vocabulary terms, varying somewhat depending on the nature of the test sets.

Domain Adaptation

Again, unlike with Statistical MT, the quality of Neural MT drops drastically when translating content that is outside the domain of the training data. In this paper, we describe some early approaches to domain adaptation, as covered in Issue #9, which had a moderate positive effect on performance. Since publication, the Iconic team has developed and implemented dynamic adaptation which has achieved very strong performance, particularly when adapting between similar content types.

In summary

In real-world MT scenarios, it is often the finer details around the edges that can be of most importance. Our paper shows just a few of the approaches we employ at Iconic to improve the quality of our software for clients around the world. If you would like to learn more about this work, or meet the scientists behind the paper and all the posts in this series, we'll be present at the MT Summit conference itself on Wednesday, August 21st from 16:00-17:00. We hope to see you there!

Dr. John Tinsley

CEO

All from Dr. John Tinsley