Issue #67 - Unsupervised Adaptation of Neural MT with Iterative Back-Translation

Dr. Patrik Lambert Dr. Patrik Lambert Senior Machine Translation Scientist 30 Jan 2020
The topic of this blog post is domain adaptation.

Introduction

The most popular domain adaptation approach, when some in-domain data are available, is to fine-tune the training of the generic model with the in-domain corpus. When no parallel in-domain data are available, the most popular approach is back-translation, which consists of translating monolingual target in-domain data into the source language and use it as training corpus. In this post we have a look at a refinement of back-translation, inspired from the advances in unsupervised neural MT, which yields large BLEU score improvements.

Adaptation with Iterative Back-Translation

The method is presented in a paper by Jin et al. (2020). It assumes access to an out-of-domain parallel training corpus and in-domain monolingual data (in both the source and the target languages). In this approach the training optimises three objectives:
  • Source and target bidirectional language models. In these language models, masked words are predicted given the whole context surrounding them.
  • Source-to-target and target-to-source unsupervised translation models. Source monolingual sentences are translated by the current source-to-target model. Similarly, target monolingual sentences are translated by the current target-to-source model. The objective is to minimise the training loss of these models with these synthetic data.
  • A supervised neural MT model, trained with out-of-domain data.

Results

Adaptation with Iterative Back Translation (IBT) is compared with baseline adaptation methods. The best baselines are back-translation and DAFE (DAFE performs multi-task learning on a translation model on out-of-domain parallel data and a language model on in-domain target-side monolingual data, while inserting domain and task embedding learners into the transformer-based model). IBT works much better than the baselines when adapting between specific domains, but since it doesn’t seem to be a real-world scenario, we will focus on the results of the adaptation from a more general domain (like news) into a specific domain (like law or medical). In this case, DAFE works slightly better than back-translation.  

For the WMT14 de-en task, IBT with the out-of-domain data as parallel corpus yields 1.5 to 2.5 BLEU points improvement with respect to the best baseline. With back-translated data as parallel corpus, the improvement is more than 2-3 BLEU points. Adding extra monolingual in-domain data gives further improvements. For the smaller WMT16 ro-en task, the improvement is larger.

Discussion

An ablation study reveals that all components are important: pre-training, IBT, language models, and supervised translation models with back-translated data. However, the language models are the component with the least impact. 

Interestingly, pre-training also has a large positive impact on the baselines. However, this result is given for adaptation between specific domains. What is missing is a comparison of IBT with pre-trained back-translation and DAFE on the generic-to-specific domain adaptation.

In Summary

Iterative Back-Translation with pre-training, source and target language models and back-translated parallel data is the best adaptation approach to date when no in-domain parallel data are available. However, it ideally requires monolingual in-domain data in both the source and target languages. The paper also highlighted the positive impact of pre-training for all considered domain adaptation baselines.
Dr. Patrik Lambert
Author

Dr. Patrik Lambert

Senior Machine Translation Scientist
Patrik conducts research on and builds high-quality customized machine translation engines, proposes and develops improved approaches to the company's machine translation software, and provides support to other team members.
He received a master in Physics from McGill University. Then he worked for several years as technical translator and as software developer. He completed in 2008 a PhD in Artificial Intelligence at the Polytechnic University of Catalonia (UPC, Spain). He then worked as research associate on machine translation and cross-lingual sentiment analysis.
All from Dr. Patrik Lambert