Issue #37 - Zero-shot Neural MT as Domain Adaptation

The topic of this blog post is multilingual machine translation.

Introduction

Zero-shot machine translation - a topic we first covered in Issue #6 -  is the idea that you can have a single MT engine that can translate between multiple languages. Such multilingual Neural MT systems can be built by simply concatenating parallel sentence pairs in several language directions and only adding a token in the source side indicating to which language it should be translated. The system learns how to encode the input into a vector representation for several different languages, and how to generate the output conditioned on the encoder representation. This configuration enables zero-shot translation, that is the translation in a language direction not seen in training. 

However, so far zero-shot translation has been much worse than translation via a pivot language. In this post, we take a look at a paper which analyses why zero-shot translation is not working and proposes effective solutions by considering a new source language as a new domain.

The missing ingredient

Arivazhagan et al. (2019) build a multilingual neural MT system from English into French and German and reversely, with English-French and English-German parallel data. Then they use this system to translate from French into German and reversely, although no training data was available between these languages. The idea behind this is that the multilingual engine learns how to encode French, German or English input into a vector representation, and learns how to decode this representation to generate text in the target language. Thus it can, in theory, decode into a target language seen in training with another source language. However, this doesn’t work very well because the decoder learns to generate the target text conditioned on the encoder representation. Thus for zero-shot translation to work well, the encoder representation should be language-independent. In other words, it should be like an interlingua.

Aligning Representations

To make the encoder representations more similar between each other, and thus more language-independent, Arivazhagan et al. use domain adaptation techniques. They consider that different source languages are like different domains, and different target languages are like different tasks. Taking English as the source domain, the aim is to adapt the other domains (languages) to English. To this end they introduce a regularisation term optimised during training which minimises the discrepancy between the feature distributions of the source and target domains. This will force the model to make representations of sentences in all non-English languages similar to their English counterparts. 

In the paper two regularisers are tested. The first one minimises the discrepancy between the feature distributions of the source and target domains by explicitly optimising an adversarial loss (in Issue #11 we had a look at adversarial training). Thus it aims at aligning distributions and does not require parallel training data. The second regulariser benefits from the available parallel training data. It maximises the similarity between the encoder representation of the English and the target sides of a segment pair. 

Both regularisers yield a large BLEU score improvement of zero shot translation, up to the level of pivot translation. The BLEU score increases from 17 to 26 for French to German, and from 12 to 20 for German to French.

In summary

For zero-shot translation to work well, one must be able to encode the source text into a language-independent representation, and to decode from this common representation to the target language. The present paper shows a large improvement in this direction. The concept of interlingua comes back with deep learning! As we saw in Issue #28, aligning source and target representations is actually also of crucial importance to unsupervised NMT.
Dr. Patrik Lambert
Author

Dr. Patrik Lambert

Senior Machine Translation Scientist
Patrik conducts research on and builds high-quality customized machine translation engines, proposes and develops improved approaches to the company's machine translation software, and provides support to other team members.
He received a master in Physics from McGill University. Then he worked for several years as technical translator and as software developer. He completed in 2008 a PhD in Artificial Intelligence at the Polytechnic University of Catalonia (UPC, Spain). He then worked as research associate on machine translation and cross-lingual sentiment analysis.
All from Dr. Patrik Lambert