Issue #65 - Forward vs Back Translation for Neural MT

Raj Patel 19 Dec 2019
The topic of this blog post is data creation.


The quality of Neural MT can be improved using additional monolingual resources to create synthetic training data. In general, the source-side monolingual data is (forward) translated into target language; target-side monolingual data is back translated and then this synthetic data is added to the true bilingual data. It is widely reported that back translation improves significantly better compared to forward translation. In this post, we will discuss the paper Bogoychev and Sennrich (2019) in which they explain when and why forward and back-translation are effective for Neural MT.

Forward and Back-Translation

Given the translation task L1⟶L2,  where a large scale monolingual data are available in L2, back translation refers to training a translation model  L2⟶L1 and using it to translate the L2 data to create a synthetic data that can be added to the true bilingual data to train a  L1⟶L2 model. Back translation was first explored for statistical machine translation (SMT), but was found to be much more effective for Neural MT, particularly in a  low resource scenario. 

To use forward translation, the monolingual data should be available in L1, which is translated using a model L1⟶L2, and added to the true bilingual corpus for retraining the L1⟶L2 model (aka self-training). Self-training with forward translation was also pioneered in SMT, but it has shown that NMT can also benefit using the same. Compared to back translation, error and biases are intuitively more problematic when using forward-translation as they directly affect the encoder training.

Domain and Translationese

It has previously been shown that back translation is really effective for domain adaptation, and the effectiveness of back-translation and forward-translation heavily depends on the availability of related, in-domain monolingual data. Even if we have monolingual data for both source and target side of the same general domain, there can be subtle differences. For example, newspaper articles in different languages talk about different topics, e.g. Hindi news articles in India will cover more local news whereas English language news will talk about more general international topics. Therefore, when using back-translation, which is based on target-side data, it implicitly adapts to this target-side news domain, while forward translation would adapt the system to source-side news domain. 

Human translations show systematic differences to the natural text, which is termed as Translationese. The text produced via translation have different word distribution compared to naturally produced text due to interference from the source language. The effect of translationese has been already studied in the context of machine translation and it was found that systems reach higher BLEU on test-sets if the direction of the test set is the same as the direction of the training set (here direction refers to natural text vs human translation).

Effect of synthetic data

Bogoychev and Sennrich (2019) used all the available test-data in the news domain for French(FR) - English(EN), and split them based on the source language (natural vs human translation). In the experiments (FR⟶EN), they reported that the back-translation had a relative gain of 6.8 BLEU points for the portion of the test-sets with reverse translation whereas forward translation improves them by only 1.00 BLEU. However, on the test-sets that were originally in source language, forward translation brought an improvement of 2.00 BLEU points, whereas back-translation has suffered an average loss of 1.00 BLEU.

In summary

With the above results, we can conclude that back-translation is more effective than forward translation in the somewhat artificial setting where the input to the translation system is itself a human translation, and the original text is used as reference. In the more natural setting where the input is native text, and the reference is a human translation, forward translation can perform better in terms of BLEU.
Raj Patel

Raj Patel

Machine Translation Scientist
All from Raj Patel