Neural machine translation engines produce systematic errors which are not always easy to detect and correct in an end-to-end framework with millions of hidden parameters. One potential way to resolve these issues is doing so after the fact - correcting the errors by post-processing the output with an automatic post-editing (APE) step. This week we take a look at a neural approach to the APE problem.
An APE engine is usually trained on machine translated data and a reference, which is ideally the corresponding manually post-edited data. Since manual post-editing is costly, a parallel corpus can be used to train the APE engine, by translating its source side and comparing the translation to the target side. Freitag et al. (2019) propose a neural APE engine trained on monolingual data (provided source-target and target-source MT models are available) via round-trip translation. The data is translated from the source language to the target language and then translated back to the source language. A neural MT engine is then trained on the parallel corpus formed by the round-trip translated text and the original text. This engine can be used to correct typical MT errors in this language, whatever the source language. It can be applied in a pivoting architecture: an MT system translates a text from a language A to language B. The APE engine then translates the MT output into corrected text in language B.
Freitag et al. use large parallel corpora in the news domain to train neural MT engines from source to target and target to source, and news crawl data as monolingual data for the round-trip translation. They observe no further improvement when using more than 24 million monolingual sentences to train the APE engine.
They report results on the test sets released at the Third Conference on Machine Translation (WMT18). At WMT the same test sets are used to evaluate translation from language A to B and B to A. These test sets were built by manually translating half of the segments from A to B and the other half from B to A. Interestingly, the proposed APE engine improves the BLEU score significantly compared to the NMT output, but only on the target-language-original half of the test set. That is, when translating with the NMT engine a human-translated sentence and comparing the NMT and APE outputs to the target sentence, which is original. On the source-language-original half, BLEU score actually decreases. In contrast, in a manual evaluation, adequacy and fluency are improved by the APE engine compared to the NMT output in both test set halves. This means that the human translation has made a less natural instance of the target language. As a consequence, correcting unnatural aspects of the NMT output with the APE engine actually moves it away from this reference. This also calls for more care into the ways test sets are built.
The proposed text repair model trained on round-trip translated text is effective in improving translation quality, according to a manual evaluation. It improves mainly fluency, since it is trained only on data in the same language. An advantage of this model is the possibility to use it for any language pair with the same target language. The main inconvenience is that it adds another step in the translation pipeline, which will have some impact on the overall speed of the end to end process.