Issue #51 - Takeaways from the 2019 Machine Translation Summit

Introduction

As you will have probably seen across our website and channels, the Machine Translation Summit took place a few weeks ago on our doorstep in Dublin, Ireland. In addition to sponsoring, hosting the MT social, and presenting our own paper, we also attended many talks, had a lot of great conversations, and since the conference, we have spent some time reading up on some of the work we didn't get to catch. As expected, we came across many interesting pieces of research . In this post, we summarise three papers that we found particularly interesting!

Controlling the Reading Level of Machine Translation Output (Marchisio et. al. (2019) )

Depending on the target audience, we may want to change the complexity of the resulting translation output. Marchisio et. al. (2019) in their work make a successful attempt to incorporate the same.

In the first approach, they change the training data. They add a text token at the end of each source sentence depending on the level of complexity required. They used only two levels of complexity: “simple” and “complex”.

In the second approach, they change the underlying model. They train an encoder-decoder model with a shared encoder and two decoders—one for “complex” decoding, and the other for “simple” decoding. At inference time, they pass a flag indicating the desired complexity to select the right decoder.

From the results, they were able to control some complexity in the output. They observe that for both approaches, the system was able to generate relatively simple and complex outputs.

To illustrate, we can see in this example from the paper where the complex model chooses initiate and simple model chooses start.

Source: Pero mis provocaciones están dirigidas a que se inicie una conversación.
Simple Translation: But my provocations are meant to start a conversation.
Complex Translation: But my provocations are directed to initiate a conversation.

They experimented with various settings of selecting simple and complex data and find that both approaches, in various settings, result in overall low BLEU scores. Also, they observe a tradeoff between the level of complexity we would like to model and the overall quality. BLEU decreases significantly when we try to model either extreme, whether too simple or too complex. They also conducted human evaluations and concluded that BLEU penalizes such translations more and may not reflect the true quality of the translations produced by such models.

Improving Neural Machine Translation Using Noisy Parallel Data through Distillation (Dakwale and Monz (2019))

Dakwale and Monz (2019) propose an approach to use noisy parallel data. In general we filter the noisy data to have only clean sentences for training and this brings us better results compared to using all noisy data. However, Dakwale and Monz (2019) show that we can train a model on clean data and use this clean model as a teacher model to train our model on “clean + all noisy data”. In this way, we can better utilize the noisy data and the resulting model will be better than any of the following models:

1) trained on only clean parallel data

2) trained on clean data and all noisy data

3) trained on clean and filtered noisy data, where the filtering is performed by dual cross entropy filtering method (Marcin Junczys-Dowmunt 2019)

4) trained on clean and back-translated noisy data

Selecting Informative Context Sentence by Forced Back-Translation (Kimura et al. (2019))

In general, we consider the previous and/or subsequent sentence as a context when we would like to consider a small context. In their study, Kimura et al. (2019) concluded that even if we consider only one sentence as a context it is not always the case that the preceding sentence is the one which is most helpful for the translation of the current sentence. They find that selecting the most informative sentence from the preceding five and following five sentences can bring improvement over the baseline without context or with immediately preceding or subsequent context. Although their proposed approach for Informative sentence selection bring improvements over the baseline, it may not be feasible in practice as it requires back-translation of the sentences in the context. A lightweight approach, e.g. based on some heuristics, may be a better alternative for informative context selection.

What's next?

To avoid repetition, in this post we deliberately skipped the four publications we covered in our previous post “Previewing the 2019 Machine Translation Summit''. We suggest you go through the previous post though, if you missed it. Even with these two posts it is hard to detail all of the in depth work covered in these papers and other interesting papers published during the MT Summit 2019. Therefore, it is highly advisable to go through the published MT Summit 2019 proceedings!

Stay tuned for next week's post where Dr. Patrik Lambert will review interesting papers from another important research conference - ACL 2019.

Tags:

Language Weaver

Author

Dr. Rohit Gupta

Sr. Machine Translation Scientist

All from Dr. Rohit Gupta