The latest on MT with researcher Dimitar Shterionov
22 Oct 2020
At RWS Moravia, we love keeping up with cutting-edge language research, especially with innovations related to machine translation. Maribel Rodríguez Molina, RWS Moravia Language Technology Development and Deployment Manager, sat down with Dimitar Shterionov, Assistant Professor at Tilburg University and former Assistant Professor at Dublin City University and ADAPT Centre collaborator, to discuss neural post-editing, back-translation to improve MT performance and what the hottest machine translation topics are right now.
Can you tell us a bit about yourself and the focus of your research?
Until January of this year, I held a postdoctoral position at the ADAPT Centre at Dublin City University. I worked there as a spokes post-doctoral researcher, which means that I collaborated on industry projects that would come to the ADAPT Centre. So, if there was a company that wanted to conduct some cutting-edge research that would aid their services or bring new services to the market, I was one of the people at ADAPT involved in the implementation of the project, including conducting the necessary research, development and/or deployment of the project. Between January 2020 and June 2020, I held an Assistant Professor position at DCU and was a collaborator to the ADAPT Centre. In August, I started a new career at Tilburg University.
My focus is on machine translation. One of the main areas that I am looking into is the dependency between the data and the translation quality of an MT system, meaning that I want to investigate what kind of data we use for training the MT system and how this data affects its performance. I’m also looking into quality estimation and back-translation and have been interested in and worked on discourse in machine translation.
What do you mean by discourse in machine translation?
Basically, it’s about the consistency and coherence of documents that are being translated. Nowadays, machine translation systems operate on a sentence level, i.e. translating one sentence at a time. As such, you can lose information that is hidden in the context and the document can be sort of broken. We are looking into incorporating context into a (sentence-level) MT system; we are trying to figure out how to actually keep the coherence of the document without providing the complete document at translation time.
Currently, there seem to be many different research trends related to neural machine translation. Which trends do you think are the most significant, and why are they more significant than others?
One of the emerging trends in the MT world is synthetic data. Typically, a machine translation system is trained on a parallel corpus, which means that you have source data in, say, English and target data in German, the latter being the translation of the former. However, for some language pairs, there is not enough parallel data, making it impossible to train a new translation system.
A lot of languages can be used in the same or nearby regions, like Amharic and Tigrinya in Ethiopia and multiple languages in India, but there is not enough (or sometimes no) parallel data available of sufficient quality to train neural machine translation systems. In order to fix this issue, researchers have been using available monolingual data to generate synthetic parallel corpora.
One such way is back-translation, which I have been working on for some time now. It’s called back-translation because we take monolingual data that is available in the target language, and (try to) translate it into the source language. Back-translation is the process by which monolingual data is translated via an existing (even if low-quality) MT system. The MT output is then used as the pseudo-source of a (synthetic) parallel corpus, and the monolingual data that we started with is used as the target.
Research has shown that a system trained on a combination of authentic and back-translated data—even if the back-translated data is of low quality—can have much higher translation performance than a system only trained on authentic data.
In our latest work, Selecting Backtranslated Data from Multiple Sources for Improved Neural Machine Translation, a collaboration between Xabier Soto (UPV/EHU), Alberto Poncelas, Andy Way (DCU/ADAPT) and myself, we investigated different systems for generating the back-translated data: Rule-based, Phrase-based Statistical and Neural (both RNN and Transformer) MT. Then, we invoked a state-of-the-art data selection algorithm, which we further optimized, to select a subset of the back-translated data from the different systems. We then used this optimised/reduced set of synthetic parallel data together with some authentic data to train new Transformer systems with reduced training effort and high translation performance. This article was presented at the ACL conference this year and is available at https://www.aclweb.org/anthology/2020.acl-main.359.pdf.
We’ve been hearing about automatic post-editing (APE) for some time. Can you give us more details about APE?
You have sentences in language 1 and you machine-translate them to language 2. There will always be errors on the translation side, and automatic post-editing (APE) aims to reduce these errors without the interference of a human post-editor. Broadly speaking, APE is translating from language 2, which has some errors, to language 2 again, trying to map the incorrect text to correct text. To put it another way, an automatic post-editing system will take text containing errors and translate or map those incorrect sentences to sentences without errors in the same language.
Most recently based on neural techniques (often referred to as Neural Post-editing or NPE), APE aims to reduce systematic errors and relieve post-editors of having to deal with such errors over and over again, allowing them to focus on more important and more creative aspects of translation.
How does automatic post-editing of statistical machine translation compare to using neural machine translation for both the translation and post-editing?
Interestingly, both SMT and NMT have advantages and disadvantages, but by combining these two technologies, we get the best of both worlds. Post-editing SMT output via a neural machine translation system to fix issues related to word order, fluency and so on has been shown to be very effective.
In one set of experiments we did in 2019, we saw a 40% improvement in terms of fluency. Of course, that doesn’t mean that we reached a level of human quality, and there are still problems to be fixed, but we actually corrected some annoying errors.
If you have SMT as your main system and NMT as the post-editing system, that leads to improvements to the initial SMT output. But if you combine two systems that are NMT, so you have an NMT system that generates some translation and then an automatic post-editing system that is based on the same neural technology, the results are not as impressive, simply because both systems work in the same way. However, there are still a lot of companies using statistical machine translation who can employ some of these NMT approaches in their pipeline and improve their overall machine translation output without having to discard their SMT systems and start over from scratch.
In fact, we wrote a paper that got published in the Machine Translation Journal just last month on the topic of APE. It came from a collaboration with the Microsoft team in Dublin in 2018-2019 and presents interesting insights on APE drawn from real-world use cases, including the experiments I mentioned earlier. The title is "A roadmap to neural automatic post-editing: an empirical approach" and you can find it here, and stay tuned because another one on the topic of APE is in press and we are expecting it to appear in the MT Journal any time now.
How do you see the neural machine translation landscape evolving in the next five years?
One direction is multilingual MT; that is, combining multiple languages in neural machine translation systems, so one NMT system will be able to provide translations in any language the user may request. That probably will happen, because we want to incorporate knowledge from languages that are similar to improve translation to other languages.
This is a very interesting direction that has potential, especially around providing high-quality translation systems for low-resource language pairs or for emergency situations where you don’t have time or data.
Something else I discussed with a former colleague from the ADAPT Centre some months ago is highly personalized MT systems that can translate on a very personal level between two people. For example, if you and I were talking in our native languages, we would use a very specialized system that learns from these conversations and gains knowledge that is specific to each interlocutor, and it would translate very well while we are speaking.
Is there anything else that you would like to add?
In terms of machine translation, there are so many things going on nowadays: quality estimation, automatic post-editing, new technology trends, advanced pre-trained translation and language models (BERT, XLM, GPT3) and so on. One challenging problem that is still very difficult to address is (machine) translation of terminology, because you have to make a system correctly translate terms such as brand names that are specific to each company. For example, an automotive company will use one term for something but another company can use a completely different term for the same thing, so we need to make sure that the MT systems they use will recognize these differences and be able to handle them correctly.
Thanks for your time!
You’re very welcome!