Issue #40 - Consistency by Agreement in Zero-shot Neural MT
Introduction
In two of our earlier posts (Issues #6 and #37), we discussed the zero-shot approach to Neural MT - learning to translate from source to target without seeing even a single example of the language pair directly. In Neural MT, the zero-shot training is achieved using multilingual architecture (Johnson et al. 2017) - a single NMT engine that can translate between multiple languages. The multilingual neural model is trained for several language directions by concatenating the parallel sentences of various language pairs.
In this post, we focus on the generalisation issue of zero-shot neural MT and discuss a new training method proposed by Al-Shedivat and Parikh (2019).
Zero-shot consistency
A neural MT engine is said to be ‘zero-shot consistent’ if low error on supervised tasks implies low error on zero-shot tasks i.e. the system generalises. In general, it is better to have a translation system that exhibits zero-shot generalisation as the access to the parallel data is always limited and training is computationally expensive.
To achieve zero-shot consistency in Neural MT, Al-Shedivat and Parikh proposed a new training objective for multilingual NMT called ‘agreement-based likelihood’ that avoids the limitations of pure composite likelihoods. The idea of agreement-based learning was initially proposed for learning consistent alignment (Lianget al., 2006) in phrase-based statistical machine translation (SMT).
Agreement-based likelihood
Rather than jumping into the full details of the objective function, for simplicity, let’s consider a multilingual NMT model of 4 languages -- English (En), Spanish (Es), French (Fr), and Russian (Ru) -- where we have available parallel corpora for En-Es, En-Fr, and En-Ru. Intuitively, the objective is the likelihood of observing parallel sentences (XEn, XFr) and having sub-models PDoes it work?
Al-Shedivat and Parikh experimented using UN corpus for En, Es, Fr, Ru, Europarl v7 for German (De) En, Es, Fr, and IWSLT17 for Italian (It), Dutch (Nl), Romanian (Ro), De, and En. They focus their evaluation mainly on zero-shot performance of the following methods:- Basic, which stands for directly evaluating a multilingual model after standard training (Johnson et al., 2017).
- Pivot, which performs pivoting-based inference using a multilingual model (after standard training); often regarded as the gold-standard.
- Agree, which applies a multilingual model trained with agreement objective directly to zero-shot directions.