Issue #74 - Transfer Learning for Neural Machine Translation
Building machine translation (MT) for low-resource languages is a challenging task. This is especially true when training using neural MT (NMT) methods that require a comparatively larger corpus of parallel data. In this post, we review the work done by Zoph et al. (2016) on training NMT systems for low-resource languages using transfer learning.
The idea of transfer learning is that we can take advantage of a trained model, to improve the performance of a related task. The knowledge is transferred to a new model from a pre-trained model, and the two models are not necessarily built for the same machine learning task. In most applications, the two models are dealing with the same task and the main goal is to reduce the size of training data required to train a well-performing model. In NMT, transfer learning is training a model (child) with a small amount of parallel corpus, e.g. Uyghur-to-English, with a pre-trained model (parent), e.g. Turkish-to-English, as the initial parameters in the training. The idea is that the knowledge preserved in the Turkish-to-English model could be transferred to the newly trained Uyghur-to-English model. The main benefit of this learning strategy is that the required training data of the new model could be much smaller than the pre-trained model, and we could train a model relatively faster. It should also be noted that there must be some connection between the two models. In NMT, the connection is that all the languages involved, i.e. Uyghur, Turkish and English, share the same space of embedding.
Experiments and Results
The first question we might ask is: “Does transfer learning really work in NMT?” Zoph et al. (2016) used French (to-English) as the parent, and the child source languages are Hausa, Turkish, Uzbek, and Urdu. The results show that when using transfer learning, the BLEU scores have around 6 to 9 points improvement compared to training the models without it. The effect of using different parents is also tested. In this experiment, French and German (to-English) are used as parent languages, and Spanish is the child language. The results showed that when using French as the parent, the trained Spanish-to-English model performed better than when using German as the parent. This confirmed that using a close language as the parent is more beneficial to the resulting child model.
In this post we briefly reviewed the idea proposed by Zoph et al. (2016) to apply transfer learning to NMT. This is the first work to use transfer learning in NMT, and demonstrated that the idea works. There are several other sources of good work if you are interested in this topic. Nguyen and Chiang (2017) reported training NMT models using transfer learning for related languages, i.e. Turkic languages Turkish, Uzbek and Uyghur, in their study. Neubig and Hu (2018) proposed a creative approach to rapidly transfer learning from “massively multilingual seed models”, which is suitable for fast training of NMT models for low-resource language pairs. Their seed models are trained with 58-language-to-English TED corpus. We'll be watching closely on the development of this line of research as it enables training of well-performing NMT models for low-resource languages.