The Future of Translation Quality Assessment: An Interview with Dr. Joss Moorkens

Lee Densmer 12 Jun 2020
The Future of Translation Quality Assessment: An Interview with Dr. Joss Moorkens

In this world of growing online consumer engagement with products, there are greater volumes of content to be translated than ever before. Consumers are demanding more and more product descriptions, reviews, online help and even social media…and corporations would be wise to provide it all in multiple languages. To handle this much content and at the required speed, machine translation (MT) often comes into play.

But is the quality good enough? Raw MT might be acceptable, or post-editing might be needed in order to meet translation quality requirements. Considering the possible variations in translation quality resulting from the various processes, quality is awfully hard to define and assess.

Maribel Rodríguez, Language Technology Deployment Manager at RWS Moravia, talked to Dr. Joss Moorkens to find out how his research in this field is helping address the pressing questions in this space. Joss is an Assistant Professor at the School of Applied Language and Intercultural Studies at Dublin City University and a researcher at the ADAPT Centre and the Centre for Translation and Textual Studies. He has authored over 40 journal articles and book chapters on translation technology, post-editing of machine translation, user evaluation of machine translation and translation technology standards.

The nature of quality and quality evaluation in this era of MT is one of the topics we discussed with him.

MARIBEL: Could you explain the focus of your research?

JOSS: My focuses are translation technology and how humans work with machine translation. We’ve worked on creating different types of interfaces: an interface for mobile, one that uses touch and voice and ones that are focused on accessibility. Other areas of focus are post-editing processes, translation process research and translation quality assessment.

MARIBEL: What are the big trends you’re seeing in translation quality assessment?

JOSS: Over the last few years, there’s been a change in how translation quality is assessed in that it’s now calibrated per customer and per project. There seems to be an increasing trend towards “good enough” quality for certain purposes and for more perishable content. Certain metrics have become more widely available such as Multidimensional Quality Metrics (MQM) and Dynamic Quality Framework (DQF) that allow a certain amount of tailoring and can be used for a wide variety of purposes, whereas in the past, there had been research in many different annotation metrics for machine translation. Now, a subset of these larger metrics (MQM and DQF) can be chosen for machine or human translation.

JOSS: And how about you, Maribel? What is your experience regarding these changes in translation quality?

MARIBEL: I’ve been working in localization for nearly 17 years now, and when I started, there was this sort of “one size fits all” approach to linguistic quality using LISA standards. It didn’t matter if there was a very demanding customer or if someone just needed something that was perishable for gisting purposes, for example. The metrics that we used and the approach to human review were always the same, whereas now, everything is customized to the customer and in many cases, also at the project level. We have many different quality approaches depending on the nature of the content and different sets of parameters.

JOSS: The other thing that’s changed is that the combination of improved machine translation quality with financial imperatives has pushed machine translation into more use cases than we’ve seen previously. For example, raw machine translation is being tested for user interfaces. In some cases, a light review might be permissible rather than full post-editing. I am not sure this is a good thing, but this seems to be a trend at present.

MARIBEL: Is there an agreed-upon industry or academic definition as to what translation quality actually means?

JOSS: No! Translation quality is usually agreed upon at the project level. We try to look for ways to measure quality that might be parallel with human judgement, but the answer to so many things with translation is" it depends. The expected quality for a literary novel is not going to be the same as for a TripAdvisor review. There are lots of variables that will change the expectation of quality and the translation process. Many things will be tailored to the value placed on the content and the amount of money available, so I don’t think it’s possible to have a single definition of what quality is.

MARIBEL: What are the key challenges of assessing translation quality in the current environment? Where are the lines between human translation and machine translation getting more blurred?

JOSS: The main challenge is trying to have predictive quality measurements, or confidence estimations, for machine translation. It’s a real problem when machine translation is priced based on a previous job, and then something is back-translated and the quality is massively different. Often, it’s not possible to price the amount of post-editing effort accurately. So, best practice would be to price retroactively for time, but a lot of language service providers are not comfortable doing that.

MARIBEL: I would be interested in hearing about your book. Who is it for and what is it about?

JOSS: Four of us co-edited a book: myself, Sheila Castilho, Federico Gaspari, a post-doc here at the ADAPT Centre and a lecturer in Reggia in the south of Italy, and Stephen Doherty, a Dublin City University and ADAPT Centre alumnus now at the University of New South Wales.

It’s called Translation Quality Assessment - from Principles to Practice. It’s for practitioners within the industry as well as researchers. We review current approaches to human and machine translation quality assessment.

There’s a chapter about the quality management principles and practices within the European Union institutions by Joanna Drugan from the University of East Anglia and by lawyer-linguists from the European Commission. It’s a detailed description of the gold standard of translation quality assessment within probably the largest translation agency in the world.

There are sections about education in training, crowd sourcing and translation quality and applications of translation quality assessment, including the MQM metrics for standardized error typologies. Andy Way wrote a chapter about quality expectations and machine translation and the increasingly different uses for MT. We have a chapter about MT post-editing itself for academic writing support: academics are trying to write articles, yet they’re linguistically disadvantaged by a lot of scientific material requiring English-language publication. It’s testing MT plus self-post-editing for those sorts of academic articles. Finally, there is a chapter about the level of quality that Neural Machine Translation (NMT) can attain on literary texts by Antonio Toral from the University of Groningen.

MARIBEL: If you had to put on your futuristic hat, where do you think translation quality assessment is heading?

JOSS: The importance of confidence estimations will be key. Machine Translation is going to be involved in more and more translation workflows as time goes on, so it’s important to think about how we introduce it. It could be that post-editing is not the best method. A couple of translators in Ireland who I spoke to said that they prefer to use MT as a starting point, to give them ideas for how you might translate a segment. They said that it increased their speed, but how you would price that as an employer is difficult.

The interactive MT method used by some localization tools doesn’t have the increase in throughput when moving from Statistical-based MT to Neural MT that we expected from looking at the number of keystrokes that are required and from looking at the increase in other quality measures, particularly fluency. So, figuring out the best way to introduce MT into workflows, measure the quality and make sure that the errors in NMT aren’t in the final output will become more focused over the next five years or so. In addition, we’ll be trying to encourage a sustainable balance between long-term benefits for all translation stakeholders and short-term aims to eliminate waste and excess cost within the production process.

MARIBEL: Thanks Joss!

JOSS: My pleasure.

If you are interested in understanding how machine translation outputs can be measured and how human processes interplay, don’t hesitate to reach out to us. Our dedicated MT team loves to talk about this stuff.

Lee Densmer
Author

Lee Densmer

Lee Densmer has been in the localization industry since 2001, starting as a project manager and moving up into solutions architecture and marketing management. Like many localization professionals, she entered the field through an interest and education in languages. She holds a master’s in linguistics from University of Colorado. Lee lives in Idaho, and enjoys foreign travel and exploring the mountains of the region.
All from Lee Densmer