Imagine translating a story about a dog. You’re working in a translation management system on a source sentence: “The black dog was almost hit while running across the street.” Now suppose your translation memory (TM) has a stored translation for: “The brown dog was hit while running across the street.” Which is what is called a fuzzy match. Close enough—you just have to change “brown” to “black” and add the (very important) “almost.”
But what if a machine translation (MT) engine offers you a perfect translation for the sentence, save for slightly awkward phrasing: “While running across the street, the black dog was almost struck”?
Do you use the translation from the TM, which is human-generated but requires editing, or the translation that’s almost perfect but came from a machine?
What seems like six of one, half dozen of the other isn’t a simple choice. Once upon a time, the choice between TM and MT was simple: take the path requiring the least human interference, which generally was TM because MT quality was so poor. But it’s getting tougher for translators to decide. With advances in machine learning, we’ve finally reached a point where MT-generated translations are competing with low fuzzy matches.
So, does that make TMs obsolete?
Not at all. Not yet, at least. But it is time for us to rethink the industry’s long-established norms around fuzzy matching that have persisted for over two decades.
The translator’s choice: fuzzy match or MT result?
Traditionally, the low-end threshold for a fuzzy match was 70-75%. Up until recently, we had little reason to question it: a fuzzy match of 70-75% was the clear winner before MT made the leap to neural. And even now that MT results have generally improved, we’ve yet to see an academic paper or large body of research across major languages that proves MT has surpassed TM.
But we do have anecdotal evidence that this could be true. Earlier this year, TAUS released a paper showing (based on their own data) that in Romance languages, at least, anything below an 85% match is potentially better handled by machine translation than by translation memory.
You might say it makes sense, then, to raise the bar to at least 85%. But there are a few problems with this:
- Language is so flexible and language/content type permutations are so endless that nobody can say for certain that an 85% threshold would win in every possible use case. There will be different standards for legal content in French, different rules for technical content in Russian, and so on.
- On top of that, you have more variables: different MT engines are good for different use cases, and different algorithms behind TM matching (there is no one “standard” algorithm) can all result in different levels of match. Again, the possibilities are infinite.
- Even if we did figure out that 85% represents a better threshold for fuzzy matching across the board, nobody is going to come out and say that anything below 85% is always best translated by MT. It’s too risky to apply a catch-all rule.
And so, we remain at an interesting point of experimentation. When fuzzies get down to low-ball matches of around 70%, translators are faced with using their best judgment to decide whether it’s worth accepting the fuzzy match (and editing it to reflect the full source meaning) or accepting an MT-generated translation of the full source meaning (that might need edits, albeit of a different type, for accuracy and/or fluency).
What’s the best use of their time? There is no right answer.
The fact that we even have to wonder which is better marks a new crossroads for the translation industry. It’s not unlike the industrial age. The Wright brothers, for example, had to crash dozens of prototypes of their plane before they finally got one off the ground. Similarly, because it’s not possible for us to account for every permutation or variable, we try things, learn from our mistakes, and as our experience (and MT) evolves, we’ll figure it out.
The tipping point
But if we can’t account for every possibility, how will we know at what point machine translation will overtake the reliance on TM?
The shift will be gradual, but for now it will be a question of how “quality” is defined.
Today, whether machine translation (with or without post-editing) is of better or worse quality than a fuzzy match of 85% (with or without revisions by the translator) is totally in the eyes of the beholder. But one day, the technology itself will be able to guide us.
Here we get into something called quality evaluation (QE), where neural MT can begin to evaluate the quality of its own output. Instead of just giving you a machine-generated translation—take it or leave it—the machine will get intelligent enough to tell you: no, this isn’t a perfect translation, but I can point out the location of potential errors for you and provide options to fix them. It will not only self-assess but self-diagnose. As time goes on, with enough experience, it will even fix some errors for you based on your previous choices. We already see this now to a degree with adaptive MT.
Once MT can become fully self-aware of its mistakes and provide the human editor viable ways to fix them, that will be the point at which it overtakes TM in certain use cases. And we’re not looking too far into the future, either. With so much investment from big tech, MT could reach that point sooner than we think.
But don’t get us wrong—we can still say with a high degree of certainty that humans won’t be replaced by machines. It’s just that we’ll move from what we used to call computer-assisted translation to human-assisted translation. The machine will take the first pass—at least for content with lower emotional weight, but maybe eventually for higher-weighted content as well—before the human cleans it up and the MT engine’s performance is judged.
Then, of course, we have to think about what this shift in dynamics means for translators. If the machine takes the first pass, then the human becomes more editor than translator. Does this make translators worse off financially because less work is required? What about the hyper-specialized translators who translate highly branded marketing content, to whom using MT or TM is just a hassle that takes away from the creative process of translation? Can we force MT on those guys?
It all leads back to the reason we use fuzzy matching in the first place: it’s still human. When MT can get to the deeper meaning and nuance behind text, when it can understand sentence flow and figure out different styles of writing across languages…who knows where it will take us?
What do you think? Should TM thresholds go up to offset advancements in MT? Drop your thoughts in the comments here or contact us to geek out about it.
Thanks to Jon Ritzdorf, Solution Architect at RWS Moravia and professor at Middlebury Institute of International Studies (MIIS) and University of Maryland (UMD), for informing this post. This year, he’ll be running experiments with his UMD students on the MT-fuzzy match matchup to compare data on post-editing efforts. Stay tuned for his results!