Machine Translation Pre-Editing To Boost Output Quality
Click here to close
Click here to close
Subscribe here

Machine Translation Pre-Editing To Boost Output Quality

Machine Translation Pre-Editing To Boost Output Quality

MT Pre-Editing

If you use Machine Translation (MT), you know how human intervention can make all the difference in achieving the right quality level, and chances are, you’ve used either light- or full post-editing as described here.

Yet, post-editing is not the only human intervention that you can implement. There is another kid in town: pre-editing.

What Is Pre-Editing?

The pre-editing process revises technical documentation before it is goes through MT: improve the source to improve the raw output quality. Good pre-editing will reduce or even eliminate the post-editing workload.

As with post-editing, the resource is ideally a specialized editor who can analyze a text block from the perspective of an MT engine and anticipate the potential output errors. The pre-editor will edit to facilitate MT by reducing sentence length, avoiding complex or ambiguous syntactic structures, ensuring term consistency, and using articles.

The editor should run automated revision tools such as spell-checking the source text against a project-specific glossary, and deploying advanced grammar-checking tools. Also, he/she can tag elements in the source document that are not to be translated.

These pre-editing techniques are valuable for human translation projects as well. Many organizations that develop vast mono- and multilingual material include similar processes in their localization best practices. Writing this way from the get-go offers many positive downstream effects on overall quality and productivity. MultiLingual magazine does a very thorough job of describing these writing practices here.

How Much Pre-Editing Is Enough?

Just like everything about machine translation: it depends on source quality and required output quality.

As with post-editing, you need to measure the change between source and target in order to calibrate the level of pre-editing to the output quality that you need. Although they have existed in one form or another for over 60 years, the tools used to measure textual changes are actively evolving in our industry to this day.

Often based on Levenshtein’s famous “edit distance” algorithm, the most advanced tools use clever algorithms to gauge the actual editing effort. How much effort is required to achieve a certain percent change in target quality, and what is the cost of that effort? Measuring effort on top of change could help with further ROI calculation.

Meanwhile, to get a basic measurement, you can have an editor test light and full pre-editing on a source text, run the result of each editing level through MT, and have a professional linguist review those outputs to determine the quality differential. You can also use automatic scoring — e.g., BLEU, GTM, Meteor, TER, etc. — to measure the similarity of MT outputs.

Only after measurement can you make business decisions about how much time to allow for pre-editing.

When to Consider Pre-Editing

There will be a tipping point at which your money is better spent on pre-editing than post-editing, and vice versa. The question is “when?”

Pre-editing ROI is most typically achieved when a technical or user document is going to be translated into more than three languages. So, it’s definitely worth investigating the ROI for a pre-editing process when translating into dozens of languages. Why not use one resource before MT instead of dozens after?

However, pre-editing is not always the right approach, and nor is it always necessary: if your source quality is already pretty good (as established by human review and automatic checks), and your MT engine is finely tuned with domain dictionaries and Translation Memories, then a light post-editing process may be all you need to make sure the translations make sense.

Is There a Tool for That?

A writer can’t remember all these rules, no way, no how. Some source quality improvement technologies may be helpful.

  • Traditional TM technology can facilitate source creation. A source content memory can provide useful feedback to your writers. For example, it can identify that multiple writers are producing very similar content and identify the differences so that the writing style can be kept consistent across writers and products over time.
  • Generic pre-editing plugins, or automated pre-editing rules, can help a writer reformulate the source text prior to MT.
  • Simplified Technical English, or Controlled Language tools provide some automated formalization of the rules for writing for localization, which include short sentences, active voice, and standard word order, among other things. (However, writers can struggle with tools that simplify or control their work — see my blog on “Why Writers Hate Controlled Language.”)
  • Program or client-specific custom tools identify spelling, grammar and preferred terminology. These are essentially grammar checkers check gone nuts: rules customized for a specific program. Think of this approach as a “custom automated style guide.”

Making the Case for Pre-Editing

There are many benefits to justify the upfront effort of pre-editing, such as:

  • Productivity improvements. When source content is not great, a thorough one-time pre-edit will boost MT output quality and save post-editing time in every target language. The more targets, the more time saved.
  • Quality enhancements and better customer experience. Better content is, simply, more effective for users. Support costs drop if you have both better source and clearer translations.
  • Cut costs. With good source content, a strong MT engine, and good past content, savings on a 500,000-word translation program with five target languages could easily be 20%. Clearly, this surpasses the savings involved in a typical MT + full PE effort, which can reduce costs by approximately 10%.

How to Get Started

While pre-editing may not totally eliminate the need to post-edit, it’s worth checking out. Start by getting your source content evaluated for MT effectiveness. If it’s poor, do some tests that involve pre-editing at different levels. It’s a very low-effort investment compared to the potential upside. You will be testing source and translated content anyway if you are considering an MT program.

From there, you could pilot the pre-editing process on a specific project, for a specific language set. You can contrast that effort with costs from a past project of the same size and language set.

Have you ever worked to improve source files as a strategy to boost MT output? How did you prioritize the effort to get the most value?