Reducing skepticism: How does MT work?
Click here to close
Click here to close
Subscribe here

Reducing skepticism: How does MT work?

Reducing skepticism: How does MT work?

Half Full? Half Empty?You know by now that I enjoy defining things in my blogs. I believe it forms a basis for discussion and evolution of these concepts.

Today’s topic: Machine Translation

This may be elementary to many of you, but there are two main MT models: Statistical MT and Rule-based MT and they produce translations in different ways.

Statistical machine translation (SMT) generates translations based on statistical models of probability, which comes from the analysis of bilingual texts. (The more text available, the better).

A document is translated according to the probability that a phrase in the target language would be the translation of a phrase in the source language.

Google Translate is an SMT, and this web source summarizes both MT and Google’s system very well: With a wide and extensive corpus, Google is able to apply different algorithms based on probability and statistics to generate translations.

Rule-Based Machine Translation (RBMT) is a general term covering machine translation systems based on grammar rules (syntax, morphology, and semantics, if you are a language nerd) in both the source and target languages. An RBMT system provides translations by applying those rules. 

Here is a table comparing the two:



RBMT needs a lot of knowledge that only linguistic experts can generate, like syntax and semantics of all the words of one language, the same rules in the second language, and the “transfer rules” between languages.

SMT needs to analyze parallel translations (source and target) to generate a translation engine; this is automated.

There are “out of the box” RBMT systems, and many buyers purchase these. From there, an RBMT system can require a lot of development and customization until it reaches the desired quality levels – which can take time. 

An SMT system can be developed rapidly if you have the appropriate corpus available, making the ROI faster. That said, I’ve never seen an MT tool for sale that came cheaply. (Note that you get what you pay for when using the free tools).

RBMT can be re-trained and calibrated constantly by adding new rules and vocabulary, which means more time is required of expert humans.

SMT automatically retrains when finding patterns or parallels not seen before, such as unknown or new words and changed translations.

RBMT may work better for more general domains.

Statistical machine translation works well for translations in a specific domain (like IT), when the engine is trained with content in that domain.

A rule-based grammar may not clearly identify exceptions, or mishandle an unrecognized word. They generalize too much and cannot handle exceptions.

SMT generates statistical patterns automatically, including a good learning of exceptions to rules.

They are also hybrid systems, which combine the best of both RBMT and SMT. Both types of systems can be combined with translation memory, which tightens the generated text with previously human-translated phrases. Combining MT with TM is the way to go because it increases the consistency and quality of their translations.

In your experience, which works better – RBMT or SMT – and for what type of content?