Neural machine translation is one of the most prominent applications of AI, particularly when it comes to language.
However, it’s far from the only application that can help global businesses understand multilingual information and communicate effectively. At RWS, we’re building AI technology solutions to help users go the last mile, beyond machine translation (MT).
Going beyond MT to make better decisions
We always ask our customers what happens to their content before it comes to our MT. What happens to this content after we translate it? And, crucially, how can we help our users automate their processes?
Here’s how we do it...
Content Insights is a state-of-the-art multilingual document summarization tool integrated directly into Language Weaver. Users can retrieve summaries of either native source documents, or documents that have been machine translated.
With our real-time slider, users can control the length of the insight to show a longer summary of a given document or a more succinct overview. This gives users the ability to make a decision on what to do next with a document – whether it’s extracting a specific piece of intelligence, sending for a professional review, determining relevance for eDiscovery, or something more.
Evaluation is always a hot topic in machine translation, but it generally happens after the fact.
What if we could tell how good the machine translation was at the segment and document level? In real-time?
Language Weaver’s proprietary Quality Estimation (QE) technology does just that – ranking documents and segments using a traffic light system for a variety of language pairs. When integrated into our connector for Trados Studio, for example, it enables linguists to better determine which segments they should spend their editing effort on.
Entity detection and tagging
A new feature, coming soon, is Named-Entity Recognition, also known as entity detection. This is a process by which certain terms in the content are tagged, either before or after translation. The applications are extensive!
Consider the detection of personally identifiable information (PII) such as names, addresses, and social security numbers. This information can be tagged so that it can be anonymised prior to sending for MT, or even used to determine sensitive information that should be routed to a solution like Language Weaver Edge.
If you don’t know what language your content is written in don’t worry.
Language Weaver will auto-detect the language and route the content to the right machine translation system. Our R&D team is also developing a solution to allow for the detection and routing of multiple languages within a single document.
Optical character recognition (OCR)
PDFs are a notoriously challenging content type.
Language Weaver integrates with a number of different OCR tools, ensuring the most accurate character recognition, regardless of the language – depending on whether the document is a text-based, scanned, or dead PDF.