Scaling tone-of-voice conversion with AI and linguistic expertise

Key benefits

Standardized tone across multilingual translation memories
Reduced manual effort for large-scale language updates
Combined AI speed with native-speaker linguistic judgment
Improved quality through tailored QA and structured review
Built a scalable workflow that can be adapted for future languages and datasets

Two global consumer brands needed to update tone of voice across multilingual translation memories quickly, accurately and at scale.

For both clients, the challenge started with a shift in tone strategy. Existing translation memories contained large volumes of legacy content that no longer matched the preferred style. In these projects, the goal was to convert translations from informal to formal across Slovenian, French, Czech and Slovak, while preserving meaning, grammar, formatting and brand intent.

That sounds straightforward on paper. In practice, it was anything but. Tone-of-voice conversion is not a simple pronoun swap. In languages like French, Czech or Slovak, changing tone of voice can affect verbs, adjectives, sentence structure and stylistic conventions. And when you are working across hundreds of thousands or millions of words, even small inconsistencies can become a large operational burden.

Precision, not just automation

RWS’s LAIS team was brought in to design a workflow that could apply tone changes at scale without introducing avoidable risk. The brief was not to “rewrite” content. It was to change only what needed to change and leave everything else untouched.

To get there, the team combined data preprocessing, prompt engineering, native-linguist review, customized QA and post-processing into a single structured workflow. The result was a process built for control as much as speed, one that could help enterprise clients modernize large translation memories without sacrificing linguistic quality.

Challenges

Updating tone without changing meaning or style unnecessarily
Identifying which segments should be converted and which should stay untouched
Managing language-specific rules across four very different languages
Handling client-specific edge cases and contextual exceptions
Preserving tags, punctuation, spacing and formatting in final outputs

Solutions

Preprocessed TMX files to isolate only relevant segments
Worked with native linguists to define language-specific rules and examples
Tested and refined prompts on smaller samples before scaling
Built QA checks to catch linguistic and formatting issues
Reinserted approved outputs into original translation memory formats

Results

98%+ prompt accuracy across four languages
~7,400 hours saved vs. manual review
Reduced effort from ~7,600 to <200 hours
Expanded from pilot to larger datasets
Built a repeatable, scalable workflow

Teaching AI where the line is

One of the hardest parts of the project was not getting the model to make changes. It was getting the model to respect boundaries.

The clients needed the LLM to update tone-related elements only. That meant no creative rewriting, no unnecessary “improvements” and no interference with segments that were already correct. Product descriptions, interface strings, quoted content and other out-of-scope text all required careful handling. As Eva Kalhousova explained in the internal write-up, the model needed to focus on grammatical adjustments tied to formality rather than drift into broader stylistic edits.

This is where the work became highly contextual. Some segments looked like ordinary strings in a translation memory but carried a usage context the model could not reliably infer on its own. In one client dataset, there were commands directed at a branded AI assistant. In another, there were movie quotations and references where a formal rewrite would have sounded unnatural or simply wrong. Those segments needed to be identified during analysis and excluded or reviewed with special care.

That is a key point in this story. The challenge was not whether an LLM could generate formal language. It could. The question was whether it could do so only when appropriate – and in a way that respected context, language rules and the source material. Left unchecked, even a strong model could become overzealous. That is why RWS designed the process around control points rather than raw throughput.

A workflow built for multilingual nuance.

The LAIS team did not train the model itself. Instead, we focused on making the data and instructions usable for the task at hand. That started with translation memory analysis and preprocessing. Source and target segments were extracted from TMX files, then filtered so that only relevant content moved forward. Empty segments, numeric-only strings and identical source-target pairs could often be excluded early, reducing unnecessary cost and lowering the risk of introducing noise.

From there, native linguists played a central role. Even in languages the internal team knew well, RWS collaborated with native speakers to compile rules, examples and transformation patterns for each market. That mattered because formality is expressed differently across language families. Some languages have multiple forms of formal address. Others distinguish masculine and feminine forms or treat formal singular and plural differently. The prompts had to reflect those realities if the outputs were going to hold up under review.

Once those rules were in place, RWS designed prompts and tested them on small samples, usually several hundred segments at a time. Linguists reviewed the outputs not as standard human translation QA but as a way of measuring prompt performance. Did the model correctly recognize which segments needed conversion? Did it preserve meaning? Did it apply the correct tone without introducing grammatical or stylistic issues? That review cycle helped the team refine prompts before scaling them to much larger datasets.

4 languages
98%+ accuracy
7,400 hours saved

Catching the errors that matter

Even strong outputs needed guardrails. During review, the team saw patterns that would have been easy to miss without targeted checks. In Slovenian, for example, the model sometimes replaced the possessive pronoun “my” with “our” or “your,” even though those changes had nothing to do with tone of voice and altered meaning. In other cases, the model introduced capitalized forms of “You” or “Your” to signal formality, even when that convention was not appropriate for the language or requested in the prompt.

Formatting created another layer of complexity. Occasionally, the model returned comments instead of translations, left a segment blank or altered tags, punctuation or non-breaking spaces despite explicit instructions not to do so. At translation memory scale, those issues are not minor. A misplaced tag or corrupted placeholder can create downstream production problems quickly.

So RWS responded with tailored QA. These checks did not replace linguistic judgment. They made that judgment more efficient, surfacing problems systematically and helping the team correct them before they reached the final files. Over time, the process got stronger. Based on lessons from the first projects, the team improved pre and post-processing, placed greater emphasis on protecting tags and placeholders and expanded QA to catch language-specific issues earlier.

From successful pilots to a reusable model

The impact was substantial. Across the combined projects, RWS estimates that prompt engineering, data engineering and linguistic review took under 200 hours. A purely manual review of the same volume would have required roughly 7,600 hours, translating to an estimated savings of about 7,400 hours.

Accuracy was equally important. After iterative testing and refinement, final prompts achieved more than 98% accuracy across all four languages. Reviewers confirmed that the model was not only producing high-quality formal translations but also correctly identifying when a segment should remain unchanged.

Just as important, the work did not stop at a one-off success. One client expanded from an initial language to additional languages after seeing the results, while another moved from a pilot to a much larger dataset once confidence in the process was established.

Where linguistic precision meets AI scale

Meeting the immediate need was only part of the story. Just as important was building a process that could adapt, improve and continue to deliver value as client requirements evolved.

In these projects, RWS delivered much more than capacity. The LAIS team created a structured, repeatable approach to multilingual tone-of-voice conversion, one shaped by native-speaker expertise, strengthened by tailored automation and designed to handle the real complexity of enterprise language data. What emerged was not a quick fix but a reliable framework for making large translation memories more consistent, more usable and more aligned with brand expectations.

That is the kind of foundation modern AI-enabled localization demands. Not automation for its own sake but deliberate solutions to the right problems, built with the right linguistic insight and designed to scale with confidence.

Explore how RWS can help you modernize multilingual content with AI-driven linguistic solutions and a smarter approach to localization at scale. Connect with us today.