How to Clean Up Your Translation Memories [Step-by-Step Guide]
Click here to close
Click here to close
Subscribe here

How to Clean Up Your Translation Memories [Step-by-Step Guide]

How to Clean Up Your Translation Memories [Step-by-Step Guide]


As your translation volumes grow, so do your translation memories (TMs): and exponential program growth often generates massive, unwieldy TMs. What content is lurking in there? Has it been audited for quality? Could your TMs be responsible for quality issues like terminology inconsistencies or the use of obsolete product names? If your translation memories have not been inspected for quite some time, they probably need a thorough cleanup. Go find your latex gloves, and let’s walk through the process.

Why TMs Need Maintenance

Much like a car needs regular maintenance to keep it running worry-free year after year, TMs also require regular upkeep to avoid breakdowns. If your TM maintenance is overdue, your issues can include:

  • Terminology conflicts within a single TM, or across various division, product or domain TMs
  • Wholly inaccurate translations in TMs. It says “apples” in the source language, but “oranges” in the target.
  • In-context, exact (ICE) matches that contain errors yet are getting lugged in over and over.
  • Duplicate translations for a single source, usually caused by importing TMs after a review has occurred offline.
  • Multiple matches from a TM with no way to determine what is the “best” choice.

These ongoing issues degrade the reliability of match results and increase the chance for introducing more mistakes. This has created a vicious cycle that compounds the problem with each added translation and propagates incorrect translations into the future. For an enterprise’s TMs to become a dependable linguistic resource into the next decade and beyond, this issue has to be addressed.

How to Do It

Any TM overhaul project should comprise the below steps:

Phase 1: Setup

If you have a large volume of TMs, it is key to start by identifying and prioritizing the TMs that contain terminology core to the translation of content for all divisions. You might want to start the overhaul process with those TMs that:

  • Are the crux for a lot of other content or have the longest history
  • Have the highest consumer visibility (e.g., your flagship product)
  • Contain consumer-facing content (e.g., not sales training)
  • Are in languages with the largest volumes of translations (such as FIGS and CJK) and thus contain more potential inconsistencies

Phase 2: Term Consolidation

The goal of this phase is to create a final terminology list that establishes a one-to-one (source-to-target) equivalency for each term, and flags terms where multiple translated variants are acceptable. It also should “blacklist” any terms that have been depreciated or are unapproved.

You should do the following to resolve terms inconsistencies within your current set of linguistic resources:

  • Download those TMs related to the cleanup effort (as TMX exports)
  • “Mine” (extract) term candidates from TMs in a bilingual format, using the term-mining features of industry-standard desktop CAT tools or your TMS (alternatively you could use open source/freeware tools for n-gram mining)
  • Select the most commonly used term candidates
  • Match each source term with all possible target term equivalents used in the TMs
  • Tag your terms with specific metadata, such as:
    • The TM from which they came
    • The last date of utilization in a TM entry
    • The product line they relate to… etc.
  • Have reviewers — yours or your vendors — choose the best 1-2 terms
  • Provide a final terminology list to your reviewers for feedback and final sign-off, including approval of “blacklisted” terms
  • If there is an approved SEO keyword list for a particular language, we recommend including this in the review as well.
  • Export a Master term list for use as a terminology database across all future projects

Phase 3: Translation Memory Cleansing

This phase is about cleansing of the TM by carefully applying both linguistically intelligent technology and human validation processes.

The Master term list produced in the previous step will be used as a guide to cleanse the TMs chosen in the Setup phase of errors, false positives and any other linguistic “noise”. You would ideally execute all the following steps in order, but even just a few would help, and the order can be changed based on your preference.

  • Filter entries in the TMs for outdated translations. Those that were last utilized before a certain date should be cut from the TM — much like milk is undrinkable past its due date, so should you treat old translation segments.
  • Eliminate duplicate TM entries with more than one target segment for a single source segment.
  • Filter entries in the TM for “blacklisted” terms and edit the segments with the newly approved terms. Once complete, filter and eliminate any newly created duplicate TM entries; when terminology is made consistent, new duplicates might be created.
  • Run automated terminology QA checks to identify when the translation of an approved source term is missing from the target. This will help flag mismatched translations where the target translation does not line up with the source.
  • Spot-check (or full) review of all remaining translations.

The deliverable at the end of the process will be one or more cleaned TMX files to import into your TMS.

Phase 4: Future-Proofing Updated Translation Assets

Once the Master term list is established and all related TMs are overhauled, it is important to take steps that will reduce the recurrence of errors and keep your TM running smoothly. Such activities include:

  • Requiring that the Master term database is used on all projects, regardless of product or division, to automate enforcement.
  • Adding an automated QA check of all segments against the termbase for all new translations and, should a term not pass the check, require a written explanation as to why the segment is an exception to the established rule.
  • Establishing a defined process for term maintenance: how a term will go from being added/edited/updated, to approval by all associated parties and led through to final acceptance.
  • Appointing an official “gatekeeper” of terminology to regulate the database. He/she must grant approval before a new term is available for utilization by the translation teams.


The value of accurate product terminology grows over time because you can leverage past translations to improve your localization program’s ROI. What are the long-term benefits?

  • Reduce localization costs. An approved terminology database will help linguists work more quickly by reducing the time spent researching and translating terms.
  • Prevent misunderstandings and errors. Misunderstanding is frustrating and can be damaging to a brand. It can slow in-country product adoption.
  • Maintain your brand in-country. You rely on brand recognition. Further, your brand has been developed to most effectively reach your target market. Retaining your brand in-country can be key to your success in the new market.
  • Reduce customer service issues. When your web and marketing terminology aligns with the product documentation, it’s easier for customers to find the help they need on their own and avoid needing to reach out to the helpdesk.
  • Improve web traffic. Consistent use of branded keywords will help your ranking in online searches, boost findability, and drive quality website traffic.