Five Steps to Building a Quality Scoring Framework
Click here to close
Click here to close
Subscribe here

Five Steps to Building a Quality Scoring Framework

Five Steps to Building a Quality Scoring Framework

5 Steps to Building a Quality Scoring Framework

Everyone has different ideas when it comes to the quality of a translation. If you ask five different people to judge the quality of a piece, you will probably get five different answers.

But subjective assessments aren’t good enough: you can’t see trends, personal preferences will get in the way, stakeholder agreement can take time, and as a result, the end user will get inconsistent deliverables. But there is a way out of this. Define quality, and then measure and track it with a formal structure.

Our Director of Language Services, Katka Gasova, talked about this in the recent TAUS Quality is Communication webinar. She teaches us why and how to build a quality framework that drives the ability to control quality across all your deliverables. Keep reading for insights and tactics you can apply to your quest for quality.

Map your content types to quality levels

You have many different types of content—from websites to technical user manuals to online help to product reviews. Do you know and agree on what quality is required for each type? It may not have crossed your mind that quality could vary by content type. Depending on the purpose and importance of each content piece, you can pinpoint the exact right level of quality. (And doing so can positively impact your budget and schedule.)

For example, you might not need the same quality for a service manual as for a web page. While top quality may be required for highly visible marketing material, only the gist might be needed for product knowledge bases.

Categorize your content types and consider how to link them to specific quality levels such as:

  • Premium: These deliverables will have no grammar, terminology, or fluency errors. The translation is faithful to the source text. Idioms, humor, and other regionalisms are adapted to the given locale. The source text’s tone of voice is preserved for the target locale.
  • Standard: Translations will be accurate and grammatically correct, but the content may not be stylistically perfect and there may be unimportant inconsistencies in terminology.
  • Basic: Deliverables with basic quality will be understandable and clear. They may have grammatical, stylistic, and terminology issues—but they won’t be serious.

Now that you’ve mapped deliverable types to quality levels, it’s time to build the framework for measuring and driving your quality program. There are five elements to this framework.

1. Error types

Error types help you understand the nature of translation mistakes. Typical error types include accuracy, language quality, fluency, punctuation, grammar, terminology, country standards, and any customer-specific requirements. You may follow existing industry standards such as SAE J2450, TAUS DQF/MQM, etc.

Besides specifying the main error types, it’s useful to define subcategories. Overtranslation, understranslation, TM match not adapted, incorrect date format, punctuation, grammar, spelling—these labels will help you further analyze the types of issues you find and identify root causes: terminology glossaries could be outdated, the translator’s performance could be inadequate, or the source could have been so ambiguous that it was difficult for the translator to be accurate, to name a few.

2. Error severity

Once you have determined error types, move on to severity. Errors can be categorized as critical, major, or minor. In this step, you consider the impact of the error in context. Does the error impact the reader’s ability to complete an action? Can the error have legal consequences for the product manufacturer? If yes, these errors would be severe. If not, they can be labeled as minor errors, just like extra spaces around quotation marks.

You may also want to define the severity of preferential or repetitive issues. The typical severity category is minor, or zero on a scale of 1 to 10, but if it’s a repetitive problem, the severity should be increased.

3. Error weight

Once you define error type and severity level, it’s time to assign a weight to each. Weights are expressed as a numeric value, for example in multiples of 1 or 5. If a reviewer marks an accuracy error as major, a weight of 5 is given. If that reviewer marks a terminology error as minor, then a weight of 1 is given. When you tally the points reported for each type and severity, you get a score.

4. Pass/fail threshold

Now you determine what scores are acceptable. The thresholds are based on allowed error points for a certain number of words (usually 500 or 1,000).

For example, 10 errors per 1,000 words might be a failing score for a user manual. Zero errors per 1,000 words might be a passing score for ad copy.

And lastly…

5. Build an evaluation form

The final step is to build a form for reviewers to use. Ideally, this happens using a CAT tool plug-in connected to the QA database. The scorecard—usually in Excel—is created automatically by extracting the data into a template. Nothing is left to the reviewer except to determine each error’s type and severity. Formulas then calculate the error points allowed for the word count, assign points for each error and its severity, and calculate whether the text passes or fails.


If you want to be able to track and measure quality in an objective way, there are no shortcuts: you need to assess the quality needs per content type and build a structured framework for defining errors. Soon you’ll be able to see patterns to quickly determine which fixes to make or leave alone, and which translators are performing for you.