New day, new job, but wait… the source is PDF and you really do not want to deal with broken formatting, missed text, a huge number of tags, and all the misalignments. This is why pre-processing and cleaning up the source file before processing in a CAT tool is important.
Unless you have access to optical character recognition (OCR) professional editing tools for processing PDF files, you can check our RWS AppStore and use dedicated PDF Assistant for Trados (https://appstore.rws.com/plugin/197) to convert the source files into Word format for cleanup and further processing the usual way in Trados Studio.
This process is suitable not only for documents with tables, charts and illustrations, but also for text editable PDFs as it gives you full control over the formatting of the document to ensure headers and footers are considered as headers and footers, paragraphs flow naturally and there none of the source is missed.
After installing this app you can simply open it from Trados Studio > Add-ins tab, add or drag and drop your PDFs, on the next screen select whether you want to extract text from graphics (strongly recommend for any documents with charts, illustrations or non-editable text) and after pre-processing you can OCR the selected images (ie content analyzed and converted into editable text) and which should remain as pictures. You will probably convert all the content in illustrations and charts, but you will want to keep client’s logo, stamps and similar as images.
Your Word files are created. Before you process them in Trados Studio, please do have a look and check whether or not the conversion succeeded (beware of foreign words in source which may get corrupted during OCR). There might be some parts in which OCR has not been successful, ie, some of the images/charts may remain as images – here you will need to select another strategy – possibly manually creating relevant content in a Word file – unless you know the same is present elsewhere in the document as text and you will process this part only during post-production utilizing your translation memory (TM) after completing the linguistic part. It is also a good practice to run spell-check within Word and make sure there is no corrupted formatting (including line breaks, lists, headers and footers) – if needed, select all text that should be of the same formatting and set the style. Now your Word documents are ready for processing in Trados.
Remember, once your linguistic work is done, you need to generate target Word files and perform post-production so translated documents are as close as possible to the original PDF source. Depending upon the agreed output format, you will provide either cleaned Word files or export the files into PDF format which should be very close to the source in form and layout.

Author
Ladislav Hlavatý
Senior Language Manager
