How to get rid of a tag soup in Trados Studio.

If you read CAT tool forums at ProZ, Translators Café, Yahoo Groups and elsewhere, you will have noticed that questions about rogue or junk tags come up most weeks. Of course, if you’ve just opened a file in Studio and are faced with a plague of tags in each segment – even interspersed between characters – you may not have the patience to take a deep breath and search for the answer. A new desperate plea for help is sent out.

Signs & Symptoms

Trados inserts tags within words. Is this a bug?

Studio is riddled with rogue tags. Urgent!

Tag soup in StudioI have a rash of tags. How can I get rid of them?                 

Tag soup in the Editor window. Help!

          

Diagnosis

The good news is that it isn’t a problem with Trados Studio. You’ve got a file that looks like a decent Word document, but in actual fact it was originally a scanned image converted into text using OCR software.

Optical Character Recognition is a life saver if you need to work on an editable file, but if you don’t pre-process the file you will end up with tags every time the OCR application thinks there is a change in font, size or spacing. And that can be between every single character in a word.

It’s not until you open the file in Studio that you see the extent of the problem.

Treatment

Close the file in Studio and go back to Word where you can clean it up. There are different methods for doing this. Have a read through them and decide which one suits your particular document:

Clear formatting in Word1. Clear the formatting. If the text doesn’t have much basic formatting (bold, font size, bullet points, etc.) then the easiest solution is to clear all the formatting. Select the whole document (Ctrl+A) then go to Home [tab] / Font /Clear formatting. That will leave you with plain text and no tags.

2. Define the basic formatting. If you want to keep some formatting, such as bold, italics, and tables, just get rid of the main culprits by defining font, size and spacing. Again, select all (Ctrl+A), then go to font (Ctrl+D) and select one font (e.g. Arial) and one size (e.g.11). If you’re using Word 2010, go to the advanced tab in font and select 100% for the scale, normal for spacing and make sure kerning is disabled. If you’re using Word 2007 you’ll find these settings in the Character Spacing tab.

3. Use a macro. If you want to automate the second method, here’s a quick macro to do it:

Click on the macro to download it in .doc format.

If you want to learn more about creating macros yourself, check out how to Record or run a macro in Microsoft Office Help and the Macros and VBA section at Word MVPs.

4. Use CodeZapper. The last, all-in-one solution is a well-known set of macros created by David Turner called CodeZapper. I find it very useful when the formatting is complex and I need the final layout to look just like the original. It is a .dot file that can be simply copied into your Word start-up folder. For €20, you’ll solve a lot of headaches.

Prophylaxis

What can you do to prevent this happening in the future?

skip advanced formattingIn Studio, make sure you’ve activated the box “Skip advanced font formatting (tracking, kerning, etc.)” for .docx files. This actually ignores all the formatting specified in the Advanced Tab of Word 2010 (Character Spacing tab in Word 2007).

If you import a PDF straight into Studio, you do so at your own risk! A better solution is to use professional OCR software, such as ABBYY FineReader or Omnipage where you can process your file before you convert it. That way you can decide whether to preserve formatting, images, etc.  Another option – launched in 2011 – is Adobe Export PDF, which has the advantage of preserving headers, footers and bullet points better than the other applications, but the disadvantage of not being able to customise settings in advance.

A last but very important point is to decide whether you’re going to charge more for translating a PDF or scanned image than an editable file. Should you charge by the hour or add a surcharge to your normal rate? Should you charge per source word or would it be better to use the target word count for these jobs? What do you recommend?

Image attribution: Thanh
About these ads
This entry was posted in 1. The Basics, SDL Trados Studio and tagged , , , , . Bookmark the permalink.

10 Responses to How to get rid of a tag soup in Trados Studio.

  1. wordstodeeds says:

    Excellent post, Emma! Clear, informative and really useful. Thanks also for the heads-up about Code Zapper.

  2. Vitaly says:

    Hello Emma,
    Thank you very much for the very useful article. I personally learnt some methods how to get rid of these annoying tags. A lot of thanks.

  3. KolumbinaBT says:

    This is great stuff, Emma, thank you very much!

  4. I recommend Dave Turner’s CodeZapper to everyone who has a license for MS Word. It’s the best tool available for cleaning up rogue tags in most texts. Registered users also receive automatic updates by e-mail, and Dave continues to refine his macros.
    With regard to charges, it is really time that all translation providers – agencies and freelancers – routinely include tags in their cost calculations. Trados studio counts tags, and a “word or character count weighting” can be decided for tags. If you choose to count one tag as one word (actual testing has shown that the time burden imposed by a tag is at least at this level, perhaps closer to two words), then 200 tags in a document with a lot of formatting will count as 200 additional words on the invoice. Cleanup work for poorly formatted OCR documents received can also be compensated with such a calculation – after all, it costs time to do the cleanup by any method.

    • Hi Kevin, I like your idea of charging for tags, and if you’ve actually persuaded your clients to pay for them, I’m even more impressed. The mere mention of charging by the hour or adding a surcharge often results in an original editable file suddenly being found, or certainly more effort being put into the OCR, so having the option of charging per tag gives us another, perhaps fairer, card to play.
      Thanks for dropping by with your comment,
      Emma

  5. paulfilkin says:

    Reblogged this on multifarious and commented:
    A great article Emma… this question does come up quite a lot and you’ve done an excellent job of handling it.

  6. Jim Shanks says:

    Hi Emma/Paul,
    Paul’s hyperlink to multifarious is defunct. Perhaps someone would like to update it?
    Jim

  7. Elena says:

    Thank you Emma, I thought I was going insane! It’s my first project with Studio 2011.

  8. Rev Rave says:

    Brilliant! You have just saved my day. All the bad tags have vanished.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s