Slate and big TMs: the perfect combination?

Slate-Desktop-LogoThis blog post is about a Personalized Translation EngineTM that is about to be released. It’s called SlateTM, a machine translation engine that you build from your own translation memories and keep on your own machine. An exciting idea. Is it for translators like you and me?

Disclaimer

First, let me clarify that I hold no commercial interest in Slate or in Precision Translation Tools, the company behind Slate. I haven’t even tested Slate, because it isn’t on the market yet (or rather, the Windows version isn’t). So this post is really me thinking out loud, at the same time as airing the topic, in case anyone hasn’t heard about Slate yet.

What is Slate Desktop?

It’s a Windows application based on Moses (a statistical machine translation toolkit) that you will be able to integrate in your CAT tool to get machine translation (MT) suggestions. In the case of SDL Trados Studio, for example, Slate would be listed as an MT provider. You should be able to train the engine more or less off the shelf, through its GUI (graphical user interface).

How does it differ from Google Translate?

This is where it gets interesting. You feed Slate your own translation memories (TMs) and other bilingual resources. It processes these resources on your own machine, typically overnight, and sets up your own personal engine. This means Slate will use your terminology, copy your spelling and mimic your style, which is a very exciting concept (and something Google Translate will never achieve).

The other big difference is that there are no confidentiality issues; you don’t even need to be connected to the internet.

TM size and computer requirements

A TM needs to have at least 100,000 translation units to start producing meaningful results in Slate. The first time you set it up, you can combine TMs and other resources to train Slate. Later, the engine can be regenerated by adding more resources, but it won’t be updated in real time, as you work.

Your computer must have at least 8 GB of RAM and 250 GB of free disk space.

The cost

The Windows desktop version of Slate is being crowdfunded through an Indiegogo campaign. So if you contribute now by buying a “perk” for $330, you get Slate Desktop at a 40% discount. Or you can wait until it’s released in January 2016 and pay $550 for it.

Will Slate produce good results?

That’s the burning question.

In my case, I have a big medical TM that definitely meets the size requirement. Some of my TMs are client specific, but only one of them is approaching the minimum size for Slate.

So, thinking out loud (and please join in this open conversation in the comments below), and looking at the types of medical texts I translate, one by one:

  • Clinical trial agreements are repetitive and I can imagine that Slate would produce excellent results for them, but my TM already returns very high matches for these projects, so my increase in productivity would be minimal.
  • Other clinical trial documentation (investigator brochures, protocols, etc.) is quite monothematic and repetitive. Slate could be very useful for this.
  • Articles for medical journals are confined to specific medical fields, so in theory Slate could be useful, but in practice each article is a world of its own, reporting new procedures and findings. Slate would simply leave any unknown terminology untranslated. On the other hand, strict journal style would be respected, so Slate will know that I always spell e-mail with a hyphen, and it won’t suggest that a patient “suffers” symptoms as I have never used this expression in my TM.
  • Product information (SmPCs, package leaflets and labelling) could benefit from Slate, but my own TMs in this field aren’t that big. I can’t use the publicly-available EPARs, because they’re too poorly aligned to feed into an MT engine.

The jury is out

I haven’t paid upfront yet (or made that “leap of faith”, as Slate’s Tom Hoar aptly phrased it in his interesting and informative webinar, which is still online). I expect I will jump on the Slate bandwagon, because I love trying out new translation technology and the temptation is huge. And it might just be fantastic for my work.

But before I make the final decision, I’d love to hear from you. How do your TMs size up? Do you think Slate would be useful in your work?

This entry was posted in 2. Beyond the Basics, SDL Trados Studio and tagged , , , . Bookmark the permalink.

29 Responses to Slate and big TMs: the perfect combination?

  1. René López says:

    Sounds really interesting; However, I’m not sure I hit the minimum segment number required. I just finished a big translation project that was over 2 million words long, but never in my mind did I think about counting the segments. I used Trados & matecat and created my own TMs according to each of the fields. Basically, engineering fields: electricity, electronics & mechanics. I started using matecat a few months ago, and I have loved it. I used the TMs that I had already created, but the only thing is that sometimes clients won’t allow you to use internet based TMs. So in this cases Trados is a must, at least for me.
    Having read your points, I think that in my case I would benefit a lot. These type of Technical translations are really repetitive. So I’ll give it a thought and I might back these guys up.

    • Thanks for your comments, René.

      A rough rule of thumb for estimating number of segments in a TM is dividing the total words by ten (i.e., each segment has an average 10 words), so your > 2 million word project could well have 200,000 segments.
      In Studio, as I expect you know, you can see how many segments a TM has by going to the Translation Memories window > Settings > Translation units.

      Interesting point about working with MateCat and server-based TMs. As far as I understood from the Slate webinar, Slate won’t be able to connect to cloud apps, only desktop tools, so that would be a drawback.

      Glad to hear you think Slate would help you with your technical translations, especially as you have such clearly defined fields.

  2. Pingback: Slate and big TMs: the perfect combination? | T...

  3. Pingback: (CAT) - Slate and big TMs: the perfect combinat...

  4. Pingback: (CAT) – Slate and big TMs: the perfect combination? | Emma Goldsmith | Glossarissimo!

  5. Thanks for the interesting post, Emma. I’m one of the people who took the leap; I paid the $330 for the “Super Early Bird – Other” perk, and have requested a CafeTran connector. I hope they reach their goal.

    I have a very, very large collection of TMs (of many different kinds and sizes; publicly available ones as well as my own), and can’t wait to start experimenting with them. No idea if the results will be better than auto-assembly/fuzzy matches in CafeTran, but I am always on the lookout for new translation t̶o̶y̶s̶ tools 😉

    • Yes, I saw your name on the list, Michael. Well done for taking the leap!
      I expect you’ll be interested to see how termbases fit into the equation. If Slate can be taught that TBs take priority over TMs when training the engine, that would be excellent.
      I’ve been testing the SDL Language Cloud recently, and one its features (with the paid version) is integrating your own TBs in the MT output. Actually, I never got this part working properly, but it’s definitely something that caught my attention.

  6. Nora Díaz says:

    This sounds very interesting, Emma. I like your breakdown by type of document, I think that’s what it will all come down to: having the right kind of source file. I agree that when you have good TMs and termbases, productivity gains might end up being minimal, so I wonder if the investment would make sense in that case. Like you, I have also experimented with SDL Language Cloud with my own TBs but was somewhat disappointed as I didn’t find it very consistent, but I like the idea of being able to integrate my own resources into an MT engine, so even though I don’t think I’d be prepared to pay upront, I think I could be persuaded to join in once I see some convincing real-life results.

    • I didn’t know you’d been trying Language Cloud too, Nora. I was also interested in trying the Life Science vertical but it’s only available out of English. Language Cloud was quite good – about the same level as Google, I found.
      Anyway, you’ve come to a very sensible conclusion with Slate, Nora, letting other people be the guinea pigs 😉
      I’m not sure if I’m that sensible!

  7. tahoar says:

    I’m Tom, the founder/CEO of Precision Translation Tools (PTTools). Thank you, Emma, for sharing your clear review and hopes for Slate. I understand your and Nora’s positions. So, I’d like to announce two changes to the Indiegogo campaign (http://igg.me/at/slate-desktop). The first change is a new Perk. Now, translators like Nora with a more conservative threshold for risks can spend $10 and receive a $100 rebate voucher to purchase Slate Desktop after release. The voucher will be good through Dec 2016. The second change is the addition of a benefit to the existing “Super Early Bird” Perks. Each of these Perks now includes a perpetual 40% discount on all Slate Desktop upgrades as our way of saying “thank you” for accepting your role as a pioneer.

    A note about TM sizes. I’d like to hear translators perspectives about where “big” starts, i.e. how many segments. The most important aspect of using TMs to create an engine is that the segments represent the work you expect and translations within a segment are consistent across all segments. That does not mean the source language has repetition. It means that the source/target pairs consistently express the same wording/usage across all pairs. The next important aspect is that new jobs share continuity with the historic work. When these things are present, we have received reports of engines providing translators measurable benefits, even when created from very “small” TMs with as few as 50,000 segments. No matter how many the translator start with, it costs nothing extra (except time) to use an improvement cycle that updated the engine every 5,000-10,000 segments (or so).

    Ultimately, this technology is a real-world extension of the proverb, “the proof of the pudding is in the eating.” We will do all we can to help you make the pudding. We’ll guide you through the process to select and prepare the ingredients (TMs). We’ll make the best possible mixing bowls, spatulas, pans and baking cups (tools). We’ll make recipes and offer training courses that help you learn to use the tools and optimize your results. What we can’t do is taste the pudding for you.

  8. Thanks for joining the conversation, Tom, and for the news about the new perk and extra bonus.

    I didn’t realise that Slate might work for smaller TMs, even as small as 50K segments. For people who work in confined fields, this would be worthwhile tasting, to follow your analogy. 🙂

  9. german2dutch says:

    I think that it’s a technology you just have to explore, as a freelance translator. Translation agencies will explore and use it, so you’ll have to know, what will be coming up in the next few years. I’m not sure what it will bring: my glossaries are already containing many fragments and auto-assembling is doing a good job for me (better than Google Translate or Bing). But it’s a technology that I’d like to explore.

    On the other hand, I guess that some agencies will use this technology to reduce the number of “No Match” segments, by “reusing” the segments that you have already delivered.

    So this whole innovation is not trivial and will have serious consequences for us freelance translators. We will just have to deal with those.

    • tahoar says:

      Although we design Slate Desktop as a productivity tool for translators and small agencies (i.e. the 10’s of thousands of agencies with 2-5 people) I agree that some agencies will be tempted to use it to supplement their process automation. The large agencies, however, are investing in their own tools. This presentation link is an example, just published 2 weeks ago:

      http://ufal.mff.cuni.cz/mtm15/files/02-real-world-application-of-mt-workflow-tomas-fulajtar.pdf

      Page 5 titled “The MT Ecosystem” shows MT at the center circled by 4 participants: Academic, Commercial Development, LSP (Language Service Providers), and In-house MT Owners. There’s no mention of translators participating in this vision of the ecosystem.

      Page 7 highlights they have 100+ engines. I request everyone join our Indiegogo campaign so our new Slate Desktop community surpasses their total! Let’s go for over 100 contributors!

    • I agree, german2dutch, as freelance translators we need to keep abreast of technology developments. As Tom points out, bigger and smaller translation agencies alike are already investigating and implementing customised MT. With long-term clients and high-quality TMs for each client, it makes sense for them.

      In practice, though, agency TMs are often poor quality. The bigger the agency, the poorer the TM quality, in fact. That’s where we’re at an advantage, because our TMs are our own work, and our work alone.

      I’m not worried that customised MT will be used to reduce the “No match” segments and increase the number of post-editing projects. I don’t feel it’s a threat for me in what remains of my working life, and it’s hard to tell what will happen further down the road.

      With CAT tools in general, “old-school” translators have always protested about investing in technology that only benefits agencies. I disagree: investing in technology makes good business sense for everyone, freelance translators and agencies alike.

      • tahoar says:

        Emma, we’ve processed countless TMs into engines over the years. By and large, our experience mirrors your comments. We traced the problem to inexperienced, lazy and incompetent localization engineers who inproperly merge updates from various sources. Some agencies protect their TMs as trade secrets, i.e. never let them out of their control much less upload them to Cloud-based MT engines. We’ve worked with only two medium/large agencies like this when training their staff to use the tools, and their in-house staff translators love the benefits. They are definitely in the minority.

  10. Looks very good, Emma. However, I would prefer to continue using Google Translate, which gives excellent results overall, but be able to combine it with my TM and terminology. In short, a tweakable Google Translate or an app that allowed us to override GT when we need to. This would be the big leap forward for me but I fear it will be long in coming.

    • tahoar says:

      Alan, it sounds like you’re experiencing the good side of GT that can performs well for some subjects and language pairs. I can’t say if Slate Desktop will improve your experience. Like I said, “the proof of the pudding…” I can say that it gives you unlimited tweaking options to override its defaults. So, the “big leap” has more to do with the leap of faith that we’ll deliver in January, which is not so long in coming. 🙂

    • The tweakable Google Translate that Hans (AKA german2dutch) has pointed to in CafeTran is just what you’re looking for, Alan!
      However, I think Slate has two advantages over this:
      1) no confidentiality issues
      2) it will learn that I use UK spelling, whereas Google Translate hasn’t woken up to the fact that there are more variants in this world than US English 🙂

  11. Ana Iaria says:

    I do like this idea and I would be interested. I agree that, as professional translators, must keep abreast with tech developments and keep pace with them – or we’ll be left behind.

    I understand that I will be able to select my language variant, right? For us, Portuguese speakers, using engines like GT or Bing and SDL cloud (I could be wrong here, as I used it for a limited time) is that we get a mix of EU and BR which proves to be a hinder, having to sort out spelling/grammar/etc.

    If the variant language option is there, I will definitely will joint the “leap of faith.”

    • Hi Ana, as far as I understand (and maybe Tom can confirm), Slate works solely on what you put into it rather than a specific language variant. So, whatever Pt variant you put in, is the variant you’ll get out.

    • tahoar says:

      Hi Ana,

      Thank you for joining our growing community. Emma is also correct. The engine learns from your TMs. So, be careful to create one engine with BR TMs and another with PT TMs. Slate Desktop supports 29 languages including PT. If you have the TMs, you can create engines for PT-EN, EN-PT, EN-ES, ES-EN, PT-ZH, ZH-PT, etc. at not extra cost, without technical or licensing limits. This means with the TMs, you can create engines for 812 language pairs!

  12. tahoar says:

    Emma, back to your original point about hearing from others before you make your final decision. The TriKonf conference in Germany this weekend is a great opportunity to talk to your colleagues. Several current customers and backers will attend. Also, some attendees have built their own Moses/Linux systems will also attend.

  13. Pingback: Slate and big TMs: the perfect combination? | L...

  14. Pingback: Slate and big TMs: the perfect combination? | W...

  15. Pingback: Slate and big TMs: the perfect combination? | M...

  16. Pingback: Weekly translation favorites (Oct 9-15)

  17. Pingback: Guest post: 5 exciting examples of crowdfunding for translation « Translator T.O.

Comments are closed.