Last September, I wrote a blog post about Slate Desktop, a new customised machine translation (MT) engine developed by Precision Translation Tools that was released in February this year. Since then, I’ve been testing Slate Desktop alongside the SDL Language Cloud Custom MT Engines solution. Here’s a report on my findings so far.
Some background facts
What is a customised machine translation engine?
A generic machine translation engine (think Google Translate) is the result of processing huge bilingual, parallel text corpora. It’s usually free or quite inexpensive for the end user. Customised MT is built on your own translation memories (TMs) and it produces more meaningful translations in your own fields and languages. What you get out depends entirely on what you put in. It’s not free (see below for prices).
How and where do you build your own MT engine?
Both Slate Desktop (Slate) and SDL Language Cloud Custom MT Engine (Language Cloud) need to be fed a TM with at least 90,000-100,000 units. Fewer units may be fine in very narrow fields. The engine building process is fairly straightforward in both cases, but it’s time consuming.
In the case of Slate, the engine is trained and stored on your own machine. Engine training time depends on your computer specs and TM size, but my 120,000-unit TM took about 4 hours with a powerful PC. (I actually left the process running overnight.) Apparently, adding new content and rebuilding an existing engine is just around the corner, which will save considerable time.
With Language Cloud, the Custom MT engine is built, encrypted and stored on SDL secure servers. Engine training time takes about the same time as Slate, but you have to upload the TM and then your request joins a queue, so it may take longer in practice. (My custom engines have taken between 4 hours and 2 days to build.)
In addition to the engine itself, both tools let you force specific terminology during MT look-up by giving priority to a glossary or dictionary.
With Slate this means adding a tab delimited text file to a specific folder and with Language Cloud you need to upload a TBX file. A MultiTerm termbase can be converted easily to either file format with the Glossary Converter from the SDL AppStore.
Slate also offers the option of adding a text file for variables, which works like a terminology file, but isn’t language specific.
Both Slate and Language Cloud are integrated in Studio as Automated Translation providers. They can be added to a specific project under Project Settings > Language Pairs > All Language Pairs > Translation Memory and Automated Translation > Add, or under General Settings (File > Options).
Slate needs the free plug-in for Studio 2015 and it doesn’t work in earlier versions. However, it also connects with memoQ, OmegaT and CafeTrans, and as a standalone application it can be used to pretranslate XLF and other file formats.
Slate Desktop costs $549.00 for a permanent license and it comes with a 30-day trial. You can build any number of engines and add any number of terminology files. Slate can be installed on two machines and it’s easy to transfer your engines from one to another.
The SDL Language Cloud Custom MT Engine package costs $90.00/month and you can try it out for free for 30 days. The subscription allows you one customised engine and one dictionary. It works on any machine where SDL Trados Studio is installed and the package includes other features that don’t require Studio at all, such as MS Office add-ins.
When I started looking at customised machine translation solutions last year, I was aware that much of my work wouldn’t be suitable at all.
Articles for medical journals, for example, are written in long-winded sentences in my source language and need considerable reworking and rewriting in English. They discuss new procedures that I haven’t translated before, so not even my TMs are particularly useful, let alone machine translation.
However, clinical trials are quite repetitive, have standardised terminology and should be written clearly, in fairly short sentences. These three aspects make this field a potential candidate for customised machine translation. I thought that informed consent forms, ethics committee letters and back translations of these documents would be good texts to try out with my customised MT engines.
Happily, since Slate and Language Cloud are based on my own TMs, they don’t make the typical mistakes that Google Translate makes. They know that in clinical trials, promotor in Spanish is not promoter in English:
Short, simple segments are generally easy for my custom machine engines. Here, both alternatives are fine:
(Although Language Cloud didn’t realise that análisis was plural)
Longer segments are much more problematic:
(Here, Language Cloud does a better job than Slate).
Sometimes, length has no bearing on the result:
(Here, Slate manages just one of these drug names, whereas Language Cloud gets them spot on.)
(Here, Language Cloud took too long to look up a segment that was not too long or difficult to understand, while Slate almost got it spot on.)
Understandably, convoluted source segments are non-starters:
(Best to start from scratch here)
Both tools have problems dealing with upper case segments:
Slate has problems with initial capitals and end punctuation:
Square brackets trigger rogue code in Slate:
Language Cloud times out if the look-up takes too long:
Language Cloud embeds tags correctly; Slate omits all tags:
Unfortunately, however, the time-out issue sometimes means that tags are included, but everything else is omitted:
Feedback from the developers
Slate Desktop has several technical issues, as seen in the screenshots above. I asked Slate’s developer, Tom Hoar, what is being done to sort out them out and he provided the following responses:
1. Lack of tags: Yes, Slate Desktop v1.x removes all tags from the source segment and does not attempt to place them in the target. Our roadmap includes support to clone XLIFF 1.x and 2.x inline elements and place them as closely in the target element as possible. Our timeline and release version are TBD. As an interim step, we could clone the tags and simply place them at the end of the target segment.
2. End punctuation inconsistencies: Possibly fixed with a bug-fix implemented in the upcoming v1.1.
3. Rogue code: These are “escape sequences” that temporarily replace the open/close square brackets. A bug in Slate Desktop allowed them to leak through to the end-user (i.e. not temporary). Another customer reported this and we fixed it in the upcoming v1.1 release.
4. Upper-case usage: Slate Desktop restores target language casing according to casing and spacing found in the TMs’ target segments without regard to the source language casing. “Fixing” this example with a broad rule that copies source language casing could “break” otherwise desirable results for another language pairs, products or contexts. [We will probably try to] fix this specific use-case – where the entire source language segment is upper case – using a rules-based rather than a corpus-based approach, to simply ensure the target follows the source casing. As Slate Desktop’s user-interface matures, we will add user-configurable options to enable/disable different features like this one.
The ability to make these kinds of fine-tuning choices shows that Slate Desktop is a hybrid (statistical and rules) system. The statistical Moses system is at its core, and rules-based pre-processing and post-processing can accommodate a wide range of modifications. Translators with a personal interest in experimenting with these features can contact me. Eventually, we envision an SD community that creates and shares their experiments as features for everyone.
SDL Language Cloud Custom MT Engines has ironed out most technical issues (apart from the upper case bug) but it has some serious logistic issues. I asked my Language Cloud contact at SDL, David Pooley, for some feedback and he responded as follows:
1. Upper-case usage: This appears to be a peculiarity with the engine training as our machine translation engines usually return translations in sentence case. If you translate the same text through FreeTranslation.com then you would get a translation of “Internal revision number:” which, even then, is potentially in the wrong case for your requirements. I will raise this observation with the engineering team and we will investigate in due course. When we do implement fixes, since SDL Language Cloud is a SaaS product, you’ll get them automatically and you’re always using the latest version.
David also mentioned (in case anyone doesn’t know) that you can toggle upper and lower case text by selecting it and pressing Shift+F3.
2. Server connection and time-out issues: We have a number of solutions built on SDL Language Cloud which include the integration with SDL Trados Studio as well as the SDL Translate mobile app and FreeTranslation.com. Recently it appears that we have become the victim of our own success with a big increase in the traffic; some of which is being generated by users looking to abuse our free translation offerings. We are aware of this and are taking steps to:
- Prevent users abusing our service
- Increase our server infrastructure to deal with the increased demand
- Ensure that the SDL Trados Studio integration is more robust and deals gracefully with failed connections to SDL Language Cloud
3. Price. [My comment: There’s a huge range of subscription options, but only the very top one, Specialist, offers a single custom engine for $90/month. It seems that you’re not targeting this project at freelance translators at all at this price?]
We created this package for the freelance market. We realize the price is higher than the other packages, but the benefit of being able to train a custom engine is significant. We are also currently working on new features and functionality that will be more appealing for freelance translators so watch this space. I will, however, illustrate an ROI calculation that you may find surprising (and I’ll err on the low side for some of these figures). A freelancer translating 250 words per hour at $0.05 per word would be earning $12.50 per hour. If using a personalized engine increases that productivity by 20% then the new hourly revenue is $15.00 (an increase of $2.50 per hour) and the $90 would be recouped in 36 hours which is roughly one week and yields up to $270 “profit” for the remainder of the month. The 250 words, $0.05 per word and 20% increase are conservative numbers and it’s entirely possible that the cost would be recouped much quicker.
Slate Desktop and SDL Language Cloud customised MT engines are new products that are still being developed and improved. My conclusions are based on the current builds.
Both tools produce useful suggestions for simple segments in my carefully-selected fields. I don’t pretranslate whole files with them, but set them to automatic look-up when my TMs don’t find a match above 85%. In the past, I’ve always set this threshold to 70%, but TM hits from 70-85% need considerable rewriting in any case and my customised MT engines tend to be more useful in this fuzzy bracket.
I think Language Cloud has the edge over Slate in terms of translation quality. Here, for example, it’s worth post-editing the Language Cloud suggestion, whereas the Slate version has substantial deviations and needs to be started from scratch:
Slate has solved the dichotomy between confidentiality and machine translation because the entire process takes place on your local machine. Client confidentiality cannot be breached.
With Language Cloud, my translation engine is stored in an encrypted environment and my source segments aren’t accessed by anyone else. But the cloud will never be as secure as my own computer.
On my machine, Slate look-ups take between one and three seconds, depending on segment length. Slate has the edge over Language Cloud not only in look-up times but also in making quick tweaks to a terminology file. Adding a couple of terms or variables for a specific project is a breeze.
Language Cloud performed well when I started testing it several months ago, but right now the lag is very significant (when it doesn’t time out completely).
Productivity / ROI
I personally don’t feel that my productivity increases by 20% as SDL suggest. For me, a customised MT engine is an additional tool in my toolkit, and it’s the sum of all these tools that makes me highly productive in terms of output per hour. Many years of experience also play a significant part in my productivity.
All in all, I’m happy to be on board the MT train. I can’t wait to find out which station comes next.