Thursday, September 30, 2010

Thoughts on practical use of machine translation

So since I haven't had the time to get OpenLogos running (I swear, just when I started, the work just came pouring in - I'm at 123,000 words for the month, phew) and given that I was far, far behind schedule on a large and boring corporate charter in French, I decided to try Systran.

(Oh, no, he didn't go there!)

I hadn't looked at Systran since 2005, when I had some work post-editing its abysmal output for an agency in Italy. I came to the conclusion then that it was normally just as easy to translate a given text myself than to try to decipher what Systran had come up with and whip it into something comprehensible by an English speaker, and that translating it myself paid five times as well. So: no-brainer, and I actually lost my Systran install.

But, well, it's been five years. Surely they could do a better job by now, right? And hey, it's only 100 bucks for the home version, which now includes a whole raft of languages - in fact, with the exception of Hungarian, all the languages I work with. So.

Here's the workflow I used: I ran Systran on my file, then aligned it with the original, and loaded it into my TM. Then I started down the file sentence by sentence in the normal manner, with the aligned segments coming up as I went.

This worked pretty damn well, actually. OK, there were some Systranisms - mille, in a year, is generally not translated "millet" and I'm not sure why that would be default. I dealt with these by loading the TM transfer file in my editor of choice, and doing global search-and-replace on them as I went. Then I'd import the edited segments back into the TM, and proceed. So commonly mistranslated terms got better as I went. Since the file was 13,000 words, this approach had time to work.

I should note that nearly every sentence needed modification. There were some real screamers in terms of Martian word order - so this should be considered kind of a rock-bottom minimum; what I wanted to know is whether it would accelerate my work even so.

My normal "fast" progress is 700 to 1000 words an hour. For this dreary text, I would probably have managed no more than 400 or 500 an hour. With this procedure, though, I managed a throughput of something between 1500 and 2500 words an hour. That ... that freaking works.

I think quality suffered somewhat, although as it was a corporate charter, I don't think I would have done fantastic quality anyway, so it's hard to say. I should continue to give this a try - certainly the preliminary results on this one job were entirely convincing and I now have much more confidence that machine translation should be part of my toolkit.

How would I improve things, you ask? Pretty much using the same tools I want to implement anyway:
  • Global search and replace for terms in a bilingual list. (This has two aspects: replacement should be sensitive to grammar in the target language, i.e. pluralizing correctly, but it should also be sensitive to the source phrase, sort of a "replace X with X' only if it's a translation for Y".)
  • Automation of simple TRADOS tasks (e.g. reloading the TM after I do a global search and replace.)
  • A database of rewording rules. This is slowly taking shape in my mind - it would be a valuable tool for any proofreader. It could also "translate" between American and British, if you see what I mean. Kind of a spellchecker on steroids, if you will.
  • Automation of Systran itself; the home version runs inside Word or with a standalone tool and they don't really want you to do things like automating it without giving them a lot of money for the Professional or Enterprise versions.
Anyway, I wanted to post this while the job was fresh in my memory. Now it's back to work for me, this time without the Systran crutch.

The real takeaway for me was: even bad MT, if well managed, would augment my throughput, potentially by a lot. And the various accessories I would need for Systran work will also be applicable to work with OpenLogos, so it's not wasted work if I get around to writing some.

No comments:

Post a Comment