Saturday, December 4, 2010

That whole MT project

OK, so the post-editing project I foolishly agreed to help with consisted of:
  • OCR with Able2Extract
  • MT with a mixture of (I think) Google Translate and Systran
  • First-pass proofreading
  • Second-pass post-editing
So let's talk about that. A far, far better workflow would have been:
  • OCR with whatever
  • Source-language spell checking and correction
  • Identification of key phrases and terminology as cues for MT
  • TRADOS or similar to avoid rework of existing sentences
  • MT with whatever
  • Target-language spell checking, feeding results back through MT until at least everything is English
  • First-pass post-editing
  • Second-pass proofreading
This workflow uses (or at least could use) the exact same tools as above, but without the introduction of errors at each step that make later steps impossible to manage. First-pass post-editing should be done by a bilingual translator, using specialized post-editing tools (not yet written) plus a normal translation memory (and of course the TM should also be used before passing text off to the MT stage). Systematic errors should be documented and recycled through the MT process.

One key insight: terminology research really starts to get a lot more important in this workflow than in normal CAT.

Thursday, December 2, 2010

More thoughts on a non-stupid text editor

I'm doing some post-editing for Portuguese today (I know, I know, never do MT post-editing, but this customer is a good one and I just couldn't say no). As usual with post-Systran work, there is a lot of dragging and dropping involved, and frankly? Word freaking sucks at dragging and dropping. Why should that be? Why can't I drag a word from the end of a punctuated sentence into its middle and have Word get the spacing right?

The mind boggles.

So it looks like I'm just going to have to break down and address non-stupid text editing again.