Saturday, December 4, 2010

That whole MT project

OK, so the post-editing project I foolishly agreed to help with consisted of:
  • OCR with Able2Extract
  • MT with a mixture of (I think) Google Translate and Systran
  • First-pass proofreading
  • Second-pass post-editing
So let's talk about that. A far, far better workflow would have been:
  • OCR with whatever
  • Source-language spell checking and correction
  • Identification of key phrases and terminology as cues for MT
  • TRADOS or similar to avoid rework of existing sentences
  • MT with whatever
  • Target-language spell checking, feeding results back through MT until at least everything is English
  • First-pass post-editing
  • Second-pass proofreading
This workflow uses (or at least could use) the exact same tools as above, but without the introduction of errors at each step that make later steps impossible to manage. First-pass post-editing should be done by a bilingual translator, using specialized post-editing tools (not yet written) plus a normal translation memory (and of course the TM should also be used before passing text off to the MT stage). Systematic errors should be documented and recycled through the MT process.

One key insight: terminology research really starts to get a lot more important in this workflow than in normal CAT.

No comments:

Post a Comment