This is really quite dismaying - and I can only imagine the discussions that must have gone on at TRADOS when they clearly suborned this restriction in their XML parser. It would have been far cleaner from an XML standpoint to have translated soft hyphens into a tag - but that would have made the editing experience far less clean. So they were stuck.
And now, I am too - I have to preprocess all TTX before passing it through the XML parser, which is a performance hit (which doesn't bother me too much) - but far worse, non-preprocessed TTX will infect the TM, so if I now make changes to the sanitized file and write it back out, it won't match the TM. This would be OK if we could be sure of sanitization before the TM were affected, but that's clearly too much to hope for in most real-world agency/freelancer workflows.
It's also rather nasty that TMX dumps from an infected TM will also contain 0x1F characters - meaning non-TRADOS tools won't be able to parse those, either. And they are supposed to be interoperable.
I think as a matter of policy I'm just going to sanitize and not worry overmuch about the rather small operability hit - at least until some actual project requires me to worry about it. Then I'll cross that bridge.
No comments:
Post a Comment