Thursday, July 15, 2010

A note on character encodings

Here's a sticky wicket (as character encodings always are). By default, Notepad (my text editor of choice for simple files) represents umlauted vowels in the normal ISO eight-bit character set. Padre (my Perl IDE of choice) represents umlauted vowels within strings as UTF-8, which is much better.

Here's the problem: if I edit a German word in a text file, and the same German word in a string, they don't test as equivalent. This is a problem. It's a widespread enough problem that I'm going to have to come up with a principled, central way to deal with it. So watch this space, I guess.

