Brandon's Blog

6/1/2009

Encoding Wars

Efendi is actually at the roll-out milestone as of Saturday night, although it took me some more bashing on Sunday before I could convince myself it was actually there.  Turkish lessons are tonight, but very soon I expect to tag a revision as an alpha deployment release and start a running instance on the new server.  If all goes well I will remap the main antesonic.org address to the new server and pull out of gilford after an extensive backup.

The nadir of the Efendi coding process was certainly yesterday, as I spent several hours banging on character encodings.  Python has great unicode support, but you kind of have to find it first, then understand it, then realize that it’s basic-intuitively backwards but complete-intuitively forwards.

Unicode is quite the animal.  As English-speakers, we really don’t understand how bad the environment is for our friends speaking other languages.  I’ve only seen a handful of Turkish people who actually touch-type because the keyboards are so messed up.  But worse than that is the actual process of getting lingual characters to file and screen.  As soon as you deviate from the easy standard (ASCII), it gets very hairy very quickly.

In any case, I had an issue that turned out to be a non-issue regarding getting normal text converted to Unicode and then into the new proper coding for the web (UTF-8).  It turned out everything internal to the program was working properly, and that all I needed was a simple HTML tag (a meta tag) to signal that I was using UTF-8.

To make matters worse, it turns out cmd.exe is not unicode-aware, and thus sent me on many goose chases thinking there was a problem when actually the console was just lying to me.  I can’t believe Windows XP doesn’t support unicode on the console.

Anyway, the debugging was so difficult I actually almost abandoned the project outright.  But, doing a search for “python blog engine” is an inspiring activity!

On the upside, my work on import/export has gone swimmingly, and from work I have added the ability to export comments with TxP.  That’s really great news, as I was afraid I would lose them in the transition.  I have a delightful little import format, which should be easily portable to just about any blog engine ever made.  I am prepared to support LiveJournal with Meta’s help, and I think together we can grok enough Perl to make it happen in Python.