Character encoding is a hard problem. I've written about it before, and have gotten really annoyed getting it right. I wish there was just a magic function that I could push strings through that would fix them and make them usable in whatever context is under discussion.

Currently, I am running iconv to solve the problem - it is currently storing data as ISO-8859-1, and using the //IGNORE option to remove anything that's not ISO-8859-1. This is one of the main reasons I will, despite previous promises, not release the source code for this blog's software. I don't want to spread this inelegant hack. If you are looking for blogging software, look for something that supports Unicode properly, because my software isn't it. This is not my fault. I am working within certain limitations, which I won't spell out. Legacy data is a pain, and it's one of the reasons why we need to be careful to design specifications and software the is going to stand the test of time. 
