Dealing with Character Encodings is (sometimes) hard. It's especially confusing for those who've never done it before. Converting text from unicode to ascii can be tricky.
A lot of times, I'll import some data from a text file, and I just want to convert everything to ASCII and ignore anything that's not ascii (like MS Word's smart quotes). Luckily, this is fairly easy:
mystring = mystring.decode('ascii', 'ignore')
There's tons of great Python resources (and code!) for all your character encoding needs. In no particular order, here are a few I've found:
- A Crash Course in Character Encoding
- Dive Into Python's Chapter on Unicode
- Beautiful Soup gives you Unicode, Dammit and there's the companion: ASCII, Dammit
- There's also unaccent.py, which seems to convert various unicode characters to their ascii equivalent.
There's probably more, but most of these have helped me get the job done.comments powered by Disqus