Dealing with Unicode and ASCII using Python

Published on 2010-03-25 15:21:00+00:00
ascii   python   unicode  

Dealing with Character Encodings is (sometimes) hard. It's especially confusing for those who've never done it before. Converting text from unicode to ascii can be tricky.

A lot of times, I'll import some data from a text file, and I just want to convert everything to ASCII and ignore anything that's not ascii (like MS Word's smart quotes). Luckily, this is fairly easy:

mystring = mystring.decode('ascii', 'ignore')

There's tons of great Python resources (and code!) for all your character encoding needs. In no particular order, here are a few I've found:

There's probably more, but most of these have helped me get the job done.