[GRLUG] Character Encodings

Ben DeMott ben.demott at gmail.com
Fri May 28 16:07:08 EDT 2010


Hello Everyone - This is an update to a discussion had on the python users
group, and I thought I would share it here.
I'm not sure if any of you out there have trouble filtering webform input -
even with the character encoding library in Python there are times when it
still doesn't get it right.

We finally seem to have come across a solution that works everytime... even
when there are multiple character encodings within a single string,
microsoft office character encodings, or html pasted INTO the form, ETC ETC
ETC ...

Just passing everything through Tidy pretty much fixes everything and makes
it comply.
So for anyone attempting to sanitize multi-lingual form-input from the web
good-luck and hopefully this solves your problems.

http://tidy.sourceforge.net/docs/tidy_man.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://shinobu.grlug.org/pipermail/grlug/attachments/20100528/d2affbff/attachment.htm 


More information about the grlug mailing list