Simple if you know it:
pip install ftfy
That installs it as a command which is a lot easier than using it from Github at https://github.com/LuminosoInsight/python-ftfy
It knows how to solve the encoding issues in  the future of publishing at W3C.
It didn’t solve my non-Unicode encoding issue: “v3/43/4r” -> “v¾¾r” -> “vóór”.
That was caused by an infamous Western Latin character set confusion issue, in this case ISO-8859–/Windows- versus CP850/CP858 encoding issue (so: no Unicode involved at all, nor CP437 as it doesn’t have ¾).
So I put in a suggestion for ftfy to support finding the above.
–jeroen
via
- ftfy (fixes text for you) version 3.0 | Luminoso Blog.
- Which encoding failure did encode “vóór” into “v3/43/4r”? – Stack Overflow.
- The WTF-8 encoding | Hacker News.
Filed under: Development, Encoding, Software Development, Unicode, UTF-8, UTF8
