After yesterdays post on Testing and static methods don’t go well together, I read around on Source (kunststube [WayBack]) a bit more and found these very nice articles on encoding,Unicode and text:
- What Every Programmer Absolutely, Positively Needs to Know About Encodings and Character Sets to Work With Text [WayBack]
- The Great Escapism (or: What you Need to Know to Work With Text Within Text) [WayBack]
- Handling Unicode Front to Back in a Web App [WayBack]
Related on those, some other nice readings:
- Is there a set of “Lorem ipsums” files for testing character encoding issues? – Stack Overflow [WayBack]
- International Components for Unicode: ICU User Guide [WayBack]
- International Components for Unicode ː Repository Browser: repos: icu/data/trunk/charset/data/ucm
- ftp://ftp.unicode.org/Public/MAPPINGS [WayBack]
-
Notes on contents of the MAPPING directory: EASTASIA: This directory is obsolete. ETSI: ETSI GSM 03.38 7-bit default alphabet mapping. ISO8859: These are the mapping tables of the ISO 8859 series (1 - 16). OBSOLETE: Obsolete and unsupported mapping tables for historical and archival purposes only. VENDORS: Miscellaneous mapping tables for small codesets, typically provided by vendors. The majority of current, useful tables are here.
-
–jeroen
Filed under: Ansi, ASCII, CP437/OEM 437/PC-8, Development, EBCDIC, Encoding, ISO-8859, ISO8859, Shift JIS, Software Development, Unicode, UTF-16, UTF-8, UTF16, UTF8, Windows-1252
