I see a lot of programmers struggle with Unicode and think it is difficult as getting the encoding decoding hassle right can take quite a bit of effort. There is a lot of fun in using Unicode as well, as the number of code points (in laymen speak: characters) is huge and the Unicode code points are well organized into various planes (or blocks) with related code points. I like Charbase: A visual unicode database a lot especially as they have pictograms of all code points that always show a picture, even if you don’t have a font that your browser can use to display the character belonging to the code point. Here are a few links from to characters and blocks of characters in their database that I like a lot:
- Popular Unicode Characters. This one is a blast, especially since this homepage updates itself over time depending on the hits that charbase gets.
- Charbase: Transport And Map Symbols. The page where I landed when I searched for a fire truck (The Unicode code point is called Fire Engine) for which this font is well suited: Symbojet – Glyph Details « MyFonts.
- Charbase: Miscellaneous Symbols And Pictographs. For instance: Tulip (am I Dutch or not? <g>).
- Charbase: Miscellaneous Symbols. Which includes snowman.
- Charbase: Control Pictures. The non-printable control characters of ASCII have representations too!
- Charbase: Box Drawing. For all those that – like me – used them extensively in the MS/PC-DOS era.
- Charbase: Currency Symbols. You thought the world only had $, € and ¥? So wrong (:
- Charbase: Number Forms. Including fractions and Roman numeral forms. But these are in a sperate block: Charbase: Rumi Numeral Symbols.
- Charbase: Arrows. You thought there was only one block with arrows? Wrong: Charbase: Supplemental Arrows-A and Charbase: Supplemental Arrows-B contain more.
- A few blocks with game code points, but I haven’t found a font for them yet: Charbase: Playing Cards, Charbase: Mahjong Tiles and Charbase: Domino Tiles.
- Charbase: Braille Patterns. The names are just numbered for 8 bits of braille dots. At first sight, I missed meaning, but then realized that Braille (originally the 6-bit French encoding didn’t encode w) has different encodings too, even after the unification. For instance English, 1829, IPA and Japanese.
- Charbase: Optical Character Recognition. Thought the OCR-A and OCR-B fonts were fun? There are more OCR characters than those (:
–jeroen
Filed under: Development, Encoding, Software Development, Unicode, UTF-8, UTF8
