Quantcast
Channel: Encoding – The Wiert Corner – irregular stream of stuff
Viewing all articles
Browse latest Browse all 160

Unicode ligatures: not all software does normalised search forgetting ffi 

$
0
0

Via a private share, I found out that some software forgets to perform a Unicode normalisation when doing a search.

That means that ligatures do not match the non-ligatures in for instance these words:

  • “ff” and “ff”, as in “difference” versus “difference”
  • “fi” and “fi” as in “notification” versus “notification”.

For more information, read [WayBackUnicode equivalence – Wikipedia and make sure you know about these normal forms:

NFD
Normalization Form Canonical Decomposition
Characters are decomposed by canonical equivalence, and multiple combining characters are arranged in a specific order.
NFC
Normalization Form Canonical Composition
Characters are decomposed and then recomposed by canonical equivalence.
NFKD
Normalization Form Compatibility Decomposition
Characters are decomposed by compatibility, and multiple combining characters are arranged in a specific order.
NFKC
Normalization Form Compatibility Composition
Characters are decomposed by compatibility, then recomposed by canonical equivalence.

–jeroen


Viewing all articles
Browse latest Browse all 160

Trending Articles