A while ago, I needed to export pure ASCII text from a .NET app.
An important step there is to convert the diacritics to “normal” ASCII characters. That turned out to be enough for this case.
This is the code I used which is based on Extension Methods and this trick from Blair Conrad:
The approach uses String.Normalize to split the input string into constituent glyphs (basically separating the “base” characters from the diacritics) and then scans the result and retains only the base characters. It’s just a little complicated, but really you’re looking at a complicated problem.
Example code:
using System; using System.Text; using System.Globalization; namespace StringToAsciiConsoleApplication { class Program { static void Main(string[] args) { string unicode = "áìôüç"; string ascii = unicode.ToAscii(); Console.WriteLine("Unicode\t{0}", unicode); Console.WriteLine("ASCII\t{0}", ascii); } } public static class StringExtensions { public static string ToAscii(this string value) { return RemoveDiacritics(value); } // http://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-accents-from-a-string-in-net private static string RemoveDiacritics(this string value) { string valueFormD = value.Normalize(NormalizationForm.FormD); StringBuilder stringBuilder = new StringBuilder(); foreach (System.Char item in valueFormD) { UnicodeCategory unicodeCategory = CharUnicodeInfo.GetUnicodeCategory(item); if (unicodeCategory != UnicodeCategory.NonSpacingMark) { stringBuilder.Append(item); } } return (stringBuilder.ToString().Normalize(NormalizationForm.FormC)); } } }
–jeroen
Filed under: .NET, .NET 3.5, .NET 4.0, .NET 4.5, ASCII, C#, C# 3.0, C# 4.0, C# 5.0, Development, Encoding, Software Development, Unicode
