.NET/C#: from Unicode to ASCII (yes, this is one-way): converting Diacritics to “regular” ASCII characters.

A while ago, I needed to export pure ASCII text from a .NET app.

An important step there is to convert the diacritics to “normal” ASCII characters. That turned out to be enough for this case.

This is the code I used which is based on Extension Methods and this trick from Blair Conrad:

The approach uses String.Normalize to split the input string into constituent glyphs (basically separating the “base” characters from the diacritics) and then scans the result and retains only the base characters. It’s just a little complicated, but really you’re looking at a complicated problem.

Example code:

using System;
using System.Text;
using System.Globalization;

namespace StringToAsciiConsoleApplication
{
    class Program
    {
        static void Main(string[] args)
        {
            string unicode = "áìôüç";
            string ascii = unicode.ToAscii();
            Console.WriteLine("Unicode\t{0}", unicode);
            Console.WriteLine("ASCII\t{0}", ascii);
        }
    }

    public static class StringExtensions
    {
        public static string ToAscii(this string value)
        {
            return RemoveDiacritics(value);
        }

        // http://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-accents-from-a-string-in-net
        private static string RemoveDiacritics(this string value)
        {
            string valueFormD = value.Normalize(NormalizationForm.FormD);
            StringBuilder stringBuilder = new StringBuilder();

            foreach (System.Char item in valueFormD)
            {
                UnicodeCategory unicodeCategory = CharUnicodeInfo.GetUnicodeCategory(item);
                if (unicodeCategory != UnicodeCategory.NonSpacingMark)
                {
                    stringBuilder.Append(item);
                }
            }

            return (stringBuilder.ToString().Normalize(NormalizationForm.FormC));
        }
    }
}

–jeroen

Filed under: .NET, .NET 3.5, .NET 4.0, .NET 4.5, ASCII, C#, C# 3.0, C# 4.0, C# 5.0, Development, Encoding, Software Development, Unicode

.NET/C#: from Unicode to ASCII (yes, this is one-way): converting Diacritics to “regular” ASCII characters.

Trending Articles

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Suspected burglar to know fate in January

Redruth man Nathan Ellis spared jail after admitting assaulting...

The 10 Tennessee Cities With The Largest Black Population For 2021

Jamaican drug mule caught

Walkthrough Pokemon Victory Fire Complete | English Language

GTA 5 PPSSPP Zip File Download For Android Mediafire 382 MB

Various Artists – StarStruck (Original Soundtrack) [iTunes Plus AAC M4A]

Central Maine arrest log: May 3-10, 2024

Thomas Grundy – Bradwell

Practice Sheet of Right form of verbs for HSC Students

Who Is Sisanda Jonas? | Biography| Profile| History Of South African Media...

Black Angus Grilled Artichokes

Throw Back: Kwaw Kese — Ma Kwan (Ft. Edem) Prod by Hammer

What happened to the guy who stabbed Ron shirley from the show lizard lick...

How To: Uninstall & Reinstall The Shavlik (ST) Remote Scheduler Service On A...

Ndakasvirwa naGarden Boy aive neZIMBORO rinenge DANDA

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

Definition of Power, Duties and Organization of the Water Development...