A while ago I bumped into applications that write alternating UTF-16 and UTF-8 to files without checking what type of encoding the files were using.
So here are some notes to at least save some of the contents.
- Powershell: about_Special_Characters (including NULL)
- sql server – How to remove NULL char (0x00) from object within PowerShell – Stack Overflow (actually a Powershell line that does the NULL stripping)
- regular expressions – Powershell 2: How to strip a specific character from a body of ASCII text – Server Fault
- tr: HOWTO remove null characters from a file « Remi Bergsma’s blog
TODO: figure out how to strip the BOM.
–jeroen
Filed under: Development, Encoding, Software Development, UTF-16, UTF-8, UTF16, UTF8
