Quantcast
Channel: Encoding – The Wiert Corner – irregular stream of stuff
Viewing all articles
Browse latest Browse all 160

Coping with UTF-16 / UCS-2 little endian in Batch files: numbers from WMIC

$
0
0

A while ago, I needed to get the various date, time and week values from WMIC to environment variables with pre-padded zeros. I thought: easy job, just write a batch file.

Tough luck: I couldn’t get the values to expand properly. Which in the end was caused by WMIC emitting UTF-16 and the command-interpreter not expecting double-byte character sets which messed up my original batch file.

What I wanted What I got
wmic_Day=21
wmic_DayOfWeek=04
wmic_Hour=15
wmic_Milliseconds=00
wmic_Minute=02
wmic_Month=05
wmic_Quarter=02
wmic_Second=22
wmic_WeekInMonth=04
wmic_Year=2015
Day=21
wmic_DayOfWeek=4
wmic_Hour=15
wmic_Milliseconds=
wmic_Minute=4
wmic_Month=5
wmic_Quarter=2
wmic_Second=22
wmic_WeekInMonth=4
wmic_Year=2015

WMIC uses this encoding because the Wide versions of Windows API calls use UTF-16 (sometimes called UCS-2 as that is where UTF-16 evolved from).

As Windows uses little-endian encoding by default, the high byte (which is zero) of a UTF-16 code point with ASCII characters comes first. That messes up the command interpreter.

Lucikly rojo was of great help solving this.

His solution is centered around set /A, which:

  • handles integer numbers and calls them “numeric” (hinting floating point, but those are truncated to integer; one of the tricks rojo uses)
  • and (be careful with this as 08 and 09 are not octal numbers) uses these prefixes:
    • 0 for Octal
    • 0x for hexadecimal

Enjoy and shiver with the online help extract:

    SET /A expression
    SET /P variable=[promptString]

The /A switch specifies that the string to the right of the equal sign
is a numerical expression that is evaluated.  The expression evaluator
is pretty simple and supports the following operations, in decreasing
order of precedence:

...

If you use any of the logical or modulus operators, you will need to
enclose the expression string in quotes.  

Any non-numeric strings in the
expression are treated as environment variable names whose values are
converted to numbers before using them.  

If an environment variable name
is specified but is not defined in the current environment, then a value
of zero is used.  

This allows you to do arithmetic with environment
variable values without having to type all those % signs to get their
values.  

...

Numeric values are decimal numbers, unless
prefixed by 0x for hexadecimal numbers, and 0 for octal numbers.
So 0x12 is the same as 18 is the same as 022. Please note that the octal
notation can be confusing: 08 and 09 are not valid numbers because 8 and
9 are not valid octal digits.

Anyway: here is the answer with batch file, where you can remove “@echo off” to see the UCS-2 adding spurious new-lines:

Also, I think part of the problem is that the encoding of values returned from WMI queries are encoded in UCS-2 Little Endian, which does weird things to an ANSI runtime. I found a way to get around that using set /a, appending .0 to each value (which is immediately dropped, since set /a only computes integers), and black holing error messages.

@echo off
setlocal enabledelayedexpansion
for /f "delims=" %%I in ('wmic path win32_localtime get * /format:list ^| findstr "="') do (
    2>NUL set /a "wmic_%%I.0"
)
for /f "tokens=1,2 delims==" %%I in ('set wmic_') do (
    if %%J leq 9 set "%%I=0%%J"
)
set wmic_
endlocal enabledelayedexpansion

It’s ugly, but it seems to produce the output you want. The end justifies the means, I guess. :)

–jeroen


Viewing all articles
Browse latest Browse all 160

Trending Articles