Caution! A big bug in tolower() in kernel!

The following code works fine in user mode:

//
// 0x7f51 is the unicode code of Chinese character ‘?’
//
n = tolower(0x7f51); // n will equal 0x7f51

However, if we are in kernel mode, n will equal 0x7f71 !!!

That is to say, the implementation of tolower in ntoskrnl.exe is wrong!

Does the right thing happen with RtlDowncaseUnicodeChar()? Remember the C
language is supported in the kernel, the C runtimes are not.

Don Burn
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
Sent: Friday, November 22, 2013 5:38 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Caution! A big bug in tolower() in kernel!

The following code works fine in user mode:

//
// 0x7f51 is the unicode code of Chinese character ‘?’
//
n = tolower(0x7f51); // n will equal 0x7f51

However, if we are in kernel mode, n will equal 0x7f71 !!!

That is to say, the implementation of tolower in ntoskrnl.exe is wrong!


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

On Fri, Nov 22, 2013 at 5:38 AM, wrote:

> tolower

is ascii only. Inputting a non-ascii character is a programming error.

“In order for _tolower to give the expected results,
__isasciihttp:and
isupper http: must both
return nonzero.”

http://msdn.microsoft.com/en-us/library/8h19t214.aspx

Mark Roddy</http:></http:>

> That is to say, the implementation of tolower in ntoskrnl.exe is wrong!

Why do you think these long-ago-obsolete calls from 1970ies are still supported and maintained?

These 1970ies calls just do not support Unicode, they are 7bit ASCII only.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

>> That is to say, the implementation of tolower in ntoskrnl.exe is wrong!

Why do you think these long-ago-obsolete calls from 1970ies are still
supported and maintained?

These 1970ies calls just do not support Unicode, they are 7bit ASCII
only.

There is support in the C libraries for Unicode, for example, _totvvvfc
xsz clower which is defined as _mbctolower for Unicode builds. This goes
to the “casemap” array to get the transformation of the character; the
casemap array is apparently set as a consequence of the current locale.
However, this is all user-level stuff; there is a kernel call CharLowerW
from user space which converts a character to lower case (note that it is
limited in that it does not handle characters which are accessed by
surrogates). There is the more general LCMapString API, which is more
appropriate for locales. Exactly how these map to internal kernel
libraries is not clear, but I do want to point out that many of the ctype
functions call the kernel to create a mapping table.

Note that such library calls would not be supported if linked into the
kernel, because they reference functions defined by ntdll.dll.

The simplest approach to dealing with the C library in kernel code is to
forget that it exists.
joe


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

>>> That is to say, the implementation of tolower in ntoskrnl.exe is wrong!

>
> Why do you think these long-ago-obsolete calls from 1970ies are still
> supported and maintained?
>
> These 1970ies calls just do not support Unicode, they are 7bit ASCII
> only.

There is support in the C libraries for Unicode, for example, _totvvvfc
xsz clower

That is _totlower with an unnoticed keyboard drop in the middle.

which is defined as _mbctolower for Unicode builds. This goes
to the “casemap” array to get the transformation of the character; the
casemap array is apparently set as a consequence of the current locale.
However, this is all user-level stuff; there is a kernel call CharLowerW
from user space which converts a character to lower case (note that it is
limited in that it does not handle characters which are accessed by
surrogates). There is the more general LCMapString API, which is more
appropriate for locales. Exactly how these map to internal kernel
libraries is not clear, but I do want to point out that many of the ctype
functions call the kernel to create a mapping table.

Note that such library calls would not be supported if linked into the
kernel, because they reference functions defined by ntdll.dll.

The simplest approach to dealing with the C library in kernel code is to
forget that it exists.
joe

>
> –
> Maxim S. Shatskih
> Microsoft MVP on File System And Storage
> xxxxx@storagecraft.com
> http://www.storagecraft.com
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

> The following code works fine in user mode:

//
// 0x7f51 is the unicode code of Chinese character ‘?’
//
n = tolower(0x7f51); // n will equal 0x7f51

However, if we are in kernel mode, n will equal 0x7f71 !!!

That is to say, the implementation of tolower in ntoskrnl.exe is wrong!

…and your point is what, exactly? That an unsupported function works
differently in its unsupported environment, and doesn’t work like a
supported function in its supported environment? 0x51 is ‘Q’ and 0x71 is
‘q’. Looks to me like tolower did exactly what it is defined to do: it
converted the low-order 8 bits to lower case!

Note that in app space, you were NOT using a function called “tolower”;
that is a macro. For Unicode builds, that macro is defined by a name I
forget precisely, and I am not at my Windows machine where I have access
to the header files, but it is something along the lines of “_mbclower”.
You have been doubly fooled. You mistook a piece of syntax for a piece of
semantics (“tolower” does not really exist as a function) and somehow,
assumed magically that such a translation appeared in the kernel versions.

I repeat what I said before, only more strongly: the only way to correctly
use the CRT in the kernel is to not use it at all. If you want to use the
CRT, write user apps.

RTFM. Look at all the Rtl functions for string and character
manipulation. If you are too lazy to RTFM, and want to close your eyes
and pretend you are writing a user-level app, be prepared to be
disappointed.

And before you get bent out of shape about how unsupported functions do
exactly what they are specified to do, think. In this case, the thing you
need to think about is “Why is it I’m using an 8-bit-only function and
expect it to work correctly on 16-bit characters?” Also, it doesn’t hurt
to look at header files to see what they are doing. RTFM can save a lot
of grief and agony as well; if you RTFM about tolower, you would see
clearly that according to the C standard, it is specified to work on 8-bit
characters. In user space, to help newbies who don’t RTFM, it is
redefined so that it works in Unicode as well. This was probably a
mistake on the part of Microsoft; newbies need to learn correct
programming techniques, not be eased into mistaken beliefs.
joe


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

> I repeat what I said before, only more strongly: the only way to correctly use the CRT in the

kernel is to not use it at all. If you want to use the CRT, write user apps.

Well,I would put it even more strongly(perhaps harshly) - if you want to use the CRT in the kernel
you are simply not qualified for system-level programming, because you don’t understand why UM libraries cannot be used by the KM code…

Anton Bassov

I was trying to be polite, but actually that’s what I wanted to say. My
reference to “if you want to use the CRT, write apps” was trying to convey
that. But I agree with you.
joe

> I repeat what I said before, only more strongly: the only way to
> correctly use the CRT in the
> kernel is to not use it at all. If you want to use the CRT, write user
> apps.

Well,I would put it even more strongly(perhaps harshly) - if you want to
use the CRT in the kernel
you are simply not qualified for system-level programming, because you
don’t understand why UM libraries cannot be used by the KM code…

Anton Bassov


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Alright I was going to stay out of this nonsense, but the fact is that
there is limited support for C runtime routines in the OS. It is not
forbidden. The calls that are forbidden are marked deprecated.

The OP thought he found a bug in Windows when he actually had a silly
programmer on device error. That was humiliating enough without the STERN
LECTURES that more or less commanded him to give up, go away, never compile
another line of code again.

Mark Roddy

On Sun, Nov 24, 2013 at 1:13 AM, wrote:

> > I repeat what I said before, only more strongly: the only way to
> correctly use the CRT in the
> > kernel is to not use it at all. If you want to use the CRT, write user
> apps.
>
> Well,I would put it even more strongly(perhaps harshly) - if you want to
> use the CRT in the kernel
> you are simply not qualified for system-level programming, because you
> don’t understand why UM libraries cannot be used by the KM code…
>
>
>
> Anton Bassov
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

As a highly-experienced programmer, I would offer that device drivers are
the most demanding of all forms of programming one can engage in. Silly
errors turn the screen blue. My productivity as an app programmer was
measured in close to 200 lines of code per hour. In writing drivers, I
considered that writing at an order of magnitude less meant I was coding
far too fast and making stupid mistakes. Even more experienced driver
writers claimed that 20 lines per day was doing pretty well, except on the
days of negative productivity (adding lines that were in error and had to
be corrected/removed). I taught drivers for many years, and I believe the
word “gobsmacked” defines the reaction of app programmers coming into the
kernel. It is a very different form of reality from app programming. In
my set of “simple rules for newbies”, one of them was “forget the CRT
exists”. Yes, it is another of my overkill rules, but like most of them,
by the time you learn the truth, you are experienced enough to understand
how much of the truth is really true. Expecting that “tolower” is going
to work on Unicode means a level of inexperience that is downright
dangerous. It is dangerous in app space, and more so in kernel space.
When I teach app-level programming, the students are dismayed when I tell
them that their favorite upper-lo-lower-case hack of adding 0x40 to
characters in the range [A-Z] is hopelessly outdated. They don’t like
calling a subroutine because we all know that calling a subroutine is
expensive, so let’s do the code right in-line. We don’t need no stinkin’
subroutines. Then I explain that you can’t use strcmp, or lstrcmp, or any
of those trivial comparison routines to get proper collating sequence.
Oh, wait. If you have to do it, use lstrcmp, which is locale-sensitive.
On a good day. With a tail wind. But if you really care, use the
CompareString API.

Sadly, one of the critical languages has been omitted from the locales:
“ALA filing rules”. As in Americal Library Association. This locale has
very specific things to say about how accented characters sort with
unaccented characters, for example. I know this because I had to
implement an “ALA” collating sequence routine for 8-bit characters Back In
The Day, and I have looked at what it would take for Unicode, and it is
somewhere along the complexity of Chhinese. Folks who implement library
systems can take the time to do an ALA collating sequence subroutine that
can cover 90% of the rules just by transforming accented characters into
unaccented characters, but can’t handle “Mc” sorting equal to “Mac” when
those appear as the beginning of surnames, and a ton of other “corner
cases” I once knew (my SO was a librarian for nearly 40 years, before she
became a professor in the school of library science).

Note that the way tolower works is probably by adding 0x40 (32) to the
8-bit value if the 8-bit value passes the isalpha() test. I am not near
my Unicode book, but I suspect that in Greek and other alphabets that have
the concept of upper and lower case, that doing isalpha() on the low-order
8 bits is used as the predicate, and only the low-order 8 bits define the
character offset within the Unicode segment, so adding 0x40 probably is
sufficient for those languages. But in The Character Set Formerly Known
As ISO-8859-1 (Latin 1), the accented characters are not 0x40 from their
lower case counterparts. So life gets a bit more complex.

The kernel is not app space. There is a lot of the C-standard CRT that
does not exist in the kernel version of the “CRT”, and knowing which part
is there, or has slightly different semantics, is not something you want
newbies to the kernel to discover empirically. Unless, of course, you
like the color blue.
joe

Alright I was going to stay out of this nonsense, but the fact is that
there is limited support for C runtime routines in the OS. It is not
forbidden. The calls that are forbidden are marked deprecated.

The OP thought he found a bug in Windows when he actually had a silly
programmer on device error. That was humiliating enough without the STERN
LECTURES that more or less commanded him to give up, go away, never
compile
another line of code again.

Mark Roddy

On Sun, Nov 24, 2013 at 1:13 AM, wrote:
>
>> > I repeat what I said before, only more strongly: the only way to
>> correctly use the CRT in the
>> > kernel is to not use it at all. If you want to use the CRT, write user
>> apps.
>>
>> Well,I would put it even more strongly(perhaps harshly) - if you want
>> to
>> use the CRT in the kernel
>> you are simply not qualified for system-level programming, because you
>> don’t understand why UM libraries cannot be used by the KM code…
>>
>>
>>
>> Anton Bassov
>>
>> —
>> NTDEV is sponsored by OSR
>>
>> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>>
>> OSR is HIRING!! See http://www.osr.com/careers
>>
>> For our schedule of WDF, WDM, debugging and other seminars visit:
>> http://www.osr.com/seminars
>>
>> To unsubscribe, visit the List Server section of OSR Online at
>> http://www.osronline.com/page.cfm?name=ListServer
>>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer

> Alright I was going to stay out of this nonsense, but the fact is that there is limited support

for C runtime routines in the OS. It is not forbidden.

But this is already a different CRT that has nothing to do with the one that UM apps use whatsoever, and it
may have the constraints unheard of in the userland. For example, certain functions may be callable only at PASSIVE_LEVEL. It is (hopefully) understandable that such a library just cannot be made 100% compliant
with ANSI/ISO/whatever standards for the libraries that are meant to run in the userland…

That was humiliating enough without the STERN LECTURES that more or less commanded him
to give up, go away, never compile another line of code again.

No one ever tells anyone to “never compile another line of code again” . However, certain posters may
be, indeed, strongly advised to step back for a while and to read some OS-related books before running the next compilation process. Let’s face it - a willingness to use UM CRT in the kernel demonstrates the lack of understanding of how OSes work whatsoever…

Anton Bassov

On 11/24/2013 7:33 PM, xxxxx@hotmail.com wrote:

No one ever tells anyone to “never compile another line of code again” . However, certain posters may
be, indeed, strongly advised to step back for a while and to read some OS-related books before running the next compilation process. Let’s face it - a willingness to use UM CRT in the kernel demonstrates the lack of understanding of how OSes work whatsoever…

Or maybe, seeing that calling tolower() didn’t result in a compile error
they presumed there was a corresponding valid KM function?


Bruce Cran

> Or maybe, seeing that calling tolower() didn’t result in a compile error they presumed

there was a corresponding valid KM function?

Yes, but why does the OP assume that KM function should behave as its UM counterpart? In any case, you have a good point - after all, the OP, indeed, seems to realize that he is using ntoskrnl.exe’s export
and not a UM CRT…

Anton Bassov

I had to check the date to make sure it’s not April Fool’s Day.

Where do you guys GET this stuff?

Certain CRTL functions are common and useful enough that, over time, people have ported them to kernel-mode and included them with NTOS. There is exactly NOTHING wrong with calling these functions.

And obviously, you need to understand the rules for invoking these function. This is no different to calling any function, whether it’s tolower() or IoConnectInterruptEx().

Isn’t it hard enough to write drivers (as observed by Dr. Newcomer – his lines-per-day observations aside) without actually inventing non-existent rules like “Thou shall not use any CRTL functions in kernel-mode, even if they are provided for your benefit by the OS developers”, just to make things harder.

You write drivers. You’re special. You call RtlCopyMemory instead of memcpy, and you write your own function to add 0x40 to each character of an ASCII string to lower-case it, instead of calling tolower(). Excellent. You’re brilliant. We get it. Move on.

But please don’t pretend that “never call a well-know and documented CRTL function in kernel mode” is an actual rule for programming in Windows kernel-mode. Because it’s not.

Peter
OSR

(man… it’s not even lunch time on the East Cost of the US yet, and I’m close to locking two threads. Good thing it’s a short week for us in the States).

The supported kernel mode c runtime functions operate the same way that the
user mode C runtime functions operate. As noted way up above toupper() (not
the macro obscured variations) requires an ascii character set input value
and behaves the same in user mode as in kernel mode, specifically the
result of inputting a non-ascii value is undefined.

Mark Roddy

On Sun, Nov 24, 2013 at 5:07 PM, wrote:

> > Or maybe, seeing that calling tolower() didn’t result in a compile error
> they presumed
> > there was a corresponding valid KM function?
>
> Yes, but why does the OP assume that KM function should behave as its UM
> counterpart? In any case, you have a good point - after all, the OP,
> indeed, seems to realize that he is using ntoskrnl.exe’s export
> and not a UM CRT…
>
>
>
> Anton Bassov
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

xxxxx@hotmail.com wrote:

> I repeat what I said before, only more strongly: the only way to correctly use the CRT in the
> kernel is to not use it at all. If you want to use the CRT, write user apps.
Well,I would put it even more strongly(perhaps harshly) - if you want to use the CRT in the kernel
you are simply not qualified for system-level programming, because you don’t understand why UM libraries cannot be used by the KM code…

That, sir, is a complete crock. As I have said many times before, the
WDK library “libcntpr.lib” contains a virtually complete implementation
of the C run-time library, designed for use in the kernel, provided by
our friends at Microsoft. It works. It has for years.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

> xxxxx@hotmail.com wrote:

>> I repeat what I said before, only more strongly: the only way to
>> correctly use the CRT in the
>> kernel is to not use it at all. If you want to use the CRT, write user
>> apps.
> Well,I would put it even more strongly(perhaps harshly) - if you want
> to use the CRT in the kernel
> you are simply not qualified for system-level programming, because you
> don’t understand why UM libraries cannot be used by the KM code…

That, sir, is a complete crock. As I have said many times before, the
WDK library “libcntpr.lib” contains a virtually complete implementation
of the C run-time library, designed for use in the kernel, provided by
our friends at Microsoft. It works. It has for years.

“Virtually complete” != “Complete”. And certainly not “Identical”. Note
also that the error was in assuming that a piece of behavior outside the
specified C standard would be implemented in the kernel, so when it
wasn’t, the OP was annoyed. But the point is that the OP should NOT have
assumed that the C library that is supported in the kernel is in some
bizarre way DIFFERENT from the C Standard, and the call of tolower() with
other than an 8-bit character is undefined by the C Standard. So the
correct approach is (a) RTFM – if the kernel library is C-Standard
compliant, than the only valid FM is the C Standard, not observed behavior
by erroneous use in the user-level Microsoft C library (b) avoid the C
library altogether. While you claim the “Almost complete” C library
functions, is there a document that says what parts are not there? Given
that there are already Rtl functions that accomplish this (some of which
are just macros wrapping the C library calls), why not use the “supported”
methods instead of making invalid assumptions about undefined behavior,
and being disappointed?
joe


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

xxxxx@flounder.com wrote:

> xxxxx@hotmail.com wrote:
>>> I repeat what I said before, only more strongly: the only way to
>>> correctly use the CRT in the kernel is to not use it at all. If you want to use the CRT, write user apps.
>> Well,I would put it even more strongly(perhaps harshly) - if you want
>> to use the CRT in the kernel you are simply not qualified for system-level programming, because you
>> don’t understand why UM libraries cannot be used by the KM code…
> That, sir, is a complete crock. As I have said many times before, the
> WDK library “libcntpr.lib” contains a virtually complete implementation
> of the C run-time library, designed for use in the kernel, provided by
> our friends at Microsoft. It works. It has for years.
“Virtually complete” != “Complete”.

My point was that it is hardly a condemnation of his entire professional
reputation that he expected C run-time support in the kernel, as was
implied above. He saw differing behavior. He asked about it. His
expectations were incorrect, and we corrected those expectations. I
think both of you are castigating him unreasonably over a relatively
arcane question.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Dr. Newcomer: Listen to Mr. Roberts on this one. You’re climbing further and further out on a very rotten limb.

/thread