Zero terminated unicode strings?

Hi,

I just noticed, through achieving a PAGE_FAULT_IN_NONE_PAGED_AREA BSOD, that I’d been slightly foolish.
I’d assumed in one part of my code that UNICODE_STRINGS were zero terminated strings.
This, certainly isn’t the case for UNICODE_STRINGS returned from a call to:

status = FltGetFileNameInformation( pData,
FLT_FILE_NAME_NORMALIZED |
FLT_FILE_NAME_QUERY_ALWAYS_ALLOW_CACHE_LOOKUP,
&pFileNameInformation );

pFileNameInformation->Name is not zero terminated, especially noticable in dbg builds (checked) where my memory allocator fills memory with 0xcd on allocation.

I noticed this when passing pFileNameInformation->Name to a LPCWCHAR* function of my own construction, which assumed a zero terminator - hence the 0xffff max size of the buffer was walked passed and a BSOD ensued.

I am now searching for other places where such assumtions about unicode strings have been made.

My question is, is this (as has been proved in this instance) the correct assumption? Or is there a problem with FltGetFileNameInformation(); // highly unlikely.

So, digging Buffer from:

typedef struct _LSA_UNICODE_STRING {
USHORT Length;
USHORT MaximumLength;
PWSTR Buffer;
} LSA_UNICODE_STRING, *PLSA_UNICODE_STRING, UNICODE_STRING, *PUNICODE_STRING;

and passing it to wcslen() for example would { discounting a) being completely dumb, as of course thats Length/sizeof(WCHAR) }, also be b) v. dangerous.

Just thought I’d post this up for comments.
I suppose its a case of RTM. However, I’m very surprised I’ve not had this type of crash much earlier.

Mike

You can never assume a UNICODE_STRING is terminated. As you have seen this
is a great way to BSOD.


Don Burn (MVP, Windows DDK)
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr
Remove StopSpam to reply

wrote in message news:xxxxx@ntfsd…
> Hi,
>
> I just noticed, through achieving a PAGE_FAULT_IN_NONE_PAGED_AREA BSOD,
> that I’d been slightly foolish.
> I’d assumed in one part of my code that UNICODE_STRINGS were zero
> terminated strings.
> This, certainly isn’t the case for UNICODE_STRINGS returned from a call
> to:
>
> status = FltGetFileNameInformation( pData,
> FLT_FILE_NAME_NORMALIZED |
> FLT_FILE_NAME_QUERY_ALWAYS_ALLOW_CACHE_LOOKUP,
> &pFileNameInformation );
>
> pFileNameInformation->Name is not zero terminated, especially noticable in
> dbg builds (checked) where my memory allocator fills memory with 0xcd on
> allocation.
>
> I noticed this when passing pFileNameInformation->Name to a LPCWCHAR*
> function of my own construction, which assumed a zero terminator - hence
> the 0xffff max size of the buffer was walked passed and a BSOD ensued.
>
> I am now searching for other places where such assumtions about unicode
> strings have been made.
>
> My question is, is this (as has been proved in this instance) the correct
> assumption? Or is there a problem with FltGetFileNameInformation(); //
> highly unlikely.
>
> So, digging Buffer from:
>
> typedef struct _LSA_UNICODE_STRING {
> USHORT Length;
> USHORT MaximumLength;
> PWSTR Buffer;
> } LSA_UNICODE_STRING, *PLSA_UNICODE_STRING, UNICODE_STRING,
> *PUNICODE_STRING;
>
> and passing it to wcslen() for example would { discounting a) being
> completely dumb, as of course thats Length/sizeof(WCHAR) }, also be b) v.
> dangerous.
>
> Just thought I’d post this up for comments.
> I suppose its a case of RTM. However, I’m very surprised I’ve not had this
> type of crash much earlier.
>
> Mike
>

There is in general never a guarantee that UNICODE_STRING has a null beyond the last character in the buffer. The assumption that you may treat all such strings as C-style zero terminated strings is invalid.

  • S

-----Original Message-----
From: xxxxx@scee.net
Sent: Tuesday, November 25, 2008 10:38
To: Windows File Systems Devs Interest List
Subject: [ntfsd] Zero terminated unicode strings?

Hi,

I just noticed, through achieving a PAGE_FAULT_IN_NONE_PAGED_AREA BSOD, that I’d been slightly foolish.
I’d assumed in one part of my code that UNICODE_STRINGS were zero terminated strings.
This, certainly isn’t the case for UNICODE_STRINGS returned from a call to:

status = FltGetFileNameInformation( pData,
FLT_FILE_NAME_NORMALIZED |
FLT_FILE_NAME_QUERY_ALWAYS_ALLOW_CACHE_LOOKUP,
&pFileNameInformation );

pFileNameInformation->Name is not zero terminated, especially noticable in dbg builds (checked) where my memory allocator fills memory with 0xcd on allocation.

I noticed this when passing pFileNameInformation->Name to a LPCWCHAR* function of my own construction, which assumed a zero terminator - hence the 0xffff max size of the buffer was walked passed and a BSOD ensued.

I am now searching for other places where such assumtions about unicode strings have been made.

My question is, is this (as has been proved in this instance) the correct assumption? Or is there a problem with FltGetFileNameInformation(); // highly unlikely.

So, digging Buffer from:

typedef struct _LSA_UNICODE_STRING {
USHORT Length;
USHORT MaximumLength;
PWSTR Buffer;
} LSA_UNICODE_STRING, *PLSA_UNICODE_STRING, UNICODE_STRING, *PUNICODE_STRING;

and passing it to wcslen() for example would { discounting a) being completely dumb, as of course thats Length/sizeof(WCHAR) }, also be b) v. dangerous.

Just thought I’d post this up for comments.
I suppose its a case of RTM. However, I’m very surprised I’ve not had this type of crash much earlier.

Mike


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: xxxxx@valhallalegends.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Mike,

Search the ntdev archives, I recall a thread on this topic a couple
of years back.

The upshot was that under no circumstances can you expect that a
system returned UNICODE_STRING will be null terminated. You’ve come
across one call type, another class of calls are values returned from registry.

The only time you can guarantee a null terminated buffer in the
UNICODE_STRING is if you construct it yourself. I would guess this
has been one of the larger classes of driver fault.

If you really do have to use string function, copy the buffer to
somewhere local and ensure null termination and although a pain at
first, I’ve found the strsafe functions to be invaluable.

Cheers,

Mark.

At 16:37 25/11/2008, xxxxx@scee.net wrote:

Hi,

I just noticed, through achieving a PAGE_FAULT_IN_NONE_PAGED_AREA
BSOD, that I’d been slightly foolish.
I’d assumed in one part of my code that UNICODE_STRINGS were zero
terminated strings.
This, certainly isn’t the case for UNICODE_STRINGS returned from a call to:

status = FltGetFileNameInformation( pData,
FLT_FILE_NAME_NORMALIZED |
FLT_FILE_NAME_QUERY_ALWAYS_ALLOW_CACHE_LOOKUP,
&pFileNameInformation );

pFileNameInformation->Name is not zero terminated, especially
noticable in dbg builds (checked) where my memory allocator fills
memory with 0xcd on allocation.

I noticed this when passing pFileNameInformation->Name to a
LPCWCHAR* function of my own construction, which assumed a zero
terminator - hence the 0xffff max size of the buffer was walked
passed and a BSOD ensued.

I am now searching for other places where such assumtions about
unicode strings have been made.

My question is, is this (as has been proved in this instance) the
correct assumption? Or is there a problem with
FltGetFileNameInformation(); // highly unlikely.

So, digging Buffer from:

typedef struct _LSA_UNICODE_STRING {
USHORT Length;
USHORT MaximumLength;
PWSTR Buffer;
}
LSA_UNICODE_STRING, *PLSA_UNICODE_STRING, UNICODE_STRING, *PUNICODE_STRING;

and passing it to wcslen() for example would { discounting a) being
completely dumb, as of course thats Length/sizeof(WCHAR) }, also be
b) v. dangerous.

Just thought I’d post this up for comments.
I suppose its a case of RTM. However, I’m very surprised I’ve not
had this type of crash much earlier.

Mike

Thanks for the info everyone.

I believe I’ve been lucky so far, through use of my own UtilUnicodeString functions and construction, of avoiding this issue. However, somewhat to my own embarrasment I see now the error of my ways.
To provide a length and then waste memory with additional two byte zeros seems somewhat ott (even for me). PASCAL + C = Rather Unlikely Combination.

Mike

Wasting space, oh, yes, this is a really serious issue. Let’s see: in a
machine with 2GB physical memory, 2bytes additional space…each
NUL-terminated string wastes 0.0000001% of the total address space…you
know, if you had a billion strings this could be a serious problem…

My impression of the reason UNICODE_STRING works as it does was to allow
substrings of a larger string (e.g., a directory path) to be represented
without requiring allocation-and-copy to get a NUL-terminated substring. It
was certainly not done for the purpose of saving the space of the
termination characters (hmmm…12 bytes for a UNICODE_STRING structure to
save 2 bytes of NUL termination, yes, that’s a good tradeoff, let’s do it
that way…each string now takes only 10 additional bytes. Why, we’re down
from allowing a billion strings to only 100 million…). The other reason
was to prevent the classic buffer-overrun because each string carried its
buffer size with it, and with proper use of the calls, buffer overrun would
be impossible. Neither of these goals have anything to do with saving the
NUL terminator space.
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@scee.net
Sent: Tuesday, November 25, 2008 12:46 PM
To: Windows File Systems Devs Interest List
Subject: RE:[ntfsd] Zero terminated unicode strings?

Thanks for the info everyone.

I believe I’ve been lucky so far, through use of my own UtilUnicodeString
functions and construction, of avoiding this issue. However, somewhat to my
own embarrasment I see now the error of my ways.
To provide a length and then waste memory with additional two byte zeros
seems somewhat ott (even for me). PASCAL + C = Rather Unlikely Combination.

Mike


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: xxxxx@flounder.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Just to amplify what Mark said a little bit, the registry strings (REG_SZ & REG_MULTI_SZ) are
supposed to be null terminated, but they just aren’t always in practice.

Good luck,

mm

Mark S. Edwards wrote:

Mike,

Search the ntdev archives, I recall a thread on this topic a couple of
years back.

The upshot was that under no circumstances can you expect that a system
returned UNICODE_STRING will be null terminated. You’ve come across one
call type, another class of calls are values returned from registry.

The only time you can guarantee a null terminated buffer in the
UNICODE_STRING is if you construct it yourself. I would guess this has
been one of the larger classes of driver fault.

If you really do have to use string function, copy the buffer to
somewhere local and ensure null termination and although a pain at
first, I’ve found the strsafe functions to be invaluable.

Cheers,

Mark.

At 16:37 25/11/2008, xxxxx@scee.net wrote:
> Hi,
>
> I just noticed, through achieving a PAGE_FAULT_IN_NONE_PAGED_AREA
> BSOD, that I’d been slightly foolish.
> I’d assumed in one part of my code that UNICODE_STRINGS were zero
> terminated strings.
> This, certainly isn’t the case for UNICODE_STRINGS returned from a
> call to:
>
> status = FltGetFileNameInformation( pData,
> FLT_FILE_NAME_NORMALIZED |
> FLT_FILE_NAME_QUERY_ALWAYS_ALLOW_CACHE_LOOKUP,
> &pFileNameInformation );
>
> pFileNameInformation->Name is not zero terminated, especially
> noticable in dbg builds (checked) where my memory allocator fills
> memory with 0xcd on allocation.
>
> I noticed this when passing pFileNameInformation->Name to a LPCWCHAR*
> function of my own construction, which assumed a zero terminator -
> hence the 0xffff max size of the buffer was walked passed and a BSOD
> ensued.
>
> I am now searching for other places where such assumtions about
> unicode strings have been made.
>
> My question is, is this (as has been proved in this instance) the
> correct assumption? Or is there a problem with
> FltGetFileNameInformation(); // highly unlikely.
>
> So, digging Buffer from:
>
> typedef struct _LSA_UNICODE_STRING {
> USHORT Length;
> USHORT MaximumLength;
> PWSTR Buffer;
> } LSA_UNICODE_STRING, *PLSA_UNICODE_STRING, UNICODE_STRING,
> *PUNICODE_STRING;
>
> and passing it to wcslen() for example would { discounting a) being
> completely dumb, as of course thats Length/sizeof(WCHAR) }, also be b)
> v. dangerous.
>
> Just thought I’d post this up for comments.
> I suppose its a case of RTM. However, I’m very surprised I’ve not had
> this type of crash much earlier.
>
> Mike

Let me add to the clamour … just forget about zero terminated strings in
kernel mode … and when you got used to that avoid zero terminated strings
in user mode unless these are mandated by the api you are using :slight_smile:

The approach I’ve adopted for passing string data between kernel mode and
user mode is that I have zero terminated unicode string with a character (or
byte) count which includes zero terminator. I dont use the zero terminator
in kernel mode - I use the count. I use the zero terminator in user mode
when I have no choice - if I have no choice, then it’s there. The count is
always the size of the buffer I need to store the thing.

wrote in message news:xxxxx@ntfsd…
> Hi,
>
> I just noticed, through achieving a PAGE_FAULT_IN_NONE_PAGED_AREA BSOD,
> that I’d been slightly foolish.
> I’d assumed in one part of my code that UNICODE_STRINGS were zero
> terminated strings.
> This, certainly isn’t the case for UNICODE_STRINGS returned from a call
> to:
>
> status = FltGetFileNameInformation( pData,
> FLT_FILE_NAME_NORMALIZED |
> FLT_FILE_NAME_QUERY_ALWAYS_ALLOW_CACHE_LOOKUP,
> &pFileNameInformation );
>
> pFileNameInformation->Name is not zero terminated, especially noticable in
> dbg builds (checked) where my memory allocator fills memory with 0xcd on
> allocation.
>
> I noticed this when passing pFileNameInformation->Name to a LPCWCHAR*
> function of my own construction, which assumed a zero terminator - hence
> the 0xffff max size of the buffer was walked passed and a BSOD ensued.
>
> I am now searching for other places where such assumtions about unicode
> strings have been made.
>
> My question is, is this (as has been proved in this instance) the correct
> assumption? Or is there a problem with FltGetFileNameInformation(); //
> highly unlikely.
>
> So, digging Buffer from:
>
> typedef struct _LSA_UNICODE_STRING {
> USHORT Length;
> USHORT MaximumLength;
> PWSTR Buffer;
> } LSA_UNICODE_STRING, *PLSA_UNICODE_STRING, UNICODE_STRING,
> *PUNICODE_STRING;
>
> and passing it to wcslen() for example would { discounting a) being
> completely dumb, as of course thats Length/sizeof(WCHAR) }, also be b) v.
> dangerous.
>
> Just thought I’d post this up for comments.
> I suppose its a case of RTM. However, I’m very surprised I’ve not had this
> type of crash much earlier.
>
> Mike
>

In general, the only time you ever see a NUL-terminated string in the kernel
is when you use RtlInitUnicodeString to initialize from a string literal.

When returning strings to user space, it is whatever you have told the user
to expect in terms of return value. In general, most I/O operations do not
guarantee NUL-termination, and users are accustomed to writing something
like

BYTE buffer[MAX_SIZE + 1];
if(!ReadFile(h, buffer, MAX_SIZE, &bytesRead, NULL))
… deal with error
else
{
buffer[bytesRead] = 0;
}

so it is rarely an issue. Note also that this works correctly only for ANSI
input (or MAX_SIZE + sizeof(TCHAR) and store two 0 bytes, left as an
Exercise For The Reader, see below)

At least, since I tend to spend most of my life in user space, that’s what I
write, what I see others writing, and what I teach.

Note that the ANSI/Unicode issue does arise, since
ReadFile/WriteFile/DeviceIoControl are byte-oriented and there is no
marshalling based on the type of call (the Registry calls will do
marshalling of Unicode to ANSI on reads and ANSI to Unicode on writes), but
there’s no ReadFileA/ReadFileW etc.

When using the Undocumented Windows API (Nebbett) from user space, there are
UNICODE_STRINGs returned for APIs that return strings, and they are
definitely *not* NUL-terminated. I’ve done this, and examined the return
values with the debugger.
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Lyndon J Clarke
Sent: Tuesday, November 25, 2008 4:01 PM
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] Zero terminated unicode strings?

Let me add to the clamour … just forget about zero terminated strings in
kernel mode … and when you got used to that avoid zero terminated strings

in user mode unless these are mandated by the api you are using :slight_smile:

The approach I’ve adopted for passing string data between kernel mode and
user mode is that I have zero terminated unicode string with a character (or

byte) count which includes zero terminator. I dont use the zero terminator
in kernel mode - I use the count. I use the zero terminator in user mode
when I have no choice - if I have no choice, then it’s there. The count is
always the size of the buffer I need to store the thing.

wrote in message news:xxxxx@ntfsd…
> Hi,
>
> I just noticed, through achieving a PAGE_FAULT_IN_NONE_PAGED_AREA BSOD,
> that I’d been slightly foolish.
> I’d assumed in one part of my code that UNICODE_STRINGS were zero
> terminated strings.
> This, certainly isn’t the case for UNICODE_STRINGS returned from a call
> to:
>
> status = FltGetFileNameInformation( pData,
> FLT_FILE_NAME_NORMALIZED |
> FLT_FILE_NAME_QUERY_ALWAYS_ALLOW_CACHE_LOOKUP,
> &pFileNameInformation );
>
> pFileNameInformation->Name is not zero terminated, especially noticable in

> dbg builds (checked) where my memory allocator fills memory with 0xcd on
> allocation.
>
> I noticed this when passing pFileNameInformation->Name to a LPCWCHAR*
> function of my own construction, which assumed a zero terminator - hence
> the 0xffff max size of the buffer was walked passed and a BSOD ensued.
>
> I am now searching for other places where such assumtions about unicode
> strings have been made.
>
> My question is, is this (as has been proved in this instance) the correct
> assumption? Or is there a problem with FltGetFileNameInformation(); //
> highly unlikely.
>
> So, digging Buffer from:
>
> typedef struct _LSA_UNICODE_STRING {
> USHORT Length;
> USHORT MaximumLength;
> PWSTR Buffer;
> } LSA_UNICODE_STRING, *PLSA_UNICODE_STRING, UNICODE_STRING,
> *PUNICODE_STRING;
>
> and passing it to wcslen() for example would { discounting a) being
> completely dumb, as of course thats Length/sizeof(WCHAR) }, also be b) v.
> dangerous.
>
> Just thought I’d post this up for comments.
> I suppose its a case of RTM. However, I’m very surprised I’ve not had this

> type of crash much earlier.
>
> Mike
>


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: xxxxx@flounder.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

The whole registry storing nulls in string values is a big cluster, IMO. I suspect the problem has its roots in the awkward termination characteristics expected of REG_MULTI_SZ. If it were not for that, it would have been much better off were strings in the registry never having been “guaranteed” to be null terminated. I say “guaranteed”, because although that’s what you’re SUPPOSED to see, you might not see it, in which case you need to handle it anyway or risk compromising the system (if the value could be set by an untrusted user mode caller).

Null termination here in the registry is tricky, because while the string value in the registry is supposed to be null terminated, you probably do NOT want to include the terminating null in the actual string you get back from the registry. So you need to check for a terminating null and decrement the length of the buffer appropriately. But you don’t want to do that if you got back a buffer length < 2, or you’d underrun.

To make matters even more fun, user mode code has been (in)famous for misusing the registry APIs for years and forgetting to include null termination in the registry APIs for RegSetValue(Ex). To make matters even *more* interesting, they usually got away with it if they were using the ANSI (RegSetValue(Ex)A) versions, because those internally copied the user supplied string to an internal Unicode buffer, expanded it, and then applied the correct (Unicode string) null termination before passing the buffer to the raw system service, thus papering over the fact that their callers were broken. The Unicode (RegSetValueEx(W)) APIs did not used to do this (IIRC), though they now do for compatibility with porting Win9x apps to Unicode APIs, where said Win9x apps sized their strings wrong.

Even with the compatibility hacks, it is STILL possible for an evil user mode caller to call the system service directly without specifying a null terminiator.

There is a similar, equally delightful series of “worked-by-accident-until-I-was-Unicode” and other fun things with respect to user mode apps retrieving registry strings and not just setting them, IIRC.

The net of all of this is:

  • You might get back an empty string from the registry API (length zero). In this case, you shouldn’t try to strip the null terminator from it or you’d decrease the length of your UNICODE_STRING below zero, resulting in badness.
  • You might get back a string with a length that isn’t a legal multiple of WCHAR, which you need to handle, as CM doesn’t sanitize it for you.
  • You might get back a string with a missing null terminator if you’re using the Unicode build of a buggy app on an old OS (IIRC), in which case you probably *also* don’t want to try and chop off the extra null terminator before saving the string away in a UNICODE_STRING. Alternatively, you could just consider the registry data just plain bogus here and not handle this case.
  • You could get back a well-formed registry string with a null terminator, at which point you probably want to chop off the null terminator before storing the string in a UNICODE_STRING structure. (Otherwise, if it was, say, a filename, you might try to create a file with a null at the end of the filename, which wouldn’t be very nice.)

Like I said, registry and null terminated strings is a big black morass.

  • S

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Lyndon J Clarke
Sent: Tuesday, November 25, 2008 4:01 PM
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] Zero terminated unicode strings?

Let me add to the clamour … just forget about zero terminated strings in
kernel mode … and when you got used to that avoid zero terminated strings
in user mode unless these are mandated by the api you are using :slight_smile:

The approach I’ve adopted for passing string data between kernel mode and
user mode is that I have zero terminated unicode string with a character (or
byte) count which includes zero terminator. I dont use the zero terminator
in kernel mode - I use the count. I use the zero terminator in user mode
when I have no choice - if I have no choice, then it’s there. The count is
always the size of the buffer I need to store the thing.

wrote in message news:xxxxx@ntfsd…
> Hi,
>
> I just noticed, through achieving a PAGE_FAULT_IN_NONE_PAGED_AREA BSOD,
> that I’d been slightly foolish.
> I’d assumed in one part of my code that UNICODE_STRINGS were zero
> terminated strings.
> This, certainly isn’t the case for UNICODE_STRINGS returned from a call
> to:
>
> status = FltGetFileNameInformation( pData,
> FLT_FILE_NAME_NORMALIZED |
> FLT_FILE_NAME_QUERY_ALWAYS_ALLOW_CACHE_LOOKUP,
> &pFileNameInformation );
>
> pFileNameInformation->Name is not zero terminated, especially noticable in
> dbg builds (checked) where my memory allocator fills memory with 0xcd on
> allocation.
>
> I noticed this when passing pFileNameInformation->Name to a LPCWCHAR*
> function of my own construction, which assumed a zero terminator - hence
> the 0xffff max size of the buffer was walked passed and a BSOD ensued.
>
> I am now searching for other places where such assumtions about unicode
> strings have been made.
>
> My question is, is this (as has been proved in this instance) the correct
> assumption? Or is there a problem with FltGetFileNameInformation(); //
> highly unlikely.
>
> So, digging Buffer from:
>
> typedef struct _LSA_UNICODE_STRING {
> USHORT Length;
> USHORT MaximumLength;
> PWSTR Buffer;
> } LSA_UNICODE_STRING, *PLSA_UNICODE_STRING, UNICODE_STRING,
> *PUNICODE_STRING;
>
> and passing it to wcslen() for example would { discounting a) being
> completely dumb, as of course thats Length/sizeof(WCHAR) }, also be b) v.
> dangerous.
>
> Just thought I’d post this up for comments.
> I suppose its a case of RTM. However, I’m very surprised I’ve not had this
> type of crash much earlier.
>
> Mike
>


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: xxxxx@valhallalegends.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Just to be extra clear and clarify Joe’s statement here, RtlInitUnicodeString does NOT include the null terminator in the string that it has initialized. “The string” is defined as [UnicodeString->Buffer, (PWCHAR)((PCHAR)UnicodeString->Buffer + UnicodeString->Length)].

The *buffer* may have a null at the end of it, but as far as the UNICODE_STRING struct is there, it won’t be included in the buffer count.

Joe is referring to the C-style string argument to RtlInitUnicodeString, which is indeed null terminated. The resultant UNICODE_STRING structure does not include the terminating null byte in its count, however.

  • S

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Joseph M. Newcomer
Sent: Tuesday, November 25, 2008 4:18 PM
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] Zero terminated unicode strings?

In general, the only time you ever see a NUL-terminated string in the kernel
is when you use RtlInitUnicodeString to initialize from a string literal.

When returning strings to user space, it is whatever you have told the user
to expect in terms of return value. In general, most I/O operations do not
guarantee NUL-termination, and users are accustomed to writing something
like

BYTE buffer[MAX_SIZE + 1];
if(!ReadFile(h, buffer, MAX_SIZE, &bytesRead, NULL))
… deal with error
else
{
buffer[bytesRead] = 0;
}

so it is rarely an issue. Note also that this works correctly only for ANSI
input (or MAX_SIZE + sizeof(TCHAR) and store two 0 bytes, left as an
Exercise For The Reader, see below)

At least, since I tend to spend most of my life in user space, that’s what I
write, what I see others writing, and what I teach.

Note that the ANSI/Unicode issue does arise, since
ReadFile/WriteFile/DeviceIoControl are byte-oriented and there is no
marshalling based on the type of call (the Registry calls will do
marshalling of Unicode to ANSI on reads and ANSI to Unicode on writes), but
there’s no ReadFileA/ReadFileW etc.

When using the Undocumented Windows API (Nebbett) from user space, there are
UNICODE_STRINGs returned for APIs that return strings, and they are
definitely *not* NUL-terminated. I’ve done this, and examined the return
values with the debugger.
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Lyndon J Clarke
Sent: Tuesday, November 25, 2008 4:01 PM
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] Zero terminated unicode strings?

Let me add to the clamour … just forget about zero terminated strings in
kernel mode … and when you got used to that avoid zero terminated strings

in user mode unless these are mandated by the api you are using :slight_smile:

The approach I’ve adopted for passing string data between kernel mode and
user mode is that I have zero terminated unicode string with a character (or

byte) count which includes zero terminator. I dont use the zero terminator
in kernel mode - I use the count. I use the zero terminator in user mode
when I have no choice - if I have no choice, then it’s there. The count is
always the size of the buffer I need to store the thing.

wrote in message news:xxxxx@ntfsd…
> Hi,
>
> I just noticed, through achieving a PAGE_FAULT_IN_NONE_PAGED_AREA BSOD,
> that I’d been slightly foolish.
> I’d assumed in one part of my code that UNICODE_STRINGS were zero
> terminated strings.
> This, certainly isn’t the case for UNICODE_STRINGS returned from a call
> to:
>
> status = FltGetFileNameInformation( pData,
> FLT_FILE_NAME_NORMALIZED |
> FLT_FILE_NAME_QUERY_ALWAYS_ALLOW_CACHE_LOOKUP,
> &pFileNameInformation );
>
> pFileNameInformation->Name is not zero terminated, especially noticable in

> dbg builds (checked) where my memory allocator fills memory with 0xcd on
> allocation.
>
> I noticed this when passing pFileNameInformation->Name to a LPCWCHAR*
> function of my own construction, which assumed a zero terminator - hence
> the 0xffff max size of the buffer was walked passed and a BSOD ensued.
>
> I am now searching for other places where such assumtions about unicode
> strings have been made.
>
> My question is, is this (as has been proved in this instance) the correct
> assumption? Or is there a problem with FltGetFileNameInformation(); //
> highly unlikely.
>
> So, digging Buffer from:
>
> typedef struct _LSA_UNICODE_STRING {
> USHORT Length;
> USHORT MaximumLength;
> PWSTR Buffer;
> } LSA_UNICODE_STRING, *PLSA_UNICODE_STRING, UNICODE_STRING,
> *PUNICODE_STRING;
>
> and passing it to wcslen() for example would { discounting a) being
> completely dumb, as of course thats Length/sizeof(WCHAR) }, also be b) v.
> dangerous.
>
> Just thought I’d post this up for comments.
> I suppose its a case of RTM. However, I’m very surprised I’ve not had this

> type of crash much earlier.
>
> Mike
>


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: xxxxx@flounder.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: xxxxx@valhallalegends.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

I refer to custom ioctl where I’m in control of format :slight_smile:

“Joseph M. Newcomer” wrote in message
news:xxxxx@ntfsd…
> In general, the only time you ever see a NUL-terminated string in the
> kernel
> is when you use RtlInitUnicodeString to initialize from a string literal.
>
> When returning strings to user space, it is whatever you have told the
> user
> to expect in terms of return value. In general, most I/O operations do
> not
> guarantee NUL-termination, and users are accustomed to writing something
> like
>
> BYTE buffer[MAX_SIZE + 1];
> if(!ReadFile(h, buffer, MAX_SIZE, &bytesRead, NULL))
> … deal with error
> else
> {
> buffer[bytesRead] = 0;
> }
>
> so it is rarely an issue. Note also that this works correctly only for
> ANSI
> input (or MAX_SIZE + sizeof(TCHAR) and store two 0 bytes, left as an
> Exercise For The Reader, see below)
>
> At least, since I tend to spend most of my life in user space, that’s what
> I
> write, what I see others writing, and what I teach.
>
> Note that the ANSI/Unicode issue does arise, since
> ReadFile/WriteFile/DeviceIoControl are byte-oriented and there is no
> marshalling based on the type of call (the Registry calls will do
> marshalling of Unicode to ANSI on reads and ANSI to Unicode on writes),
> but
> there’s no ReadFileA/ReadFileW etc.
>
> When using the Undocumented Windows API (Nebbett) from user space, there
> are
> UNICODE_STRINGs returned for APIs that return strings, and they are
> definitely not NUL-terminated. I’ve done this, and examined the return
> values with the debugger.
> joe
>
>
> -----Original Message-----
> From: xxxxx@lists.osr.com
> [mailto:xxxxx@lists.osr.com] On Behalf Of Lyndon J Clarke
> Sent: Tuesday, November 25, 2008 4:01 PM
> To: Windows File Systems Devs Interest List
> Subject: Re:[ntfsd] Zero terminated unicode strings?
>
> Let me add to the clamour … just forget about zero terminated strings
> in
> kernel mode … and when you got used to that avoid zero terminated
> strings
>
> in user mode unless these are mandated by the api you are using :slight_smile:
>
> The approach I’ve adopted for passing string data between kernel mode and
> user mode is that I have zero terminated unicode string with a character
> (or
>
> byte) count which includes zero terminator. I dont use the zero terminator
> in kernel mode - I use the count. I use the zero terminator in user mode
> when I have no choice - if I have no choice, then it’s there. The count is
> always the size of the buffer I need to store the thing.
>
> wrote in message news:xxxxx@ntfsd…
>> Hi,
>>
>> I just noticed, through achieving a PAGE_FAULT_IN_NONE_PAGED_AREA BSOD,
>> that I’d been slightly foolish.
>> I’d assumed in one part of my code that UNICODE_STRINGS were zero
>> terminated strings.
>> This, certainly isn’t the case for UNICODE_STRINGS returned from a call
>> to:
>>
>> status = FltGetFileNameInformation( pData,
>> FLT_FILE_NAME_NORMALIZED |
>> FLT_FILE_NAME_QUERY_ALWAYS_ALLOW_CACHE_LOOKUP,
>> &pFileNameInformation );
>>
>> pFileNameInformation->Name is not zero terminated, especially noticable
>> in
>
>> dbg builds (checked) where my memory allocator fills memory with 0xcd on
>> allocation.
>>
>> I noticed this when passing pFileNameInformation->Name to a LPCWCHAR*
>> function of my own construction, which assumed a zero terminator - hence
>> the 0xffff max size of the buffer was walked passed and a BSOD ensued.
>>
>> I am now searching for other places where such assumtions about unicode
>> strings have been made.
>>
>> My question is, is this (as has been proved in this instance) the correct
>> assumption? Or is there a problem with FltGetFileNameInformation(); //
>> highly unlikely.
>>
>> So, digging Buffer from:
>>
>> typedef struct _LSA_UNICODE_STRING {
>> USHORT Length;
>> USHORT MaximumLength;
>> PWSTR Buffer;
>> } LSA_UNICODE_STRING, *PLSA_UNICODE_STRING, UNICODE_STRING,
>> *PUNICODE_STRING;
>>
>> and passing it to wcslen() for example would { discounting a) being
>> completely dumb, as of course thats Length/sizeof(WCHAR) }, also be b) v.
>> dangerous.
>>
>> Just thought I’d post this up for comments.
>> I suppose its a case of RTM. However, I’m very surprised I’ve not had
>> this
>
>> type of crash much earlier.
>>
>> Mike
>>
>
>
>
> —
> NTFSD is sponsored by OSR
>
> For our schedule debugging and file system seminars
> (including our new fs mini-filter seminar) visit:
> http://www.osr.com/seminars
>
> You are currently subscribed to ntfsd as: xxxxx@flounder.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>
> –
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
>

The approach I’ve adopted for passing string data between kernel mode and
user mode …

I completely agree with everyhing you write about strings from registry api;
how could anyone not agree!

The approach I’ve adopted for passing string data between kernel mode and
user mode …

I hope nobody passes UNICODE_STRING between user mode and kernel mode. It’s
got a pointer it it of course.

I was referring to custom ioctl where I’m in control of the format;
elsewhere I’m as much victim of the nt developers as the next chap :wink:

“Skywing” wrote in message news:xxxxx@ntfsd…
Just to be extra clear and clarify Joe’s statement here,
RtlInitUnicodeString does NOT include the null terminator in the string that
it has initialized. “The string” is defined as [UnicodeString->Buffer,
(PWCHAR)((PCHAR)UnicodeString->Buffer + UnicodeString->Length)].

The buffer may have a null at the end of it, but as far as the
UNICODE_STRING struct is there, it won’t be included in the buffer count.

Joe is referring to the C-style string argument to RtlInitUnicodeString,
which is indeed null terminated. The resultant UNICODE_STRING structure
does not include the terminating null byte in its count, however.

- S

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Joseph M. Newcomer
Sent: Tuesday, November 25, 2008 4:18 PM
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] Zero terminated unicode strings?

In general, the only time you ever see a NUL-terminated string in the kernel
is when you use RtlInitUnicodeString to initialize from a string literal.

When returning strings to user space, it is whatever you have told the user
to expect in terms of return value. In general, most I/O operations do not
guarantee NUL-termination, and users are accustomed to writing something
like

BYTE buffer[MAX_SIZE + 1];
if(!ReadFile(h, buffer, MAX_SIZE, &bytesRead, NULL))
… deal with error
else
{
buffer[bytesRead] = 0;
}

so it is rarely an issue. Note also that this works correctly only for ANSI
input (or MAX_SIZE + sizeof(TCHAR) and store two 0 bytes, left as an
Exercise For The Reader, see below)

At least, since I tend to spend most of my life in user space, that’s what I
write, what I see others writing, and what I teach.

Note that the ANSI/Unicode issue does arise, since
ReadFile/WriteFile/DeviceIoControl are byte-oriented and there is no
marshalling based on the type of call (the Registry calls will do
marshalling of Unicode to ANSI on reads and ANSI to Unicode on writes), but
there’s no ReadFileA/ReadFileW etc.

When using the Undocumented Windows API (Nebbett) from user space, there are
UNICODE_STRINGs returned for APIs that return strings, and they are
definitely not NUL-terminated. I’ve done this, and examined the return
values with the debugger.
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Lyndon J Clarke
Sent: Tuesday, November 25, 2008 4:01 PM
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] Zero terminated unicode strings?

Let me add to the clamour … just forget about zero terminated strings in
kernel mode … and when you got used to that avoid zero terminated strings

in user mode unless these are mandated by the api you are using :slight_smile:

The approach I’ve adopted for passing string data between kernel mode and
user mode is that I have zero terminated unicode string with a character (or

byte) count which includes zero terminator. I dont use the zero terminator
in kernel mode - I use the count. I use the zero terminator in user mode
when I have no choice - if I have no choice, then it’s there. The count is
always the size of the buffer I need to store the thing.

wrote in message news:xxxxx@ntfsd…
> Hi,
>
> I just noticed, through achieving a PAGE_FAULT_IN_NONE_PAGED_AREA BSOD,
> that I’d been slightly foolish.
> I’d assumed in one part of my code that UNICODE_STRINGS were zero
> terminated strings.
> This, certainly isn’t the case for UNICODE_STRINGS returned from a call
> to:
>
> status = FltGetFileNameInformation( pData,
> FLT_FILE_NAME_NORMALIZED |
> FLT_FILE_NAME_QUERY_ALWAYS_ALLOW_CACHE_LOOKUP,
> &pFileNameInformation );
>
> pFileNameInformation->Name is not zero terminated, especially noticable in

> dbg builds (checked) where my memory allocator fills memory with 0xcd on
> allocation.
>
> I noticed this when passing pFileNameInformation->Name to a LPCWCHAR*
> function of my own construction, which assumed a zero terminator - hence
> the 0xffff max size of the buffer was walked passed and a BSOD ensued.
>
> I am now searching for other places where such assumtions about unicode
> strings have been made.
>
> My question is, is this (as has been proved in this instance) the correct
> assumption? Or is there a problem with FltGetFileNameInformation(); //
> highly unlikely.
>
> So, digging Buffer from:
>
> typedef struct _LSA_UNICODE_STRING {
> USHORT Length;
> USHORT MaximumLength;
> PWSTR Buffer;
> } LSA_UNICODE_STRING, *PLSA_UNICODE_STRING, UNICODE_STRING,
> *PUNICODE_STRING;
>
> and passing it to wcslen() for example would { discounting a) being
> completely dumb, as of course thats Length/sizeof(WCHAR) }, also be b) v.
> dangerous.
>
> Just thought I’d post this up for comments.
> I suppose its a case of RTM. However, I’m very surprised I’ve not had this

> type of crash much earlier.
>
> Mike
>


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: xxxxx@flounder.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: xxxxx@valhallalegends.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

The way I handle this in user space is to store a NUL character after the
string (making sure that I really have a buffer big enough: query the length
of the string, then allocate a buffer of that length+1, then read the
string). Then use _tcslen/lstrlen to get the length, or, as more typical in
my case, just call the CString constructor pointing to the string buffer.
If it had a NUL terminator already, I get a string which has the right
shape. If it didn’t, I get a string which has the right shape. Win-win.
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Skywing
Sent: Tuesday, November 25, 2008 4:28 PM
To: Windows File Systems Devs Interest List
Subject: RE: Re:[ntfsd] Zero terminated unicode strings?

The whole registry storing nulls in string values is a big cluster, IMO. I
suspect the problem has its roots in the awkward termination characteristics
expected of REG_MULTI_SZ. If it were not for that, it would have been much
better off were strings in the registry never having been “guaranteed” to be
null terminated. I say “guaranteed”, because although that’s what you’re
SUPPOSED to see, you might not see it, in which case you need to handle it
anyway or risk compromising the system (if the value could be set by an
untrusted user mode caller).

Null termination here in the registry is tricky, because while the string
value in the registry is supposed to be null terminated, you probably do NOT
want to include the terminating null in the actual string you get back from
the registry. So you need to check for a terminating null and decrement the
length of the buffer appropriately. But you don’t want to do that if you
got back a buffer length < 2, or you’d underrun.

To make matters even more fun, user mode code has been (in)famous for
misusing the registry APIs for years and forgetting to include null
termination in the registry APIs for RegSetValue(Ex). To make matters even
*more* interesting, they usually got away with it if they were using the
ANSI (RegSetValue(Ex)A) versions, because those internally copied the user
supplied string to an internal Unicode buffer, expanded it, and then applied
the correct (Unicode string) null termination before passing the buffer to
the raw system service, thus papering over the fact that their callers were
broken. The Unicode (RegSetValueEx(W)) APIs did not used to do this (IIRC),
though they now do for compatibility with porting Win9x apps to Unicode
APIs, where said Win9x apps sized their strings wrong.

Even with the compatibility hacks, it is STILL possible for an evil user
mode caller to call the system service directly without specifying a null
terminiator.

There is a similar, equally delightful series of
“worked-by-accident-until-I-was-Unicode” and other fun things with respect
to user mode apps retrieving registry strings and not just setting them,
IIRC.

The net of all of this is:

  • You might get back an empty string from the registry API (length zero).
    In this case, you shouldn’t try to strip the null terminator from it or
    you’d decrease the length of your UNICODE_STRING below zero, resulting in
    badness.
  • You might get back a string with a length that isn’t a legal multiple of
    WCHAR, which you need to handle, as CM doesn’t sanitize it for you.
  • You might get back a string with a missing null terminator if you’re using
    the Unicode build of a buggy app on an old OS (IIRC), in which case you
    probably *also* don’t want to try and chop off the extra null terminator
    before saving the string away in a UNICODE_STRING. Alternatively, you could
    just consider the registry data just plain bogus here and not handle this
    case.
  • You could get back a well-formed registry string with a null terminator,
    at which point you probably want to chop off the null terminator before
    storing the string in a UNICODE_STRING structure. (Otherwise, if it was,
    say, a filename, you might try to create a file with a null at the end of
    the filename, which wouldn’t be very nice.)

Like I said, registry and null terminated strings is a big black morass.

  • S

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Lyndon J Clarke
Sent: Tuesday, November 25, 2008 4:01 PM
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] Zero terminated unicode strings?

Let me add to the clamour … just forget about zero terminated strings in
kernel mode … and when you got used to that avoid zero terminated strings

in user mode unless these are mandated by the api you are using :slight_smile:

The approach I’ve adopted for passing string data between kernel mode and
user mode is that I have zero terminated unicode string with a character (or

byte) count which includes zero terminator. I dont use the zero terminator
in kernel mode - I use the count. I use the zero terminator in user mode
when I have no choice - if I have no choice, then it’s there. The count is
always the size of the buffer I need to store the thing.

wrote in message news:xxxxx@ntfsd…
> Hi,
>
> I just noticed, through achieving a PAGE_FAULT_IN_NONE_PAGED_AREA BSOD,
> that I’d been slightly foolish.
> I’d assumed in one part of my code that UNICODE_STRINGS were zero
> terminated strings.
> This, certainly isn’t the case for UNICODE_STRINGS returned from a call
> to:
>
> status = FltGetFileNameInformation( pData,
> FLT_FILE_NAME_NORMALIZED |
> FLT_FILE_NAME_QUERY_ALWAYS_ALLOW_CACHE_LOOKUP,
> &pFileNameInformation );
>
> pFileNameInformation->Name is not zero terminated, especially noticable in

> dbg builds (checked) where my memory allocator fills memory with 0xcd on
> allocation.
>
> I noticed this when passing pFileNameInformation->Name to a LPCWCHAR*
> function of my own construction, which assumed a zero terminator - hence
> the 0xffff max size of the buffer was walked passed and a BSOD ensued.
>
> I am now searching for other places where such assumtions about unicode
> strings have been made.
>
> My question is, is this (as has been proved in this instance) the correct
> assumption? Or is there a problem with FltGetFileNameInformation(); //
> highly unlikely.
>
> So, digging Buffer from:
>
> typedef struct _LSA_UNICODE_STRING {
> USHORT Length;
> USHORT MaximumLength;
> PWSTR Buffer;
> } LSA_UNICODE_STRING, *PLSA_UNICODE_STRING, UNICODE_STRING,
> *PUNICODE_STRING;
>
> and passing it to wcslen() for example would { discounting a) being
> completely dumb, as of course thats Length/sizeof(WCHAR) }, also be b) v.
> dangerous.
>
> Just thought I’d post this up for comments.
> I suppose its a case of RTM. However, I’m very surprised I’ve not had this

> type of crash much earlier.
>
> Mike
>


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: xxxxx@valhallalegends.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: unknown lmsubst tag argument: ‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Yes. Suppose the string was

L"123"

This is stored as the byte sequence (in hex)

31 00 32 00 33 00 00 00


L’1’ L’2’ L’3’ L’\0’

The NUL terminator is added because the definition of the C language
requires it.

If I do

UNICODE_STRING str;
RtlInitUnicodeString(str, L"123");

The UNICODE_STRING structure will reveal that the .Length is 6 bytes, the
.MaximumLength will be 6 bytes, and the .Buffer pointer will point to the
string specified. Note that string literals are normally stored into a
write-protected segment, so an attempt to append to or otherwise modify the
contents of the UNICODE_STRING will, in such a case, cause an access fault.

I do find it odd that most DDI calls do not use the const specification on
UNICODE_STRING values (or pretty much anything other objects) that are not
modified (e.g., IoCreateSymbolicLink specifies two PUNICODE_STRING
arguments, but since it does not modify either, they should be const
PUNICODE_STRING arguments). In general the failure to use const in the
kernel is dismaying.
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Skywing
Sent: Tuesday, November 25, 2008 4:32 PM
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] Zero terminated unicode strings?

Just to be extra clear and clarify Joe’s statement here,
RtlInitUnicodeString does NOT include the null terminator in the string that
it has initialized. “The string” is defined as [UnicodeString->Buffer,
(PWCHAR)((PCHAR)UnicodeString->Buffer + UnicodeString->Length)].

The *buffer* may have a null at the end of it, but as far as the
UNICODE_STRING struct is there, it won’t be included in the buffer count.

Joe is referring to the C-style string argument to RtlInitUnicodeString,
which is indeed null terminated. The resultant UNICODE_STRING structure
does not include the terminating null byte in its count, however.

  • S

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Joseph M. Newcomer
Sent: Tuesday, November 25, 2008 4:18 PM
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] Zero terminated unicode strings?

In general, the only time you ever see a NUL-terminated string in the kernel
is when you use RtlInitUnicodeString to initialize from a string literal.

When returning strings to user space, it is whatever you have told the user
to expect in terms of return value. In general, most I/O operations do not
guarantee NUL-termination, and users are accustomed to writing something
like

BYTE buffer[MAX_SIZE + 1];
if(!ReadFile(h, buffer, MAX_SIZE, &bytesRead, NULL))
… deal with error
else
{
buffer[bytesRead] = 0;
}

so it is rarely an issue. Note also that this works correctly only for ANSI
input (or MAX_SIZE + sizeof(TCHAR) and store two 0 bytes, left as an
Exercise For The Reader, see below)

At least, since I tend to spend most of my life in user space, that’s what I
write, what I see others writing, and what I teach.

Note that the ANSI/Unicode issue does arise, since
ReadFile/WriteFile/DeviceIoControl are byte-oriented and there is no
marshalling based on the type of call (the Registry calls will do
marshalling of Unicode to ANSI on reads and ANSI to Unicode on writes), but
there’s no ReadFileA/ReadFileW etc.

When using the Undocumented Windows API (Nebbett) from user space, there are
UNICODE_STRINGs returned for APIs that return strings, and they are
definitely *not* NUL-terminated. I’ve done this, and examined the return
values with the debugger.
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Lyndon J Clarke
Sent: Tuesday, November 25, 2008 4:01 PM
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] Zero terminated unicode strings?

Let me add to the clamour … just forget about zero terminated strings in
kernel mode … and when you got used to that avoid zero terminated strings

in user mode unless these are mandated by the api you are using :slight_smile:

The approach I’ve adopted for passing string data between kernel mode and
user mode is that I have zero terminated unicode string with a character (or

byte) count which includes zero terminator. I dont use the zero terminator
in kernel mode - I use the count. I use the zero terminator in user mode
when I have no choice - if I have no choice, then it’s there. The count is
always the size of the buffer I need to store the thing.

wrote in message news:xxxxx@ntfsd…
> Hi,
>
> I just noticed, through achieving a PAGE_FAULT_IN_NONE_PAGED_AREA BSOD,
> that I’d been slightly foolish.
> I’d assumed in one part of my code that UNICODE_STRINGS were zero
> terminated strings.
> This, certainly isn’t the case for UNICODE_STRINGS returned from a call
> to:
>
> status = FltGetFileNameInformation( pData,
> FLT_FILE_NAME_NORMALIZED |
> FLT_FILE_NAME_QUERY_ALWAYS_ALLOW_CACHE_LOOKUP,
> &pFileNameInformation );
>
> pFileNameInformation->Name is not zero terminated, especially noticable in

> dbg builds (checked) where my memory allocator fills memory with 0xcd on
> allocation.
>
> I noticed this when passing pFileNameInformation->Name to a LPCWCHAR*
> function of my own construction, which assumed a zero terminator - hence
> the 0xffff max size of the buffer was walked passed and a BSOD ensued.
>
> I am now searching for other places where such assumtions about unicode
> strings have been made.
>
> My question is, is this (as has been proved in this instance) the correct
> assumption? Or is there a problem with FltGetFileNameInformation(); //
> highly unlikely.
>
> So, digging Buffer from:
>
> typedef struct _LSA_UNICODE_STRING {
> USHORT Length;
> USHORT MaximumLength;
> PWSTR Buffer;
> } LSA_UNICODE_STRING, *PLSA_UNICODE_STRING, UNICODE_STRING,
> *PUNICODE_STRING;
>
> and passing it to wcslen() for example would { discounting a) being
> completely dumb, as of course thats Length/sizeof(WCHAR) }, also be b) v.
> dangerous.
>
> Just thought I’d post this up for comments.
> I suppose its a case of RTM. However, I’m very surprised I’ve not had this

> type of crash much earlier.
>
> Mike
>


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: xxxxx@flounder.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: xxxxx@valhallalegends.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: unknown lmsubst tag argument: ‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Well, you may be in charge of the format, but then the question is =
whether
or not the user can RTFM.

For example, some users will write

BYTE buffer[1024];
::ZeroMemory(buffer, sizeof(buffer));
ReadFile(h, buffer, sizeof(buffer), &bytesRead, NULL);

and be happy because they’ve zeroed out the buffer and therefore, even
though you did not put a NUL at the end, they never notice their code is
actually incorrect (if they read 1024 bytes, it would be in error, but =
if it
was followed by an int that never exceeded 64K (Unicode) or 16M (ANSI), =
it
would still give the illusion that it had worked).

If, on the other hand, they just write

BYTE buffer[1024];
ReadFile(…as above…);

then in debug mode the rest of the buffer is filled with 0xCC bytes (a =
weird
symbol in Unicode), and their code will fail, because they wrote it =
wrong.
So their solution is to always zero the buffer, which is time-consuming =
and
unnecessary. =20

So you have a choice, and it doesn’t hurt to show them how to write the =
code
correctly; nobody ever lost by underestimating the naivet=E9 of an
inexperienced programmer.
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Lyndon J Clarke
Sent: Tuesday, November 25, 2008 4:37 PM
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] Zero terminated unicode strings?

I refer to custom ioctl where I’m in control of format :slight_smile:

“Joseph M. Newcomer” wrote in message=20
news:xxxxx@ntfsd…
> In general, the only time you ever see a NUL-terminated string in the=20
> kernel
> is when you use RtlInitUnicodeString to initialize from a string =
literal.
>
> When returning strings to user space, it is whatever you have told the =

> user
> to expect in terms of return value. In general, most I/O operations =
do=20
> not
> guarantee NUL-termination, and users are accustomed to writing =
something
> like
>
> BYTE buffer[MAX_SIZE + 1];
> if(!ReadFile(h, buffer, MAX_SIZE, &bytesRead, NULL))
> … deal with error
> else
> {
> buffer[bytesRead] =3D 0;
> }
>
> so it is rarely an issue. Note also that this works correctly only =
for=20
> ANSI
> input (or MAX_SIZE + sizeof(TCHAR) and store two 0 bytes, left as an
> Exercise For The Reader, see below)
>
> At least, since I tend to spend most of my life in user space, that’s =
what

> I
> write, what I see others writing, and what I teach.
>
> Note that the ANSI/Unicode issue does arise, since
> ReadFile/WriteFile/DeviceIoControl are byte-oriented and there is no
> marshalling based on the type of call (the Registry calls will do
> marshalling of Unicode to ANSI on reads and ANSI to Unicode on =
writes),=20
> but
> there’s no ReadFileA/ReadFileW etc.
>
> When using the Undocumented Windows API (Nebbett) from user space, =
there=20
> are
> UNICODE_STRINGs returned for APIs that return strings, and they are
> definitely not NUL-terminated. I’ve done this, and examined the =
return
> values with the debugger.
> joe
>
>
> -----Original Message-----
> From: xxxxx@lists.osr.com
> [mailto:xxxxx@lists.osr.com] On Behalf Of Lyndon J =
Clarke
> Sent: Tuesday, November 25, 2008 4:01 PM
> To: Windows File Systems Devs Interest List
> Subject: Re:[ntfsd] Zero terminated unicode strings?
>
> Let me add to the clamour … just forget about zero terminated =
strings=20
> in
> kernel mode … and when you got used to that avoid zero terminated=20
> strings
>
> in user mode unless these are mandated by the api you are using :slight_smile:
>
> The approach I’ve adopted for passing string data between kernel mode =
and
> user mode is that I have zero terminated unicode string with a =
character=20
> (or
>
> byte) count which includes zero terminator. I dont use the zero =
terminator
> in kernel mode - I use the count. I use the zero terminator in user =
mode
> when I have no choice - if I have no choice, then it’s there. The =
count is
> always the size of the buffer I need to store the thing.
>
> wrote in message news:xxxxx@ntfsd…
>> Hi,
>>
>> I just noticed, through achieving a PAGE_FAULT_IN_NONE_PAGED_AREA =
BSOD,
>> that I’d been slightly foolish.
>> I’d assumed in one part of my code that UNICODE_STRINGS were zero
>> terminated strings.
>> This, certainly isn’t the case for UNICODE_STRINGS returned from a =
call
>> to:
>>
>> status =3D FltGetFileNameInformation( pData,
>> FLT_FILE_NAME_NORMALIZED |
>> FLT_FILE_NAME_QUERY_ALWAYS_ALLOW_CACHE_LOOKUP,
>> &pFileNameInformation );
>>
>> pFileNameInformation->Name is not zero terminated, especially =
noticable=20
>> in
>
>> dbg builds (checked) where my memory allocator fills memory with 0xcd =
on
>> allocation.
>>
>> I noticed this when passing pFileNameInformation->Name to a LPCWCHAR*
>> function of my own construction, which assumed a zero terminator - =
hence
>> the 0xffff max size of the buffer was walked passed and a BSOD =
ensued.
>>
>> I am now searching for other places where such assumtions about =
unicode
>> strings have been made.
>>
>> My question is, is this (as has been proved in this instance) the =
correct
>> assumption? Or is there a problem with FltGetFileNameInformation(); =
//
>> highly unlikely.
>>
>> So, digging Buffer from:
>>
>> typedef struct _LSA_UNICODE_STRING {
>> USHORT Length;
>> USHORT MaximumLength;
>> PWSTR Buffer;
>> } LSA_UNICODE_STRING, *PLSA_UNICODE_STRING, UNICODE_STRING,
>> *PUNICODE_STRING;
>>
>> and passing it to wcslen() for example would { discounting a) being
>> completely dumb, as of course thats Length/sizeof(WCHAR) }, also be =
b) v.
>> dangerous.
>>
>> Just thought I’d post this up for comments.
>> I suppose its a case of RTM. However, I’m very surprised I’ve not had =

>> this
>
>> type of crash much earlier.
>>
>> Mike
>>
>
>
>
> —
> NTFSD is sponsored by OSR
>
> For our schedule debugging and file system seminars
> (including our new fs mini-filter seminar) visit:
> http://www.osr.com/seminars
>
> You are currently subscribed to ntfsd as: xxxxx@flounder.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>
> --=20
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>
>=20


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars
(including our new fs mini-filter seminar) visit:=20
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: xxxxx@flounder.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

–=20
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

I think you probably mean:

const UNICODE_STRING*

and not:

const PUNICODE_STRING

…as the two are not equivalent as to where the const gets applied to (const PUNICODE_STRING == UNICODE_STRING * const, != const UNICODE_STRING *). There is a typedef (PCUNICODE_STRING) that is defined by the headers for that case.

However, note that specifying a UNICODE_STRING as const has limited value, as you can still modify the string through UNICODE_STRING::Buffer.

Most kernel things tend to be annotated nowadays, which provides a much better description of what will happen to static analysis tools than const ever did.

  • S

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Joseph M. Newcomer
Sent: Tuesday, November 25, 2008 8:24 PM
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] Zero terminated unicode strings?

Yes. Suppose the string was

L"123"

This is stored as the byte sequence (in hex)

31 00 32 00 33 00 00 00


L’1’ L’2’ L’3’ L’\0’

The NUL terminator is added because the definition of the C language
requires it.

If I do

UNICODE_STRING str;
RtlInitUnicodeString(str, L"123");

The UNICODE_STRING structure will reveal that the .Length is 6 bytes, the
.MaximumLength will be 6 bytes, and the .Buffer pointer will point to the
string specified. Note that string literals are normally stored into a
write-protected segment, so an attempt to append to or otherwise modify the
contents of the UNICODE_STRING will, in such a case, cause an access fault.

I do find it odd that most DDI calls do not use the const specification on
UNICODE_STRING values (or pretty much anything other objects) that are not
modified (e.g., IoCreateSymbolicLink specifies two PUNICODE_STRING
arguments, but since it does not modify either, they should be const
PUNICODE_STRING arguments). In general the failure to use const in the
kernel is dismaying.
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Skywing
Sent: Tuesday, November 25, 2008 4:32 PM
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] Zero terminated unicode strings?

Just to be extra clear and clarify Joe’s statement here,
RtlInitUnicodeString does NOT include the null terminator in the string that
it has initialized. “The string” is defined as [UnicodeString->Buffer,
(PWCHAR)((PCHAR)UnicodeString->Buffer + UnicodeString->Length)].

The *buffer* may have a null at the end of it, but as far as the
UNICODE_STRING struct is there, it won’t be included in the buffer count.

Joe is referring to the C-style string argument to RtlInitUnicodeString,
which is indeed null terminated. The resultant UNICODE_STRING structure
does not include the terminating null byte in its count, however.

  • S

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Joseph M. Newcomer
Sent: Tuesday, November 25, 2008 4:18 PM
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] Zero terminated unicode strings?

In general, the only time you ever see a NUL-terminated string in the kernel
is when you use RtlInitUnicodeString to initialize from a string literal.

When returning strings to user space, it is whatever you have told the user
to expect in terms of return value. In general, most I/O operations do not
guarantee NUL-termination, and users are accustomed to writing something
like

BYTE buffer[MAX_SIZE + 1];
if(!ReadFile(h, buffer, MAX_SIZE, &bytesRead, NULL))
… deal with error
else
{
buffer[bytesRead] = 0;
}

so it is rarely an issue. Note also that this works correctly only for ANSI
input (or MAX_SIZE + sizeof(TCHAR) and store two 0 bytes, left as an
Exercise For The Reader, see below)

At least, since I tend to spend most of my life in user space, that’s what I
write, what I see others writing, and what I teach.

Note that the ANSI/Unicode issue does arise, since
ReadFile/WriteFile/DeviceIoControl are byte-oriented and there is no
marshalling based on the type of call (the Registry calls will do
marshalling of Unicode to ANSI on reads and ANSI to Unicode on writes), but
there’s no ReadFileA/ReadFileW etc.

When using the Undocumented Windows API (Nebbett) from user space, there are
UNICODE_STRINGs returned for APIs that return strings, and they are
definitely *not* NUL-terminated. I’ve done this, and examined the return
values with the debugger.
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Lyndon J Clarke
Sent: Tuesday, November 25, 2008 4:01 PM
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] Zero terminated unicode strings?

Let me add to the clamour … just forget about zero terminated strings in
kernel mode … and when you got used to that avoid zero terminated strings

in user mode unless these are mandated by the api you are using :slight_smile:

The approach I’ve adopted for passing string data between kernel mode and
user mode is that I have zero terminated unicode string with a character (or

byte) count which includes zero terminator. I dont use the zero terminator
in kernel mode - I use the count. I use the zero terminator in user mode
when I have no choice - if I have no choice, then it’s there. The count is
always the size of the buffer I need to store the thing.

wrote in message news:xxxxx@ntfsd…
> Hi,
>
> I just noticed, through achieving a PAGE_FAULT_IN_NONE_PAGED_AREA BSOD,
> that I’d been slightly foolish.
> I’d assumed in one part of my code that UNICODE_STRINGS were zero
> terminated strings.
> This, certainly isn’t the case for UNICODE_STRINGS returned from a call
> to:
>
> status = FltGetFileNameInformation( pData,
> FLT_FILE_NAME_NORMALIZED |
> FLT_FILE_NAME_QUERY_ALWAYS_ALLOW_CACHE_LOOKUP,
> &pFileNameInformation );
>
> pFileNameInformation->Name is not zero terminated, especially noticable in

> dbg builds (checked) where my memory allocator fills memory with 0xcd on
> allocation.
>
> I noticed this when passing pFileNameInformation->Name to a LPCWCHAR*
> function of my own construction, which assumed a zero terminator - hence
> the 0xffff max size of the buffer was walked passed and a BSOD ensued.
>
> I am now searching for other places where such assumtions about unicode
> strings have been made.
>
> My question is, is this (as has been proved in this instance) the correct
> assumption? Or is there a problem with FltGetFileNameInformation(); //
> highly unlikely.
>
> So, digging Buffer from:
>
> typedef struct _LSA_UNICODE_STRING {
> USHORT Length;
> USHORT MaximumLength;
> PWSTR Buffer;
> } LSA_UNICODE_STRING, *PLSA_UNICODE_STRING, UNICODE_STRING,
> *PUNICODE_STRING;
>
> and passing it to wcslen() for example would { discounting a) being
> completely dumb, as of course thats Length/sizeof(WCHAR) }, also be b) v.
> dangerous.
>
> Just thought I’d post this up for comments.
> I suppose its a case of RTM. However, I’m very surprised I’ve not had this

> type of crash much earlier.
>
> Mike
>


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: xxxxx@flounder.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: xxxxx@valhallalegends.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: unknown lmsubst tag argument: ‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: xxxxx@valhallalegends.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Wow,

Having returned to the site this morning I notice one or two comments have appeared. :slight_smile:
Apologies for kicking off a subject which has probably been over discussed on this site in the past.

I’ll add one or two responses, and leave it at that:

In response to Joeseph M. Newcomer:

‘Wasting space, oh yes, this is a really serious issue.’

You are correct. I appreciate your point about relative size. However, I perhaps didn’t make my thoughts explicit. By defining UNICODE_STRING with a Length, it is unnecessary therefore to require a zero terminater. The contract is - PASCAL string, not C string.
Obviously this decision restricts you from the apis you can use, but, it’s a contract and therefore should be honoured.
The two byte zero (which I previously was adding) is, in my own mind, two unnecessary bytes, but I added them out of necessity at the time, so I could call some of my own lib functions such as:
FindLeafNameZTW(const WCHAR* const)
etc
rather than having to write
FindLeafNameUnc(const UNICODE_STRING* const)
etc.
Yesterday I swiftly decided it was better to write the latter functions, and abandon the former inside kernel code.

My impression of the reason UNICODE_STRING works as it does was to allow
substrings of a larger string (e.g., a directory path) to be represented
without requiring allocation-and-copy to get a NUL-terminated substring

Yes, I thought along similar lines myself.
Also, I believe it was done as PASCAL strings are slightly superior to C strings in terms of you can get to the end of your PASCAL string without touching memory; but inferior in terms of, they require (length+ptr) rather than just ptr to be carried. As, I’m sure, the kernel doesn’t want to touch memory if at all possible, PASCAL strings would be the better choice.

The idea of using one UNICODE_STRING to reference a substring of another UNICODE_STRING fills me with the fear. Which one owns the memory pointed to by Buffer? Refcounted memory, and all the work done in std,boost for memory tracking suddenly rears its head. Obviously coderz at Kernel level are expected to understand these issues, so of course wont make such mistakes when deallocing unicode strings :wink:
However I take your point, in localised senarios, substring refs can be created without copies.

Lyndon,

Let me add to the clamour … forget about zero terminated strings in kernel mode …

Yes, absolutely spot on! I’ve learnt.

As suggested elsewhere, I pass strings to user mode usually via:

BlockOMemory
struct
{
/*blah*/
ULONG OffsetToString;
/*blah*/
};

where:
OffsetToString==0 ? NULL : ((WCHAR*) ((UCHAR*)&OffsetToString + (OffsetToString))
and the string is zero terminated.

I hope nobody passes UNICODE_STRING between user mode and kernel mode. It’s
got a pointer in it of course.

Quite

BYTE buffer[1024];
::ZeroMemory(buffer, sizeof(buffer));
ReadFile(h, buffer, sizeof(buffer), &bytesRead, NULL);

This is bad form in this situation imho (from the performance point of view, and for other reasons).
The original bug hit me as the memory I allocated, I filled with 0xcd in debug builds. 0xcd was chosen for the sane reason it wasn’t 0. If zero’d the problem I experienced would no doubt have not shown up until some poor user out there had a bluescreen.
( Obviously ZeroMem tends to get used a fair bit when zeroing large structs. ).

Anyhoo, some fine thoughts…

Cheers

Mike

See below…

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@scee.net
Sent: Wednesday, November 26, 2008 7:40 AM
To: Windows File Systems Devs Interest List
Subject: RE:[ntfsd] Zero terminated unicode strings?

Wow,

Having returned to the site this morning I notice one or two comments have
appeared. :slight_smile:
Apologies for kicking off a subject which has probably been over discussed
on this site in the past.

I’ll add one or two responses, and leave it at that:

In response to Joeseph M. Newcomer:

‘Wasting space, oh yes, this is a really serious issue.’

You are correct. I appreciate your point about relative size. However, I
perhaps didn’t make my thoughts explicit. By defining UNICODE_STRING with a
Length, it is unnecessary therefore to require a zero terminater. The
contract is - PASCAL string, not C string.
Obviously this decision restricts you from the apis you can use, but, it’s a
contract and therefore should be honoured.
The two byte zero (which I previously was adding) is, in my own mind, two
unnecessary bytes, but I added them out of necessity at the time, so I could
call some of my own lib functions such as:
FindLeafNameZTW(const WCHAR* const)
etc
rather than having to write
FindLeafNameUnc(const UNICODE_STRING* const)
etc.
Yesterday I swiftly decided it was better to write the latter functions, and
abandon the former inside kernel code.

My impression of the reason UNICODE_STRING works as it does was to allow
substrings of a larger string (e.g., a directory path) to be represented
without requiring allocation-and-copy to get a NUL-terminated substring

Yes, I thought along similar lines myself.
Also, I believe it was done as PASCAL strings are slightly superior to C
strings in terms of you can get to the end of your PASCAL string without
touching memory; but inferior in terms of, they require (length+ptr) rather
than just ptr to be carried. As, I’m sure, the kernel doesn’t want to touch
memory if at all possible, PASCAL strings would be the better choice.

The idea of using one UNICODE_STRING to reference a substring of another
UNICODE_STRING fills me with the fear. Which one owns the memory pointed to
by Buffer? Refcounted memory, and all the work done in std,boost for memory
tracking suddenly rears its head. Obviously coderz at Kernel level are
expected to understand these issues, so of course wont make such mistakes
when deallocing unicode strings :wink:
****
Consider that the UNICODE_STRING is on the stack, and needs to be examined
piecewise or processed piecewise (as you might when parsing a path name).
The caller owns the space, and only the caller is permitted to free it;
others just use it. No problems, and it saves the need to make copies,
particularly solely for the purpose of putting a NUL at the end of a
substring. It also means that you do not need to replace, say, an L’\’
with an L’\0’ temporarily, which is really good because if the string was a
literal string, this would cause an access fault. The idea is that someone
owns the string but clients of the string are sent references to pieces. If
const had been properly used everywhere, it would be evident that functions
that are called do not need to free the string, and in fact it would be
inappropriate for them to free the string. No reference counting is
required because once a function returns, it implicitly releases all claim
to the string.

The philosophy in the kernel is that “you allocate it, you free it” and no
one else is permitted to free something they did not create. Thus client
functions of the creator would NEVER be permitted to free the string; only
the creator is permitted to do that. Furthermore, retention of pointers
follows a protocol of lifetime; it is immoral, illegal, and fattening to
retain a pointer to an object that can be destroyed by the owner, unless
there is a protocol by which you inform the owner you are no longer using it
(e.g., IoCompleteRequest, which, by the way, is one of the most common
failures that results in a BSOD: keeping an IRP pointer of an IRP that has
been freed. Maybe reference counting isn’t such a bad idea after all…)
joe
****

However I take your point, in localised senarios, substring refs can be
created without copies.

Lyndon,

Let me add to the clamour … forget about zero terminated strings in
kernel mode …

Yes, absolutely spot on! I’ve learnt.

As suggested elsewhere, I pass strings to user mode usually via:

BlockOMemory
struct
{
/*blah*/
ULONG OffsetToString;
/*blah*/
};

where:
OffsetToString==0 ? NULL : ((WCHAR*) ((UCHAR*)&OffsetToString +
(OffsetToString))
and the string is zero terminated.
****
But you are then doing badly what UNICODE_STRING was designed to do well!
****

I hope nobody passes UNICODE_STRING between user mode and kernel mode.
It’s
got a pointer in it of course.

Quite

BYTE buffer[1024];
::ZeroMemory(buffer, sizeof(buffer));
ReadFile(h, buffer, sizeof(buffer), &bytesRead, NULL);

This is bad form in this situation imho (from the performance point of view,
and for other reasons).
The original bug hit me as the memory I allocated, I filled with 0xcd in
debug builds. 0xcd was chosen for the sane reason it wasn’t 0. If zero’d the
problem I experienced would no doubt have not shown up until some poor user
out there had a bluescreen.
( Obviously ZeroMem tends to get used a fair bit when zeroing large structs.
).
****
I agree; I think it is always a mistake. Note that in debug mode, YOU do
not need to fill the buffer with anything, because it is initialized by the
debug runtime (including local variables). In fact, 0xCD would be a REALLY
bad idea because that is the filler for heap-allocated (not stack-allocated)
storage and would result in confusion if someone hit it.

Note that UNICODE_STRINGs *ARE* passed from user mode to kernel mode ALL THE
TIME; that’s how NTDLL.DLL transforms Win32 API calls into internal calls
(read Nebbett)

Anyhoo, some fine thoughts…

Cheers

Mike


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: xxxxx@flounder.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

> I’d assumed in one part of my code that UNICODE_STRINGS were zero terminated strings.

Bad assumption. There is no such guarantee.

I would refactor all kernel-mode code which uses LPCWCHAR* to use UNICODE_STRING* instead, to not rely on the strings being zero-terminated, and to use the proper RtlXxx and StringCbXxx functions to work with these strings.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com