Testing with verifier causes system crash

Hi,

I am running verifier to test my driver and I found that with verifier on checked build of windows 7, my driver causes the system to crash.
I know where it is crashing but I can’t quite figure out how to prevent it as I am a beginner in this area.

Followings are the part of code that kernel debugger displays when it is crashing.

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 805a0050, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000000, value 0 = read operation, 1 = write operation
Arg4: 9edbb051, address which referenced memory

FAULTING_SOURCE_CODE:
217: (ExInterlockedCompareExchange64(
218: (LONGLONG *)&(g_af_trace.handle),
219: (LONGLONG *)&shandle,
220: (LONGLONG *)&zero_value,

221: NULL) == zero_value)) {

g_af_trace is a global variable and both shandle and zero_value are local variables declared as follows inside the function.

LONGLONG zero_value = 0;
LONGLONG socket_handle = handle;

How can I prevent my driver from crashing when it is faced with adverse situation caused by verifier? Thank you in advance.

YEH

xxxxx@gmail.com wrote:

I am running verifier to test my driver and I found that with verifier on checked build of windows 7, my driver causes the system to crash.
I know where it is crashing but I can’t quite figure out how to prevent it as I am a beginner in this area.

FAULTING_SOURCE_CODE:
217: (ExInterlockedCompareExchange64(
218: (LONGLONG *)&(g_af_trace.handle),
219: (LONGLONG *)&shandle,
220: (LONGLONG *)&zero_value,
> 221: NULL) == zero_value)) {

None of those casts are necessary. Casts are evil and can hide delicate
problems. You should remove them all.

It seems unlikely that the problem is in this call, if shandle and
zero_value really are on the stack. I notice this is part of a compound
“if” statement. What does the rest of the statement look like?
Remember that the debugger only knows that the crash happened somewhere
NEAR here.

How can I prevent my driver from crashing when it is faced with adverse situation caused by verifier?

By fixing the driver. The whole POINT of the Driver Verifier is to
create adverse situations that cause erroneous drivers to crash.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Also what does 805a0050 correspond to in your driver?

Mark Roddy

On Wed, Jan 26, 2011 at 7:46 PM, Tim Roberts wrote:
> xxxxx@gmail.com wrote:
>> I am running verifier to test my driver and I found that with verifier on checked build of windows 7, my driver causes the system to crash.
>> I know where it is crashing but I can’t quite figure out how to prevent it as I am a beginner in this area.
>> …
>>
>> FAULTING_SOURCE_CODE:
>> ? ?217: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (ExInterlockedCompareExchange64(
>> ? ?218: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (LONGLONG *)&(g_af_trace.handle),
>> ? ?219: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (LONGLONG *)&shandle,
>> ? ?220: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (LONGLONG *)&zero_value,
>>> ?221: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?NULL) == zero_value)) {
>
> None of those casts are necessary. ?Casts are evil and can hide delicate
> problems. ?You should remove them all.
>
> It seems unlikely that the problem is in this call, if shandle and
> zero_value really are on the stack. ?I notice this is part of a compound
> “if” statement. ?What does the rest of the statement look like?
> Remember that the debugger only knows that the crash happened somewhere
> NEAR here.
>
>> How can I prevent my driver from crashing when it is faced with adverse situation caused by verifier?
>
> By fixing the driver. ?The whole POINT of the Driver Verifier is to
> create adverse situations that cause erroneous drivers to crash.
>
> –
> Tim Roberts, xxxxx@probo.com
> Providenza & Boekelheide, Inc.
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>

Are you sure that global variable hasn’t been placed in a pageable section? What’s at address 0x805a0050?

Dear Tim,

I initially did not have those casts but I found the same call to ExInterlockedCompareExchange64 from filter.cpp in WDK samples so I decided to try it to see if that was the case. Anyway I will remove them. Thanks for the advice.

Here is the code prior to the call to CAS op.
shm_add_entry(
__in INT64 handle)
{
LONGLONG zero_value = 0;
LONGLONG socket_handle = handle;

__try {
if ((g_af_trace.flag == 0) &&
(ExInterlockedCompareExchange64(
(LONGLONG *)&(g_af_trace.handle),
(LONGLONG *)&shandle,
(LONGLONG *)&zero_value,
NULL) == zero_value)) {

The thing that I am suspicious is the call to this function shm_add_entry(). When I call this function, I call shm_add_entry(flowData->handle). Before I call, I have an assert statement to check if flowData is NULL and flowData was allocated from ExAllocatePoolWithTag with NonPagedPool. Of course, I also have an assert statement if allocation fails. Yet, I do not see any crash from those points but here so I am not really sure what went wrong.

Also could you tell me how I can check if global variable is in a pageable section? Thank you very much.

xxxxx@gmail.com wrote:

Here is the code prior to the call to CAS op.

The thing that I am suspicious is the call to this function shm_add_entry(). When I call this function, I call shm_add_entry(flowData->handle). Before I call, I have an assert statement to check if flowData is NULL and flowData was allocated from ExAllocatePoolWithTag with NonPagedPool. Of course, I also have an assert statement if allocation fails. Yet, I do not see any crash from those points but here so I am not really sure what went wrong.

My guess is that we’ll find that g_af_trace is in a pagable or discarded
segment, so the (g_af_trace.flag = 0) is failing.

Also could you tell me how I can check if global variable is in a pageable section? Thank you very much.

If you have
#pragma data_seg(“INIT”)
that would certainly do it. If you want, send me the whole source file
by private email and I’ll take a quick look.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

First, I agree with the point that the role of the driver verifier is to
cause bad drivers to crash. I get this question a lot in user space
“There’s something wrong with the debug version of the MFC library. When I
use it, I get lots of crashes” [Note: by “crash”, they mean “assert
failures”] “but when I use the release library, I don’t get those crashes”
[read: I don’t see asserts from a build that compiles all the asserts out].

Also, another thing to consider here is whether or not you are using
optimization (frequently used in the release build). The observation that
the code is “near” the line you see is serious, because the optimizing
compiler is truly astonishingly good, and does transformations to the code
that are simply mind-boggling. So in addition to the fact that the verifier
quite likely is forcing an error in a bad driver, if it is optimized code
the report of *where* the error occurred is problematic. Note that in many,
many years of using MS C/C++ I have never encountered an optimization error
that produced erroneous code, although I have had situations in which
optimization caused programming errors that were masked by the debug version
(classic: a buffer overrun that clobbered an unused-but-later-initialized
variable on the stack; in optimized code, this variable lived only in a
register, was not allocated on the stack, and the overrun clobbered the
*next* variable which really did matter (it was a pointer). This was not in
my code, but in a piece of client code I was debugging (“We never use
optimization because optimized code always fails” but the problem was their
own error, which became deadly only when the optimizer changed the stack
layout)
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Tim Roberts
Sent: Wednesday, January 26, 2011 7:47 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] Testing with verifier causes system crash

xxxxx@gmail.com wrote:

I am running verifier to test my driver and I found that with verifier on
checked build of windows 7, my driver causes the system to crash.
I know where it is crashing but I can’t quite figure out how to prevent it
as I am a beginner in this area.

FAULTING_SOURCE_CODE:
217: (ExInterlockedCompareExchange64(
218: (LONGLONG
*)&(g_af_trace.handle),
219: (LONGLONG
*)&shandle,
220: (LONGLONG
*)&zero_value,
> 221: NULL) ==
zero_value)) {

None of those casts are necessary. Casts are evil and can hide delicate
problems. You should remove them all.

It seems unlikely that the problem is in this call, if shandle and
zero_value really are on the stack. I notice this is part of a compound
“if” statement. What does the rest of the statement look like?
Remember that the debugger only knows that the crash happened somewhere
NEAR here.

How can I prevent my driver from crashing when it is faced with adverse
situation caused by verifier?

By fixing the driver. The whole POINT of the Driver Verifier is to
create adverse situations that cause erroneous drivers to crash.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

First of all, thank you very much for all the responses. I also thank Mr. Roberts for his willingness to help. If I can’t get this working with this post, I will reach out to you. :wink:

It took some time for me to get the all the setup correctly to find answers to questions.

I found that memory reference refers to my global variable g_af_trace.
In fact, this global variable g_af_trace refers to shared memory that I created in the beginning of driver. So I assumed that referenced page must have been paged out. Just for testing, I removed my call to ExInterlockedCompareExchange64() and I learned that I am still getting the crash. But the crash does not happen all the time but when it happens and I ran !verifier and I see the trims with certain number like ‘5’.

So I would like to ask how to keep my shared memory not to be paged out. Or if there is any better approach, please let me know.
Here is part of my code that creates and maps shared memory. I used this approach so that user-mode process can read from what I gather for monitoring purpose.

RtlInitUnicodeString(&uname, map_name);
InitializeObjectAttributes(
&oa,
&uname,
OBJ_CASE_INSENSITIVE|OBJ_FORCE_ACCESS_CHECK|OBJ_KERNEL_HANDLE,
NULL,
&sd);

mapsize.u.HighPart = 0;
mapsize.u.LowPart = (DWORD)size;

status = ZwCreateSection(
&mapfile,
STANDARD_RIGHTS_REQUIRED|SECTION_MAP_READ|SECTION_MAP_WRITE|SECTION_QUERY,
&oa,
&mapsize,
PAGE_READWRITE,
SEC_COMMIT,
NULL);
if (!NT_SUCCESS(status)) {
AFDRVTracePrint(TRACE_LEVEL_ERROR, (“ZwCreationSection failed with %x\n”, status));
return status;
}

status = ObReferenceObjectByHandle(
mapfile,
STANDARD_RIGHTS_REQUIRED|SECTION_MAP_READ|SECTION_MAP_WRITE|SECTION_QUERY,
NULL,
KernelMode,
&object,
NULL);
if (!NT_SUCCESS(status)) {
AFDRVTracePrint(TRACE_LEVEL_ERROR, (“ObReferenceObjectByHandle failed with %x\n”, status));
ZwClose(mapfile);
return status;
}

ob_mapsize = size;

// MmMapViewInSystemSpace always maps into the kernel process space
// check out - http://www.osronline.com/showThread.cfm?link=30527
status = MmMapViewInSystemSpace(object, &start_addr, &ob_mapsize);
ObDereferenceObject(object);
if (!NT_SUCCESS(status)) {
AFDRVTracePrint(TRACE_LEVEL_ERROR, (“MmMapViewInSystemSpace failed with %x\n”, status));
ZwClose(mapfile);
return status;
}

Is there anythng else I should do to ensure that that this portion of memory does not get paged out?
Thank you very much again for all the advices.

YEH

Ok, generally speaking, mapping shared memory between user land and kernel land is not recommended. First because it’s a pain in the ass to keep user mode and kernel mode accesses in sync, which tends to end up in a BSOD, and second, as has been discussed here a legion of times, it opens a security hole. One recommended way of doing this is via an inverted call back where you pend an IO request from the user which is then completed by the driver and returns the needed or requested data.

Gary G. Little

----- Original Message -----
From: xxxxx@gmail.com
To: “Windows System Software Devs Interest List”
Sent: Thursday, January 27, 2011 10:25:23 AM
Subject: RE:[ntdev] Testing with verifier causes system crash

First of all, thank you very much for all the responses. I also thank Mr. Roberts for his willingness to help. If I can’t get this working with this post, I will reach out to you. :wink:

It took some time for me to get the all the setup correctly to find answers to questions.

I found that memory reference refers to my global variable g_af_trace.
In fact, this global variable g_af_trace refers to shared memory that I created in the beginning of driver. So I assumed that referenced page must have been paged out. Just for testing, I removed my call to ExInterlockedCompareExchange64() and I learned that I am still getting the crash. But the crash does not happen all the time but when it happens and I ran !verifier and I see the trims with certain number like ‘5’.

So I would like to ask how to keep my shared memory not to be paged out. Or if there is any better approach, please let me know.
Here is part of my code that creates and maps shared memory. I used this approach so that user-mode process can read from what I gather for monitoring purpose.

RtlInitUnicodeString(&uname, map_name);
InitializeObjectAttributes(
&oa,
&uname,
OBJ_CASE_INSENSITIVE|OBJ_FORCE_ACCESS_CHECK|OBJ_KERNEL_HANDLE,
NULL,
&sd);

mapsize.u.HighPart = 0;
mapsize.u.LowPart = (DWORD)size;

status = ZwCreateSection(
&mapfile,
STANDARD_RIGHTS_REQUIRED|SECTION_MAP_READ|SECTION_MAP_WRITE|SECTION_QUERY,
&oa,
&mapsize,
PAGE_READWRITE,
SEC_COMMIT,
NULL);
if (!NT_SUCCESS(status)) {
AFDRVTracePrint(TRACE_LEVEL_ERROR, (“ZwCreationSection failed with %x\n”, status));
return status;
}

status = ObReferenceObjectByHandle(
mapfile,
STANDARD_RIGHTS_REQUIRED|SECTION_MAP_READ|SECTION_MAP_WRITE|SECTION_QUERY,
NULL,
KernelMode,
&object,
NULL);
if (!NT_SUCCESS(status)) {
AFDRVTracePrint(TRACE_LEVEL_ERROR, (“ObReferenceObjectByHandle failed with %x\n”, status));
ZwClose(mapfile);
return status;
}

ob_mapsize = size;

// MmMapViewInSystemSpace always maps into the kernel process space
// check out - http://www.osronline.com/showThread.cfm?link=30527
status = MmMapViewInSystemSpace(object, &start_addr, &ob_mapsize);
ObDereferenceObject(object);
if (!NT_SUCCESS(status)) {
AFDRVTracePrint(TRACE_LEVEL_ERROR, (“MmMapViewInSystemSpace failed with %x\n”, status));
ZwClose(mapfile);
return status;
}

Is there anythng else I should do to ensure that that this portion of memory does not get paged out?
Thank you very much again for all the advices.

YEH


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Thank you very much for the advice Mr. Little. My driver is WFP module and it monitors IIS traffic and logs all the socket connection activities and user-mode application gathers the data from shared memory. I just thought that it would be too much overhead to create system call to go back and forth to gather all the socket data. User-mode app only reads from the memory and Kernel driver writes to it so I thought that it might be okay. But I also concerned about what you pointed out. Do you think I still should follow your advice to be safe?

I also found after posting a question that shared memory can be swapped out. (http://www.osronline.com/showThread.cfm?link=14975)
I will try to see if that resolves the issue according to the post above. Thank you very much!!!

The memory won’t be swapped out if the driver allocates it from non-paged memory, the user defines it as METHOD_IN/OUT_DIRECT, or the driver probes and locks the user buffer. There are lots of ways for the driver to lock memory and avoid paging.

WFP has both kernel and user components. The recommendation is to keep things in user land. You can much more easily do what your trying to do in the kernel in a user application or even in a service and avoid the hassle of the kernel and the overhead required to switch between kernel and user land. Having recently authored a WFP driver where the customer insisted on WFP in the kernel, the major overhead is not passing a tailored IOCTL between user and kernel, it is the time it takes to interact with such things as DNS servers to validate domain names and/or IP addresses. But, if you can avoid maintaining some type of white or black list in the kernel and keep everything in user land you are well ahead of the development curve.

Gary G. Little

----- Original Message -----
From: xxxxx@gmail.com
To: “Windows System Software Devs Interest List”
Sent: Thursday, January 27, 2011 11:05:24 AM
Subject: RE:[ntdev] Testing with verifier causes system crash

Thank you very much for the advice Mr. Little. My driver is WFP module and it monitors IIS traffic and logs all the socket connection activities and user-mode application gathers the data from shared memory. I just thought that it would be too much overhead to create system call to go back and forth to gather all the socket data. User-mode app only reads from the memory and Kernel driver writes to it so I thought that it might be okay. But I also concerned about what you pointed out. Do you think I still should follow your advice to be safe?

I also found after posting a question that shared memory can be swapped out. (http://www.osronline.com/showThread.cfm?link=14975)
I will try to see if that resolves the issue according to the post above. Thank you very much!!!


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

xxxxx@gmail.com wrote:

Thank you very much for the advice Mr. Little. My driver is WFP module and it monitors IIS traffic and logs all the socket connection activities and user-mode application gathers the data from shared memory. I just thought that it would be too much overhead to create system call to go back and forth to gather all the socket data.

Are you talking about dozens of times per second, or thousands of times
per second? The kernel/user transition is not all that expensive, and
it’s a lot safer than sharing memory.

User-mode app only reads from the memory and Kernel driver writes to it so I thought that it might be okay.

Well, it CAN be OK, as long as you know whether you need the space to be
locked in to memory or not. You created pagable memory, and then tried
to use it in a place where pagable memory cannot be used.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Thank you for your input Mr. Roberts and Mr. Little.
I need to scan shared memory to gather network traffic data a couple of times per minute. Currently I keep two hundred fixed size of entries in the shared memory. Rather than copying all the data, I felt that it is better to shared the view.

I also asked a question on WFP user-mode approach about the task that I need to do at MSDN forum and I was confirmed that kernel mode driver is required for my task which gathers network data by registering callouts.

Lastly, I tried to lock the page and that seems to resolve the crash case. As a future reference, I just had to call IoAllocateMdl() and MmProbeAndLockPages() to lock the pages.

Thank you again for many inputs and I really enjoy posting questions here, for everyone is very helpful.

YEH

xxxxx@gmail.com wrote:

I need to scan shared memory to gather network traffic data a couple of times per minute. Currently I keep two hundred fixed size of entries in the shared memory. Rather than copying all the data, I felt that it is better to shared the view

No, that’s an absolutely trivial amount of data. Just use an ioctl.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

> many years of using MS C/C++ I have never encountered an optimization error

that produced erroneous code

I saw this once in 1997 in MSVC 5.

This was a bad release. MSVC 6 was much better.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Back when VC still had the -Ow switch (cross function aliasing), it was
possible to break things.

mm
-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Maxim S. Shatskih
Sent: Sunday, January 30, 2011 4:38 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Testing with verifier causes system crash

many years of using MS C/C++ I have never encountered an optimization
error
that produced erroneous code

I saw this once in 1997 in MSVC 5.

This was a bad release. MSVC 6 was much better.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer