Getting a Fatal System Error BSOD, but does not crash with Verifier on

I am trying to track down what exactly is causing a driver crash I am getting. I most likely has something to do with the WdfMemoryCreate call. Running my application, I can cause the crash to happen in seconds. If I turn on the verifier on, it runs without crashing (it ran for over 12 hours). I am not sure how to go about finding what in my driver is causing this crash.

Here is the crash info:

*** Fatal System Error: 0x0000001a
(0x00041287,0x00000080,0x00000000,0x00000000)

Break instruction exception - code 80000003 (first chance)

2: kd> !analyze -v
Connected to Windows 7 7601 x86 compatible target at (Wed Feb 25 07:59:37.179 2015 (UTC - 5:00)), ptr64 FALSE
Loading Kernel Symbols

Press ctrl-c (cdb, kd, ntsd) or ctrl-break (windbg) to abort symbol loads that take too long.
Run !sym noisy before .reload to track down problems loading symbols.




Loading User Symbols

Press ctrl-c (cdb, kd, ntsd) or ctrl-break (windbg) to abort symbol loads that take too long.
Run !sym noisy before .reload to track down problems loading symbols.



Loading unloaded module list

*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

MEMORY_MANAGEMENT (1a)

Any other values for parameter 1 must be individually examined.

Arguments:
Arg1: 00041287, An illegal page fault occurred while holding working set synchronization.
Parameter 2 contains the referenced virtual address.
Arg2: 00000080
Arg3: 00000000
Arg4: 00000000

Debugging Details:

BUGCHECK_STR: 0x1a_41287

DEFAULT_BUCKET_ID: WIN7_DRIVER_FAULT

PROCESS_NAME: svchost.exe

CURRENT_IRQL: 2

ANALYSIS_VERSION: 6.3.9600.17298 (debuggers(dbg).141024-1500) x86fre

TRAP_FRAME: 8cdc378c – (.trap 0xffffffff8cdc378c)
ErrCode = 00000000
eax=00000080 ebx=00001000 ecx=00020007 edx=00020008 esi=00000000 edi=00000001
eip=82651e1d esp=8cdc3800 ebp=82745308 iopl=0 nv up ei pl nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010202
nt!ExpInterlockedPopEntrySListFault:
82651e1d 8b18 mov ebx,dword ptr [eax] ds:0023:00000080=???
Resetting default scope

LAST_CONTROL_TRANSFER: from 826ef083 to 8268b110

STACK_TEXT:
8cdc32d4 826ef083 00000003 215bc59c 00000065 nt!RtlpBreakWithStatusInstruction
8cdc3324 826efb81 00000003 84c177f8 86542030 nt!KiBugCheckDebugBreak+0x1c
8cdc36e8 8269e41b 0000001a 00041287 00000080 nt!KeBugCheck2+0x68b
8cdc3774 826513d8 00000000 00000080 00000000 nt!MmAccessFault+0x106
8cdc3774 82651e1d 00000000 00000080 00000000 nt!KiTrap0E+0xdc
8cdc3808 827304df 00000000 82747940 00001000 nt!ExpInterlockedPopEntrySListFault
8cdc3868 826ab674 00000000 00001000 00000ff0 nt!MiAllocatePoolPages+0x77
8cdc38c0 82731132 00000000 00000000 00001000 nt!ExpAllocateBigPool+0xa6
8cdc3924 826adae4 00000000 00001000 63416d4d nt!ExAllocatePoolWithTag+0x12d
8cdc3948 82695b87 c03a54b8 83e7a1c0 c0803a9c nt!MiAllocateAccessLog+0x9b
8cdc3994 826957ff c03a54b8 86542220 8cdc3b74 nt!MiLogPageAccess+0x40
8cdc3b10 8269094b 86542220 00000001 00000001 nt!MiAgeWorkingSet+0x251
8cdc3b5c 82690610 00000002 00000001 00000000 nt!MiProcessWorkingSets+0x25b
8cdc3b88 82688f2f 00000000 84c177f8 00000000 nt!MmWorkingSetManager+0xa4
8cdc3c50 82819f5e 00000000 215bca28 00000000 nt!KeBalanceSetManager+0x1af
8cdc3c90 826c1219 82688d80 00000000 00000000 nt!PspSystemThreadStartup+0x9e
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x19

STACK_COMMAND: kb

FOLLOWUP_IP:
nt!KiTrap0E+dc
826513d8 85c0 test eax,eax

SYMBOL_STACK_INDEX: 4

SYMBOL_NAME: nt!KiTrap0E+dc

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: nt

IMAGE_NAME: ntkrpamp.exe

DEBUG_FLR_IMAGE_TIMESTAMP: 4ce78a09

IMAGE_VERSION: 6.1.7601.17514

FAILURE_BUCKET_ID: 0x1a_41287_nt!KiTrap0E+dc

BUCKET_ID: 0x1a_41287_nt!KiTrap0E+dc

ANALYSIS_SOURCE: KM

FAILURE_ID_HASH_STRING: km:0x1a_41287_nt!kitrap0e+dc

FAILURE_ID_HASH: {227501d2-d717-b436-e1dc-63914ef7b2fa}

Followup: MachineOwner

Cool.

WHICH Verifier are you enabling that causes the problem to stop happening? Windows Driver Verifier or KMDF Verifier?

What is your driver doing at the time of the crash?

Peter
OSR
@OSRDrivers

I enabled the Windows Driver Verifier and turned on Special Pool and Pool Tracking only. I also tried using only one or the other, but both have the same affect (no crash).

The driver is continuously communicating to a USB device and also receiving asynchronous data from the device.

I also tried enabling KMDF Verifier. The driver still crashes and reports the same stack info. I did make sure that it is enabled. After the crash I executed the following:

2: kd> !wdfkd.wdfdriverinfo mydriver 0x01

Default driver image name: mydriver
WDF library image name: Wdf01000
FxDriverGlobals 0x86543840
WdfBindInfo 0x946bd4e0
Version v1.5
Library module 0x84c29e48
ServiceName \Registry\Machine\System\CurrentControlSet\Services\Wdf01000
ImageName Wdf01000

Count of WDF Objects

WDFDRIVER = 1
WDFDEVICE = 1
WDFQUEUE = 10
WDFREQUEST = 4
WDFFILEOBJECT = 5
WDFWAITLOCK = 1
WDFWORKITEM = 4
WDFCHILDLIST = 1
WDFCMRESLIST = 2
WDFIOTARGET = 1
WDFUSBDEVICE = 1
WDFUSBPIPE = 4
WDFUSBINTERFACE = 1

WDF Verifier settings for mydriver.sys is OFF

WDF Verifier is *not* enabled according to this output.

I’m sorry to say that this looks like a pool corruption problem to me. Did you enable Windows Driver Verifier for your driver AND ALSO for the Framework (WDF0001.sys)?

Peter
OSR
@OSRDrivers

When you enable verifier special pool, allocations happen at the end of a
page with the next page invalid, to detect overwrites at the end. Since
for you, when you enable this verifier option things get better, it might
mean you are getting an overwrite before the beginning of block, which
will generally be valid with special pool. There is a verifier option you
can set to detect overruns below the beginning of block, which basically
causes all allocations to start on a page boundary with an invalid page
below.

Jan

On 2/25/15, 2:12 PM, “xxxxx@gmail.com
wrote:

>I enabled the Windows Driver Verifier and turned on Special Pool and Pool
>Tracking only. I also tried using only one or the other, but both have
>the same affect (no crash).
>
>The driver is continuously communicating to a USB device and also
>receiving asynchronous data from the device.
>
>—
>NTDEV is sponsored by OSR
>
>Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
>OSR is HIRING!! See http://www.osr.com/careers
>
>For our schedule of WDF, WDM, debugging and other seminars visit:
>http://www.osr.com/seminars
>
>To unsubscribe, visit the List Server section of OSR Online at
>http://www.osronline.com/page.cfm?name=ListServer

I have it enabled now, not sure what happened as the registry entry was gone.


WDF Verifier settings for mydriver.sys is ON
Pool tracking is ON
Handle verification is ON
IO verification is ON
Lock verification is ON
Handle reference tracking is OFF

Also got the output for the Win Driver Verifier:
*******************************************************************************
* This is the string you add to your checkin description
* Driver Verifier: Enabled for Wdf01000.sys on Build 7601 dauz9m6m0utaLKw9FJCpAH
*******************************************************************************
*******************************************************************************
* This is the string you add to your checkin description
* Driver Verifier: Enabled for mydriver.sys on Build 7601 EIcxiSLEPZd5lu2WN6OjTH
*******************************************************************************

<< after running app >>
Page heap: pid 0x908: page heap enabled with flags 0x3.

When I ran with the KDMF Verifier on for my driver and the Win Driver Verifier enabled (custom settings, predefined standard settings, for my driver and wdf01000), it just keeps running (I ran it for 10 minutes). Without it enabled, it would crash after a few seconds. I also tried just enabling for wdf01000 and no crash.

Hmmm. That IS puzzling. Sorry, Mr. Porter, I know that’s not helpful.


MEMORY_MANAGEMENT (1a)

Any other values for parameter 1 must be individually examined.

Arguments:
Arg1: 00041287, An illegal page fault occurred while holding working set
synchronization.
Parameter 2 contains the referenced virtual address.
Arg2: 00000080
Arg3: 00000000
Arg4: 00000000

So, Arg 2 contains the referenced address, which is 0x080. And this is probably an offset into a structure, so there’s a NULL pointer somewhere. Do you RtlZeroMemory/memset a block of memory that you allocate from pool anywhere in your driver? Are you confident about the pointer and length? You might review that code.

Well, lacking a more constructive solution, I vote for Mr. Bottorff’s suggestion to change Pool Tracking to “Verify Start” to see if THAT finds anything. You can do this by running GFLAGS.EXE on the system you’re testing on.

I wish I could be more helpful, because this is a whole lot more interesting than working on my article for the next issue of The NT Insider :wink:

Peter
OSR
@OSRDrivers

Usually crashes like this are caused by write-after-free bugs.

In this case MiAllocatePoolPages is trying to obtain a page from a cache of free pages which are kept as an SLIST. One of the links in the SLIST chain has probably been corrupted by somebody writing to a page after it has been freed.

Special pool is probably not going to help here because it only works for allocations that are smaller than a page. These small allocations normally have a pool header preceding them, so they are not page-aligned. Which in turn means they are unlikely to corrupt the SLIST link which (on win7) is located at the start of the page.

I would disassemble things backwards from the faulting instruction to find out where the 0x80 value came from. It should be at the beginning of some freed pool page. I would then dump the rest of that page and try to figure out what sort of allocation it used to be.

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@osr.com
Sent: Wednesday, February 25, 2015 11:09 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Getting a Fatal System Error BSOD, but does not crash with Verifier on

Hmmm. That IS puzzling. Sorry, Mr. Porter, I know that’s not helpful.


MEMORY_MANAGEMENT (1a)

Any other values for parameter 1 must be individually examined.

Arguments:
Arg1: 00041287, An illegal page fault occurred while holding working set synchronization.
Parameter 2 contains the referenced virtual address.
Arg2: 00000080
Arg3: 00000000
Arg4: 00000000

So, Arg 2 contains the referenced address, which is 0x080. And this is probably an offset into a structure, so there’s a NULL pointer somewhere. Do you RtlZeroMemory/memset a block of memory that you allocate from pool anywhere in your driver? Are you confident about the pointer and length? You might review that code.

Well, lacking a more constructive solution, I vote for Mr. Bottorff’s suggestion to change Pool Tracking to “Verify Start” to see if THAT finds anything. You can do this by running GFLAGS.EXE on the system you’re testing on.

I wish I could be more helpful, because this is a whole lot more interesting than working on my article for the next issue of The NT Insider :wink:

Peter
OSR
@OSRDrivers


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

OK… I found the buffer that is being used. I verified that this was the buffer, by writing a value of 0x1234 into it and saw that this was the value instead of 0x80.

I am still trying to figure out where in the code things are going wrong. I instrumented the code to see when the following calls are being made:

WdfMemoryCreate, WdfMemoryGetBuffer, WdfObjectDelete, and similar calls

I see that the sequence of create, get, use, delete happened just before the crash. It does look like a write-after-free error. The part of the code just before the write I put in has to do with doing a USB read. It does the following calls:

* zero out the data buffer in the WDFMEMORY
* calls WdfMemoryCreatePreallocated on the buffer to get a new WDFMEMORY
* calls WdfUsbTargetPipeFormatRequestForWrite with the new WDFMEMORY
* calls WdfUsbTargetPipeFormatRequestForRead with the new WDFMEMORY
* calls WdfObjectDelete on the new WDFMEMORY
* return from function and write to buffer with 0x1234, then crash after 10 seconds of executing this sequence over and over

xxxxx@gmail.com wrote:

OK… I found the buffer that is being used. I verified that this was the buffer, by writing a value of 0x1234 into it and saw that this was the value instead of 0x80.

I see that the sequence of create, get, use, delete happened just before the crash. It does look like a write-after-free error. The part of the code just before the write I put in has to do with doing a USB read. It does the following calls:

* zero out the data buffer in the WDFMEMORY
* calls WdfMemoryCreatePreallocated on the buffer to get a new WDFMEMORY
* calls WdfUsbTargetPipeFormatRequestForWrite with the new WDFMEMORY
* calls WdfUsbTargetPipeFormatRequestForRead with the new WDFMEMORY
* calls WdfObjectDelete on the new WDFMEMORY
* return from function and write to buffer with 0x1234, then crash after 10 seconds of executing this sequence over and over

Why use WdfMemoryCreatePreallocated? Where does the memory itself come
from? One of the problems with WdfMemoryCreatePreallocated is that the
framework doesn’t own the memory. If you call WdfMemoryCreate, then you
can be sure the memory will not go away until the object is deleted.
With …Preallocated, there’s no telling. If, for example, you called
WdfMemoryCreatePreallocated with a buffer on the stack, disaster would
result when the function returned.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

On WHICH buffer?

If this is a buffer you’re receiving from a Request, you want to call WdfRequestRetrieveInputMemory or WdfRequestRetrieveOutputMemory.

Peter
OSR
@OSRDrivers

This is how the WdfMemoryCreatePreallocated gets setup.

WDFMEMORY hWdfMemory;
pUsbHeader = (USB_HEADER*)WdfMemoryGetBuffer( rdwrReqMemory, NULL);
WdfMemoryCreatePreallocated( WDF_NO_OBJECT_ATTRIBUTES, pRdData, outlen, &hWdfMemory);

As for why is it being used… this code was written years ago by several people. I will probably have to change it to only use the necessary functions. Before that I need to understand what all this code is trying to do.

xxxxx@gmail.com wrote:

This is how the WdfMemoryCreatePreallocated gets setup.

WDFMEMORY hWdfMemory;
pUsbHeader = (USB_HEADER*)WdfMemoryGetBuffer( rdwrReqMemory, NULL);
WdfMemoryCreatePreallocated( WDF_NO_OBJECT_ATTRIBUTES, pRdData, outlen, &hWdfMemory);

As for why is it being used… this code was written years ago by several people. I will probably have to change it to only use the necessary functions. Before that I need to understand what all this code is trying to do.

I can’t imagine what it is trying to do. Where did pRdData come from?
That’s the key point. The implication here is that the memory pointerd
to by pRdData is being released before the request is completed.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.