GUIDANCE: Why Is My Driver Crashing??

We seem to get a large number of emails to the list that ask, in essence,
“why is my driver crashing”. Though I’d like to help (well, usually), I
often find myself frustrated at the lack of information provided by the
author. To assist similar questioners in the future, I thought I’d put
together some guidance on the kinds of information that I find helpful in
diagnosing the reason for a crash.

Crash analysis is non-trivial. But with the right information, it’s usually
possible to at least determine why the system crashed, if not identify the
actual cause of the crash. Understand that these are different: The system
may crash due to dereferencing a bad pointer. That’s WHY it crashed. The
pointer may be bad because some driver, that has been unloaded long ago,
trashed a data-structure in non-paged pool. That’s the CAUSE of the crash.

To help in the analysis, please provide as much of the following information
as possible (note: This list is in approximate order of importance):

  1. Please supply at least a symbolic strack trace, WITH THE CORRECT SYMBOLS
    set up for your driver, the kernel, and the HAL. This is of primary
    importance. If possible provide the stack trace (and all other details
    more) from !Analyze -v from WinDbg. Post the output in the command window
    starting with and including the line on which you typed !analyze -v, all the
    way through the end of the output. If you can’t be bothered getting and
    posting a proper stack trace, I suggest you not expect the list members to
    be bothered with diagnosing your crash.

  2. Please look at the parts of the stack trace that appear to be within your
    driver, and tell us what you’re driver is doing at each location. If
    possible, supply a few lines of code preceding each entry made for your
    driver on the stack. This should be as simple as a double-clicking on your
    driver’s entries in the crash stack in WinDbg.

  3. If this is your driver, crashing in your lab or on your test machine,
    reproduce the problem with DriverVerifier running (with all options turned
    on EXCEPT for low resource simulation), and using the checked kernel and
    hal. Don’t waste your time (or ours) trying to repro the problem on a
    system running the free builds or without Verifier, UNLESS the problem ONLY
    occurs on the free build or with Verifier off (and if this is the case, be
    SURE to tell us).

  4. Describe the scenario under which the crash is obtained, in as much
    detail as you can manage. Bear in mind that the several thousand people who
    read this list are not each intimately familiar with your project. So
    please be clear.

  5. If you have a suspicion about the case of the problem, by all means let
    us know.

  6. Monitor the list closely for follow-up questions. Even with all the
    information above, you should expect there to be one or two more questions
    about the problem.

Given the above, the people on this list can probably help identify just
about ANY crash problem you come across.

Peter
OSR

This is all the information from my winDbg:

kd> !analyze -v

****************************************************************************
***

* *

* Bugcheck Analysis *

* *

****************************************************************************
***

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)

An attempt was made to access a pagable (or completely invalid) address at
an

interrupt request level (IRQL) that is too high. This is usually

caused by drivers using improper addresses.

If kernel debugger is available get stack backtrace.

Arguments:

Arg1: 00000008, memory referenced

Arg2: 00000002, IRQL

Arg3: 00000000, value 0 = read operation, 1 = write operation

Arg4: fc699ee8, address which referenced memory

Debugging Details:


READ_ADDRESS: 00000008

CURRENT_IRQL: 2

FAULTING_IP:

ndishk!TCPIP_ReceiveHandler+1d7

fc699ee8 8b4240 mov eax,[edx+0x40]

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0xD1

LAST_CONTROL_TRANSFER: from 8042bcb9 to 80452e70

STACK_TEXT:

fc81fa48 8042bcb9 00000003 fc81fa90 00000008
nt!RtlpBreakWithStatusInstruction

fc81fa78 8042c068 00000003 00000008 fc699ee8 nt!KiBugCheckDebugBreak+0x31

fc81fe00 80464b1f 00000000 00000008 00000002 nt!KeBugCheckEx+0x37b

fc81fe00 fc699ee8 00000000 00000008 00000002 nt!KiTrap0E+0x27c

fc81ff0c fef02008 ff0222c8 ff074fa8 ffb95000
ndishk!TCPIP_ReceiveHandler+0x1d7 [C:\NDISPIM\BASE\NTPIMEB\mstcpip.c @ 856]

fc81ff78 fc720bc4 ff092101 ff0778c4 00000001
NDIS!ethFilterDprIndicateReceivePacket+0x312

fc81ffc4 feeec28f ff076237 0b7f1320 00000000
pcntn5m!LanceHandleInterrupt+0x41c

fc81ffe0 80460bd4 ff0773d0 ff0773bc 00000000 NDIS!ndisMDpc+0xc8

fc81fff4 80403a82 fc8b7608 00000000 00000000 nt!KiRetireDpcList+0x30

FOLLOWUP_IP:

ndishk!TCPIP_ReceiveHandler+1d7

fc699ee8 8b4240 mov eax,[edx+0x40]

FOLLOWUP_NAME: MachineOwner

SYMBOL_NAME: ndishk!TCPIP_ReceiveHandler+1d7

MODULE_NAME: ndishk

IMAGE_NAME: ndishk.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 3d77a882

STACK_COMMAND: kb

BUCKET_ID: 0xD1_ndishk!TCPIP_ReceiveHandler+1d7

Followup: MachineOwner


The part of code I ported was:



POPEN_INSTANCE open;
PIO_STACK_LOCATION irpSp;
PIRP irp;
PLIST_ENTRY packetListEntry;
PNDIS_PACKET pPacket;
ULONG sizeToTransfer;
NDIS_STATUS status;
UINT bytesTransfered = 0;
ULONG bufferLength;
PPACKET_RESERVED reserved;
PMDL pMdl;

DebugPrint((“ReceiveIndicate\n”));

open= (POPEN_INSTANCE)ProtocolBindingContext;

if (HeaderBufferSize > ETHERNET_HEADER_LENGTH) {

return NDIS_STATUS_SUCCESS;
}

//
// See if there are any pending read that we can satisfy
//
packetListEntry=ExInterlockedRemoveHeadList(
&open->RcvList,
&open->RcvQSpinLock
);

if (packetListEntry == NULL) {
DebugPrint((“No pending read, dropping packets\n”));
return NDIS_STATUS_NOT_ACCEPTED;
}

reserved=CONTAINING_RECORD(packetListEntry,PACKET_RESERVED,ListElement);
pPacket=CONTAINING_RECORD(reserved,NDIS_PACKET,ProtocolReserved);

irp=RESERVED(pPacket)->Irp;
irpSp = IoGetCurrentIrpStackLocation(irp);

//
// We don’t have to worry about the situation where the IRP is cancelled
// after we remove it from the queue and before we reset the cancel
// routine because the cancel routine has been coded to cancel an IRP
// only if it’s in the queue.
//

IoSetCancelRoutine(irp, NULL);

//
// This is the length of our partial MDL
//
bufferLength=irpSp->Parameters.Read.Length-ETHERNET_HEADER_LENGTH;

//
// Find out how much to transfer
//
sizeToTransfer = (PacketSize < bufferLength) ?
PacketSize : bufferLength;

//
// copy the ethernet header into the actual readbuffer
//
NdisMoveMappedMemory(
MmGetSystemAddressForMdlSafe(irp->MdlAddress, NormalPagePriority),
HeaderBuffer,
HeaderBufferSize
);

//
// Allocate an MDL to map the portion of the buffer following the
// header
//
pMdl=IoAllocateMdl(
MmGetMdlVirtualAddress(irp->MdlAddress),
MmGetMdlByteCount(irp->MdlAddress),
FALSE,
FALSE,
NULL
);

if (pMdl == NULL) {
DebugPrint((“Packet: Read-Failed to allocate Mdl\n”));
status = NDIS_STATUS_RESOURCES;
goto ERROR;
}

//
// Build the mdl to point to the the portion of the buffer following
// the header
//
IoBuildPartialMdl(
irp->MdlAddress,
pMdl,

((PUCHAR)MmGetMdlVirtualAddress(irp->MdlAddress))+ETHERNET_HEADER_LENGTH,
0
);

//
// Clear the next link in the new MDL
//

pMdl->Next=NULL;

RESERVED(pPacket)->pMdl=pMdl;

//
// Attach our partial MDL to the packet
//

NdisChainBufferAtFront(pPacket,pMdl);


This code was ported with little changes. Can this give you some hints on
why it crashed?

Thanks a lot!

Yuanhui

----- Original Message -----
From: “Peter Viscarola”
Newsgroups: ntdev
To: “NT Developers Interest List”
Sent: Thursday, September 05, 2002 2:24 PM
Subject: [ntdev] GUIDANCE: Why Is My Driver Crashing??

> We seem to get a large number of emails to the list that ask, in essence,
> “why is my driver crashing”. Though I’d like to help (well, usually), I
> often find myself frustrated at the lack of information provided by the
> author. To assist similar questioners in the future, I thought I’d put
> together some guidance on the kinds of information that I find helpful in
> diagnosing the reason for a crash.
>
> Crash analysis is non-trivial. But with the right information, it’s
usually
> possible to at least determine why the system crashed, if not identify the
> actual cause of the crash. Understand that these are different: The
system
> may crash due to dereferencing a bad pointer. That’s WHY it crashed. The
> pointer may be bad because some driver, that has been unloaded long ago,
> trashed a data-structure in non-paged pool. That’s the CAUSE of the
crash.
>
> To help in the analysis, please provide as much of the following
information
> as possible (note: This list is in approximate order of importance):
>
> 1) Please supply at least a symbolic strack trace, WITH THE CORRECT
SYMBOLS
> set up for your driver, the kernel, and the HAL. This is of primary
> importance. If possible provide the stack trace (and all other details
> more) from !Analyze -v from WinDbg. Post the output in the command window
> starting with and including the line on which you typed !analyze -v, all
the
> way through the end of the output. If you can’t be bothered getting and
> posting a proper stack trace, I suggest you not expect the list members to
> be bothered with diagnosing your crash.
>
> 2) Please look at the parts of the stack trace that appear to be within
your
> driver, and tell us what you’re driver is doing at each location. If
> possible, supply a few lines of code preceding each entry made for your
> driver on the stack. This should be as simple as a double-clicking on
your
> driver’s entries in the crash stack in WinDbg.
>
> 3) If this is your driver, crashing in your lab or on your test machine,
> reproduce the problem with DriverVerifier running (with all options turned
> on EXCEPT for low resource simulation), and using the checked kernel and
> hal. Don’t waste your time (or ours) trying to repro the problem on a
> system running the free builds or without Verifier, UNLESS the problem
ONLY
> occurs on the free build or with Verifier off (and if this is the case, be
> SURE to tell us).
>
> 4) Describe the scenario under which the crash is obtained, in as much
> detail as you can manage. Bear in mind that the several thousand people
who
> read this list are not each intimately familiar with your project. So
> please be clear.
>
> 5) If you have a suspicion about the case of the problem, by all means let
> us know.
>
> 6) Monitor the list closely for follow-up questions. Even with all the
> information above, you should expect there to be one or two more questions
> about the problem.
>
> Given the above, the people on this list can probably help identify just
> about ANY crash problem you come across.
>
> Peter
> OSR
>
>
>
>
> —
> You are currently subscribed to ntdev as: xxxxx@nexland.com
> To unsubscribe send a blank email to %%email.unsub%%
>

Looks like a null pointer problem. You tried to touch virtual address
00000008. (Probably pointer + 0x8)

-----Original Message-----
From: Yuanhui Zhao [mailto:xxxxx@nexland.com]
Sent: Thursday, September 05, 2002 1:24 PM
To: NT Developers Interest List
Subject: [ntdev] Re: GUIDANCE: Why Is My Driver Crashing??

This is all the information from my winDbg:

kd> !analyze -v

****************************************************************************
***

* *

* Bugcheck Analysis *

* *

****************************************************************************
***

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)

An attempt was made to access a pagable (or completely invalid) address at
an

interrupt request level (IRQL) that is too high. This is usually

caused by drivers using improper addresses.

If kernel debugger is available get stack backtrace.

Arguments:

Arg1: 00000008, memory referenced

Arg2: 00000002, IRQL

Arg3: 00000000, value 0 = read operation, 1 = write operation

Arg4: fc699ee8, address which referenced memory

Debugging Details:


READ_ADDRESS: 00000008

CURRENT_IRQL: 2

FAULTING_IP:

ndishk!TCPIP_ReceiveHandler+1d7

fc699ee8 8b4240 mov eax,[edx+0x40]

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0xD1

LAST_CONTROL_TRANSFER: from 8042bcb9 to 80452e70

STACK_TEXT:

fc81fa48 8042bcb9 00000003 fc81fa90 00000008
nt!RtlpBreakWithStatusInstruction

fc81fa78 8042c068 00000003 00000008 fc699ee8 nt!KiBugCheckDebugBreak+0x31

fc81fe00 80464b1f 00000000 00000008 00000002 nt!KeBugCheckEx+0x37b

fc81fe00 fc699ee8 00000000 00000008 00000002 nt!KiTrap0E+0x27c

fc81ff0c fef02008 ff0222c8 ff074fa8 ffb95000
ndishk!TCPIP_ReceiveHandler+0x1d7 [C:\NDISPIM\BASE\NTPIMEB\mstcpip.c @ 856]

fc81ff78 fc720bc4 ff092101 ff0778c4 00000001
NDIS!ethFilterDprIndicateReceivePacket+0x312

fc81ffc4 feeec28f ff076237 0b7f1320 00000000
pcntn5m!LanceHandleInterrupt+0x41c

fc81ffe0 80460bd4 ff0773d0 ff0773bc 00000000 NDIS!ndisMDpc+0xc8

fc81fff4 80403a82 fc8b7608 00000000 00000000 nt!KiRetireDpcList+0x30

FOLLOWUP_IP:

ndishk!TCPIP_ReceiveHandler+1d7

fc699ee8 8b4240 mov eax,[edx+0x40]

FOLLOWUP_NAME: MachineOwner

SYMBOL_NAME: ndishk!TCPIP_ReceiveHandler+1d7

MODULE_NAME: ndishk

IMAGE_NAME: ndishk.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 3d77a882

STACK_COMMAND: kb

BUCKET_ID: 0xD1_ndishk!TCPIP_ReceiveHandler+1d7

Followup: MachineOwner


The part of code I ported was:



POPEN_INSTANCE open;
PIO_STACK_LOCATION irpSp;
PIRP irp;
PLIST_ENTRY packetListEntry;
PNDIS_PACKET pPacket;
ULONG sizeToTransfer;
NDIS_STATUS status;
UINT bytesTransfered = 0;
ULONG bufferLength;
PPACKET_RESERVED reserved;
PMDL pMdl;

DebugPrint((“ReceiveIndicate\n”));

open= (POPEN_INSTANCE)ProtocolBindingContext;

if (HeaderBufferSize > ETHERNET_HEADER_LENGTH) {

return NDIS_STATUS_SUCCESS;
}

//
// See if there are any pending read that we can satisfy
//
packetListEntry=ExInterlockedRemoveHeadList(
&open->RcvList,
&open->RcvQSpinLock
);

if (packetListEntry == NULL) {
DebugPrint((“No pending read, dropping packets\n”));
return NDIS_STATUS_NOT_ACCEPTED;
}

reserved=CONTAINING_RECORD(packetListEntry,PACKET_RESERVED,ListElement);
pPacket=CONTAINING_RECORD(reserved,NDIS_PACKET,ProtocolReserved);

irp=RESERVED(pPacket)->Irp;
irpSp = IoGetCurrentIrpStackLocation(irp);

//
// We don’t have to worry about the situation where the IRP is cancelled
// after we remove it from the queue and before we reset the cancel
// routine because the cancel routine has been coded to cancel an IRP
// only if it’s in the queue.
//

IoSetCancelRoutine(irp, NULL);

//
// This is the length of our partial MDL
//
bufferLength=irpSp->Parameters.Read.Length-ETHERNET_HEADER_LENGTH;

//
// Find out how much to transfer
//
sizeToTransfer = (PacketSize < bufferLength) ?
PacketSize : bufferLength;

//
// copy the ethernet header into the actual readbuffer
//
NdisMoveMappedMemory(
MmGetSystemAddressForMdlSafe(irp->MdlAddress, NormalPagePriority),
HeaderBuffer,
HeaderBufferSize
);

//
// Allocate an MDL to map the portion of the buffer following the
// header
//
pMdl=IoAllocateMdl(
MmGetMdlVirtualAddress(irp->MdlAddress),
MmGetMdlByteCount(irp->MdlAddress),
FALSE,
FALSE,
NULL
);

if (pMdl == NULL) {
DebugPrint((“Packet: Read-Failed to allocate Mdl\n”));
status = NDIS_STATUS_RESOURCES;
goto ERROR;
}

//
// Build the mdl to point to the the portion of the buffer following
// the header
//
IoBuildPartialMdl(
irp->MdlAddress,
pMdl,

((PUCHAR)MmGetMdlVirtualAddress(irp->MdlAddress))+ETHERNET_HEADER_LENGTH,
0
);

//
// Clear the next link in the new MDL
//

pMdl->Next=NULL;

RESERVED(pPacket)->pMdl=pMdl;

//
// Attach our partial MDL to the packet
//

NdisChainBufferAtFront(pPacket,pMdl);


This code was ported with little changes. Can this give you some hints on
why it crashed?

Thanks a lot!

Yuanhui

----- Original Message -----
From: “Peter Viscarola”
Newsgroups: ntdev
To: “NT Developers Interest List”
Sent: Thursday, September 05, 2002 2:24 PM
Subject: [ntdev] GUIDANCE: Why Is My Driver Crashing??

> We seem to get a large number of emails to the list that ask, in essence,
> “why is my driver crashing”. Though I’d like to help (well, usually), I
> often find myself frustrated at the lack of information provided by the
> author. To assist similar questioners in the future, I thought I’d put
> together some guidance on the kinds of information that I find helpful in
> diagnosing the reason for a crash.
>
> Crash analysis is non-trivial. But with the right information, it’s
usually
> possible to at least determine why the system crashed, if not identify the
> actual cause of the crash. Understand that these are different: The
system
> may crash due to dereferencing a bad pointer. That’s WHY it crashed. The
> pointer may be bad because some driver, that has been unloaded long ago,
> trashed a data-structure in non-paged pool. That’s the CAUSE of the
crash.
>
> To help in the analysis, please provide as much of the following
information
> as possible (note: This list is in approximate order of importance):
>
> 1) Please supply at least a symbolic strack trace, WITH THE CORRECT
SYMBOLS
> set up for your driver, the kernel, and the HAL. This is of primary
> importance. If possible provide the stack trace (and all other details
> more) from !Analyze -v from WinDbg. Post the output in the command window
> starting with and including the line on which you typed !analyze -v, all
the
> way through the end of the output. If you can’t be bothered getting and
> posting a proper stack trace, I suggest you not expect the list members to
> be bothered with diagnosing your crash.
>
> 2) Please look at the parts of the stack trace that appear to be within
your
> driver, and tell us what you’re driver is doing at each location. If
> possible, supply a few lines of code preceding each entry made for your
> driver on the stack. This should be as simple as a double-clicking on
your
> driver’s entries in the crash stack in WinDbg.
>
> 3) If this is your driver, crashing in your lab or on your test machine,
> reproduce the problem with DriverVerifier running (with all options turned
> on EXCEPT for low resource simulation), and using the checked kernel and
> hal. Don’t waste your time (or ours) trying to repro the problem on a
> system running the free builds or without Verifier, UNLESS the problem
ONLY
> occurs on the free build or with Verifier off (and if this is the case, be
> SURE to tell us).
>
> 4) Describe the scenario under which the crash is obtained, in as much
> detail as you can manage. Bear in mind that the several thousand people
who
> read this list are not each intimately familiar with your project. So
> please be clear.
>
> 5) If you have a suspicion about the case of the problem, by all means let
> us know.
>
> 6) Monitor the list closely for follow-up questions. Even with all the
> information above, you should expect there to be one or two more questions
> about the problem.
>
> Given the above, the people on this list can probably help identify just
> about ANY crash problem you come across.
>
> Peter
> OSR
>
>
>
>
> —
> You are currently subscribed to ntdev as: xxxxx@nexland.com
> To unsubscribe send a blank email to %%email.unsub%%
>


You are currently subscribed to ntdev as: xxxxx@nvidia.com
To unsubscribe send a blank email to %%email.unsub%%