How do I locate the code that caused this bugcheck?

Hi,

I have been trying for days to find out where my NDIS-WDM driver
is corrupting memory. But each crash has different characteristics
and I can’t pinpoint the root cause of the problem. Sometimes
I get a 0x0A bugcheck, sometimes a 0xD1 bugcheck and at other
times a 0xc5 error.

I am running the checked versions of ntoskrnl.exe, halacpi.dll
and ndis.sys for WinXP SP1 and using the driver verifier. The latest
bugcheck shows some details that I didn’t see before (the ‘MM:’
lines). But it still does not indicate where the problem occured.
I have a lot of ‘dbgprintf()’ in the code and the last line printed was
“Exiting MiniportSendPackets…”, the last line of the
‘SendPacketsHandler’ function.

How do I use the information below to locate where the problem
occurred?

Thanks,

  • Harshal

MM:***PAG FAULT AT IRQL > 1 Va CD34CD1A, IRQL 2
MM:***EIP 80CBEF24, EFL 00010206
MM:***EAX CD34CD1A, ECX 8279CF68 EDX CD34CD32
MM:***EBX 82181E01, ESI 82181C10 EDI 80AF0D24

*** Fatal System Error: 0x0000000a
(0xCD34CD1A,0x00000002,0x00000000,0x80CBEF24)

kd> !analyze -v

Arg1: cd34cd1a, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000000, value 0 = read operation, 1 = write operation
Arg4: 80cbef24, address which referenced memory

Debugging Details:

READ_ADDRESS: cd34cd1a Nonpaged pool

CURRENT_IRQL: 2

FAULTING_IP:
nt!ViIrpDatabaseFindPointer+20
80cbef24 3908 cmp [eax],ecx

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0xA

LAST_CONTROL_TRANSFER: from 80cbf6f5 to 80cbef24

TRAP_FRAME: f7ae5b10 – (.trap fffffffff7ae5b10)
ErrCode = 00000000
eax=cd34cd1a ebx=82181e01 ecx=8279cf68 edx=cd34cd32 esi=82181c10 edi=80af0d24
eip=80cbef24 esp=f7ae5b84 ebp=f7ae5ba0 iopl=0 nv up ei pl nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010206
nt!ViIrpDatabaseFindPointer+0x20:
80cbef24 3908 cmp [eax],ecx
Resetting default scope

STACK_TEXT:
f7ae5b84 80cbf6f5 81e43638 8279cf68 82181c10 nt!ViIrpDatabaseFindPointer+0x20
f7ae5ba0 80cb736c 81e43638 8279cf68 8279cf68 nt!VfIrpDatabaseEntryFindAndLock+0x3f
f7ae5bb8 80cae3b2 82181e01 f7ae5bdb f7ae5be8 nt!VerifierIoInitializeIrp+0xe
f7ae5bc8 80a21383 8279cf68 00000094 82181e01 nt!IovInitializeIrp+0x1e
f7ae5be8 80cb74cd 8279cf68 00000094 82181e01 nt!IoInitializeIrp+0x1f
f7ae5c18 80cb7572 f7ae5c3c 82039728 00000001 nt!ViIrpAllocateLockedPacket+0x73
f7ae5c34 80cae553 00000000 81e43638 82039728 nt!VerifierIoAllocateIrp1+0x3c
f7ae5c6c 80b18df0 00000001 00000001 f7ae5d58 nt!IovAllocateIrp+0x1d
f7ae5cf4 80b0ec70 000002cc 00000000 00000000 nt!IopXxxControlFile+0x3e4
f7ae5d28 80ac2efc 000002cc 00000000 00000000 nt!NtDeviceIoControlFile+0x28
f7ae5d28 7ffe0304 000002cc 00000000 00000000 nt!KiSystemService+0x13b
009bf534 77f5b864 77e7565b 000002cc 00000000 SharedUserData!SystemCallStub+0x4
009bf538 77e7565b 000002cc 00000000 00000000 ntdll!ZwDeviceIoControlFile+0xc
009bf598 76d61c26 000002cc 00120003 009bf7e8 kernel32!DeviceIoControl+0xdd
009bf5f4 76d6246e 00000006 00000000 009bf7e8 iphlpapi!WsControl+0xf3
009bf860 76d61c5b 009bfb18 0015ffd0 00160018 iphlpapi!GetAdapterList+0x443
009bf894 76d61fad 00000000 0015ffd0 00160018 iphlpapi!GetAdapterInfo+0x1f
009bf8e8 75d38759 00000000 009bfb18 00000000 iphlpapi!GetAdapterInfoEx+0x1c
009bfb10 75d3c196 00000000 015a8358 009bfc10 NETSHELL!HrGetAutoNetSetting+0x35
009bfbd0 75d3a044 009bfbec 00000001 000d3e30 NETSHELL!CLanStatEngine::HrUpdateData+0x15d
009bfbf4 75d36454 0015ffe0 009bfc10 009bfca0 NETSHELL!CNetStatisticsEngine::UpdateStatistics+0x2b
009bfc18 75d37393 00063c83 75d3735f 0017a060 NETSHELL!CNetStatisticsCentral::RefreshStatistics+0x4c
009bfc2c 77d43a50 00000000 00000113 00007fd2 NETSHELL!CNetStatisticsCentral::TimerCallback+0x34
009bfc58 77d442c5 75d3735f 00000000 00000113 USER32!InternalCallWinProc+0x1b
009bfcc0 77d43e6f 00000000 75d3735f 00000000 USER32!UserCallWinProc+0xf3
009bfd18 77d43ddf 009bfd6c 00000000 74b015d7 USER32!DispatchMessageWorker+0x10e
009bfd24 74b015d7 009bfd6c 771c301d 74b00000 USER32!DispatchMessageW+0xb
009bfd90 74b02f1b 74b00000 00000000 000200f0 stobject!SysTrayMain+0x175
009bffb4 77e7d28e 00000000 771c301d 00a9f580 stobject!CSysTray::SysTrayThreadProc+0x45
009bffec 00000000 74b02ed6 00000000 00000000 kernel32!BaseThreadStart+0x37

FOLLOWUP_IP:
nt!ViIrpDatabaseFindPointer+20
80cbef24 3908 cmp [eax],ecx

SYMBOL_STACK_INDEX: 0

FOLLOWUP_NAME: MachineOwner

SYMBOL_NAME: nt!ViIrpDatabaseFindPointer+20

MODULE_NAME: nt

IMAGE_NAME: ntoskrnl.exe

DEBUG_FLR_IMAGE_TIMESTAMP: 3d6dd014

STACK_COMMAND: .trap fffffffff7ae5b10 ; kb

FAILURE_BUCKET_ID: 0xA_VRF_nt!ViIrpDatabaseFindPointer+20

BUCKET_ID: 0xA_VRF_nt!ViIrpDatabaseFindPointer+20

Followup: MachineOwner

Since this is an NDIS-WDM driver, I am assuming you are allocating your
own PIRP(s). In the completion routine(s) for the PIRP(s), do you have
code that looks like this:

if (Irp->PendingReturned) {
IoMarkIrpPending (Irp);
}

?

If so, you are corrupting memory here b/c for PIRPs you allocate, when
they complete back to you, no longer have a valid current stack location
and IoMarkIrpPending modifies the current stack location. The affect is
that you are touching memory just beyond the end of the PIRP.

d

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Harshal Chhaya
Sent: Tuesday, December 28, 2004 10:14 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] How do I locate the code that caused this bugcheck?

Hi,

I have been trying for days to find out where my NDIS-WDM driver
is corrupting memory. But each crash has different characteristics
and I can’t pinpoint the root cause of the problem. Sometimes
I get a 0x0A bugcheck, sometimes a 0xD1 bugcheck and at other
times a 0xc5 error.

I am running the checked versions of ntoskrnl.exe, halacpi.dll
and ndis.sys for WinXP SP1 and using the driver verifier. The latest
bugcheck shows some details that I didn’t see before (the ‘MM:’
lines). But it still does not indicate where the problem occured.
I have a lot of ‘dbgprintf()’ in the code and the last line printed was
“Exiting MiniportSendPackets…”, the last line of the
‘SendPacketsHandler’ function.

How do I use the information below to locate where the problem
occurred?

Thanks,

  • Harshal

MM:***PAG FAULT AT IRQL > 1 Va CD34CD1A, IRQL 2
MM:***EIP 80CBEF24, EFL 00010206
MM:***EAX CD34CD1A, ECX 8279CF68 EDX CD34CD32
MM:***EBX 82181E01, ESI 82181C10 EDI 80AF0D24

*** Fatal System Error: 0x0000000a
(0xCD34CD1A,0x00000002,0x00000000,0x80CBEF24)

kd> !analyze -v

Arg1: cd34cd1a, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000000, value 0 = read operation, 1 = write operation
Arg4: 80cbef24, address which referenced memory

Debugging Details:

READ_ADDRESS: cd34cd1a Nonpaged pool

CURRENT_IRQL: 2

FAULTING_IP:
nt!ViIrpDatabaseFindPointer+20
80cbef24 3908 cmp [eax],ecx

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0xA

LAST_CONTROL_TRANSFER: from 80cbf6f5 to 80cbef24

TRAP_FRAME: f7ae5b10 – (.trap fffffffff7ae5b10)
ErrCode = 00000000
eax=cd34cd1a ebx=82181e01 ecx=8279cf68 edx=cd34cd32 esi=82181c10
edi=80af0d24
eip=80cbef24 esp=f7ae5b84 ebp=f7ae5ba0 iopl=0 nv up ei pl nz na
po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000
efl=00010206
nt!ViIrpDatabaseFindPointer+0x20:
80cbef24 3908 cmp [eax],ecx
Resetting default scope

STACK_TEXT:
f7ae5b84 80cbf6f5 81e43638 8279cf68 82181c10
nt!ViIrpDatabaseFindPointer+0x20
f7ae5ba0 80cb736c 81e43638 8279cf68 8279cf68
nt!VfIrpDatabaseEntryFindAndLock+0x3f
f7ae5bb8 80cae3b2 82181e01 f7ae5bdb f7ae5be8
nt!VerifierIoInitializeIrp+0xe
f7ae5bc8 80a21383 8279cf68 00000094 82181e01 nt!IovInitializeIrp+0x1e
f7ae5be8 80cb74cd 8279cf68 00000094 82181e01 nt!IoInitializeIrp+0x1f
f7ae5c18 80cb7572 f7ae5c3c 82039728 00000001
nt!ViIrpAllocateLockedPacket+0x73
f7ae5c34 80cae553 00000000 81e43638 82039728
nt!VerifierIoAllocateIrp1+0x3c
f7ae5c6c 80b18df0 00000001 00000001 f7ae5d58 nt!IovAllocateIrp+0x1d
f7ae5cf4 80b0ec70 000002cc 00000000 00000000 nt!IopXxxControlFile+0x3e4
f7ae5d28 80ac2efc 000002cc 00000000 00000000
nt!NtDeviceIoControlFile+0x28
f7ae5d28 7ffe0304 000002cc 00000000 00000000 nt!KiSystemService+0x13b
009bf534 77f5b864 77e7565b 000002cc 00000000
SharedUserData!SystemCallStub+0x4
009bf538 77e7565b 000002cc 00000000 00000000
ntdll!ZwDeviceIoControlFile+0xc
009bf598 76d61c26 000002cc 00120003 009bf7e8
kernel32!DeviceIoControl+0xdd
009bf5f4 76d6246e 00000006 00000000 009bf7e8 iphlpapi!WsControl+0xf3
009bf860 76d61c5b 009bfb18 0015ffd0 00160018
iphlpapi!GetAdapterList+0x443
009bf894 76d61fad 00000000 0015ffd0 00160018
iphlpapi!GetAdapterInfo+0x1f
009bf8e8 75d38759 00000000 009bfb18 00000000
iphlpapi!GetAdapterInfoEx+0x1c
009bfb10 75d3c196 00000000 015a8358 009bfc10
NETSHELL!HrGetAutoNetSetting+0x35
009bfbd0 75d3a044 009bfbec 00000001 000d3e30
NETSHELL!CLanStatEngine::HrUpdateData+0x15d
009bfbf4 75d36454 0015ffe0 009bfc10 009bfca0
NETSHELL!CNetStatisticsEngine::UpdateStatistics+0x2b
009bfc18 75d37393 00063c83 75d3735f 0017a060
NETSHELL!CNetStatisticsCentral::RefreshStatistics+0x4c
009bfc2c 77d43a50 00000000 00000113 00007fd2
NETSHELL!CNetStatisticsCentral::TimerCallback+0x34
009bfc58 77d442c5 75d3735f 00000000 00000113
USER32!InternalCallWinProc+0x1b
009bfcc0 77d43e6f 00000000 75d3735f 00000000 USER32!UserCallWinProc+0xf3
009bfd18 77d43ddf 009bfd6c 00000000 74b015d7
USER32!DispatchMessageWorker+0x10e
009bfd24 74b015d7 009bfd6c 771c301d 74b00000 USER32!DispatchMessageW+0xb
009bfd90 74b02f1b 74b00000 00000000 000200f0 stobject!SysTrayMain+0x175
009bffb4 77e7d28e 00000000 771c301d 00a9f580
stobject!CSysTray::SysTrayThreadProc+0x45
009bffec 00000000 74b02ed6 00000000 00000000
kernel32!BaseThreadStart+0x37

FOLLOWUP_IP:
nt!ViIrpDatabaseFindPointer+20
80cbef24 3908 cmp [eax],ecx

SYMBOL_STACK_INDEX: 0

FOLLOWUP_NAME: MachineOwner

SYMBOL_NAME: nt!ViIrpDatabaseFindPointer+20

MODULE_NAME: nt

IMAGE_NAME: ntoskrnl.exe

DEBUG_FLR_IMAGE_TIMESTAMP: 3d6dd014

STACK_COMMAND: .trap fffffffff7ae5b10 ; kb

FAILURE_BUCKET_ID: 0xA_VRF_nt!ViIrpDatabaseFindPointer+20

BUCKET_ID: 0xA_VRF_nt!ViIrpDatabaseFindPointer+20

Followup: MachineOwner


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@windows.microsoft.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Doron Holan wrote:

Since this is an NDIS-WDM driver, I am assuming you are allocating your
own PIRP(s). In the completion routine(s) for the PIRP(s), do you have
code that looks like this:

if (Irp->PendingReturned) {
IoMarkIrpPending (Irp);
}

?

If so, you are corrupting memory here b/c for PIRPs you allocate, when
they complete back to you, no longer have a valid current stack location
and IoMarkIrpPending modifies the current stack location. The affect is
that you are touching memory just beyond the end of the PIRP.

Doron,

Thanks for the suggestion. I am allocating and freeing the PIRPs but I don’t
have any call to IoMarkIrpPending() in the completion routine.

The bulk of the completion function looks like this:

NTSTATUS USBwinReadWrite_Complete(IN PDEVICE_OBJECT deviceObject,
IN PIRP cirp,
IN PVOID Context)
{

NTSTATUS ntStatus = OS_STATUS_FAILURE;

cfContext = (PUSB_ASYNC_RDRW_CONEXT) Context;

urb = cfContext->urb; //get the urb we alloced for this xfer
irp = cfContext->irp;
pAdapter = (PTIWLAN_T)(cfContext->pAdapter);

usbstatus = ((struct _URB_HEADER *)urb)->Status;

USBExtension = &pAdapter->USBExtension;

ASSERT (cirp == irp);
ASSERT (urb !=NULL);

dprintf (2, “USBwinReadWrite_Complete Status: %x usb status: %x\n”,
irp->IoStatus.Status, usbstatus);

irp->IoStatus.Information =
urb->UrbBulkOrInterruptTransfer.TransferBufferLength;

TransferBuffer = urb->UrbBulkOrInterruptTransfer.TransferBuffer;

if (irp->IoStatus.Status == STATUS_SUCCESS)
{
// Check if Recv
if (urb->UrbBulkOrInterruptTransfer.TransferFlags & USBD_TRANSFER_DIRECTION_IN)
{
USBRxComplete(pAdapter, TransferBuffer,
irp->IoStatus.Status == STATUS_SUCCESS ?
urb->UrbBulkOrInterruptTransfer.TransferBufferLength : 0);
}
else // Transmit
{
USBSendComplete(pAdapter,cfContext->TxBuffer);
}
}
else
{
dprintf (2, “USBwinReadWrite_Complete FAILURE Status: %x usb status: %x\n”,
irp->IoStatus.Status, usbstatus);

pAdapter->bUsbHalted = pAdapter->bUsbError = TRUE;
dprintf(2, “\n** USBwinReadWrite_Complete: USB ERRORRRRRRRR\n”);
}

ntStatus = STATUS_MORE_PROCESSING_REQUIRED;

irp->IoStatus.Status = STATUS_SUCCESS;

curListPtr = pAdapter->USBExtension.UrbQueueHead;

if (curListPtr == Context)
pAdapter->USBExtension.UrbQueueHead = NULL;
else
while (curListPtr && curListPtr->Next)
{
if (curListPtr->Next == Context)
curListPtr->Next = ((PUSB_ASYNC_RDRW_CONEXT)Context)->Next ? ((PUSB_ASYNC_RDRW_CONEXT)Context)->Next : NULL;
curListPtr = curListPtr->Next;
}

dprintf(2, “Freeing IRP: %8x\n”, irp);

osFreeMemory(irp, pAdapter->USBExtension.IrpSize);
osFreeMemory(urb, sizeof(struct _URB_BULK_OR_INTERRUPT_TRANSFER));
osFreeMemory(cfContext, sizeof(USB_ASYNC_RDRW_CONEXT));

DecrementIoCount(pAdapter);

dprintf(2, “Returning from USBwinReadWrite_Complete\n”);

return ntStatus;
}

The ‘pAdapter->bUsbError = TRUE;’ flag is handled in the
MiniportCheckForHang() function.

For all the packets processed before the crash, irp->IoStatus.Status
and usbstatus are both 0.

Thanks again for the suggestion.

  • Harshal

One quick note, you should use

If (NT_SUCCESS(irp->IoStatus.Status))

Instead of

if (irp->IoStatus.Status == STATUS_SUCCESS)

d

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Harshal Chhaya
Sent: Tuesday, December 28, 2004 11:47 AM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] How do I locate the code that caused this bugcheck?

Doron Holan wrote:

Since this is an NDIS-WDM driver, I am assuming you are allocating
your
own PIRP(s). In the completion routine(s) for the PIRP(s), do you
have
code that looks like this:

if (Irp->PendingReturned) {
IoMarkIrpPending (Irp);
}

?

If so, you are corrupting memory here b/c for PIRPs you allocate, when
they complete back to you, no longer have a valid current stack
location
and IoMarkIrpPending modifies the current stack location. The affect
is
that you are touching memory just beyond the end of the PIRP.

Doron,

Thanks for the suggestion. I am allocating and freeing the PIRPs but I
don’t
have any call to IoMarkIrpPending() in the completion routine.

The bulk of the completion function looks like this:

NTSTATUS USBwinReadWrite_Complete(IN PDEVICE_OBJECT deviceObject,
IN PIRP cirp,
IN PVOID Context)
{

NTSTATUS ntStatus = OS_STATUS_FAILURE;

cfContext = (PUSB_ASYNC_RDRW_CONEXT) Context;

urb = cfContext->urb; //get the urb we alloced for this xfer
irp = cfContext->irp;
pAdapter = (PTIWLAN_T)(cfContext->pAdapter);

usbstatus = ((struct _URB_HEADER *)urb)->Status;

USBExtension = &pAdapter->USBExtension;

ASSERT (cirp == irp);
ASSERT (urb !=NULL);

dprintf (2, “USBwinReadWrite_Complete Status: %x usb status: %x\n”,
irp->IoStatus.Status, usbstatus);

irp->IoStatus.Information =
urb->UrbBulkOrInterruptTransfer.TransferBufferLength;

TransferBuffer = urb->UrbBulkOrInterruptTransfer.TransferBuffer;

if (irp->IoStatus.Status == STATUS_SUCCESS)
{
// Check if Recv
if (urb->UrbBulkOrInterruptTransfer.TransferFlags &
USBD_TRANSFER_DIRECTION_IN)
{
USBRxComplete(pAdapter, TransferBuffer,
irp->IoStatus.Status == STATUS_SUCCESS ?
urb->UrbBulkOrInterruptTransfer.TransferBufferLength : 0);
}
else // Transmit
{
USBSendComplete(pAdapter,cfContext->TxBuffer);
}
}
else
{
dprintf (2, “USBwinReadWrite_Complete FAILURE Status: %x usb status:
%x\n”,
irp->IoStatus.Status, usbstatus);

pAdapter->bUsbHalted = pAdapter->bUsbError = TRUE;
dprintf(2, “\n** USBwinReadWrite_Complete: USB ERRORRRRRRRR\n”);
}

ntStatus = STATUS_MORE_PROCESSING_REQUIRED;

irp->IoStatus.Status = STATUS_SUCCESS;

curListPtr = pAdapter->USBExtension.UrbQueueHead;

if (curListPtr == Context)
pAdapter->USBExtension.UrbQueueHead = NULL;
else
while (curListPtr && curListPtr->Next)
{
if (curListPtr->Next == Context)
curListPtr->Next = ((PUSB_ASYNC_RDRW_CONEXT)Context)->Next ?
((PUSB_ASYNC_RDRW_CONEXT)Context)->Next : NULL;
curListPtr = curListPtr->Next;
}

dprintf(2, “Freeing IRP: %8x\n”, irp);

osFreeMemory(irp, pAdapter->USBExtension.IrpSize);
osFreeMemory(urb, sizeof(struct _URB_BULK_OR_INTERRUPT_TRANSFER));
osFreeMemory(cfContext, sizeof(USB_ASYNC_RDRW_CONEXT));

DecrementIoCount(pAdapter);

dprintf(2, “Returning from USBwinReadWrite_Complete\n”);

return ntStatus;
}

The ‘pAdapter->bUsbError = TRUE;’ flag is handled in the
MiniportCheckForHang() function.

For all the packets processed before the crash, irp->IoStatus.Status
and usbstatus are both 0.

Thanks again for the suggestion.

  • Harshal

Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@windows.microsoft.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

I find your completion routine a bit confused. Perhaps it is just me who is
confused. Anyhow, you seem to be wasting time setting various bits of state
in an IRP that you unconditionally free at the end of the routine.

For example here:

irp->IoStatus.Information =
urb->UrbBulkOrInterruptTransfer.TransferBufferLength;

And then again here:

irp->IoStatus.Status = STATUS_SUCCESS;

Before you go off and do this:

osFreeMemory(irp, pAdapter->USBExtension.IrpSize);

Which appears to be deallocating the Irp in question. What’s up with that?

Also where exactly did “irp” come from? And what exactly does osFreeMemory
do?

You didn’t by any chance allocate the IRP using IoAllocateIrp, did you? If
not why not, and if not how did you allocate the IRP in question?

=====================
Mark Roddy
Windows .NET/XP/2000 Consulting
Hollis Technology Solutions 603-321-1032
www.hollistech.com

You don’t appear to be doing any locking on your queue management. Is it
possible to have a race condition with queuing a new request at the same
time you are walking the list here?

Loren

Mark Roddy wrote:

I find your completion routine a bit confused. Perhaps it is just me who is
confused. Anyhow, you seem to be wasting time setting various bits of state
in an IRP that you unconditionally free at the end of the routine.

For example here:

> irp->IoStatus.Information =
> urb->UrbBulkOrInterruptTransfer.TransferBufferLength;

And then again here:
>
> irp->IoStatus.Status = STATUS_SUCCESS;
>

Before you go off and do this:
>
> osFreeMemory(irp, pAdapter->USBExtension.IrpSize);

Which appears to be deallocating the Irp in question. What’s up with that?

Mark,

You are right about the confusing code. I got the driver code from
someone else and I am modifying it for a different project. I have
changed only the functions that I needed to and left the others
as-is. There is some stuff in there that is questionable (like the
lines you point out) but since it seems harmless, I didn’t mess
with it.

Also where exactly did “irp” come from? And what exactly does osFreeMemory
do?

‘irp’ is declared as:

PIRP irp;

in this function function and assigned as

irp = cfContext->irp.

‘osFreeMemory()’ just another name for NdisFreeMemory(). It is defined
as:

#define osFreeMemory(buffer, length) NdisFreeMemory(buffer, length, 0)

You didn’t by any chance allocate the IRP using IoAllocateIrp, did you? If
not why not, and if not how did you allocate the IRP in question?

The function that passes the IRP to the USB drivers using IoCallDriver()
creates the irp through a variable declaration. It does not use IoAllocateIrp
but uses osAllocateMemory() (which calls NdisAllocateMemoryWithTag() for
the actual allocation).

The function that passes the IRP to the USB sub-system is:

static OS_STATUS AsyncCallUSBD(IN PTIWLAN_T pAdapter,
IN PURB Urb,
tUsbTxBuffer *TxBuffer,
PIO_COMPLETION_ROUTINE CompletionRoutine)
{

OS_STATUS Status;
PIRP irp;
IO_STATUS_BLOCK ioStatus;
PIO_STACK_LOCATION nextStack;
PUSB_ASYNC_RDRW_CONEXT pCfContext;

irp = osAllocateMemory(pAdapter->USBExtension.IrpSize);
if(!irp)
{
dprintf(1, “Failed to allocate memory for irp\n”);
return OS_STATUS_INSUFFICIENT_RESOURCES;
}

pCfContext = osAllocateMemory(sizeof(USB_ASYNC_RDRW_CONEXT));
if(!pCfContext)
{
dprintf(1, “Failed to allocate memory for context\n”);
return OS_STATUS_INSUFFICIENT_RESOURCES;
}

IoInitializeIrp(irp, (USHORT)pAdapter->USBExtension.IrpSize, (CCHAR)pAdapter->USBExtension.usbStackSize);

pCfContext->irp = irp;
pCfContext->urb = Urb;
pCfContext->TxBuffer = TxBuffer;
pCfContext->pAdapter = (struct TIWLAN_T *)pAdapter;
pCfContext->Next = pAdapter->USBExtension.UrbQueueHead;
pAdapter->USBExtension.UrbQueueHead = pCfContext;

nextStack = IoGetNextIrpStackLocation(irp);
ASSERT(nextStack != NULL);
dprintf(4, “AsyncCallUSBD: Urb: %x\n”, Urb);
nextStack->Parameters.Others.Argument1 = Urb;

nextStack->MajorFunction = IRP_MJ_INTERNAL_DEVICE_CONTROL;
nextStack->Parameters.DeviceIoControl.IoControlCode = IOCTL_INTERNAL_USB_SUBMIT_URB;
IoSetCompletionRoutine(irp, USBwinReadWrite_Complete, pCfContext, TRUE, TRUE, TRUE);

Status = IoCallDriver(pAdapter->USBExtension.TopOfStackDeviceObject, irp);

if (Status == OS_STATUS_PENDING)
{
IncrementIoCount(pAdapter);
}
return Status;
}

Thanks for your help,

  • Harshal

Loren Wilton wrote:

You don’t appear to be doing any locking on your queue management. Is it
possible to have a race condition with queuing a new request at the same
time you are walking the list here?

Loren,

Could be. I have now added spinlocks around every access to the
packet queues. For eg:

osAcquireSpinLock( &pDc->TxPacketQLock );
pHdr = (PTIWLAN_PACKET_HEADER) QueuePopHead(&pDc->PendingTxPacketQ);
osReleaseSpinLock( &pDc->TxPacketQLock );

I now get failed assertions like the one below (I am running a checked
build of ntoskrnl.exe and halacpi.dll):

*** Assertion failed: NextEntry->Owner == Entry
*** Source File: d:\xpsp1\base\ntos\mm\allocpag.c, line 2664
Break repeatedly, break Once, Ignore, terminate Process, or terminate Thread (boipt)? o
o
Execute ‘.cxr B2388894’ to dump context

kd> .cxr b2388894

[many lines of ‘loading symbols…’ skipped]

Couldn’t resolve error at ‘b2388894’

A second run ended with this failed assertion:

*** Assertion failed: FreePageInfo->Signature == MM_FREE_POOL_SIGNATURE
*** Source File: d:\xpsp1\base\ntos\mm\allocpag.c, line 1267

I guess the memory corruption still exists. But I am at a loss on what is
causing it.

Another run ended with a 0x19 (Bad_pool_header) bugcheck with the message
that ‘the pool freelist is corrupt.’ Again the verifier had no stats on
the memory usage:

kd> !verifier

Verify Level bb … enabled options are:
special pool
special irql
all pool allocations checked on unload
Io subsystem checking enabled
Deadlock detection enabled
DMA checking enabled

Summary of All Verifier Statistics

RaiseIrqls 0x0
AcquireSpinLocks 0x0
Synch Executions 0x0
Trims 0x0

Pool Allocations Attempted 0x0
Pool Allocations Succeeded 0x0
Pool Allocations Succeeded SpecialPool 0x0
Pool Allocations With NO TAG 0x0
Pool Allocations Failed 0x0
Resource Allocations Failed Deliberately 0x0

Current paged pool allocations 0x0 for 00000000 bytes
Peak paged pool allocations 0x0 for 00000000 bytes
Current nonpaged pool allocations 0x0 for 00000000 bytes
Peak nonpaged pool allocations 0x0 for 00000000 bytes

The bugcheck details also said:
The internal pool links must be walked to figure out a possible cause of
the problem, and then special pool applied to the suspect tags or the driver
verifier to a suspect driver.

I am already doing the latter and the ‘!pool’ command failed when I
used it with the value of the pool entry in the bugcheck.

At other times I get the 0xD1 bugcheck whose arguments include the
address which referenced pageable or invalid memory location. But
the address is typically in a system function somewhere and the
stack trace has no mention of my driver.

I am running out of things to do to find and fix the problem.

Thanks,

  • Harshal

> *** Assertion failed: NextEntry->Owner == Entry

*** Source File: d:\xpsp1\base\ntos\mm\allocpag.c, line 2664

If you missed some place where the queuing is happening that could still be
the cause.
However, considering this failed in MM with broken links, I think there are
three better possibilities here:

1 You have a small buffer that is allocated someplace, and it is
overrunning the end of the buffer, possibly with IO data. Or possibly by
walking off the end of the buffer.
2 You are doing something like returning the same buffer twice.
3 You are returning a buffer that is still in use by something or other.

*** Assertion failed: FreePageInfo->Signature == MM_FREE_POOL_SIGNATURE
*** Source File: d:\xpsp1\base\ntos\mm\allocpag.c, line 1267

We’re still in allocpage. I don’t have source here to see which routine we
are in, but the stack trace probably would have shown that. Likely failing
an allocation, so the fault was probably on the previous deallocation, or
buffer use after the deallocation.

Another run ended with a 0x19 (Bad_pool_header) bugcheck with the message
that ‘the pool freelist is corrupt.’ Again the verifier had no stats on

Sounds again like things are getting walked on after they are freed.

My best bet at the moment is that you are freeing something and then using
it after it is freed, which could mean after it has been allocated to
something else.

You could try zeroing all pointers after your free the things they are
pointing to, this *might* catch someone picking up the pointer later.
However, if people construct pointers into buffers and keep them around (for
instance, on the stack in some function waiting for an event) then zeroing
the original pointer won’t help.

If you know the sizes of the things you are freeing you could also try
zapping them with a recognizable pattern just before freeing them. If you
find one of these zapped things lying around with other data on it the data
pattern might tell you something useful.

I’m afraid this is likely to be a hard one to find, and will take code
inspection. I’m betting on using a freed allocation. So I’d start by
zotting the pointers to things when they are freed in the (probably vain)
hope that that might help. Then I’d look at any function that saves a
pointer to (or into) something that gets allocated and freed, and see if it
can end up waiting on something before it finishes using the pointer. Keep
in mind the wait can be in a subroutine. Once I found something holding a
pointer across a wait, I’d be real suspicious. I’d try to put some code in
the function to verify that the pointer is still valid after the wait. Or
if possible re-fetch the pointer after the wait.

There are some other things that can be done if the code inspection doesn’t
find it, but they start to get tedious to hard. I’d try the code inspection
method two or three times before moving on to the next step, in the hopes
that that would find at least one problem. (If you are having a bad day
this may be more than one problem!)

Loren

Loren,

Thank you very much for the very detailed and very helpful
answer below. It helped me track down the problem - a memory
allocation for a queue that grabbed way more memory than it
should have. The initial part of the queue was probably OK
which is why the driver seemed to work for a hundred packets
or so.

The problem was driving me crazy and I couldn’t find a way
to track it down (every crash had different a different
bugcheck code and seemed to occure in a different place).

Thanks to you and everyone else here (Mark, Doron, Maxim etc)
who suggested ideas to figure out the problem.

I still have some other problems with the driver but they
seem to be USB related for e.g. usbd returns USBD_STATUS_BABBLE_DETECTED
when transferring data at high rates.

Thanks again for your help.

  • Harshal

Loren Wilton wrote:

> *** Assertion failed: NextEntry->Owner == Entry
> *** Source File: d:\xpsp1\base\ntos\mm\allocpag.c, line 2664

If you missed some place where the queuing is happening that could still be
the cause.
However, considering this failed in MM with broken links, I think there are
three better possibilities here:

1 You have a small buffer that is allocated someplace, and it is
overrunning the end of the buffer, possibly with IO data. Or possibly by
walking off the end of the buffer.
2 You are doing something like returning the same buffer twice.
3 You are returning a buffer that is still in use by something or other.

> *** Assertion failed: FreePageInfo->Signature == MM_FREE_POOL_SIGNATURE
> *** Source File: d:\xpsp1\base\ntos\mm\allocpag.c, line 1267

We’re still in allocpage. I don’t have source here to see which routine we
are in, but the stack trace probably would have shown that. Likely failing
an allocation, so the fault was probably on the previous deallocation, or
buffer use after the deallocation.

> Another run ended with a 0x19 (Bad_pool_header) bugcheck with the message
> that ‘the pool freelist is corrupt.’ Again the verifier had no stats on

Sounds again like things are getting walked on after they are freed.

My best bet at the moment is that you are freeing something and then using
it after it is freed, which could mean after it has been allocated to
something else.

You could try zeroing all pointers after your free the things they are
pointing to, this *might* catch someone picking up the pointer later.
However, if people construct pointers into buffers and keep them around (for
instance, on the stack in some function waiting for an event) then zeroing
the original pointer won’t help.

If you know the sizes of the things you are freeing you could also try
zapping them with a recognizable pattern just before freeing them. If you
find one of these zapped things lying around with other data on it the data
pattern might tell you something useful.

I’m afraid this is likely to be a hard one to find, and will take code
inspection. I’m betting on using a freed allocation. So I’d start by
zotting the pointers to things when they are freed in the (probably vain)
hope that that might help. Then I’d look at any function that saves a
pointer to (or into) something that gets allocated and freed, and see if it
can end up waiting on something before it finishes using the pointer. Keep
in mind the wait can be in a subroutine. Once I found something holding a
pointer across a wait, I’d be real suspicious. I’d try to put some code in
the function to verify that the pointer is still valid after the wait. Or
if possible re-fetch the pointer after the wait.

There are some other things that can be done if the code inspection doesn’t
find it, but they start to get tedious to hard. I’d try the code inspection
method two or three times before moving on to the next step, in the hopes
that that would find at least one problem. (If you are having a bad day
this may be more than one problem!)

Loren

Loren Wilton wrote:

> *** Assertion failed: NextEntry->Owner == Entry
> *** Source File: d:\xpsp1\base\ntos\mm\allocpag.c, line 2664

If you missed some place where the queuing is happening that could still be
the cause.
However, considering this failed in MM with broken links, I think there are
three better possibilities here:

1 You have a small buffer that is allocated someplace, and it is
overrunning the end of the buffer, possibly with IO data. Or possibly by
walking off the end of the buffer.
2 You are doing something like returning the same buffer twice.
3 You are returning a buffer that is still in use by something or other.

> *** Assertion failed: FreePageInfo->Signature == MM_FREE_POOL_SIGNATURE
> *** Source File: d:\xpsp1\base\ntos\mm\allocpag.c, line 1267

We’re still in allocpage. I don’t have source here to see which routine we
are in, but the stack trace probably would have shown that. Likely failing
an allocation, so the fault was probably on the previous deallocation, or
buffer use after the deallocation.

> Another run ended with a 0x19 (Bad_pool_header) bugcheck with the message
> that ‘the pool freelist is corrupt.’ Again the verifier had no stats on

Sounds again like things are getting walked on after they are freed.

My best bet at the moment is that you are freeing something and then using
it after it is freed, which could mean after it has been allocated to
something else.

You could try zeroing all pointers after your free the things they are
pointing to, this *might* catch someone picking up the pointer later.
However, if people construct pointers into buffers and keep them around (for
instance, on the stack in some function waiting for an event) then zeroing
the original pointer won’t help.

If you know the sizes of the things you are freeing you could also try
zapping them with a recognizable pattern just before freeing them. If you
find one of these zapped things lying around with other data on it the data
pattern might tell you something useful.

I’m afraid this is likely to be a hard one to find, and will take code
inspection. I’m betting on using a freed allocation. So I’d start by
zotting the pointers to things when they are freed in the (probably vain)
hope that that might help. Then I’d look at any function that saves a
pointer to (or into) something that gets allocated and freed, and see if it
can end up waiting on something before it finishes using the pointer. Keep
in mind the wait can be in a subroutine. Once I found something holding a
pointer across a wait, I’d be real suspicious. I’d try to put some code in
the function to verify that the pointer is still valid after the wait. Or
if possible re-fetch the pointer after the wait.

There are some other things that can be done if the code inspection doesn’t
find it, but they start to get tedious to hard. I’d try the code inspection
method two or three times before moving on to the next step, in the hopes
that that would find at least one problem. (If you are having a bad day
this may be more than one problem!)

Loren


Questions? First check the Kernel Driver FAQ at http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@chhaya.org
To unsubscribe send a blank email to xxxxx@lists.osr.com


http://www.mumbai-central.com : Where Mumbaikars meet

Babble detected means your device is transmitted on the usb bus when the
host did not ask for it to transmit. That would indicate a hardware or
firmware error.

d

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Harshal Chhaya
Sent: Tuesday, January 04, 2005 2:34 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] How do I locate the code that caused this bugcheck?

Loren,

Thank you very much for the very detailed and very helpful
answer below. It helped me track down the problem - a memory
allocation for a queue that grabbed way more memory than it
should have. The initial part of the queue was probably OK
which is why the driver seemed to work for a hundred packets
or so.

The problem was driving me crazy and I couldn’t find a way
to track it down (every crash had different a different
bugcheck code and seemed to occure in a different place).

Thanks to you and everyone else here (Mark, Doron, Maxim etc)
who suggested ideas to figure out the problem.

I still have some other problems with the driver but they
seem to be USB related for e.g. usbd returns USBD_STATUS_BABBLE_DETECTED
when transferring data at high rates.

Thanks again for your help.

  • Harshal

Loren Wilton wrote:

> *** Assertion failed: NextEntry->Owner == Entry
> *** Source File: d:\xpsp1\base\ntos\mm\allocpag.c, line 2664

If you missed some place where the queuing is happening that could
still be
the cause.
However, considering this failed in MM with broken links, I think
there are
three better possibilities here:

1 You have a small buffer that is allocated someplace, and it is
overrunning the end of the buffer, possibly with IO data. Or possibly
by
walking off the end of the buffer.
2 You are doing something like returning the same buffer twice.
3 You are returning a buffer that is still in use by something or
other.

> *** Assertion failed: FreePageInfo->Signature ==
MM_FREE_POOL_SIGNATURE
> *** Source File: d:\xpsp1\base\ntos\mm\allocpag.c, line 1267

We’re still in allocpage. I don’t have source here to see which
routine we
are in, but the stack trace probably would have shown that. Likely
failing
an allocation, so the fault was probably on the previous deallocation,
or
buffer use after the deallocation.

> Another run ended with a 0x19 (Bad_pool_header) bugcheck with the
message
> that ‘the pool freelist is corrupt.’ Again the verifier had no stats
on

Sounds again like things are getting walked on after they are freed.

My best bet at the moment is that you are freeing something and then
using
it after it is freed, which could mean after it has been allocated to
something else.

You could try zeroing all pointers after your free the things they are
pointing to, this *might* catch someone picking up the pointer later.
However, if people construct pointers into buffers and keep them
around (for
instance, on the stack in some function waiting for an event) then
zeroing
the original pointer won’t help.

If you know the sizes of the things you are freeing you could also try
zapping them with a recognizable pattern just before freeing them. If
you
find one of these zapped things lying around with other data on it the
data
pattern might tell you something useful.

I’m afraid this is likely to be a hard one to find, and will take code
inspection. I’m betting on using a freed allocation. So I’d start by
zotting the pointers to things when they are freed in the (probably
vain)
hope that that might help. Then I’d look at any function that saves a
pointer to (or into) something that gets allocated and freed, and see
if it
can end up waiting on something before it finishes using the pointer.
Keep
in mind the wait can be in a subroutine. Once I found something
holding a
pointer across a wait, I’d be real suspicious. I’d try to put some
code in
the function to verify that the pointer is still valid after the wait.
Or
if possible re-fetch the pointer after the wait.

There are some other things that can be done if the code inspection
doesn’t
find it, but they start to get tedious to hard. I’d try the code
inspection
method two or three times before moving on to the next step, in the
hopes
that that would find at least one problem. (If you are having a bad
day
this may be more than one problem!)

Loren

Loren Wilton wrote:

> *** Assertion failed: NextEntry->Owner == Entry
> *** Source File: d:\xpsp1\base\ntos\mm\allocpag.c, line 2664

If you missed some place where the queuing is happening that could
still be
the cause.
However, considering this failed in MM with broken links, I think
there are
three better possibilities here:

1 You have a small buffer that is allocated someplace, and it is
overrunning the end of the buffer, possibly with IO data. Or possibly
by
walking off the end of the buffer.
2 You are doing something like returning the same buffer twice.
3 You are returning a buffer that is still in use by something or
other.

> *** Assertion failed: FreePageInfo->Signature ==
MM_FREE_POOL_SIGNATURE
> *** Source File: d:\xpsp1\base\ntos\mm\allocpag.c, line 1267

We’re still in allocpage. I don’t have source here to see which
routine we
are in, but the stack trace probably would have shown that. Likely
failing
an allocation, so the fault was probably on the previous deallocation,
or
buffer use after the deallocation.

> Another run ended with a 0x19 (Bad_pool_header) bugcheck with the
message
> that ‘the pool freelist is corrupt.’ Again the verifier had no stats
on

Sounds again like things are getting walked on after they are freed.

My best bet at the moment is that you are freeing something and then
using
it after it is freed, which could mean after it has been allocated to
something else.

You could try zeroing all pointers after your free the things they are
pointing to, this *might* catch someone picking up the pointer later.
However, if people construct pointers into buffers and keep them
around (for
instance, on the stack in some function waiting for an event) then
zeroing
the original pointer won’t help.

If you know the sizes of the things you are freeing you could also try
zapping them with a recognizable pattern just before freeing them. If
you
find one of these zapped things lying around with other data on it the
data
pattern might tell you something useful.

I’m afraid this is likely to be a hard one to find, and will take code
inspection. I’m betting on using a freed allocation. So I’d start by
zotting the pointers to things when they are freed in the (probably
vain)
hope that that might help. Then I’d look at any function that saves a
pointer to (or into) something that gets allocated and freed, and see
if it
can end up waiting on something before it finishes using the pointer.
Keep
in mind the wait can be in a subroutine. Once I found something
holding a
pointer across a wait, I’d be real suspicious. I’d try to put some
code in
the function to verify that the pointer is still valid after the wait.
Or
if possible re-fetch the pointer after the wait.

There are some other things that can be done if the code inspection
doesn’t
find it, but they start to get tedious to hard. I’d try the code
inspection
method two or three times before moving on to the next step, in the
hopes
that that would find at least one problem. (If you are having a bad
day
this may be more than one problem!)

Loren


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@chhaya.org
To unsubscribe send a blank email to xxxxx@lists.osr.com


http://www.mumbai-central.com : Where Mumbaikars meet


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@windows.microsoft.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Doron Holan wrote:

Babble detected means your device is transmitted on the usb bus when the
host did not ask for it to transmit. That would indicate a hardware or
firmware error.

Thanks! Now I know what to look for. I am off to find a USB protocol
analyzer that can tell me more.

Regards,

  • Harshal