Virtual Storport Miniport :: Bugcheck C2 - d

Hi,

We have developed a virtual storport miniport driver and one more driver
which exposes few IOCTLs and does IO from network. The storport miniport
driver issues IRPs to this driver and these IRPs get completed in an
IoCompletionRoutine set by the storport miniport driver. But if some IO is
run on the disk for some time, within minutes the system bugchecks with
bugcheck code 0xC2 with first parameter as D. Which means, “The current
thread attempted to release a quota on a corrupted pool allocation.”

The strange thing is that if we turn on Verifier for both of our drivers we
dont get this bugcheck at all. I am not able to find out what is going
wrong. I am giving the bugcheck analysis below. Please let me know if you
can find something or suggest something to try.

kd> !analyze -v

*******************************************************************************

*
*

* Bugcheck
Analysis *

*
*

*******************************************************************************

BAD_POOL_CALLER (c2)

The current thread is making a bad pool request. Typically this is at a bad
IRQL level or double freeing the same allocation, etc.

Arguments:

Arg1: 0000000d, type of pool violation the caller is guilty of.

Arg2: 81ee9238

Arg3: 20707249

Arg4: 01000000

Debugging Details:


FAULTING_IP:

nt!IopFreeIrp+e9

8081bab1 5f pop edi

BUGCHECK_STR: 0xc2_d

DEFAULT_BUCKET_ID: DRIVER_FAULT

PROCESS_NAME: svchost.exe

CURRENT_IRQL: 0

LAST_CONTROL_TRANSFER: from 80825b3b to 8086cf18

STACK_TEXT:

f79f07f4 80825b3b 00000003 0000000d 00000000
nt!RtlpBreakWithStatusInstruction

f79f0840 80826a2f 00000003 00000014 81ee9238 nt!KiBugCheckDebugBreak+0x19

f79f0bd8 80826dc7 000000c2 0000000d 81ee9238 nt!KeBugCheck2+0x5d1

f79f0bf8 8088c76c 000000c2 0000000d 81ee9238 nt!KeBugCheckEx+0x1b

f79f0c5c 8081bab1 81ee9238 00000000 81f45648 nt!ExFreePoolWithTag+0x436

f79f0c78 808f1718 81ee9238 81f45630 8259d900 nt!IopFreeIrp+0xe9

f79f0cac 8092c98c 81fe5d88 820d0030 0012019f nt!IopCloseFile+0x2ea

f79f0cdc 8092c28d 81fe5d88 00000001 8259d900 nt!ObpDecrementHandleCount+0xcc

f79f0d04 8092c326 e19a23e0 81f45648 00000b2c
nt!ObpCloseHandleTableEntry+0x131

f79f0d48 8092c443 00000b2c 00000001 f79f0d64 nt!ObpCloseHandle+0x82

f79f0d58 80883908 00000b2c 00f5fd4c 7c8285ec nt!NtClose+0x1b

f79f0d58 7c8285ec 00000b2c 00f5fd4c 7c8285ec nt!KiFastCallEntry+0xf8

00f5fd3c 7c826d2b 77e63eb3 00000b2c 00f5fd5c ntdll!KiFastSystemCallRet

00f5fd40 77e63eb3 00000b2c 00f5fd5c 77c6e708 ntdll!ZwClose+0xc

WARNING: Stack unwind information not available. Following frames may be
wrong.

00f5fd4c 77c6e708 00000b2c 0008cfdc 00f5fd70 kernel32!CloseHandle+0x44

00f5fd5c 77c6e65b 0008cfdc 00000000 0008cec0 RPCRT4!I_RpcLogEvent+0x3c9

00f5fd70 77c6de36 0008cfdc 00000000 0008cec0 RPCRT4!I_RpcLogEvent+0x31c

00f5fdac 77c6deb1 0008cec0 0008cec0 0008cec0
RPCRT4!NdrConformantStructUnmarshall+0x87c

00f5fdc0 77c6de66 000b4750 00f5fdf0 77c82750
RPCRT4!NdrConformantStructUnmarshall+0x8f7

00f5fdcc 77c82750 00000001 77c6e2c9 7c81a3ab
RPCRT4!NdrConformantStructUnmarshall+0x8ac

00f5fdf0 77c6ed3b 000006bf 00000000 000987d0 RPCRT4!RpcBindingFree+0x28c

00f5fe14 77c88965 00000000 80020000 000987d0 RPCRT4!I_RpcLogEvent+0x9fc

00f5fe28 77c88a77 000987d0 00000000 0064004c RPCRT4!I_RpcFree+0xda3

00f5ff84 77c88792 00f5ffac 77c8872d 000987d0 RPCRT4!I_RpcFree+0xeb5

00f5ff8c 77c8872d 000987d0 00000000 00000000 RPCRT4!I_RpcFree+0xbd0

00f5ffac 77c7b110 0008ada0 00f5ffec 77e64829 RPCRT4!I_RpcFree+0xb6b

00f5ffb8 77e64829 00098980 00000000 00000000
RPCRT4!NdrFullPointerInsertRefId+0x3ba

00f5ffec 00000000 77c7b0f5 00098980 00000000 kernel32!GetModuleHandleA+0xdf

STACK_COMMAND: kb

FOLLOWUP_IP:

nt!ExFreePoolWithTag+436

8088c76c ff75fc push dword ptr [ebp-4]

SYMBOL_STACK_INDEX: 4

SYMBOL_NAME: nt!ExFreePoolWithTag+436

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: nt

IMAGE_NAME: ntkrnlpa.exe

DEBUG_FLR_IMAGE_TIMESTAMP: 45d69710

FAILURE_BUCKET_ID: 0xc2_d_nt!ExFreePoolWithTag+436

BUCKET_ID: 0xc2_d_nt!ExFreePoolWithTag+436

Followup: MachineOwner


Thanks and Regards,

Charansing D Deore
Sr. Software Developer,
CalSoft Pvt Ltd.
Baner Road, Pune-411045
Office: +91 20 40792900 Ext: 3052
Cell: +91 9850960550

Hi,

I have done a bit more analysis on this bugcheck. But still could not go to
the root of it. If you guys can derive some conclusion out of this please
let me know.

BugCheck C2, {d, fe5d4c18, 20707249, 1000000}

Probably caused by : ntkrnlpa.exe ( nt!KeRegisterBugCheckReasonCallback+ecc
)

Followup: MachineOwner

nt!DbgBreakPointWithStatus+0x4:
8086cf18 cc int 3
kd> !analyze -v
BAD_POOL_CALLER (c2)

The current thread is making a bad pool request. Typically this is at a bad
IRQL level or double freeing the same allocation, etc.

Arguments:

Arg1: 0000000d, type of pool violation the caller is guilty of.

Arg2: fe5d4c18
Arg3: 20707249

Arg4: 01000000

Debugging Details:


MODULE_NAME: nt

FAULTING_MODULE: 80800000 nt

DEBUG_FLR_IMAGE_TIMESTAMP: 45d69710

FAULTING_IP:
nt!IoFreeIrp+fb
8081bab1 5f pop edi

BUGCHECK_STR: 0xc2_d

DEFAULT_BUCKET_ID: WRONG_SYMBOLS

LAST_CONTROL_TRANSFER: from 80826d76 to 8086cf18

STACK_TEXT:
WARNING: Stack unwind information not available. Following frames may be
wrong.

f9314894 80826d76 00000004 00000014 fe5d4c18 nt!DbgBreakPointWithStatus+0x4
f9314c2c 80826dc7 000000c2 0000000d fe5d4c18
nt!KeRegisterBugCheckReasonCallback+0xecc
f9314c4c 8088c76c 000000c2 0000000d fe5d4c18 nt!KeBugCheckEx+0x1b
f9314cb0 8081bab1 fe5d4c18 00000000 ffff0000 nt!ExFreePoolWithTag+0x436
f9314ccc 808e562d fe5d4c18 f9314d64 0099fecc nt!IoFreeIrp+0xfb
f9314d48 80883908 000002e8 0099ff04 0099fef4 nt!IoSetIoCompletion+0x30f
f9314d64 7c8285ec badb0d00 0099feb8 00000000
nt!KeReleaseInStackQueuedSpinLockFromDpcLevel+0xb14
f9314d68 badb0d00 0099feb8 00000000 00000000 ntdll!KiFastSystemCallRet
f9314d6c 0099feb8 00000000 00000000 00000000 0xbadb0d00
f9314d70 00000000 00000000 00000000 00000000 0x99feb8

STACK_COMMAND: kb

FOLLOWUP_IP:

nt!KeRegisterBugCheckReasonCallback+ecc
80826d76 8b4dfc mov ecx,dword ptr [ebp-4]

SYMBOL_STACK_INDEX: 1

SYMBOL_NAME: nt!KeRegisterBugCheckReasonCallback+ecc

FOLLOWUP_NAME: MachineOwner

IMAGE_NAME: ntkrnlpa.exe

BUCKET_ID: WRONG_SYMBOLS

Followup: MachineOwner

kd> !pool fe5d4c18
Pool page fe5d4c18 region is Unknown
fe5d4000 size: 270 previous size: 0 (Allocated) Thre (Protected)
fe5d4270 size: 10 previous size: 270 (Free) …~
fe5d4280 size: 28 previous size: 10 (Allocated) Ntfn
fe5d42a8 size: 40 previous size: 28 (Allocated) Ntfr
fe5d42e8 size: 28 previous size: 40 (Allocated) NtFs
fe5d4310 size: 20 previous size: 28 (Allocated) VadS
fe5d4330 size: e0 previous size: 20 (Allocated) WmiG (Protected)
fe5d4410 size: 110 previous size: e0 (Allocated) Ntfi
fe5d4520 size: 298 previous size: 110 (Allocated) Proc (Protected)
fe5d47b8 size: 40 previous size: 298 (Allocated) Ntfr
fe5d47f8 size: 30 previous size: 40 (Allocated) Vad
fe5d4828 size: 40 previous size: 30 (Allocated) CMpa Process:
fe315390
fe5d4868 size: 20 previous size: 40 (Allocated) CM44 Process:
fe315390
fe5d4888 size: 40 previous size: 20 (Allocated) Ntfr
fe5d48c8 size: 30 previous size: 40 (Allocated) Even (Protected)
fe5d48f8 size: 30 previous size: 30 (Allocated) Even (Protected)
fe5d4928 size: a0 previous size: 30 (Allocated) Irp Process: 1000000
fe5d49c8 size: 98 previous size: a0 (Allocated) File (Protected)
fe5d4a60 size: a0 previous size: 98 (Allocated) MmIn
fe5d4b00 size: 8 previous size: a0 (Free) Mmdl
fe5d4b08 size: 28 previous size: 8 (Allocated) NtFs
fe5d4b30 size: e0 previous size: 28 (Allocated) MmCi
*fe5d4c10 size: a0 previous size: e0 (Free ) *Irp Process: 1000000
Pooltag Irp : Io, IRP packets
fe5d4cb0 size: a0 previous size: a0 (Allocated) Irp
fe5d4d50 size: 10 previous size: a0 (Free) CcPL
fe5d4d60 size: 98 previous size: 10 (Allocated) File (Protected)
fe5d4df8 size: 28 previous size: 98 (Allocated) Ntfn
fe5d4e20 size: a0 previous size: 28 (Allocated) MmCi
fe5d4ec0 size: a0 previous size: a0 (Allocated) Irp
fe5d4f60 size: a0 previous size: a0 (Allocated) Irp Process: 1000000
kd> !irp fe5d4c10
IRP signature does not match, probably not an IRP
kd> !poolval fe5d4c10 3
Pool page fe5d4c10 region is Unknown

Validating Pool headers for pool page: fe5d4c10

Pool page [fe5d4000] is VALID.

kd> !process 1000000 7
Searching for Process with Cid == 1000000
kd> !process 0 0
**** NT ACTIVE PROCESS DUMP ****
PROCESS 81ba92a8 SessionId: none Cid: 0004 Peb: 00000000 ParentCid:
0000
DirBase: 0062a000 ObjectTable: e1000b78 HandleCount: 393.
Image: System

PROCESS fe5eb4e0 SessionId: none Cid: 010c Peb: 7ffd9000 ParentCid:
0004
DirBase: 06843020 ObjectTable: e1457058 HandleCount: 19.
Image: smss.exe

PROCESS faedcd88 SessionId: 0 Cid: 013c Peb: 7ffdf000 ParentCid: 010c
DirBase: 06843040 ObjectTable: e1548670 HandleCount: 413.
Image: csrss.exe

PROCESS fc9055a8 SessionId: 0 Cid: 0154 Peb: 7ffda000 ParentCid: 010c
DirBase: 06843060 ObjectTable: e13aa4c8 HandleCount: 532.
Image: winlogon.exe

PROCESS fcd39bf0 SessionId: 0 Cid: 0184 Peb: 7ffdf000 ParentCid: 0154
DirBase: 06843080 ObjectTable: e20fc0a0 HandleCount: 275.
Image: services.exe

PROCESS fcd6a500 SessionId: 0 Cid: 0190 Peb: 7ffdc000 ParentCid: 0154
DirBase: 068430a0 ObjectTable: e2132c78 HandleCount: 412.
Image: lsass.exe

PROCESS fe5fa020 SessionId: 0 Cid: 025c Peb: 7ffdb000 ParentCid: 0184
DirBase: 068430c0 ObjectTable: e1448138 HandleCount: 31.
Image: vmacthlp.exe

PROCESS fd9e4020 SessionId: 0 Cid: 027c Peb: 7ffdb000 ParentCid: 0184
DirBase: 068430e0 ObjectTable: e214bf50 HandleCount: 93.
Image: svchost.exe

PROCESS fdac8200 SessionId: 0 Cid: 02b4 Peb: 7ffd7000 ParentCid: 0184
DirBase: 06843100 ObjectTable: e13e0400 HandleCount: 210.
Image: svchost.exe

PROCESS fe5d1148 SessionId: 0 Cid: 02ec Peb: 7ffdc000 ParentCid: 0184
DirBase: 06843120 ObjectTable: e2168100 HandleCount: 133.
Image: svchost.exe

PROCESS fb073528 SessionId: 0 Cid: 0304 Peb: 7ffd8000 ParentCid: 0184
DirBase: 06843140 ObjectTable: e2184790 HandleCount: 184.
Image: svchost.exe

PROCESS fe315390 SessionId: 0 Cid: 0320 Peb: 7ffd5000 ParentCid: 0184
DirBase: 06843160 ObjectTable: e214ded0 HandleCount: 1028.
Image: svchost.exe

PROCESS feea8628 SessionId: 0 Cid: 03bc Peb: 7ffde000 ParentCid: 0184
DirBase: 06843180 ObjectTable: e144f868 HandleCount: 136.
Image: spoolsv.exe

PROCESS fed11bf0 SessionId: 0 Cid: 03ec Peb: 7ffda000 ParentCid: 0184
DirBase: 068431a0 ObjectTable: e144f818 HandleCount: 152.
Image: msdtc.exe

PROCESS fe093d88 SessionId: 0 Cid: 044c Peb: 7ffdc000 ParentCid: 0184
DirBase: 068431c0 ObjectTable: e225bcc0 HandleCount: 68.
Image: svchost.exe

PROCESS fe83c538 SessionId: 0 Cid: 0474 Peb: 7ffd4000 ParentCid: 0184
DirBase: 068431e0 ObjectTable: e220b860 HandleCount: 39.
Image: svchost.exe

PROCESS fc45f020 SessionId: 0 Cid: 04a8 Peb: 7ffdf000 ParentCid: 0184
DirBase: 06843220 ObjectTable: e221ee40 HandleCount: 59.
Image: VMwareService.e

PROCESS fb126268 SessionId: 0 Cid: 05a4 Peb: 7ffd9000 ParentCid: 0184
DirBase: 06843240 ObjectTable: e228de88 HandleCount: 79.
Image: dfssvc.exe

PROCESS fe5f8ae0 SessionId: 0 Cid: 05ec Peb: 7ffde000 ParentCid: 0184
DirBase: 06843260 ObjectTable: e22585f8 HandleCount: 142.
Image: svchost.exe

PROCESS fb0a4388 SessionId: 0 Cid: 0658 Peb: 7ffd8000 ParentCid: 0184
DirBase: 06843200 ObjectTable: e23215d0 HandleCount: 93.
Image: alg.exe

PROCESS fe5f9718 SessionId: 0 Cid: 06c0 Peb: 7ffd6000 ParentCid: 027c
DirBase: 06843280 ObjectTable: e14701e0 HandleCount: 181.
Image: wmiprvse.exe

PROCESS fd4cc990 SessionId: 0 Cid: 07b8 Peb: 7ffdf000 ParentCid: 07a8
DirBase: 068432c0 ObjectTable: e1704ce8 HandleCount: 265.
Image: explorer.exe

PROCESS fe103958 SessionId: 0 Cid: 0080 Peb: 7ffd5000 ParentCid: 07b8
DirBase: 068432e0 ObjectTable: e23d8fb8 HandleCount: 42.
Image: VMwareTray.exe

PROCESS fb434940 SessionId: 0 Cid: 00a0 Peb: 7ffdf000 ParentCid: 07b8
DirBase: 06843300 ObjectTable: e24182c0 HandleCount: 54.
Image: VMwareUser.exe

PROCESS fe5d4540 SessionId: 0 Cid: 0104 Peb: 7ffd7000 ParentCid: 0320
DirBase: 06843320 ObjectTable: e23d19d0 HandleCount: 122.
Image: wuauclt.exe

kd> !address fe5d4c10
fa434000 - 057ac000
Usage KernelSpaceUsageNonPagedPoolExpansion

kd> !irp fe5d4c10
IRP signature does not match, probably not an IRP

Thanks and Regards,

Charansing D Deore
Sr. Software Developer,
CalSoft Pvt Ltd.
Baner Road, Pune-411045
Office: +91 20 40792900 Ext: 3052
Cell: +91 9850960550

“or double freeing the same allocation, etc.”

As in the IRP object in question has already been completed and
recycled for other purposes.

kd> !irp fe5d4c10
IRP signature does not match, probably not an IRP

Isn’t the IRP at fe5d4c18? doing a !irp on the pool header isn’t going to work.

BUCKET_ID: WRONG_SYMBOLS
Do you have the correct symbols loaded?

Mark Roddy

2009/3/18 Charansing Rajput :
> or double freeing the same allocation, etc.

Hi Charansing,

Pool blocks that charged quota have a pointer to the process that allocated them, immediately after the end of the data stored inside the pool block. In your case, the pointer to the process was 0x1000000. That can never be the address of a process object, because 0x1000000 is a user-mode address. So the end of that pool block is corrupted.

Next step for the corruption below is to run ?dc fe5d4c10 l28? to take a look at the contents of that IRP, and try to figure out how the bogus pointer 0x1000000 ended up there.

Note that you have below a couple more IRPs that were corrupted in the same way ? one at fe5d4928+8 and another at fe5d4f60+8. So you can use ?!irp fe5d4928+8? and ?!irp fe5d4f60+8? to take a look at these other corrupted IRPs.

If you really need to force ?!irp fe5d4c10+8? to work too, you can display what?s left of that freed IRP using ?dt nt!_IRP fe5d4c10+8? and then write the value 6 (IO_TYPE_IRP) in the Type field of the freed IRP. Then I expect that !irp will start working for this freed IRP.

Verifier is allocating IRPs in a different way. For example, it might try to allocate them from Special Pool, and doesn?t use a similar Quota pointer at the end of the pool block. That?s why the bugcheck 0xC2 doesn?t happen with Verifier enabled. I am still surprised that Verifier didn?t expose this problem in a different way though. So after you figure out the corruption, it would be great if you can explain the nature of the corruption to me. That way maybe we can find a way to expose that corruption better with Verifier, in the future. BTW, what Windows version are you using?

fe5d4888 size: 40 previous size: 20 (Allocated) Ntfr
fe5d48c8 size: 30 previous size: 40 (Allocated) Even (Protected)
fe5d48f8 size: 30 previous size: 30 (Allocated) Even (Protected)
fe5d4928 size: a0 previous size: 30 (Allocated) Irp Process: 1000000
fe5d49c8 size: 98 previous size: a0 (Allocated) File (Protected)
fe5d4a60 size: a0 previous size: 98 (Allocated) MmIn
fe5d4b00 size: 8 previous size: a0 (Free) Mmdl
fe5d4b08 size: 28 previous size: 8 (Allocated) NtFs
fe5d4b30 size: e0 previous size: 28 (Allocated) MmCi
*fe5d4c10 size: a0 previous size: e0 (Free ) *Irp Process: 1000000
Pooltag Irp : Io, IRP packets
fe5d4cb0 size: a0 previous size: a0 (Allocated) Irp
fe5d4d50 size: 10 previous size: a0 (Free) CcPL
fe5d4d60 size: 98 previous size: 10 (Allocated) File (Protected)
fe5d4df8 size: 28 previous size: 98 (Allocated) Ntfn
fe5d4e20 size: a0 previous size: 28 (Allocated) MmCi
fe5d4ec0 size: a0 previous size: a0 (Allocated) Irp
fe5d4f60 size: a0 previous size: a0 (Allocated) Irp Process: 1000000

Dan

Looks like the Forum web page ended up with ? characters every time I used a quote character, so please replace the ? characters when you read what I said.

Dan

Your symbols have to be incorrect.

  1. As Mark already mentioned, BUCKET_ID: WRONG_SYMBOLS.

  2. This call sequence makes no sense, at least as far as I can tell:

f9314c2c 80826dc7 000000c2 0000000d fe5d4c18 nt!KeRegisterBugCheckReasonCallback+0xecc
> f9314c4c 8088c76c 000000c2 0000000d fe5d4c18 nt!KeBugCheckEx+0x1b

Assuming that you’re symbols were correct in your first post, KeRegisterBugCheckReasonCallback+0xECC
== KeRegisterBugCheck2+0x5d1.

  1. This looks like it might be part of trap frame: 0xbadb0d00. Have you perhaps trashed the stack
    somehow?

Have your tried:

.sympath srv*<local_path>*http://msdl.microsoft.com/download/symbols
.symopt+0x80000400 (SYMOPT_DEBUG | SYMOPT_EXACT_SYMBOLS)
.reload -f -n
lme

Good luck,

mm

Charansing Rajput wrote:
> Hi,
>
> I have done a bit more analysis on this bugcheck. But still could not go
> to the root of it. If you guys can derive some conclusion out of this
> please let me know.
>
>
> BugCheck C2, {d, fe5d4c18, 20707249, 1000000}
>
> Probably caused by : ntkrnlpa.exe (
> nt!KeRegisterBugCheckReasonCallback+ecc )
>
> Followup: MachineOwner
> ---------
> nt!DbgBreakPointWithStatus+0x4:
> 8086cf18 cc int 3
> kd> !analyze -v
> BAD_POOL_CALLER (c2)
>
> The current thread is making a bad pool request. Typically this is at a
> bad IRQL level or double freeing the same allocation, etc.
>
> Arguments:
>
> Arg1: 0000000d, type of pool violation the caller is guilty of.
>
> Arg2: fe5d4c18
> Arg3: 20707249
>
> Arg4: 01000000
>
>
>
> Debugging Details:
>
> ------------------
>
> MODULE_NAME: nt
>
> FAULTING_MODULE: 80800000 nt
>
>
> DEBUG_FLR_IMAGE_TIMESTAMP: 45d69710
>
>
>
> FAULTING_IP:
> nt!IoFreeIrp+fb
> 8081bab1 5f pop edi
>
>
>
> BUGCHECK_STR: 0xc2_d
>
>
>
> DEFAULT_BUCKET_ID: WRONG_SYMBOLS
>
> LAST_CONTROL_TRANSFER: from 80826d76 to 8086cf18
>
> STACK_TEXT:
> WARNING: Stack unwind information not available. Following frames may be
> wrong.
>
> f9314894 80826d76 00000004 00000014 fe5d4c18 nt!DbgBreakPointWithStatus+0x4
> f9314c2c 80826dc7 000000c2 0000000d fe5d4c18
> nt!KeRegisterBugCheckReasonCallback+0xecc
> f9314c4c 8088c76c 000000c2 0000000d fe5d4c18 nt!KeBugCheckEx+0x1b
> f9314cb0 8081bab1 fe5d4c18 00000000 ffff0000 nt!ExFreePoolWithTag+0x436
> f9314ccc 808e562d fe5d4c18 f9314d64 0099fecc nt!IoFreeIrp+0xfb
> f9314d48 80883908 000002e8 0099ff04 0099fef4 nt!IoSetIoCompletion+0x30f
> f9314d64 7c8285ec badb0d00 0099feb8 00000000
> nt!KeReleaseInStackQueuedSpinLockFromDpcLevel+0xb14
> f9314d68 badb0d00 0099feb8 00000000 00000000 ntdll!KiFastSystemCallRet
> f9314d6c 0099feb8 00000000 00000000 00000000 0xbadb0d00
> f9314d70 00000000 00000000 00000000 00000000 0x99feb8
>
>
>
>
> STACK_COMMAND: kb
>
>
>
> FOLLOWUP_IP:
>
> nt!KeRegisterBugCheckReasonCallback+ecc
> 80826d76 8b4dfc mov ecx,dword ptr [ebp-4]
>
> SYMBOL_STACK_INDEX: 1
>
> SYMBOL_NAME: nt!KeRegisterBugCheckReasonCallback+ecc
>
> FOLLOWUP_NAME: MachineOwner
>
> IMAGE_NAME: ntkrnlpa.exe
>
> BUCKET_ID: WRONG_SYMBOLS
>
> Followup: MachineOwner
> ---------
>
>
> kd> !pool fe5d4c18
> Pool page fe5d4c18 region is Unknown
> fe5d4000 size: 270 previous size: 0 (Allocated) Thre (Protected)
> fe5d4270 size: 10 previous size: 270 (Free) …~
> fe5d4280 size: 28 previous size: 10 (Allocated) Ntfn
> fe5d42a8 size: 40 previous size: 28 (Allocated) Ntfr
> fe5d42e8 size: 28 previous size: 40 (Allocated) NtFs
> fe5d4310 size: 20 previous size: 28 (Allocated) VadS
> fe5d4330 size: e0 previous size: 20 (Allocated) WmiG (Protected)
> fe5d4410 size: 110 previous size: e0 (Allocated) Ntfi
> fe5d4520 size: 298 previous size: 110 (Allocated) Proc (Protected)
> fe5d47b8 size: 40 previous size: 298 (Allocated) Ntfr
> fe5d47f8 size: 30 previous size: 40 (Allocated) Vad
> fe5d4828 size: 40 previous size: 30 (Allocated) CMpa Process:
> fe315390
> fe5d4868 size: 20 previous size: 40 (Allocated) CM44 Process:
> fe315390
> fe5d4888 size: 40 previous size: 20 (Allocated) Ntfr
> fe5d48c8 size: 30 previous size: 40 (Allocated) Even (Protected)
> fe5d48f8 size: 30 previous size: 30 (Allocated) Even (Protected)
> fe5d4928 size: a0 previous size: 30 (Allocated) Irp Process: 1000000
> fe5d49c8 size: 98 previous size: a0 (Allocated) File (Protected)
> fe5d4a60 size: a0 previous size: 98 (Allocated) MmIn
> fe5d4b00 size: 8 previous size: a0 (Free) Mmdl
> fe5d4b08 size: 28 previous size: 8 (Allocated) NtFs
> fe5d4b30 size: e0 previous size: 28 (Allocated) MmCi
> *fe5d4c10 size: a0 previous size: e0 (Free ) *Irp Process: 1000000
> Pooltag Irp : Io, IRP packets
> fe5d4cb0 size: a0 previous size: a0 (Allocated) Irp
> fe5d4d50 size: 10 previous size: a0 (Free) CcPL
> fe5d4d60 size: 98 previous size: 10 (Allocated) File (Protected)
> fe5d4df8 size: 28 previous size: 98 (Allocated) Ntfn
> fe5d4e20 size: a0 previous size: 28 (Allocated) MmCi
> fe5d4ec0 size: a0 previous size: a0 (Allocated) Irp
> fe5d4f60 size: a0 previous size: a0 (Allocated) Irp Process: 1000000
> kd> !irp fe5d4c10
> IRP signature does not match, probably not an IRP
> kd> !poolval fe5d4c10 3
> Pool page fe5d4c10 region is Unknown
>
> Validating Pool headers for pool page: fe5d4c10
>
> Pool page [fe5d4000] is VALID.
>
> kd> !process 1000000 7
> Searching for Process with Cid == 1000000
> kd> !process 0 0
> NT ACTIVE PROCESS DUMP
> PROCESS 81ba92a8 SessionId: none Cid: 0004 Peb: 00000000
> ParentCid: 0000
> DirBase: 0062a000 ObjectTable: e1000b78 HandleCount: 393.
> Image: System
>
> PROCESS fe5eb4e0 SessionId: none Cid: 010c Peb: 7ffd9000
> ParentCid: 0004
> DirBase: 06843020 ObjectTable: e1457058 HandleCount: 19.
> Image: smss.exe
>
> PROCESS faedcd88 SessionId: 0 Cid: 013c Peb: 7ffdf000 ParentCid: 010c
> DirBase: 06843040 ObjectTable: e1548670 HandleCount: 413.
> Image: csrss.exe
>
> PROCESS fc9055a8 SessionId: 0 Cid: 0154 Peb: 7ffda000 ParentCid: 010c
> DirBase: 06843060 ObjectTable: e13aa4c8 HandleCount: 532.
> Image: winlogon.exe
>
> PROCESS fcd39bf0 SessionId: 0 Cid: 0184 Peb: 7ffdf000 ParentCid: 0154
> DirBase: 06843080 ObjectTable: e20fc0a0 HandleCount: 275.
> Image: services.exe
>
> PROCESS fcd6a500 SessionId: 0 Cid: 0190 Peb: 7ffdc000 ParentCid: 0154
> DirBase: 068430a0 ObjectTable: e2132c78 HandleCount: 412.
> Image: lsass.exe
>
> PROCESS fe5fa020 SessionId: 0 Cid: 025c Peb: 7ffdb000 ParentCid: 0184
> DirBase: 068430c0 ObjectTable: e1448138 HandleCount: 31.
> Image: vmacthlp.exe
>
> PROCESS fd9e4020 SessionId: 0 Cid: 027c Peb: 7ffdb000 ParentCid: 0184
> DirBase: 068430e0 ObjectTable: e214bf50 HandleCount: 93.
> Image: svchost.exe
>
> PROCESS fdac8200 SessionId: 0 Cid: 02b4 Peb: 7ffd7000 ParentCid: 0184
> DirBase: 06843100 ObjectTable: e13e0400 HandleCount: 210.
> Image: svchost.exe
>
> PROCESS fe5d1148 SessionId: 0 Cid: 02ec Peb: 7ffdc000 ParentCid: 0184
> DirBase: 06843120 ObjectTable: e2168100 HandleCount: 133.
> Image: svchost.exe
>
> PROCESS fb073528 SessionId: 0 Cid: 0304 Peb: 7ffd8000 ParentCid: 0184
> DirBase: 06843140 ObjectTable: e2184790 HandleCount: 184.
> Image: svchost.exe
>
> PROCESS fe315390 SessionId: 0 Cid: 0320 Peb: 7ffd5000 ParentCid: 0184
> DirBase: 06843160 ObjectTable: e214ded0 HandleCount: 1028.
> Image: svchost.exe
>
> PROCESS feea8628 SessionId: 0 Cid: 03bc Peb: 7ffde000 ParentCid: 0184
> DirBase: 06843180 ObjectTable: e144f868 HandleCount: 136.
> Image: spoolsv.exe
>
> PROCESS fed11bf0 SessionId: 0 Cid: 03ec Peb: 7ffda000 ParentCid: 0184
> DirBase: 068431a0 ObjectTable: e144f818 HandleCount: 152.
> Image: msdtc.exe
>
> PROCESS fe093d88 SessionId: 0 Cid: 044c Peb: 7ffdc000 ParentCid: 0184
> DirBase: 068431c0 ObjectTable: e225bcc0 HandleCount: 68.
> Image: svchost.exe
>
> PROCESS fe83c538 SessionId: 0 Cid: 0474 Peb: 7ffd4000 ParentCid: 0184
> DirBase: 068431e0 ObjectTable: e220b860 HandleCount: 39.
> Image: svchost.exe
>
> PROCESS fc45f020 SessionId: 0 Cid: 04a8 Peb: 7ffdf000 ParentCid: 0184
> DirBase: 06843220 ObjectTable: e221ee40 HandleCount: 59.
> Image: VMwareService.e
>
> PROCESS fb126268 SessionId: 0 Cid: 05a4 Peb: 7ffd9000 ParentCid: 0184
> DirBase: 06843240 ObjectTable: e228de88 HandleCount: 79.
> Image: dfssvc.exe
>
> PROCESS fe5f8ae0 SessionId: 0 Cid: 05ec Peb: 7ffde000 ParentCid: 0184
> DirBase: 06843260 ObjectTable: e22585f8 HandleCount: 142.
> Image: svchost.exe
>
> PROCESS fb0a4388 SessionId: 0 Cid: 0658 Peb: 7ffd8000 ParentCid: 0184
> DirBase: 06843200 ObjectTable: e23215d0 HandleCount: 93.
> Image: alg.exe
>
> PROCESS fe5f9718 SessionId: 0 Cid: 06c0 Peb: 7ffd6000 ParentCid: 027c
> DirBase: 06843280 ObjectTable: e14701e0 HandleCount: 181.
> Image: wmiprvse.exe
>
> PROCESS fd4cc990 SessionId: 0 Cid: 07b8 Peb: 7ffdf000 ParentCid: 07a8
> DirBase: 068432c0 ObjectTable: e1704ce8 HandleCount: 265.
> Image: explorer.exe
>
> PROCESS fe103958 SessionId: 0 Cid: 0080 Peb: 7ffd5000 ParentCid: 07b8
> DirBase: 068432e0 ObjectTable: e23d8fb8 HandleCount: 42.
> Image: VMwareTray.exe
>
> PROCESS fb434940 SessionId: 0 Cid: 00a0 Peb: 7ffdf000 ParentCid: 07b8
> DirBase: 06843300 ObjectTable: e24182c0 HandleCount: 54.
> Image: VMwareUser.exe
>
> PROCESS fe5d4540 SessionId: 0 Cid: 0104 Peb: 7ffd7000 ParentCid: 0320
> DirBase: 06843320 ObjectTable: e23d19d0 HandleCount: 122.
> Image: wuauclt.exe
>
> kd> !address fe5d4c10
> fa434000 - 057ac000
> Usage KernelSpaceUsageNonPagedPoolExpansion
>
> kd> !irp fe5d4c10
> IRP signature does not match, probably not an IRP
>
> Thanks and Regards,
> ----------------------------------------------------------------
> Charansing D Deore
> Sr. Software Developer,
> CalSoft Pvt Ltd.
> Baner Road, Pune-411045
> Office: +91 20 40792900 Ext: 3052
> Cell: +91 9850960550</local_path>

Hi guys,

I am a colleague of Charansing and am working with him on this project. We are investigating this bug together. After few investigation we have come to a conclusion that the IRP which we form, for TCP communication using TDI API call “TdiBuildInternalDeviceControlIrp” is getting corrupted somewhere. This IRP is freed by the IO manager during which we were getting the above bug check.

Now i made some changes to code and the bug has changed from C2 to 7E. Following is the abstract of code where i am building the IRP to be sent and queuing it.

pMemVars = (PMEM_C)ExAllocatePoolWithTag(NonPagedPool,sizeof(MEM_C),‘memp’);

pTdiDeviceObj = IoGetRelatedDeviceObject(pConnFileObj);

pMemVars->pStatusBlk = pStatusBlock = (PIO_STATUS_BLOCK)ExAllocatePoolWithTag(NonPagedPool,sizeof(IO_STATUS_BLOCK),‘stbk’);

RtlZeroMemory(pStatusBlock,sizeof(IO_STATUS_BLOCK));

pSendIrp = TdiBuildInternalDeviceControlIrp(TDI_SEND,
pTdiDeviceObj,
pConnFileObj,
&pConnCntx->SendQueue.IoComplete,
pStatusBlock);

if(!pSendIrp)
{
DbgPrint(“IRP Allocation Failed!”);
return STATUS_INSUFFICIENT_RESOURCES;
}

if(pCmdCntx->CommandCode == NBD_CMD_WRITE)
{
pMemVars->pPacket = pPacketData = ExAllocatePoolWithTag(PagedPool,PacketLength + pCmdCntx->Length,‘dnes’);
PacketLength += pCmdCntx->Length;
}
else
{
pMemVars->pPacket = pPacketData = ExAllocatePoolWithTag(PagedPool,PacketLength,‘dnes’);
}

//check if packet data was allocated properly or not

if(!pPacketData)
{
DbgPrint(“Packet Data Allocation Failed!”);
return STATUS_INSUFFICIENT_RESOURCES;
}

pSendMdl = IoAllocateMdl(pPacketData, PacketLength, NULL, FALSE, pSendIrp);

if(!pSendMdl)
{
NBDProtErrDbgPrint(“MDL Allocation Failed!”);
return STATUS_INSUFFICIENT_RESOURCES;
}

__try
{
MmProbeAndLockPages(pSendMdl, KernelMode, IoModifyAccess);
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
IoFreeMdl(pSendMdl);
pSendMdl = NULL;
}

PNBD_REQ_PACKET reqpacket = (PNBD_REQ_PACKET)pPacketData;

pCmdCntx->ExpectedSequenceNumber = reqpacket->Handle = pConnCntx->CurrentSequenceNumber;
pConnCntx->CurrentSequenceNumber++;

reqpacket->Length = Length;
reqpacket->offset = Offset;
reqpacket->magic = NBD_REQUEST_MAGIC;
reqpacket->OperationType = Command;

if(!pData)
{
DbgPrint(“Data Buffer Received is Empty!”);
return STATUS_INSUFFICIENT_RESOURCES;
}

if (NBD_CMD_WRITE == Command)
{
RtlCopyMemory((PVOID)((char*)pPacketData+sizeof(NBD_REQ_PACKET)),
(PVOID)((char*)pData+sizeof(SCSI_REQ)),
SwapEndianness(Length));
}

pCmdCntx->pPacketBuffer = pPacketData;

TdiBuildSend(pSendIrp, pTdiDeviceObj, pConnFileObj, NULL, NULL,
pSendMdl, 0, PacketLength);

if(NULL == pCmdCntx)
{
DbgPrint(“Null context!!!”);
NBDPROTASSERT(0);
}

KeAcquireSpinLock(&pConnCntx->SendLock,&OldIrql);
IoCsqInsertIrp(&pConnCntx->SendQueue.CancelSafeQueue,pSendIrp,NULL);
ExInterlockedInsertTailList(&pConnCntx->MemoryFreeList,pMemVars,&pConnCntx->MemoryLock);
ExInterlockedInsertTailList(&pConnCntx->ContextList,pCmdCntx,&pConnCntx->ContextLock);
KeSetEvent(&pConnCntx->SendQueue.NBDQueueEvent,0,FALSE);
KeReleaseSpinLock(&pConnCntx->SendLock,OldIrql);

The following thread removes the IRPs from the queue and sends it to TCP

do
{
ServiceIrp = IoCsqRemoveNextIrp(&pConnCntx->SendQueue.CancelSafeQueue,NULL);

if(ServiceIrp != NULL)
{

ntStatus = IoCallDriver(NbdDeviceObj, ServiceIrp);

if(STATUS_PENDING ==(ntStatus))
{
KeWaitForSingleObject(&pConnCntx->SendQueue.IoComplete, Executive, KernelMode, FALSE,NULL);

if((pMemToFree = (PMEM_C)ExInterlockedRemoveHeadList(&pConnCntx->MemoryFreeList,&pConnCntx->MemoryLock))!=NULL)
{
ExFreePoolWithTag(pMemToFree->pPacket,‘dnes’);
ExFreePoolWithTag(pMemToFree->pStatusBlk,‘stbk’);
ExFreePoolWithTag(pMemToFree,‘memp’);
}
}

}
}while(ServiceIrp != NULL);

The following is the analysis of the bugcheck that i receive. The analysis points to a part in code, but I dont think it has much relevance. The bug only occurs when verifier is off. With verifier on everything work fine. And it seems to appear for windows 2003 only. The driver works flawlessly for windows XP.

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e)
This is a very common bugcheck. Usually the exception address pinpoints
the driver/function that caused the problem. Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: 8088c753, The address that the exception occurred at
Arg3: f8a93b50, Exception Record Address
Arg4: f8a9384c, Context Record Address

Debugging Details:

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at “0x%08lx” referenced memory at “0x%08lx”. The memory could not be “%s”.

FAULTING_IP:
nt!ExFreePoolWithTag+41d
8088c753 803803 cmp byte ptr [eax],3

EXCEPTION_RECORD: f8a93b50 – (.exr 0xfffffffff8a93b50)
ExceptionAddress: 8088c753 (nt!ExFreePoolWithTag+0x0000041d)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 00000000
Parameter[1]: 01000000
Attempt to read from address 01000000

CONTEXT: f8a9384c – (.cxr 0xfffffffff8a9384c)
eax=01000000 ebx=808a7bc0 ecx=000317b0 edx=00000e0e esi=ffbcc798 edi=00000014
eip=8088c753 esp=f8a93c18 ebp=f8a93c60 iopl=0 nv up ei pl nz na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010206
nt!ExFreePoolWithTag+0x41d:
8088c753 803803 cmp byte ptr [eax],3 ds:0023:01000000=??
Resetting default scope

DEFAULT_BUCKET_ID: DRIVER_FAULT

PROCESS_NAME: System

CURRENT_IRQL: 1

ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at “0x%08lx” referenced memory at “0x%08lx”. The memory could not be “%s”.

READ_ADDRESS: 01000000

BUGCHECK_STR: 0x7E

IRP_ADDRESS: ffbcc7a0

LAST_CONTROL_TRANSFER: from 8081bab1 to 8088c753

STACK_TEXT:
f8a93c60 8081bab1 ffbcc7a0 00000000 00000000 nt!ExFreePoolWithTag+0x41d
f8a93c7c 80820ef3 ffbcc7a0 ffbcc7e0 817f45e8 nt!IopFreeIrp+0xe9
f8a93ccc 8082c34e ffbcc7e0 f8a93d18 f8a93d0c nt!IopCompleteRequest+0x3db
f8a93d1c 8082ffd3 00000000 00000000 00000000 nt!KiDeliverApc+0xb8
f8a93d40 808287b4 00000000 817f45e8 00000000 nt!KiSwapThread+0x9f
f8a93d84 f921085d ffaffcf0 00000000 00000000 nt!KeWaitForSingleObject+0x2e0
f8a93dac 809418f4 ffaffccc 00000000 00000000 DISK!RunThread+0x1d [l:\vdi\client\windows\nbddisk\DISK\DISK.cpp @ 161]
f8a93ddc 80887f4a f9210840 ffaffccc 00000000 nt!PspSystemThreadStartup+0x2e
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

FOLLOWUP_IP:
STOR_NBD!RunThread+1d [l:\vdi\client\windows\nbddisk\DISK\DISK.cpp @ 161]
f921085d 8b5508 mov edx,dword ptr [ebp+8]

FAULTING_SOURCE_CODE:
157: KeWaitForSingleObject(&AdaVx->RunThreadEvent, Executive, KernelMode, FALSE, NULL);
158:
159: SRB_VX *SrbVx;
160:

161: while((SrbVx = (SRB_VX *)ExInterlockedRemoveHeadList(&AdaVx->RunQueue, &AdaVx->RunQueueLock)) != NULL)
162: StartSRB(AdaVx, SrbVx->LunVx, SrbVx->Srb);
163:
164: }while(!AdaVx->RunThreadExit);
165:
166: /*------------------------------------*/

SYMBOL_STACK_INDEX: 6

SYMBOL_NAME: DISK!RunThread+1d

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: DISK

IMAGE_NAME: DISK.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 49c75c98

STACK_COMMAND: .cxr 0xfffffffff8a9384c ; kb

FAILURE_BUCKET_ID: 0x7E_DISK!RunThread+1d

BUCKET_ID: 0x7E_DISK!RunThread+1d

Followup: MachineOwner

Hi Imtiaz,

As you said, your IRPs are still getting corrupted, regardless if that corruption shows up as bugcheck C2 or 7E. My advice is the same one I mentioned last week. Here it is again, hopefully this time without the confusing ? characters:

Pool blocks that charged quota have a pointer to the process that allocated
them, immediately after the end of the data stored inside the pool block. In
your case, the pointer to the process was 0x1000000. That can never be the
address of a process object, because 0x1000000 is a user-mode address. So the
end of that pool block is corrupted.

Next step for the corruption below is to run ‘dc fe5d4c10 l28’ to take a look at
the contents of that IRP, and try to figure out how the bogus pointer 0x1000000
ended up there.

Note that you have below a couple more IRPs that were corrupted in the same way,
one at fe5d4928+8 and another at fe5d4f60+8. So you can use ‘!irp fe5d4928+8’
and ‘!irp fe5d4f60+8’ to take a look at these other corrupted IRPs.

If you really need to force ‘!irp fe5d4c10+8’ to work too, you can display
what’s left of that freed IRP using ‘dt nt!_IRP fe5d4c10+8’ and then write the
value 6 (IO_TYPE_IRP) in the Type field of the freed IRP. Then I expect that
!irp will start working for this freed IRP.

Verifier is allocating IRPs in a different way. For example, Verifier might try to
allocate them from Special Pool, and doesn’t use a similar Quota pointer at the
end of the pool block. That’s why the bugcheck 0xC2 doesn’t happen with Verifier
enabled. I am still surprised that Verifier didn’t expose this problem in a
different way though. So after you figure out the corruption, it would be great
if you can explain the nature of the corruption to me. That way maybe we can
find a way to expose that corruption better with Verifier, in the future. BTW,
what Windows version are you using?

fe5d4888 size: 40 previous size: 20 (Allocated) Ntfr
fe5d48c8 size: 30 previous size: 40 (Allocated) Even (Protected)
fe5d48f8 size: 30 previous size: 30 (Allocated) Even (Protected)
fe5d4928 size: a0 previous size: 30 (Allocated) Irp Process: 1000000
fe5d49c8 size: 98 previous size: a0 (Allocated) File (Protected)
fe5d4a60 size: a0 previous size: 98 (Allocated) MmIn
fe5d4b00 size: 8 previous size: a0 (Free) Mmdl
fe5d4b08 size: 28 previous size: 8 (Allocated) NtFs
fe5d4b30 size: e0 previous size: 28 (Allocated) MmCi
*fe5d4c10 size: a0 previous size: e0 (Free ) *Irp Process: 1000000
Pooltag Irp : Io, IRP packets
fe5d4cb0 size: a0 previous size: a0 (Allocated) Irp
fe5d4d50 size: 10 previous size: a0 (Free) CcPL
fe5d4d60 size: 98 previous size: 10 (Allocated) File (Protected)
fe5d4df8 size: 28 previous size: 98 (Allocated) Ntfn
fe5d4e20 size: a0 previous size: 28 (Allocated) MmCi
fe5d4ec0 size: a0 previous size: a0 (Allocated) Irp
fe5d4f60 size: a0 previous size: a0 (Allocated) Irp Process: 1000000

Dan

Iocsq’s are for IRPs that will be canceled. This is an irp you created yourself, no one will be canceling it so there is no need to go with the complexity of a csq with a cancel routine. A simple LIST_ENTRY queue heahd and using Tail.Overlay.ListEntry to enqueue the irp on it will be good enough.

Your problem is that the IoCSQ marks the irp as pending with a call to IoMarkIrpPending. IoMarkIrpPending uses the currents tack location

FORCEINLINE
VOID
IoMarkIrpPending(
__inout PIRP Irp
)
/*++
Routine Description:

This routine marks the specified I/O Request Packet (IRP) to indicate
that an initial status of STATUS_PENDING was returned to the caller.
This is used so that I/O completion can determine whether or not to
fully complete the I/O operation requested by the packet.

Arguments:

Irp - Pointer to the I/O Request Packet to be marked pending.

Return Value:

None.

–*/
{
IoGetCurrentIrpStackLocation( (Irp) )->Control |= SL_PENDING_RETURNED;
}

For self allocated IRPs, there is *NO CURRENT STACK LOCATION* (it is invalid). Stack locations are allocated after the IRP itself and grow backwards so the current stack location is one past the end of the IRP allocation. This means that you are touching memory that does not belong to you that is immediate after the irp.

Remove the csq from the equation and you are good to go until your next bug shows up :slight_smile:

d

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@yahoo.com
Sent: Tuesday, March 24, 2009 12:15 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Virtual Storport Miniport :: Bugcheck C2 - d

Hi guys,

I am a colleague of Charansing and am working with him on this project. We are investigating this bug together. After few investigation we have come to a conclusion that the IRP which we form, for TCP communication using TDI API call “TdiBuildInternalDeviceControlIrp” is getting corrupted somewhere. This IRP is freed by the IO manager during which we were getting the above bug check.

Now i made some changes to code and the bug has changed from C2 to 7E. Following is the abstract of code where i am building the IRP to be sent and queuing it.

pMemVars = (PMEM_C)ExAllocatePoolWithTag(NonPagedPool,sizeof(MEM_C),‘memp’);

pTdiDeviceObj = IoGetRelatedDeviceObject(pConnFileObj);

pMemVars->pStatusBlk = pStatusBlock = (PIO_STATUS_BLOCK)ExAllocatePoolWithTag(NonPagedPool,sizeof(IO_STATUS_BLOCK),‘stbk’);

RtlZeroMemory(pStatusBlock,sizeof(IO_STATUS_BLOCK));

pSendIrp = TdiBuildInternalDeviceControlIrp(TDI_SEND,
pTdiDeviceObj,
pConnFileObj,
&pConnCntx->SendQueue.IoComplete,
pStatusBlock);

if(!pSendIrp)
{
DbgPrint(“IRP Allocation Failed!”);
return STATUS_INSUFFICIENT_RESOURCES;
}

if(pCmdCntx->CommandCode == NBD_CMD_WRITE)
{
pMemVars->pPacket = pPacketData = ExAllocatePoolWithTag(PagedPool,PacketLength + pCmdCntx->Length,‘dnes’);
PacketLength += pCmdCntx->Length;
}
else
{
pMemVars->pPacket = pPacketData = ExAllocatePoolWithTag(PagedPool,PacketLength,‘dnes’);
}

//check if packet data was allocated properly or not

if(!pPacketData)
{
DbgPrint(“Packet Data Allocation Failed!”);
return STATUS_INSUFFICIENT_RESOURCES;
}

pSendMdl = IoAllocateMdl(pPacketData, PacketLength, NULL, FALSE, pSendIrp);

if(!pSendMdl)
{
NBDProtErrDbgPrint(“MDL Allocation Failed!”);
return STATUS_INSUFFICIENT_RESOURCES;
}

__try
{
MmProbeAndLockPages(pSendMdl, KernelMode, IoModifyAccess);
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
IoFreeMdl(pSendMdl);
pSendMdl = NULL;
}

PNBD_REQ_PACKET reqpacket = (PNBD_REQ_PACKET)pPacketData;

pCmdCntx->ExpectedSequenceNumber = reqpacket->Handle = pConnCntx->CurrentSequenceNumber;
pConnCntx->CurrentSequenceNumber++;

reqpacket->Length = Length;
reqpacket->offset = Offset;
reqpacket->magic = NBD_REQUEST_MAGIC;
reqpacket->OperationType = Command;

if(!pData)
{
DbgPrint(“Data Buffer Received is Empty!”);
return STATUS_INSUFFICIENT_RESOURCES;
}

if (NBD_CMD_WRITE == Command)
{
RtlCopyMemory((PVOID)((char*)pPacketData+sizeof(NBD_REQ_PACKET)),
(PVOID)((char*)pData+sizeof(SCSI_REQ)),
SwapEndianness(Length));
}

pCmdCntx->pPacketBuffer = pPacketData;

TdiBuildSend(pSendIrp, pTdiDeviceObj, pConnFileObj, NULL, NULL,
pSendMdl, 0, PacketLength);

if(NULL == pCmdCntx)
{
DbgPrint(“Null context!!!”);
NBDPROTASSERT(0);
}

KeAcquireSpinLock(&pConnCntx->SendLock,&OldIrql);
IoCsqInsertIrp(&pConnCntx->SendQueue.CancelSafeQueue,pSendIrp,NULL);
ExInterlockedInsertTailList(&pConnCntx->MemoryFreeList,pMemVars,&pConnCntx->MemoryLock);
ExInterlockedInsertTailList(&pConnCntx->ContextList,pCmdCntx,&pConnCntx->ContextLock);
KeSetEvent(&pConnCntx->SendQueue.NBDQueueEvent,0,FALSE);
KeReleaseSpinLock(&pConnCntx->SendLock,OldIrql);

The following thread removes the IRPs from the queue and sends it to TCP

do
{
ServiceIrp = IoCsqRemoveNextIrp(&pConnCntx->SendQueue.CancelSafeQueue,NULL);

if(ServiceIrp != NULL)
{

ntStatus = IoCallDriver(NbdDeviceObj, ServiceIrp);

if(STATUS_PENDING ==(ntStatus))
{
KeWaitForSingleObject(&pConnCntx->SendQueue.IoComplete, Executive, KernelMode, FALSE,NULL);

if((pMemToFree = (PMEM_C)ExInterlockedRemoveHeadList(&pConnCntx->MemoryFreeList,&pConnCntx->MemoryLock))!=NULL)
{
ExFreePoolWithTag(pMemToFree->pPacket,‘dnes’);
ExFreePoolWithTag(pMemToFree->pStatusBlk,‘stbk’);
ExFreePoolWithTag(pMemToFree,‘memp’);
}
}

}
}while(ServiceIrp != NULL);

The following is the analysis of the bugcheck that i receive. The analysis points to a part in code, but I dont think it has much relevance. The bug only occurs when verifier is off. With verifier on everything work fine. And it seems to appear for windows 2003 only. The driver works flawlessly for windows XP.

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e)
This is a very common bugcheck. Usually the exception address pinpoints
the driver/function that caused the problem. Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: 8088c753, The address that the exception occurred at
Arg3: f8a93b50, Exception Record Address
Arg4: f8a9384c, Context Record Address

Debugging Details:

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at “0x%08lx” referenced memory at “0x%08lx”. The memory could not be “%s”.

FAULTING_IP:
nt!ExFreePoolWithTag+41d
8088c753 803803 cmp byte ptr [eax],3

EXCEPTION_RECORD: f8a93b50 – (.exr 0xfffffffff8a93b50)
ExceptionAddress: 8088c753 (nt!ExFreePoolWithTag+0x0000041d)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 00000000
Parameter[1]: 01000000
Attempt to read from address 01000000

CONTEXT: f8a9384c – (.cxr 0xfffffffff8a9384c)
eax=01000000 ebx=808a7bc0 ecx=000317b0 edx=00000e0e esi=ffbcc798 edi=00000014
eip=8088c753 esp=f8a93c18 ebp=f8a93c60 iopl=0 nv up ei pl nz na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010206
nt!ExFreePoolWithTag+0x41d:
8088c753 803803 cmp byte ptr [eax],3 ds:0023:01000000=??
Resetting default scope

DEFAULT_BUCKET_ID: DRIVER_FAULT

PROCESS_NAME: System

CURRENT_IRQL: 1

ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at “0x%08lx” referenced memory at “0x%08lx”. The memory could not be “%s”.

READ_ADDRESS: 01000000

BUGCHECK_STR: 0x7E

IRP_ADDRESS: ffbcc7a0

LAST_CONTROL_TRANSFER: from 8081bab1 to 8088c753

STACK_TEXT:
f8a93c60 8081bab1 ffbcc7a0 00000000 00000000 nt!ExFreePoolWithTag+0x41d
f8a93c7c 80820ef3 ffbcc7a0 ffbcc7e0 817f45e8 nt!IopFreeIrp+0xe9
f8a93ccc 8082c34e ffbcc7e0 f8a93d18 f8a93d0c nt!IopCompleteRequest+0x3db
f8a93d1c 8082ffd3 00000000 00000000 00000000 nt!KiDeliverApc+0xb8
f8a93d40 808287b4 00000000 817f45e8 00000000 nt!KiSwapThread+0x9f
f8a93d84 f921085d ffaffcf0 00000000 00000000 nt!KeWaitForSingleObject+0x2e0
f8a93dac 809418f4 ffaffccc 00000000 00000000 DISK!RunThread+0x1d [l:\vdi\client\windows\nbddisk\DISK\DISK.cpp @ 161]
f8a93ddc 80887f4a f9210840 ffaffccc 00000000 nt!PspSystemThreadStartup+0x2e
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

FOLLOWUP_IP:
STOR_NBD!RunThread+1d [l:\vdi\client\windows\nbddisk\DISK\DISK.cpp @ 161]
f921085d 8b5508 mov edx,dword ptr [ebp+8]

FAULTING_SOURCE_CODE:
157: KeWaitForSingleObject(&AdaVx->RunThreadEvent, Executive, KernelMode, FALSE, NULL);
158:
159: SRB_VX *SrbVx;
160:

161: while((SrbVx = (SRB_VX *)ExInterlockedRemoveHeadList(&AdaVx->RunQueue, &AdaVx->RunQueueLock)) != NULL)
162: StartSRB(AdaVx, SrbVx->LunVx, SrbVx->Srb);
163:
164: }while(!AdaVx->RunThreadExit);
165:
166: /*------------------------------------*/

SYMBOL_STACK_INDEX: 6

SYMBOL_NAME: DISK!RunThread+1d

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: DISK

IMAGE_NAME: DISK.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 49c75c98

STACK_COMMAND: .cxr 0xfffffffff8a9384c ; kb

FAILURE_BUCKET_ID: 0x7E_DISK!RunThread+1d

BUCKET_ID: 0x7E_DISK!RunThread+1d

Followup: MachineOwner


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Hi Doron,

Ur awesome man ! That was precisely the problem. Actually I had implemented the cancel safe queue framework as a part of an earlier design which was meant to queue IRPs coming from the SCSI driver. After a while I ended up queuing my own IRPs too with it. I had completely forgotten about the framework marking the IRPs pending. Ne ways all thanks to you , the driver works just fine now.

Also there are a few questions troubling me though.

  1. Why didnt the system bug check on windows XP without verifier ? (PS : the drivers were stress tested for about 16 hours without verifier on windows XP and with no problems) The problem was only encountered with windows 2003.
  2. Even if the verifier does allocation in a different way than the normal one, shouldnt it have detected the corruption as a part of overrun or underrun checks ?? Suprisingly the systems worked flawlessly with verifier on.
    Dan I hope this clears out the nature of the problem to you and we can expect verifier to catch such bugs in future. :slight_smile:

Ne ways thanx a lot all of you guys. You guys have been really great.

Thanx again Doron for making our life bit less buggy !! :smiley:

  1. b/c there was not sufficient verifier infrastructure to reliably catch the mistake. You were probably corrupting another irp and were getting away with it 99.99% of the time. Trust me, if you tested on xp long enough you would have blown up in some weird fashion

  2. verifier is not all powerful. On some OS releases it does not hook in the irp allocation path so there is no way to do the under/overrun checking.

What it boils down to is that verifier evolves over OS releases, what was not done on a previous release may now be checked on the current one (like getting chk behavior for io remove locks even though you built a fre version of your driver).

d

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@yahoo.com
Sent: Monday, March 30, 2009 12:33 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Virtual Storport Miniport :: Bugcheck C2 - d

Hi Doron,

Ur awesome man ! That was precisely the problem. Actually I had implemented the cancel safe queue framework as a part of an earlier design which was meant to queue IRPs coming from the SCSI driver. After a while I ended up queuing my own IRPs too with it. I had completely forgotten about the framework marking the IRPs pending. Ne ways all thanks to you , the driver works just fine now.

Also there are a few questions troubling me though.

  1. Why didnt the system bug check on windows XP without verifier ? (PS : the drivers were stress tested for about 16 hours without verifier on windows XP and with no problems) The problem was only encountered with windows 2003.
  2. Even if the verifier does allocation in a different way than the normal one, shouldnt it have detected the corruption as a part of overrun or underrun checks ?? Suprisingly the systems worked flawlessly with verifier on.
    Dan I hope this clears out the nature of the problem to you and we can expect verifier to catch such bugs in future. :slight_smile:

Ne ways thanx a lot all of you guys. You guys have been really great.

Thanx again Doron for making our life bit less buggy !! :smiley:


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

In addition to what Doron said, even the latest Verifier version (i.e. the Win7 version) would have masked this buffer overrun. :frowning: I have just added to our TODO list an item to try to expose this kind of driver problem in the future - maybe at the time when the driver makes the incorrect IoCsqInsertIrp call.

I believe Verifier masked this buffer overrun because of a combination of all of these factors:

  1. Special Pool was enabled, so Verifier used Special Pool to allocate IRPs.

  2. This driver was using an odd number of stack locations.

  3. sizeof (IO_STACK_LOCATION) = 0x24 is not a multiple of the pool block alignment on this system (0x8).

  4. Special pool is using a byte value that is always an odd number for filling the memory space up to the pool alignment.

  5. The value of SL_PENDING_RETURNED is 0x1.

If we make changes to any of these factors, Verifier will have a better chance to catch this buffer overrun.

Obviously we cannot change #3 or #5.

Changing #1 is usually not a good idea.

#2 depends on the luck of a particular driver. Your driver was unlucky, because it didn’t use an even number of stack locations.

#4 is typically useful, because our experience has been that overrunning memory buffers by writing even values is more common than writing an even number over the end.

So we will look into catching this problem without changing any of these 5 conditions - e.g. by starting to verify calls to IoCsqInsertIrp.

Thanks for bringing this up!

As an example, here’s an IRP with just 1 stack location, allocated from Special Pool:

1: kd> dc 89374F68-20 [DMIHAI-DEV1 2:08 PM]
89374f48 19191919 19191919 19191919 19191919 …
89374f58 19191919 19191919 19191919 19191919 …
89374f68 00940006 00000000 00060000 00000000 …
89374f78 94814f78 86740ab4 00000000 00000000 xO…t…
89374f88 02010000 80000000 a615fbbc a615fbac …
89374f98 00000000 00000000 00000000 00000000 …
89374fa8 00000000 00000000 00000000 00000000 …
89374fb8 86740868 00000000 00000000 00000000 h.t…
1: kd> dc
89374fc8 89374ffc 00000000 00000000 00000000 .O7…
89374fd8 0000000f 00000000 00000000 12341234 …4.4.
89374fe8 00000000 00000000 00000000 00000000 …
89374ff8 00000000 19191919 ??? ??? …???
89375008 ??? ??? ??? ??? ???
89375018 ??? ??? ??? ??? ???
89375028 ??? ??? ??? ??? ???
89375038 ??? ??? ??? ??? ???

Marking this IRP pending will just set a bit to 0x1 inside the DWORD at address 89374ffc. Since that bit is already set to 1 by Special Pool, that will not corrupt anything, and we will miss the driver bug. However, a lucky driver using 2 stack locations (or another even number) will step over the next memory page and crash immediately, taking advantage of the Special Pool that way.

Dan