Given the state of sanity you reached, I thought my “guess” would allow
you yet another path to travel during your search. I am actually not
too familiar with this particular area, but…
I recall that some time ago, people were corrupting ACLs by directly
modifying them. The problem was, some ACLs are hard-coded and saved in
only one location because they are used quite often in the kernel.
Direct modification (rather than using appropriate APIs) resulted in
strange results.
Wish I could find the posts this referred to. Anyways, worth
checking if you modify your device’s ACL at any point, as the crash dump
is coming from “nt!RtlpInheritAcl+0x28”, which certainly sounds like
someone’s corrupting the ACL. This won’t be caught by driver verifier
enabled on your device driver, since the memory is allocated by the
system for the device object and related bits. You could always try to
enable verifier for *everything* as follows:
verifier.exe /flags 0xFB /all
Note that this will seriously slooooowwww down the system, so it’s
probably only useful when you’ve got an “impossible” issue like this one
to deal with.
Good luck, and know that we’ve all been there at some point…
.
P.S. – This type of bug is commonly referred to as a “Heisenbug”. See
http://www.jargon.net/jargonfile/h/heisenbug.html
-----Original Message-----
From: Ralph Shnelvar [mailto:xxxxx@dos32.com]
Sent: Monday, February 07, 2005 2:52 PM
Subject: RE: IRQL = 0xFF mystery
Loren and all:
On Sun, 6 Feb 2005 04:59:46 -0800, you wrote:
Having irql = high_level
Please clear this up for me. I thought IRQLs were limited to 0-31.
See, for instance, the MS white paper:
http://www.microsoft.com/whdc/driver/kernel/IRQL.mspx
is what you would expect once you are into a crash situation. It
probably was something more reasonable while driver entry and add
device were running. It doubtless got bumped as the result of the
crash.
Just from what you describe, I’d bet a jelly donut that you are
allocating an incorrect size for your device extension (or possibly a
driver
extension?)
I have searched high and low and have been unable to determine how one
sets the size of the driver extension. I see
IoAllocateDriverObjectExtension but that seems to be unrelated to
PDRIVER_OBJECT->PDRIVER_EXTENSION.
The DDK says in the comments to PDRIVER_EXTENSION “Note: any new shared
fields get added here”. David Craig tells me that this is likely to be
a comment for the benefit of MS programmers rather than us members of
the mere hoi polloi.
and are walking off the end of it during add device.
Anything and everything is possible.
But I have turned on special pools. Wouldn’t that capture that sort of
thing?
For
instance, do you have a table of device pointers in the driver
extension that you construct during add device for each device? If so,
what are you using for an index as you build that entry, and do you
check it against the number of slots you allocated?
I am modifying someone else’s code. I put ASSERTs in there to check for
that sort of thing. I put magic numbers into various data structures to
validate that the thing I am getting back via IoGetDriverObjectExtension
is the same block that I think I’m putting in via
IoAllocateDriverObjectExtension.
Or you have a device extension and you are loading data past the size
you allocated. Maybe you allocated space for a unicode string in
characters rather than bytes?
I’ve been staring at this code for almost 4 weeks now. At this point if
you asked me how many finger I had on my right hand, I’d say something
like “I think it’s five. I count five. I pretty sure it’s five but I
wouldn’t swear to it.”
Similarly, I’m pretty sure that all unicode string lengths are in bytes
rather than chars.
I have determined from MS documentation that the when passing the number
of fingers in a hand to an MS kernel routine, that “fingers” is measured
in units of knuckles.
After further debugging, I am now pretty sure that the bug is happening
before my AddDevice routine is called.
I’m pretty sure of it because of trace information that I’m leaving
around in memory and I can see just what is happening when I get a crash
dump.
Unfortunately, sometimes I don’t get a crash dump and the machine merely
hangs.
For the curious and those interested in self flagellation, appended,
below, is a stack trace of the current crash.
What is of interest is that the crash happens about 25% of the time
inside of ExAllocatePoolWithTag with exactly the same error code (c5).
Of course, when the debugger is enabled in boot.ini (i.e.
/debugport=1394
/channel=33) everything works perfectly. No crashes. I don’t even have
to have WinDbg running; all I need is to have the debugger enabled.
If I step through the code so that I replicate the stack, below, I can
see that the IRQL is 0.
To repeat from a previous post: This bug only shows up on XP Home when
the debugger is disabled. It does not show up in XP PRO SP2 under any
circumstance.
Loren
I just got a crash dump that shows the following
Args to Child
badb0d00 85627020 000000f5 nt!KiTrap0E+0x238 (FPO: [0,0] TrapFrame @
f78d65c4)
00000001 00000000 63416553 nt!ExAllocatePoolWithTag+0x863 (FPO:
[Non-Fpo])
e10038ac 00000000 00000000 nt!RtlpInheritAcl+0x28 (FPO: [Non-Fpo])
e1003898 e16820b0 f78d67e8 nt!RtlpNewSecurityObject+0x485 (FPO:
[Non-Fpo])
e1003898 00000000 f78d67e8 nt!SeAssignSecurity+0x4f (FPO: [Non-Fpo])
f78d6810 e1003898 849e7690 nt!ObAssignSecurity+0x35 (FPO: [Non-Fpo])
849e7690 f78d6810 00000000 nt!ObInsertObject+0x47d (FPO: [Non-Fpo])
f78d69c0 8067f15b 00220020 nt!IoCreateDriver+0x199 (FPO: [Non-Fpo])
84a7fdc0 00000003 f78d6a00 nt!VfDriverAttachFilter+0x2d (FPO: [Non-Fpo])
84a7fdc0 84b82cc0 edd5f9f4 nt!VfDevObjPreAddDevice+0x33 (FPO: [Non-Fpo])
edd5f9f4 00000004 00000001 nt!PpvUtilCallAddDevice+0x29 (FPO: [Non-Fpo])
00000000 02000001 00000000 nt!PipCallDriverAddDevice+0x3b9 (FPO:
[Non-Fpo])
84a801b0 00000001 00000000 nt!PipProcessDevNodeTree+0x1a4 (FPO:
[Non-Fpo])
00000003 805625c0 8056b4fc nt!PiRestartDevice+0x80 (FPO: [Non-Fpo])
00000000 00000000 855f7b30 nt!PipDeviceActionWorker+0x168 (FPO:
[Non-Fpo])
00000000 00000000 00000000 nt!ExpWorkerThread+0xef (FPO: [Non-Fpo])
804e2912 00000001 00000000 nt!PspSystemThreadStartup+0x34 (FPO:
[Non-Fpo])
00000000 00000000 00000000 nt!KiThreadStartup+0x16
************************************************************************
*******
* *
* Bugcheck Analysis *
* *
************************************************************************
*******
DRIVER_CORRUPTED_EXPOOL (c5)
An attempt was made to access a pageable (or completely invalid) address
at an interrupt request level (IRQL) that is too high. This is caused
by drivers that have corrupted the system pool. Run the driver verifier
against any new (or suspect) drivers, and if that doesn’t turn up the
culprit, then use gflags to enable special pool.
Arguments:
Arg1: e10f5000, memory referenced
Arg2: 000000ff, IRQL
Arg3: 00000001, value 0 = read operation, 1 = write operation
Arg4: 805524d5, address which referenced memory
Debugging Details:
DEFAULT_BUCKET_ID: DRIVER_FAULT
BUGCHECK_STR: 0xC5
LAST_CONTROL_TRANSFER: from 80575b64 to 805524d5
STACK_TEXT:
f78d668c 80575b64 00000001 00000000 63416553
nt!ExAllocatePoolWithTag+0x863
f78d66b4 805757e1 e10038ac 00000000 00000000 nt!RtlpInheritAcl+0x28
f78d6790 80575a6c e1003898 e16820b0 f78d67e8
nt!RtlpNewSecurityObject+0x485 f78d67bc 80575df0 e1003898 00000000
f78d67e8 nt!SeAssignSecurity+0x4f f78d67ec 80575f8c f78d6810 e1003898
849e7690 nt!ObAssignSecurity+0x35
f78d68d4 805b39b5 849e7690 f78d6810 00000000 nt!ObInsertObject+0x47d
f78d69b0 8067f1d3 f78d69c0 8067f15b 00220020 nt!IoCreateDriver+0x199
f78d69cc 8067d760 84a7fdc0 00000003 f78d6a00
nt!VfDriverAttachFilter+0x2d f78d69dc 80521b3b 84a7fdc0 84b82cc0
edd5f9f4 nt!VfDevObjPreAddDevice+0x33 f78d6a00 805a43b3 edd5f9f4
00000004 00000001 nt!PpvUtilCallAddDevice+0x29
f78d6ac8 805a0129 00000000 02000001 00000000
nt!PipCallDriverAddDevice+0x3b9
f78d6d24 806269d9 84a801b0 00000001 00000000
nt!PipProcessDevNodeTree+0x1a4
f78d6d54 8050cce3 00000003 805625c0 8056b4fc nt!PiRestartDevice+0x80
f78d6d7c 804e29d6 00000000 00000000 855f7b30
nt!PipDeviceActionWorker+0x168 f78d6dac 80576b24 00000000 00000000
00000000 nt!ExpWorkerThread+0xef f78d6ddc 804eed86 804e2912 00000001
00000000 nt!PspSystemThreadStartup+0x34 00000000 00000000 00000000
00000000 00000000 nt!KiThreadStartup+0x16
FOLLOWUP_IP:
nt!KiTrap0E+238
804e106f f7457000000200 test dword ptr [ebp+0x70],0x20000
SYMBOL_STACK_INDEX: 0
FOLLOWUP_NAME: MachineOwner
SYMBOL_NAME: nt!KiTrap0E+238