Re: w7/ 64 crash in my driver, can?t see the stack

> While stress-testing my 64 bits driver on an 8 core w7 I got a BSOD and am

not able to see the offending stack. Also I don?t understand why Windbg
seems to enter a sort of 32 bits x86 mode (maybe result of some sort of
corruption?)
My Driver is an Ndis IM driver derived from the NDIS 6 MUX sample.

I see no sign that it has entered 32-bit mode; the instruction, it is
true, is referencing ecx instead of rcx, but it doesn’t much matter;
rcx/ecx is 0, meaning a NULL pointer, which most likely represents a bug
in your code.

I would suggest sprinkling a few ASSERT(p != NULL) for various variables
replacing the p, in critical places in your code where you assume that a
pointer is non-NULL, and wait for one to jump out at you.

The message suggests that a 32-bit user mode program is calling you. Did
you do the suggested .load, and did it change the outcome in any way? And
I mean BEFORE you do the !analyze -v, not AFTER.

Q1) Is there some way to find the offending stack
Q2) What is the prompt ?16.0: kd>? telling? (I?m use to one number for
current processor only)

See my WinDbg output:

The context is partially valid. Only x86 user-mode context is available.
The wow64exts extension must be loaded to access 32-bit state.
.load wow64exts will do this if you haven’t loaded it already.
*******************************************************************************
*
*
* Bugcheck Analysis
*
*
*
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck D1, {6, 2, 0, fffff8800428b120}

*** ERROR: Module load completed but symbols could not be loaded for
.sys
> Probably caused by : Unknown_Image ( +f120 )
>
> Followup: MachineOwner
> ---------
>
> 16.0: kd:x86> < === Fixed symols for MyDriver === >
> 16.0: kd:x86> !analyze -v
> *****
>
>
> * Bugcheck Analysis
>
>
>
>

>
> DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
> An attempt was made to access a pageable (or completely invalid) address
> at an
> interrupt request level (IRQL) that is too high. This is usually
> caused by drivers using improper addresses.
> If kernel debugger is available get stack backtrace.
> Arguments:
> Arg1: 0000000000000006, memory referenced
> Arg2: 0000000000000002, IRQL
> Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
> Arg4: fffff8800428b120, address which referenced memory

You have an access check at DISPATCH_LEVEL (IRQL == 2). But it is clearly
a NULL-pointer dereference.

The address 0000000000000006 is a dead giveaway that you passed someone a
NULL pointer.
>
> Debugging Details:
> ------------------
>
>
> READ_ADDRESS: 0000000000000006
>
> CURRENT_IRQL: 0

Note that this is in conflict with the previous message about IRQL level,
but it doesn;t matter. An access through a NULL pointer is always an
error, even at PASSIVE_LEVEL.

>
> FAULTING_IP:
> !memcmp+30
> [d:\winmain\minkernel\crts\crtw32\string\amd64\memcmp.asm @ 99]
> fffff8800428b120 8a01 mov al,byte ptr [ecx]<br><br>You are doing a memcmp against a NULL pointer<br><br>&gt;<br>&gt; DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT<br>&gt;<br>&gt; BUGCHECK_STR: 0xD1<br>&gt;<br>&gt; LAST_CONTROL_TRANSFER: from 0000000000000000 to 0000000000000000<br>&gt;<br>&gt; STACK_TEXT:<br>&gt; 00000000 00000000 00000000 00000000 00000000 0x0<br>&gt;<br>&gt;<br>&gt; STACK_COMMAND: .bugcheck ; kb<br>&gt;<br>&gt; FOLLOWUP_IP:<br>&gt; <mydriver>!memcmp+30<br>&gt; [d:\winmain\minkernel\crts\crtw32\string\amd64\memcmp.asm @ 99]<br>&gt; fffff8800428b120 8a01 mov al,byte ptr [ecx]
>
> SYMBOL_NAME: !memcmp+30
>
> FOLLOWUP_NAME: MachineOwner
>
> IMAGE_NAME: Unknown_Image
>
> DEBUG_FLR_IMAGE_TIMESTAMP: 0
>
> BUCKET_ID: INVALID_KERNEL_CONTEXT
>
> MODULE_NAME: Unknown_Module
>
> Followup: MachineOwner
> ---------
> 16.0: kd:x86> kv
> ChildEBP RetAddr Args to Child
> WARNING: Frame IP not in any known module. Following frames may be wrong.
> 00000000 00000000 00000000 00000000 00000000 0x0
> 16.0: kd:x86> .load wow64exts
> 16.0: kd:x86> !wow64exts.sw
> Switched to 64bit mode
> 16.0: kd> kv
> Child-SP RetAddr : Args to Child
> : Call Site
> 0000000000000000 0000000000000000 : 0000000000000000 0000000000000000
> 0000000000000000 0000000000000000 : 0x0
> 16.0: kd> r
> rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000
> rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
> rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000
> r8=0000000000000000 r9=0000000000000000 r10=0000000000000000
> r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
> r14=0000000000000000 r15=0000000000000000
> iopl=0 nv up di pl nz na pe nc
> cs=0000 ss=0000 ds=0000 es=0000 fs=0000 gs=0000
> efl=00000000
> 00000000`00000000 ?? ???
>
>
> 16.0: kd> ||
> . 0 64-bit Full kernel dump:
> D:\SharedDir\Test_1\Dumps\BSOD_WhenStressed\MEMORY.DMP
>
> 16.0: kd> !errlog
> errorlog is empty
> 16.0: kd> !dpcs
> CPU Type KDPC Function
> 0: Normal : 0xfffffa80075d4328 0xfffff880014cea00 ndis!ndisInterruptDpc
>
>
> —
> WINDBG is sponsored by OSR
>
> OSR is hiring!! Info at http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

xxxxx@flounder.com wrote:

I see no sign that it has entered 32-bit mode;

Don’t you? It’s certainly clear to me. Besides the unusual notation in
the prompt:

16.0: kd:x86>

Take a look at the faulting instruction:

FAULTING_IP:
!memcmp+30 [d:\winmain\minkernel\crts\crtw32\string\amd64\memcmp.asm @ 99]
fffff880`0428b120 8a01 mov al,byte ptr [ecx]

That instruction has a 64-bit address, and the name of the source file
strongly implies it should be 64-bit code, but it is being interpreted
as a 32-bit instruction. “8A 01” in 64-bit mode is
mov al, byte ptr [rcx]

So, windbg is clearly decoding the instructions as if it were 32-bit
code, which cannot be the case, given the IP address. I don’t know how
it got so confused.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Cool crash dump :slight_smile:

The bugcheck code seems reasonable enough, so I’m inclined to believe that
you do have a NULL pointer dereference. The dump does appear to be seriously
corrupted though.

Searching the NDIS 6 MUX sample, I only see one call to memcmp while walking
an internal list of VELANs in PtFindVElan. If I were you, I would try to get
my structures back from the dump and recreate the processing of that routine
(the list head is part of the PADAPT, which maybe you can reconstruct with
the !ndiskd extensions?). Maybe the dump is intact enough to get this far
and you can see that your structures are corrupt/freed.

-scott
OSR

wrote in message news:xxxxx@windbg…

While stress-testing my 64 bits driver on an 8 core w7 I got a BSOD and am
not able to see the offending stack. Also I don?t understand why Windbg
seems to enter a sort of 32 bits x86 mode (maybe result of some sort of
corruption?)
My Driver is an Ndis IM driver derived from the NDIS 6 MUX sample.

Q1) Is there some way to find the offending stack
Q2) What is the prompt ?16.0: kd>? telling? (I?m use to one number for
current processor only)

See my WinDbg output:

The context is partially valid. Only x86 user-mode context is available.
The wow64exts extension must be loaded to access 32-bit state.
.load wow64exts will do this if you haven’t loaded it already.
*******************************************************************************
*
*
* Bugcheck Analysis
*
*
*
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck D1, {6, 2, 0, fffff8800428b120}

*** ERROR: Module load completed but symbols could not be loaded for
.sys
Probably caused by : Unknown_Image ( +f120 )

Followup: MachineOwner
---------

16.0: kd:x86> < === Fixed symols for MyDriver === >
16.0: kd:x86> !analyze -v



Bugcheck Analysis



******

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at
an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 0000000000000006, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
Arg4: fffff8800428b120, address which referenced memory

Debugging Details:
------------------

READ_ADDRESS: 0000000000000006

CURRENT_IRQL: 0

FAULTING_IP:
!memcmp+30
[d:\winmain\minkernel\crts\crtw32\string\amd64\memcmp.asm @ 99]
fffff8800428b120 8a01 mov al,byte ptr [ecx]<br><br>DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT<br><br>BUGCHECK_STR: 0xD1<br><br>LAST_CONTROL_TRANSFER: from 0000000000000000 to 0000000000000000<br><br>STACK_TEXT:<br>00000000 00000000 00000000 00000000 00000000 0x0<br><br>STACK_COMMAND: .bugcheck ; kb<br><br>FOLLOWUP_IP:<br><mydriver>!memcmp+30 <br>[d:\winmain\minkernel\crts\crtw32\string\amd64\memcmp.asm @ 99]<br>fffff8800428b120 8a01 mov al,byte ptr [ecx]

SYMBOL_NAME: !memcmp+30

FOLLOWUP_NAME: MachineOwner

IMAGE_NAME: Unknown_Image

DEBUG_FLR_IMAGE_TIMESTAMP: 0

BUCKET_ID: INVALID_KERNEL_CONTEXT

MODULE_NAME: Unknown_Module

Followup: MachineOwner
---------
16.0: kd:x86> kv
ChildEBP RetAddr Args to Child
WARNING: Frame IP not in any known module. Following frames may be wrong.
00000000 00000000 00000000 00000000 00000000 0x0
16.0: kd:x86> .load wow64exts
16.0: kd:x86> !wow64exts.sw
Switched to 64bit mode
16.0: kd> kv
Child-SP RetAddr : Args to Child
: Call Site
0000000000000000 0000000000000000 : 0000000000000000 0000000000000000
0000000000000000 0000000000000000 : 0x0
16.0: kd> r
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000
r8=0000000000000000 r9=0000000000000000 r10=0000000000000000
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up di pl nz na pe nc
cs=0000 ss=0000 ds=0000 es=0000 fs=0000 gs=0000
efl=00000000
00000000`00000000 ?? ???

16.0: kd> ||
. 0 64-bit Full kernel dump:
D:\SharedDir\Test_1\Dumps\BSOD_WhenStressed\MEMORY.DMP

16.0: kd> !errlog
errorlog is empty
16.0: kd> !dpcs
CPU Type KDPC Function
0: Normal : 0xfffffa80075d4328 0xfffff880014cea00 ndis!ndisInterruptDpc

Hi all
Thanks for the comments and suggestions.
@Scott there is few more memcmp and also memcpy on the same data
in this driver than the original, so I will inspect them very close.
And yes, I was able to find some structures in the dump, one of the _ADAPT struct had this:
+0x1cc OutstandingSends : 0x902
I believe this is rather high, and maybe we have a situation with low resources.

Here is a curiosity in this dump:
The ?Microsoft (R) Windows Debugger Version 6.12.0002.633 AMD64?
enters this x86 mode by itself when opening the dump.

16.0: kd:x86> .load wow64exts
16.0: kd:x86> !wow64exts.sw
The context is partially valid. Only x86 user-mode context is available.
Switched to 64bit mode
16.0: kd> .echo “try to switch back again”
16.0: kd> !wow64exts.sw
The current thread doesn’t have an x86 context.

I can see the others CPU?s stack

16.0: kd> k
Child-SP RetAddr Call Site
0000000000000000 0000000000000000 0x0
16.0: kd> ~1s
16.1: kd> k
Child-SP RetAddr Call Site
fffff8800311bb58 fffff800030d7f09 intelppm+0x2c61
fffff8800311bb60 fffff800030c633c nt!PoIdle+0x52a
fffff8800311bc40 0000000000000000 nt!KiIdleLoop+0x2c
16.1: kd> ~7s
16.7: kd> k
Child-SP RetAddr Call Site
fffff88003395b58 fffff800030d7f09 intelppm+0x2c61

Regards
Kjell Gunnar

>enters this x86 mode by itself when opening the dump.

The dump type is in the dump header, maybe it has been corrupted? You can
dump it with .dumpdebug and see what it says.

What driver are you using for the storage adapter on this system? Is it an
inbox driver or a third party driver?

-scott
OSR

wrote in message news:xxxxx@windbg…

Hi all
Thanks for the comments and suggestions.
@Scott there is few more memcmp and also memcpy on the same data
in this driver than the original, so I will inspect them very close.
And yes, I was able to find some structures in the dump, one of the _ADAPT
struct had this:
+0x1cc OutstandingSends : 0x902
I believe this is rather high, and maybe we have a situation with low
resources.

Here is a curiosity in this dump:
The ?Microsoft (R) Windows Debugger Version 6.12.0002.633 AMD64?
enters this x86 mode by itself when opening the dump.

16.0: kd:x86> .load wow64exts
16.0: kd:x86> !wow64exts.sw
The context is partially valid. Only x86 user-mode context is available.
Switched to 64bit mode
16.0: kd> .echo “try to switch back again”
16.0: kd> !wow64exts.sw
The current thread doesn’t have an x86 context.

I can see the others CPU?s stack

16.0: kd> k
Child-SP RetAddr Call Site
0000000000000000 0000000000000000 0x0
16.0: kd> ~1s
16.1: kd> k
Child-SP RetAddr Call Site
fffff8800311bb58 fffff800030d7f09 intelppm+0x2c61
fffff8800311bb60 fffff800030c633c nt!PoIdle+0x52a
fffff8800311bc40 0000000000000000 nt!KiIdleLoop+0x2c
16.1: kd> ~7s
16.7: kd> k
Child-SP RetAddr Call Site
fffff88003395b58 fffff800030d7f09 intelppm+0x2c61

Regards
Kjell Gunnar

>What driver are you using for the storage adapter
The one and only disk driver is disk.sys from Microsoft (6.1.7600.16385) and here is the dump header. (Looks fine for me ! )

16.0: kd:x86> .dumpdebug
----- 64 bit Kernel Full Dump Analysis

DUMP_HEADER64:
MajorVersion 0000000f
MinorVersion 00001db1
KdSecondaryVersion 00000000
DirectoryTableBase 0000000000187000 PfnDataBase fffffa8000000000
PsLoadedModuleList fffff80003293e90 PsActiveProcessHead fffff80003275b90
MachineImageType 00008664
NumberProcessors 00000008
BugCheckCode 000000d1
BugCheckParameter1 0000000000000006 BugCheckParameter2 0000000000000002
BugCheckParameter3 0000000000000000 BugCheckParameter4 fffff8800428b120
KdDebuggerDataBlock fffff800`0323f0a0
SecondaryDataState 00000000
ProductType 00000001
SuiteMask 00000110

Physical Memory Description:
Number of runs: 6 (limited to 6)
FileOffset Start Address Length
0000000000001000 0000000000001000 000000000009e000 000000000009f000 0000000000100000 000000001ff00000
000000001ff9f000 0000000020200000 000000001fe04000 000000003fda3000 0000000040005000 0000000099353000
00000000d90f6000 00000000d9c03000 00000000003fd000 00000000d94f3000 0000000100000000 000000011ee00000
Last Page: 00000001f82f2000 000000021edff000

KiProcessorBlock at fffff800032ff900 8 KiProcessorBlock entries: fffff80003240e80 fffff880009eb180 fffff88003165180 fffff880031d7180 fffff880009b3180 fffff88003289180 fffff880032fb180 fffff880`0336d180


Regards
Kjell Gunnar

This usually means that something went wrong with capturing the register state of one of the processors at bugcheck time, either because a processor has gotten severely jammed somehow, or things were sufficiently corrupted that something went non-recoverably wrong during bugcheck handling on that processor. The “16.” prefix at the KD prompt means that the debugger thinks that the target is in V86 mode, most likely because it couldn’t read valid code segment descriptor information for the crashed processor due to bogus/incomplete register context being stored in the crash dump file.

You could attempt to use the “.segmentation /V /X /a” command to rescue the debugger session (but this will need at least a Win8-era debugger), however while this will often fix the inability to do much of anything to debug the session when the debugger has become confused into thinking that it’s in V86 decode mode, it probably won’t get the context of the wedged processor(s) back.

For that, you might be able to manually dump data out of the current thread stack of the thread running on the wedged processor (from its PRCB->CurrentThread), or the DPC stack of that processor, etc., to get further clues as to what might have happened.

  • S (Msft)

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Scott Noone
Sent: Monday, July 01, 2013 8:53 AM
To: Kernel Debugging Interest List
Subject: Re:[windbg] w7/ 64 crash in my driver, can?t see the stack

enters this x86 mode by itself when opening the dump.

The dump type is in the dump header, maybe it has been corrupted? You can dump it with .dumpdebug and see what it says.

What driver are you using for the storage adapter on this system? Is it an inbox driver or a third party driver?

-scott
OSR

wrote in message news:xxxxx@windbg…

Hi all
Thanks for the comments and suggestions.
@Scott there is few more memcmp and also memcpy on the same data in this driver than the original, so I will inspect them very close.
And yes, I was able to find some structures in the dump, one of the _ADAPT struct had this:
+0x1cc OutstandingSends : 0x902
I believe this is rather high, and maybe we have a situation with low resources.

Here is a curiosity in this dump:
The ?Microsoft (R) Windows Debugger Version 6.12.0002.633 AMD64?
enters this x86 mode by itself when opening the dump.

16.0: kd:x86> .load wow64exts
16.0: kd:x86> !wow64exts.sw
The context is partially valid. Only x86 user-mode context is available.
Switched to 64bit mode
16.0: kd> .echo “try to switch back again”
16.0: kd> !wow64exts.sw
The current thread doesn’t have an x86 context.

I can see the others CPU?s stack

16.0: kd> k
Child-SP RetAddr Call Site
0000000000000000 0000000000000000 0x0
16.0: kd> ~1s
16.1: kd> k
Child-SP RetAddr Call Site
fffff8800311bb58 fffff800030d7f09 intelppm+0x2c61
fffff8800311bb60 fffff800030c633c nt!PoIdle+0x52a
fffff8800311bc40 0000000000000000 nt!KiIdleLoop+0x2c
16.1: kd> ~7s
16.7: kd> k
Child-SP RetAddr Call Site
fffff88003395b58 fffff800030d7f09 intelppm+0x2c61

Regards
Kjell Gunnar


WINDBG is sponsored by OSR

OSR is hiring!! Info at http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer