Need help with BSOD

Velio_Roumenov-2 · February 19, 2007, 11:36am

Hi, i just got a mini dump from one of my colleagues who was testing the
application which includes a driver i have written.

I just do not really get the output in this crash and hope you (ntdev users)
can help me or guide me to the path which will help me solve this issue.

Thanks for all your help.

Here comes the bugcheck analysis.

PS. It’s a TDI Filter Driver.

*******************************************************************************
*
*
* Bugcheck
Analysis *
*
*
*******************************************************************************

IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at
an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: ffffffe0, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000000, value 0 = read operation, 1 = write operation
Arg4: 805364b6, address which referenced memory

Debugging Details:

READ_ADDRESS: ffffffe0

CURRENT_IRQL: 2

FAULTING_IP:
nt!MiEndingOffset+106
805364b6 8b51e0 mov edx,[ecx-0x20]

CUSTOMER_CRASH_COUNT: 1

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0xA

LAST_CONTROL_TRANSFER: from 80536a82 to 805364b6

STACK_TEXT:
bad1fd2c 80536a82 80563b50 80563b48 8053f053 nt!MiEndingOffset+0x106
bad1fdac 805ce7d4 00000000 00000000 00000000 nt!MiFlushDirtyBitsToPfn+0x112
bad1fddc 805451ce 8053efc6 00000000 00000000 nt!MiPageFileTraces+0x1014
00000000 00000000 00000000 00000000 00000000
nt!MiAddValidPageToWorkingSet+0x224

FOLLOWUP_IP:
nt!MiEndingOffset+106
805364b6 8b51e0 mov edx,[ecx-0x20]

SYMBOL_STACK_INDEX: 0

FOLLOWUP_NAME: MachineOwner

SYMBOL_NAME: nt!MiEndingOffset+106

MODULE_NAME: nt

DEBUG_FLR_IMAGE_TIMESTAMP: 434c50c7

STACK_COMMAND: kb

IMAGE_NAME: memory_corruption

FAILURE_BUCKET_ID: 0xA_nt!MiEndingOffset+106

BUCKET_ID: 0xA_nt!MiEndingOffset+106

Followup: MachineOwner

Gianluca_Varenni · February 19, 2007, 12:17pm

Have you run “!analyze -v” in windbg?

In any case, the BSOD seems to point at a very bad memory corruption in the kernel memory structures (probably due to some buffer overrun or similar), causing the memory manager to go nuts.

Hope it helps
GV

----- Original Message -----
From: Faik Riza
To: Windows System Software Devs Interest List
Sent: Monday, February 19, 2007 8:36 AM
Subject: [ntdev] Need help with BSOD

Hi, i just got a mini dump from one of my colleagues who was testing the application which includes a driver i have written.

I just do not really get the output in this crash and hope you (ntdev users) can help me or guide me to the path which will help me solve this issue.

Thanks for all your help.

Here comes the bugcheck analysis.

PS. It’s a TDI Filter Driver.

*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: ffffffe0, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000000, value 0 = read operation, 1 = write operation
Arg4: 805364b6, address which referenced memory

Debugging Details:

READ_ADDRESS: ffffffe0

CURRENT_IRQL: 2

FAULTING_IP:
nt!MiEndingOffset+106
805364b6 8b51e0 mov edx,[ecx-0x20]

CUSTOMER_CRASH_COUNT: 1

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0xA

LAST_CONTROL_TRANSFER: from 80536a82 to 805364b6

STACK_TEXT:
bad1fd2c 80536a82 80563b50 80563b48 8053f053 nt!MiEndingOffset+0x106
bad1fdac 805ce7d4 00000000 00000000 00000000 nt!MiFlushDirtyBitsToPfn+0x112
bad1fddc 805451ce 8053efc6 00000000 00000000 nt!MiPageFileTraces+0x1014
00000000 00000000 00000000 00000000 00000000 nt!MiAddValidPageToWorkingSet+0x224

FOLLOWUP_IP:
nt!MiEndingOffset+106
805364b6 8b51e0 mov edx,[ecx-0x20]

SYMBOL_STACK_INDEX: 0

FOLLOWUP_NAME: MachineOwner

SYMBOL_NAME: nt!MiEndingOffset+106

MODULE_NAME: nt

DEBUG_FLR_IMAGE_TIMESTAMP: 434c50c7

STACK_COMMAND: kb

IMAGE_NAME: memory_corruption

FAILURE_BUCKET_ID: 0xA_nt!MiEndingOffset+106

BUCKET_ID: 0xA_nt!MiEndingOffset+106

Followup: MachineOwner

— Questions? First check the Kernel Driver FAQ at http://www.osronline.com/article.cfm?id=256 To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Thomas_Divine-2 · February 19, 2007, 1:17pm

Something is terribly wrong in your driver. Memory is so corrupted that the
little bit of information you have given provides no clue.

You need to work with your colleague to setup a way to reproduce the problem
in an environment where you can debug the driver.

Good luck,

Thomas F. Divine

From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Faik Riza
Sent: Monday, February 19, 2007 11:36 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Need help with BSOD

Hi, i just got a mini dump from one of my colleagues who was testing the
application which includes a driver i have written.

I just do not really get the output in this crash and hope you (ntdev users)
can help me or guide me to the path which will help me solve this issue.

Thanks for all your help.

Here comes the bugcheck analysis.

PS. It’s a TDI Filter Driver.

****************************************************************************
***
*
*
* Bugcheck Analysis
*
*
*
****************************************************************************
***

IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at
an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: ffffffe0, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000000, value 0 = read operation, 1 = write operation
Arg4: 805364b6, address which referenced memory

Debugging Details:

READ_ADDRESS: ffffffe0

CURRENT_IRQL: 2

FAULTING_IP:
nt!MiEndingOffset+106
805364b6 8b51e0 mov edx,[ecx-0x20]

CUSTOMER_CRASH_COUNT: 1

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0xA

LAST_CONTROL_TRANSFER: from 80536a82 to 805364b6

STACK_TEXT:
bad1fd2c 80536a82 80563b50 80563b48 8053f053 nt!MiEndingOffset+0x106
bad1fdac 805ce7d4 00000000 00000000 00000000 nt!MiFlushDirtyBitsToPfn+0x112
bad1fddc 805451ce 8053efc6 00000000 00000000 nt!MiPageFileTraces+0x1014
00000000 00000000 00000000 00000000 00000000
nt!MiAddValidPageToWorkingSet+0x224

FOLLOWUP_IP:
nt!MiEndingOffset+106
805364b6 8b51e0 mov edx,[ecx-0x20]

SYMBOL_STACK_INDEX: 0

FOLLOWUP_NAME: MachineOwner

SYMBOL_NAME: nt!MiEndingOffset+106

MODULE_NAME: nt

DEBUG_FLR_IMAGE_TIMESTAMP: 434c50c7

STACK_COMMAND: kb

IMAGE_NAME: memory_corruption

FAILURE_BUCKET_ID: 0xA_nt!MiEndingOffset+106

BUCKET_ID: 0xA_nt!MiEndingOffset+106

Followup: MachineOwner

— Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256 To unsubscribe, visit the List
Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · February 19, 2007, 1:54pm

I would suggest setting your driver up for Driver Verifier and then attempting to reproduce the problem. Standard settings is usually enough to catch memory corrupters in the act.

Velio_Roumenov-2 · February 19, 2007, 2:54pm

Thanks to all of you
Gianluca, Tomas and Bob.

I did run !analyze -v and that was the output. Anyway as you all mention i
probably mess up the memmory somewhere in the driver and will probably have
to do some more testing on my own with Driver verifier and all that.

Thank you all for a quick reply.

/Faik

Velio_Roumenov-2 · February 19, 2007, 6:27pm

Just an other question related to the possible memory corruption i am
causing and Driver Verifier.

I was just doing some casual reading in the ntdev lists and came across a
post stating that Driver verifier can not be used to verify memory allocated
from Look Aside Lists (or maybe it was the comments in some of the example
code in the IFS Kit), is this true or is this false.

And also, another question, could for example allocating non-paged memmory
for a global KSPIN_LOCK be the cause of the memory corruption? I recently
understood that all globals are allready non-paged.

Sorry for posting several times but this came up way after i posted the last
message.

Thanks again

/Faik

On 2/19/07, Faik Riza wrote:
>
> Thanks to all of you
> Gianluca, Tomas and Bob.
>
> I did run !analyze -v and that was the output. Anyway as you all mention i
> probably mess up the memmory somewhere in the driver and will probably have
> to do some more testing on my own with Driver verifier and all that.
>
>
> Thank you all for a quick reply.
>
> /Faik
>
>

OSR_Community_User · February 20, 2007, 9:08am

* Easy answer first: Allocating paged memory for a spin lock would be a tremendously bad idea. But allocating non-paged memory [which is what you asked about] is fine. Global data isn’t necessarily non-paged, but it will be unless you are explicitly creating a paged data segment. So that isn’t likely to be your problem.

The issue with look aside lists is that DV can be less effective in detecting corruption problems at the earliest opportunity with them. That is not the same as being unable to work.

* One reason is that the lists don’t usually free items through the pool API when you “free” them- the memory remains allocated so you can use the item again. The worst case is items smaller than the size of a page of memory.

DV allocates these one to a page of “special pool” and aligns them at the end [by default] or the beginning [this is described in the WDK wrt “global flags”] of a page of memory, with an invalid “guard page” either after [in the first case] or before the item. It fills the rest of the page with a fill pattern to detect writes outside the buffer on the page itself. So overflows at one end of the buffer will trigger a page fault and are caught “in the act”. Overflows at the other end don’t get caught until the item is freed (at which point DV will look for corruption of the fill pattern).

Applying this to a lookaside list with the default pool allocate and free- you can see that the pool corruption checks that look for writes on the same memory page won’t get triggered as soon as they otherwise would. They will at least get caught when you enter a cleanup phase and destroy the list [and if you unload your driver without doing that, DV will catch you not doing so].

If your list items are all bigger than a page of memory (minus a pool header for those being absolutely precise), then I don’t believe you lose a thing vs. not using a lookaside. IIRC you get a guard page on either end, and DV will thus bugcheck the moment you step out of bounds.

* A second reason is that a driver can provide custom allocate and free routines for lookasides. The driver would have to do its own verification in that case, because it is opting out of using the API DV is checking. But somewhere, those routines still have to call kernel routines, and unless they are using undocumented calls, some level of verification will still happen. Again, I believe DV is effective as it can be. Analyzing these cases requires more detailed knowledge than a general question of this sort would provide, though.

So, whether that makes the statement about lookaside lists true or false I’ll leave to the individual. As I see it, it’s partially true in some cases, but the problem is not anywhere near as severe as implied.

Velio_Roumenov-2 · February 20, 2007, 9:38am

Thank you for the explanation Bob.

It seems to help asking questions on these lists/forums.

/Faik

On 2/20/07, Bob Kjelgaard wrote:
>
> * Easy answer first: Allocating paged memory for a spin lock would be a
> tremendously bad idea. But allocating non-paged memory [which is what you
> asked about] is fine. Global data isn’t necessarily non-paged, but it will
> be unless you are explicitly creating a paged data segment. So that isn’t
> likely to be your problem.
>
> The issue with look aside lists is that DV can be less effective in
> detecting corruption problems at the earliest opportunity with them. That is
> not the same as being unable to work.
>
> * One reason is that the lists don’t usually free items through the pool
> API when you “free” them- the memory remains allocated so you can use the
> item again. The worst case is items smaller than the size of a page of
> memory.
>
> DV allocates these one to a page of “special pool” and aligns them at the
> end [by default] or the beginning [this is described in the WDK wrt “global
> flags”] of a page of memory, with an invalid “guard page” either after [in
> the first case] or before the item. It fills the rest of the page with a
> fill pattern to detect writes outside the buffer on the page itself. So
> overflows at one end of the buffer will trigger a page fault and are caught
> “in the act”. Overflows at the other end don’t get caught until the item is
> freed (at which point DV will look for corruption of the fill pattern).
>
> Applying this to a lookaside list with the default pool allocate and free-
> you can see that the pool corruption checks that look for writes on the same
> memory page won’t get triggered as soon as they otherwise would. They will
> at least get caught when you enter a cleanup phase and destroy the list [and
> if you unload your driver without doing that, DV will catch you not doing
> so].
>
> If your list items are all bigger than a page of memory (minus a pool
> header for those being absolutely precise), then I don’t believe you lose a
> thing vs. not using a lookaside. IIRC you get a guard page on either end,
> and DV will thus bugcheck the moment you step out of bounds.
>
> * A second reason is that a driver can provide custom allocate and free
> routines for lookasides. The driver would have to do its own verification
> in that case, because it is opting out of using the API DV is checking. But
> somewhere, those routines still have to call kernel routines, and unless
> they are using undocumented calls, some level of verification will still
> happen. Again, I believe DV is effective as it can be. Analyzing these
> cases requires more detailed knowledge than a general question of this sort
> would provide, though.
>
> So, whether that makes the statement about lookaside lists true or false
> I’ll leave to the individual. As I see it, it’s partially true in some
> cases, but the problem is not anywhere near as severe as implied.
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

Doron_Holan · February 20, 2007, 10:31am

As a point of clarification, when you enable DV, the lookaside list
functionality is essentially turned off. The depth of the list will
always be zero which means that you will allocate from pool on every
call and free back to pool instead of to the list. Win2k might not do
this, but this is there for XP and beyond …

d

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Bob Kjelgaard
Sent: Tuesday, February 20, 2007 6:07 AM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] Need help with BSOD

* Easy answer first: Allocating paged memory for a spin lock would be a
tremendously bad idea. But allocating non-paged memory [which is what
you asked about] is fine. Global data isn’t necessarily non-paged, but
it will be unless you are explicitly creating a paged data segment. So
that isn’t likely to be your problem.

The issue with look aside lists is that DV can be less effective in
detecting corruption problems at the earliest opportunity with them.
That is not the same as being unable to work.

* One reason is that the lists don’t usually free items through the pool
API when you “free” them- the memory remains allocated so you can use
the item again. The worst case is items smaller than the size of a page
of memory.

DV allocates these one to a page of “special pool” and aligns them at
the end [by default] or the beginning [this is described in the WDK wrt
“global flags”] of a page of memory, with an invalid “guard page” either
after [in the first case] or before the item. It fills the rest of the
page with a fill pattern to detect writes outside the buffer on the page
itself. So overflows at one end of the buffer will trigger a page fault
and are caught “in the act”. Overflows at the other end don’t get
caught until the item is freed (at which point DV will look for
corruption of the fill pattern).

Applying this to a lookaside list with the default pool allocate and
free- you can see that the pool corruption checks that look for writes
on the same memory page won’t get triggered as soon as they otherwise
would. They will at least get caught when you enter a cleanup phase and
destroy the list [and if you unload your driver without doing that, DV
will catch you not doing so].

If your list items are all bigger than a page of memory (minus a pool
header for those being absolutely precise), then I don’t believe you
lose a thing vs. not using a lookaside. IIRC you get a guard page on
either end, and DV will thus bugcheck the moment you step out of bounds.

* A second reason is that a driver can provide custom allocate and free
routines for lookasides. The driver would have to do its own
verification in that case, because it is opting out of using the API DV
is checking. But somewhere, those routines still have to call kernel
routines, and unless they are using undocumented calls, some level of
verification will still happen. Again, I believe DV is effective as it
can be. Analyzing these cases requires more detailed knowledge than a
general question of this sort would provide, though.

So, whether that makes the statement about lookaside lists true or false
I’ll leave to the individual. As I see it, it’s partially true in some
cases, but the problem is not anywhere near as severe as implied.

Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · February 20, 2007, 10:32am

Faik-

I think we all try to help when and as we can, that’s one reason why the lists exist.

If you are using the WDK, there are now two additional analysis tools that might also help with this sort of a problem: PreFast for Drivers can flag potential memory corruption issues [and other problems] at the time you compile your driver. Static Driver Verifier requires a little bit more preparation, but it looks deeper and further in trying to detect bad call patterns and the like. PFD is particularly easy to use. Both can find problems without your needing a test machine and debugger. IMO, both are useful tools to have when facing a problem like this.

There are also things (some of which may have been mentioned- my memory for all the posts isn’t very good) like turning up the compiler warning level to W4 that can also help.

Unfortunately, there are no guarantees- memory corruption problems are among the hardest to detect and reproduce. So while all of these tools and methods can and do help, it can still take a lot of time, patience and your own ingenuity to finally resolve the issue. Good luck with your efforts.

OSR_Community_User · February 20, 2007, 11:29am

Thanks for the clarification, Doron-

It’s always good to know something works even better than you thought it did!

Velio_Roumenov-2 · February 20, 2007, 11:37am

Thanks again Bob, i will sure have to start using all the tools available.

And Thank you Doron for clarifying that about DV.

Cheers
/Faik