Tracking down cause of page fault

OSR_Community_User · October 22, 2003, 12:25pm

I’m currently examing a problem with a driver that at some point causes a
BSOD due to a DRIVER_IRQL_NOT_LESS_OR_EQUAL . What I would like to do is get
the help of
WinDbg in telling me why the address that I am using may have become
invalid.

The sequence is

app issues a Read to my driver’s device which is configured for
DO_DIRECT_IO
I take the mdl and use MmGetSystemAddressForMdlSafe (Irp->MdlAddress,
HighPagePriority) to get a system address that I can use within the driver
at any IRQL. This address should reference nonpaged memory (i.e. the buffer
supplied in the read request)
Subsequent read requests are queued (using a section of the read buffer
which is now owned by the driver until the read request completes) - I know
I could just
queue the Irps themselves but that’s another story
So, I now have a queue of read buffers and at some, seemingly random
point, one of these pointers suddenly becomes invalid and, when used, causes
the BSOD.

I have numerous calls to a validation routine from within the driver which
is how I know that

a) when the BSOD occurs the address is the same value that was previously OK
(i.e. the last time the validation routine was called)
b) the address has become invalid when none of my driver code is running
(and the app has apparently done nothing either)

It looks like it has been paged out - but how could that be?

!analyze -v shows:

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pagable (or completely invalid) address at
an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: fb1bab56, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000000, value 0 = read operation, 1 = write operation
Arg4: f8778113, address which referenced memory

Debugging Details:

READ_ADDRESS: fb1bab56 Nonpaged pool

CURRENT_IRQL: 2

This clearly shows the address refering to nonpaged pool and yet fb1bab56 is
displayed as ??? if dumped (and was OK the last time the validation
routine was called)

Any suggestions as to any info I can gleem concerning this address and the
memory it references and why it should have become invalid?

Thanks

Will

OSR_Community_User · October 22, 2003, 2:12pm

Silly question… but are you completing the read IRPs after chaining the
read requests?

-Jeff

-----Original Message-----
From: Will Barker [mailto:xxxxx@farsite.co.uk]
Sent: Wednesday, October 22, 2003 12:24 PM
To: Kernel Debugging Interest List
Subject: [windbg] Tracking down cause of page fault

I’m currently examing a problem with a driver that at some point causes a
BSOD due to a DRIVER_IRQL_NOT_LESS_OR_EQUAL . What I would like to do is get
the help of WinDbg in telling me why the address that I am using may have
become invalid.

The sequence is

app issues a Read to my driver’s device which is configured for
DO_DIRECT_IO
I take the mdl and use MmGetSystemAddressForMdlSafe (Irp->MdlAddress,
HighPagePriority) to get a system address that I can use within the driver
at any IRQL. This address should reference nonpaged memory (i.e. the buffer
supplied in the read request)
Subsequent read requests are queued (using a section of the read buffer
which is now owned by the driver until the read request completes) - I know
I could just queue the Irps themselves but that’s another story
So, I now have a queue of read buffers and at some, seemingly random
point, one of these pointers suddenly becomes invalid and, when used, causes
the BSOD.

I have numerous calls to a validation routine from within the driver which
is how I know that

a) when the BSOD occurs the address is the same value that was previously OK
(i.e. the last time the validation routine was called)
b) the address has become invalid when none of my driver code is running
(and the app has apparently done nothing either)

It looks like it has been paged out - but how could that be?

!analyze -v shows:

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pagable (or completely invalid) address at
an interrupt request level (IRQL) that is too high. This is usually caused
by drivers using improper addresses. If kernel debugger is available get
stack backtrace.
Arguments:
Arg1: fb1bab56, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000000, value 0 = read operation, 1 = write operation
Arg4: f8778113, address which referenced memory

Debugging Details:

READ_ADDRESS: fb1bab56 Nonpaged pool

CURRENT_IRQL: 2

This clearly shows the address refering to nonpaged pool and yet fb1bab56 is
displayed as ??? if dumped (and was OK the last time the validation
routine was called)

Any suggestions as to any info I can gleem concerning this address and the
memory it references and why it should have become invalid?

Thanks

Will

You are currently subscribed to windbg as: xxxxx@concord.com To
unsubscribe send a blank email to xxxxx@lists.osr.com

**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote also confirms that this email message has been swept by
the latest virus scan software available for the presence of computer
viruses.
**********************************************************************

OSR_Community_User · October 23, 2003, 4:57am

Indeed. In fact it all works fine normally. I can only get the problem to
occur if, ~20 mins previously, I (unsafely) ejected a particular PCMCIA
card. This is a PCMCIA card (and driver) that we are developing and the
problem is almost certainly related to its (unsafe) removal. However, I can
find no fault in that support. Although one shouldn’t do a hot removal of a
PCMCIA card, the driver should cope with it. The main issue is that the
problem doesn’t manifest itself for so long that a user may not even
associate it with the previous removal.

Therefore, I’m trying to get WinDbg to give me some info re: the symptoms
(which appear in Driver A) of the problem (probably caused by Driver B) such
that I can track down the cause.

So the question is what info can I get out with the aide of WinDbg re: the
memory which has become inaccessible. It doesn’t look like the memory itself
that has become corrupted. It looks more like it has become paged out and
yet it is nonpaged memory.

Will

“Curless, Jeffrey” wrote in message
news:xxxxx@windbg…
>
> Silly question… but are you completing the read IRPs after chaining the
> read requests?
>
> -Jeff
>
> -----Original Message-----
> From: Will Barker [mailto:xxxxx@farsite.co.uk]
> Sent: Wednesday, October 22, 2003 12:24 PM
> To: Kernel Debugging Interest List
> Subject: [windbg] Tracking down cause of page fault
>
>
> I’m currently examing a problem with a driver that at some point causes a
> BSOD due to a DRIVER_IRQL_NOT_LESS_OR_EQUAL . What I would like to do is
get
> the help of WinDbg in telling me why the address that I am using may have
> become invalid.
>
> The sequence is
>
> 1) app issues a Read to my driver’s device which is configured for
> DO_DIRECT_IO
> 2) I take the mdl and use MmGetSystemAddressForMdlSafe (Irp->MdlAddress,
> HighPagePriority) to get a system address that I can use within the driver
> at any IRQL. This address should reference nonpaged memory (i.e. the
buffer
> supplied in the read request)
> 3) Subsequent read requests are queued (using a section of the read buffer
> which is now owned by the driver until the read request completes) - I
know
> I could just queue the Irps themselves but that’s another story
> 4) So, I now have a queue of read buffers and at some, seemingly random
> point, one of these pointers suddenly becomes invalid and, when used,
causes
> the BSOD.
>
> I have numerous calls to a validation routine from within the driver which
> is how I know that
>
> a) when the BSOD occurs the address is the same value that was previously
OK
> (i.e. the last time the validation routine was called)
> b) the address has become invalid when none of my driver code is running
> (and the app has apparently done nothing either)
>
> It looks like it has been paged out - but how could that be?
>
> !analyze -v shows:
>
> DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
> An attempt was made to access a pagable (or completely invalid) address at
> an interrupt request level (IRQL) that is too high. This is usually
caused
> by drivers using improper addresses. If kernel debugger is available get
> stack backtrace.
> Arguments:
> Arg1: fb1bab56, memory referenced
> Arg2: 00000002, IRQL
> Arg3: 00000000, value 0 = read operation, 1 = write operation
> Arg4: f8778113, address which referenced memory
>
> Debugging Details:
> ------------------
> READ_ADDRESS: fb1bab56 Nonpaged pool
>
> CURRENT_IRQL: 2
>
> This clearly shows the address refering to nonpaged pool and yet fb1bab56
is
> displayed as ??? if dumped (and was OK the last time the validation
> routine was called)
>
> Any suggestions as to any info I can gleem concerning this address and the
> memory it references and why it should have become invalid?
>
> Thanks
>
> Will
>
>
>
>
> —
> You are currently subscribed to windbg as: xxxxx@concord.com To
> unsubscribe send a blank email to xxxxx@lists.osr.com
>
>
>
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they
> are addressed. If you have received this email in error please notify
> the system manager.
> This footnote also confirms that this email message has been swept by
> the latest virus scan software available for the presence of computer
> viruses.
>
>
>
>

OSR_Community_User · October 23, 2003, 11:46am

I’m assuming you are handling the SURPRISE_REMOVAL that you get. It is
freaky that nonpaged memory looks
like it is paged out. Have you tried !poolval, I know it doesn’t seem
corrupted but that might give you
a hint.

And the question I asked earlier is that you are completing the IRPs with
STATUS_PENDING when you chain the
MDL and not STATUS_SUCCESS, right? As I understand if return STATUS_SUCCESS
the IoManager will clean up
the MDL eventually, which includes unlocking it.

Try asking this question on NTDEV, you may get more feedback.

-Jeff

-----Original Message-----
From: Will Barker [mailto:xxxxx@farsite.co.uk]
Sent: Thursday, October 23, 2003 4:56 AM
To: Kernel Debugging Interest List
Subject: [windbg] Re: Tracking down cause of page fault

Indeed. In fact it all works fine normally. I can only get the problem to
occur if, ~20 mins previously, I (unsafely) ejected a particular PCMCIA
card. This is a PCMCIA card (and driver) that we are developing and the
problem is almost certainly related to its (unsafe) removal. However, I can
find no fault in that support. Although one shouldn’t do a hot removal of a
PCMCIA card, the driver should cope with it. The main issue is that the
problem doesn’t manifest itself for so long that a user may not even
associate it with the previous removal.

Therefore, I’m trying to get WinDbg to give me some info re: the symptoms
(which appear in Driver A) of the problem (probably caused by Driver B) such
that I can track down the cause.

So the question is what info can I get out with the aide of WinDbg re: the
memory which has become inaccessible. It doesn’t look like the memory itself
that has become corrupted. It looks more like it has become paged out and
yet it is nonpaged memory.

Will

“Curless, Jeffrey” wrote in message
news:xxxxx@windbg…
>
> Silly question… but are you completing the read IRPs after chaining
> the read requests?
>
> -Jeff
>
> -----Original Message-----
> From: Will Barker [mailto:xxxxx@farsite.co.uk]
> Sent: Wednesday, October 22, 2003 12:24 PM
> To: Kernel Debugging Interest List
> Subject: [windbg] Tracking down cause of page fault
>
>
> I’m currently examing a problem with a driver that at some point
> causes a BSOD due to a DRIVER_IRQL_NOT_LESS_OR_EQUAL . What I would
> like to do is
get
> the help of WinDbg in telling me why the address that I am using may
> have become invalid.
>
> The sequence is
>
> 1) app issues a Read to my driver’s device which is configured for
> DO_DIRECT_IO
> 2) I take the mdl and use MmGetSystemAddressForMdlSafe
> (Irp->MdlAddress,
> HighPagePriority) to get a system address that I can use within the driver
> at any IRQL. This address should reference nonpaged memory (i.e. the
buffer
> supplied in the read request)
> 3) Subsequent read requests are queued (using a section of the read
> buffer which is now owned by the driver until the read request
> completes) - I
know
> I could just queue the Irps themselves but that’s another story
> 4) So, I now have a queue of read buffers and at some, seemingly
> random point, one of these pointers suddenly becomes invalid and, when
> used,
causes
> the BSOD.
>
> I have numerous calls to a validation routine from within the driver
> which is how I know that
>
> a) when the BSOD occurs the address is the same value that was
> previously
OK
> (i.e. the last time the validation routine was called)
> b) the address has become invalid when none of my driver code is
> running (and the app has apparently done nothing either)
>
> It looks like it has been paged out - but how could that be?
>
> !analyze -v shows:
>
> DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
> An attempt was made to access a pagable (or completely invalid)
> address at an interrupt request level (IRQL) that is too high. This
> is usually
caused
> by drivers using improper addresses. If kernel debugger is available
> get stack backtrace.
> Arguments:
> Arg1: fb1bab56, memory referenced
> Arg2: 00000002, IRQL
> Arg3: 00000000, value 0 = read operation, 1 = write operation
> Arg4: f8778113, address which referenced memory
>
> Debugging Details:
> ------------------
> READ_ADDRESS: fb1bab56 Nonpaged pool
>
> CURRENT_IRQL: 2
>
> This clearly shows the address refering to nonpaged pool and yet
> fb1bab56
is
> displayed as ??? if dumped (and was OK the last time the validation
> routine was called)
>
> Any suggestions as to any info I can gleem concerning this address and
> the memory it references and why it should have become invalid?
>
> Thanks
>
> Will
>
>
>
>
> —
> You are currently subscribed to windbg as: xxxxx@concord.com To
> unsubscribe send a blank email to xxxxx@lists.osr.com
>
>
>
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they
> are addressed. If you have received this email in error please notify
> the system manager. This footnote also confirms that this email
> message has been swept by the latest virus scan software available for
> the presence of computer viruses.
>
>
>
>

—
You are currently subscribed to windbg as: xxxxx@concord.com To
unsubscribe send a blank email to xxxxx@lists.osr.com