how to debug DRIVER_LEFT_LOCKED_PAGES_IN_PROCESS

Dear list,

I have read a lot in this forum/list in the last couple of weeks in order to try to find a way to solve a BSOD that I’m dealing with. I read every thread that exist here and other sites and I still didn’t come up with an answer to a question: how to debug the DRIVER_LEFT_LOCKED_PAGES_IN_PROCESS BSOD. The purpose of this post is to consult with you how can I find out what pages were locked and by whom, and why is the I/O manager can’t release the lock.

To give you a context to this conversation I would say that I have developed a TDI driver that analyzes the network usage parameters by processes that are running on a Windows machine. The BSOD occurs on Windows2003/XP machines and doesn’t occur on Windows7/2008. It happens once a day (more or less).

I have a full memory dump and I was trying to analyze it with the methods I read here and there, but I didn’t get to any conclusion.

The !analyze -v command shows some information, the stack trace doesn’t involve my driver:

DRIVER_LEFT_LOCKED_PAGES_IN_PROCESS (cb)
Caused by a driver not cleaning up completely after an I/O.
When possible, the guilty driver’s name (Unicode string) is printed on
the bugcheck screen and saved in KiBugCheckDriver.
Arguments:
Arg1: f694a2f2, The calling address in the driver that locked the pages or if the
IO manager locked the pages this points to the dispatch routine of
the top driver on the stack to which the IRP was sent.
Arg2: f694b931, The caller of the calling address in the driver that locked the
pages. If the IO manager locked the pages this points to the device
object of the top driver on the stack to which the IRP was sent.
Arg3: 821a3008, A pointer to the MDL containing the locked pages.
Arg4: 00000004, The number of locked pages.

FAULTING_IP:
afd!AfdAllocateMdlChain+bf
f694a2f2 8b45dc mov eax,dword ptr [ebp-24h]

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0xCB

PROCESS_NAME: RequestTicket.e

CURRENT_IRQL: 0

LAST_CONTROL_TRANSFER: from 8084ddb3 to 80827e33

STACK_TEXT:
f4db7c54 8084ddb3 000000cb f694a2f2 f694b931 nt!KeBugCheckEx+0x1b
f4db7c84 8094e2c7 ffffffe9 826366f8 82636938 nt!MmCleanProcessAddressSpace+0x551
f4db7d0c 8094e41b 00000025 00000000 83e625f0 nt!PspExitThread+0x5f1
f4db7d24 8094e615 826366f8 00000025 00000001 nt!PspTerminateThreadByPointer+0x4b
f4db7d54 8088b46c 00000000 00000025 0012fe90 nt!NtTerminateProcess+0x125
f4db7d54 7c82845c 00000000 00000025 0012fe90 nt!KiFastCallEntry+0xfc
WARNING: Frame IP not in any known module. Following frames may be wrong.
0012fe90 00000000 00000000 00000000 00000000 0x7c82845c

STACK_COMMAND: .bugcheck ; kb

FOLLOWUP_IP:
afd!AfdAllocateMdlChain+bf
f694a2f2 8b45dc mov eax,dword ptr [ebp-24h]

SYMBOL_NAME: afd!AfdAllocateMdlChain+bf

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: afd

IMAGE_NAME: afd.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 4ef9d207

FAILURE_BUCKET_ID: 0xCB_afd!AfdAllocateMdlChain+bf

BUCKET_ID: 0xCB_afd!AfdAllocateMdlChain+bf


I thought that I could search for VA that are associated with the MDL and then print to a log buffer all the places that my driver allocates/dealocated/touches/locks/unlocks the memory, but it was useless since searching to the MDL pointer in the dump provided nothing:

15: kd> !search 821a3008
Searching PFNs in range 00000001 - 0003FFFF for [FFFFFFFF821A3008 - FFFFFFFF821A3008]

Pfn Offset Hit Va Pte


Search done.


Using the dt nt!_mdl command on the MDL pointer showed the following information:

15: kd> dt nt!_mdl 821a3008
+0x000 Next : (null)
+0x004 Size : 0n36
+0x006 MdlFlags : 0n138
+0x008 Process : 0x83e625f0 _EPROCESS
+0x00c MappedSystemVa : 0xf7bcae58 Void
+0x010 StartVa : 0x0012e000 Void
+0x014 ByteCount : 0x1000
+0x018 ByteOffset : 0xdf4

But it was a dead end to me too because I didn’t understand what use I can make of that information except of interpreting the MdlFlags parameter. If I understood correctly, then this MDL deals with:

MDL_PAGES_LOCKED 0x0002
MDL_ALLOCATED_FIXED_SIZE 0x0008
MDL_WRITE_OPERATION 0x0080

(information learned from this post: http://www.sysnative.com/forums/bsod-kernel-dump-analysis-debugging-information/269-fun-mdls.html)


I hope you would be able to suggest some more techniques to debug the DRIVER_LEFT_LOCKED_PAGES_IN_PROCESS issue.

You are mapping 4 pages of data into the RequestTicket.exe process. And your
driver does not do the proper cleanup when the process unexpectedly
terminates. Your driver must detect when the process into which you map the
memory is about to terminate and there unmap everything that you have mapped
into the process. The best place to do this is at dispatch cleanup
(IRP_MJ_CLEANUP).

//Daniel

Hi Daniel,

Thanks for your reply. Is this error always caused by MDL related functions like MmProbeAndLockPages() with no MmUnlockPages() or can be cause by spin locks or something like that? I’m trying to isolate the possible erroneous place in my code.

I would expect so, in particular a missing MmUnmapLockedPages after a
MmMapLockedPages. Are you suggesting that you are not sharing memory and not
making use of these calls ?

I am reading the bug check data and it says “Starting with Windows Vista,
this bug check can also be issued by Driver Verifier when the Pool Tracking
option is enabled.”

//Daniel

Hi Daniel,

Thanks for your reply. Is this error always caused by MDL related functions
like MmProbeAndLockPages() with no MmUnlockPages() or can be cause by spin
locks or something like that? I’m trying to isolate the possible erroneous
place in my code.

I’m not calling MmMapLockedPages/MmUnmapLockedPages function in my code. The only MDL related code in my driver looks like:

IoAllocateIrp
IoAllocateMdl
MmProbeAndLockPages
TdiBuildQueryInformation
MmUnlockPages
IoFreeMdl
IoFreeIrp

No “return” calls in between. So I don’t understand how MmUnlockPages was not called.

Regarding your Verifier comment, this BSOD happens only on Win2003 machines (not higher) with no Verifier turned on.

So it looks like you have to take care of outstanding
TdiBuildQueryInformation requests that are associated with your userbuffer
at dispatch cleanup.
This normally done by canceling the outstanding IRPs. But perhaps someone
can comment.

//Daniel

wrote in message news:xxxxx@ntdev…

I’m not calling MmMapLockedPages/MmUnmapLockedPages function in my code. The
only MDL related code in my driver looks like:

IoAllocateIrp
IoAllocateMdl
MmProbeAndLockPages
TdiBuildQueryInformation
MmUnlockPages
IoFreeMdl
IoFreeIrp

No “return” calls in between. So I don’t understand how MmUnlockPages was
not called.

Regarding your Verifier comment, this BSOD happens only on Win2003 machines
(not higher) with no Verifier turned on.

Maybe this would shed some more light on what’s happening:

I’m was searching for uncompleted IRPs within the context of the RequestTicket.exe process that have my driver in their stack and found those two:

15: kd> !irp 82166008
Irp is active with 4 stacks 3 is current (= 0x821660c0)
Mdl=84058780: No System Buffer: Thread 00000000: Irp stack trace.
cmd flg cl Device File Completion-Context
[0, 0] 0 0 00000000 00000000 00000000-00000000

Args: 00000000 00000000 00000000 00000000
[0, 0] 0 0 00000000 00000000 00000000-00000000

Args: 00000000 00000000 00000000 00000000

[f, 8] 0 e0 83e8b3a8 82204070 f694cb70-8265d680 Success Error Cancel
\Driver\MyDriver afd!AfdRestartBufferReceiveWithUserIrp
Args: 00001000 00010020 00000000 00000000
[e, 5] 5 0 840e46d0 8220e938 00000000-00000000
\Driver\AFD
Args: 00001000 00000000 20000020 00000000
15: kd> !irp 8258e550
Irp is active with 4 stacks 3 is current (= 0x8258e608)
Mdl=821a3008: No System Buffer: Thread 00000000: Irp stack trace.
cmd flg cl Device File Completion-Context
[0, 0] 0 0 00000000 00000000 00000000-00000000

Args: 00000000 00000000 00000000 00000000
[0, 0] 0 0 00000000 00000000 00000000-00000000

Args: 00000000 00000000 00000000 00000000

[f, 8] 0 e0 83e8b3a8 847008b8 f694cb70-82533a10 Success Error Cancel
\Driver\MyDriver afd!AfdRestartBufferReceiveWithUserIrp
Args: 00001000 00010020 00000000 00000000
[e, 5] 5 0 840e46d0 82230028 00000000-00000000
\Driver\AFD
Args: 00001000 00000000 20000020 00000000

As you can see, both mentioning the AfdRestartBufferReceiveWithUserIrp function which immediately was a warning sign to me since the bugcheck output points out the AfdAllocateMdlChain function.

What else can I check?

Is there a way I can find out the IOCTL that is responsible for the unfinished IRP?

Something to consider is that while there is a pended/uncompleted irp from the app, the application cannot exit. As such, you can’t get the BSOD while there is an irp pending. Now if there is a level of indirection where you mapped a buffer and then left it mapped after the irp completed, you can get to this state. Have you turned on driver verifier for the entire system to see if it catches something earlier?

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
Sent: Sunday, February 17, 2013 8:24 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] how to debug DRIVER_LEFT_LOCKED_PAGES_IN_PROCESS

Is there a way I can find out the IOCTL that is responsible for the unfinished IRP?


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Doron, thanks for your reply.

I’m confused. I read in other thread here (http://www.osronline.com/showThread.cfm?link=33339) that you get this bugcheck when there’s an uncompleted IRP and the process terminates. Isn’t that correct? What would you suggest to do to locate the problem?

This bugcheck was fired w/o a verifier present. I tried to turn on IRP and deadlock checks but didn’t came up with something interesting yet.

OK, I’m feel like this should be helpful. I turned on some extra verifier option and got a bugcheck:

DRIVER_VERIFIER_IOMANAGER_VIOLATION (c9)
The IO manager has caught a misbehaving driver.
Arguments:
Arg1: 00000226, An IRP dispatch handler has returned without passing down or completing this IRP,
or someone forgot to return STATUS_PENDING.
Arg2: f57bf3f0, The address in the driver’s code where the error was detected.
Arg3: 00000000, IRP address.
Arg4: 00000000

Arg2 points to my main dispatch function while the IRP address is null (completed?). What can I look for to find the source of this problem?

I would suggest looking at places where an IRP pointer might be
inadveryenly used. One approach woul be to liberally sprinkle ASSER(irp
!= NULL) in you code (or ASSERTMSG(…), which lets you put a suitable
message out so you can tell one from another. It’s a little strange that
it sees this NULL pointer as an uncompleted IRP, rather tan taking an
access fault, which makes me a bit suspicious that the pointer address is
something non-NULL but is erroneously reported as NULL. This error is
most often reurned from a top-level dispatch routine, so I found that
using KdPrint at dispatch routine entry and exit. If you see the entry
message, but then this verifier error, that’s a strong indication of where
the error is.
joe

OK, I’m feel like this should be helpful. I turned on some extra verifier
option and got a bugcheck:

DRIVER_VERIFIER_IOMANAGER_VIOLATION (c9)
The IO manager has caught a misbehaving driver.
Arguments:
Arg1: 00000226, An IRP dispatch handler has returned without passing down
or completing this IRP,
or someone forgot to return STATUS_PENDING.
Arg2: f57bf3f0, The address in the driver’s code where the error was
detected.
Arg3: 00000000, IRP address.
Arg4: 00000000

Arg2 points to my main dispatch function while the IRP address is null
(completed?). What can I look for to find the source of this problem?


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

This may be caused by returning STATUS_PENDING without matching IoMarkIrpPending. Such mismatch may cause the IRP not to be properly finalized, leaving the MDL with locked pages.

You might try running prefast and static driver verifier on your source code. It may have IRP processing validation rules, and it may be able to point out what code path leaves a dangling IRP.

Jan

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
Sent: Monday, February 18, 2013 7:08 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] how to debug DRIVER_LEFT_LOCKED_PAGES_IN_PROCESS

OK, I’m feel like this should be helpful. I turned on some extra verifier option and got a bugcheck:

DRIVER_VERIFIER_IOMANAGER_VIOLATION (c9) The IO manager has caught a misbehaving driver.
Arguments:
Arg1: 00000226, An IRP dispatch handler has returned without passing down or completing this IRP,
or someone forgot to return STATUS_PENDING.
Arg2: f57bf3f0, The address in the driver’s code where the error was detected.
Arg3: 00000000, IRP address.
Arg4: 00000000

Arg2 points to my main dispatch function while the IRP address is null (completed?). What can I look for to find the source of this problem?

Jan, can I use SDV with a TDI driver?

You need to find a path through your source code, in your dispatch routine, where you do what Verifier said. That is, where do you return without (a) completing the IRP, (b) calling IoCallDriver, or (c) returning STATUS_PENDING.

So, some careful code inspection is in order. And anyplace you have in your code where “well, THAT branch can’t happen” add an ASSERT.

And I’d at least TRY SDV… It *can* be quite clever at finding obscure problems.

Peter
OSR

Thanks guys, I’ll try your suggestions and come back with findings!

Problem solved!

Following Joseph’s and Peter’s valuable advices and the below described investigation I managed to find the faulting IRP and find the flow that didn’t complete the IRP.

As you can see from the DRIVER_VERIFIER_IOMANAGER_VIOLATION bugcheck arguments, the IRP there is null. But an uncompleted IRP cannot be null, so I looked at the call stack and found the function IofCallDriver is being called. This is the same as IoCallDriver but with ?fastcall? calling convention which means that the parameters are not passed on the stack, but via registers. So next I was interested to know what is the value of the EDX register before calling to IofCallDriver because it always contains the second parameter of the function which in our case is the pointer to the IRP. Finding the caller to IofCallDriver, unassembling it and do some registers math revealed the IRP address eventually. Then using the !irp extension I found the major and minor functions that the IRP called. From there it was easy to find the exact problematic flow.