PFN_LIST_CORRUPT

Hi all
A PFN_LIST_CORRUPT memory dump similar to the one below has been reported by three different customers following the installation of our minifilter driver. In each case the fault was reported against a different third party driver (in this case McAfee’s) but I don’t think this is where the problem lies - particulary since one of them was against AFD.SYS.

The problem occurs because of an elevated reference count on a memory buffer (2 in this example). I’ve confirmed that this is the same memory buffer that is passed to Ntfs!NtfsDeleteMdlAndBuffer (84f8f000)

: kd> !pfn 00004f8d
PFN 00004F8D at address 8148B36C
flink 00000000 blink / share count 00000001 pteaddress C0427C68
reference count 0002 Cached color 0
restore pte 00000000 containing page 000B45 Active R
ReadInProgress

0: kd> !pte C0427C68
VA 84f8d000
PDE at C0602138 PTE at C0427C68
contains 0000000004E009E3 contains 0000000000000000
pfn 4e00 -GLDA–KWEV LARGE PAGE pfn 4f8d

I know that this issue is typically related to incorrect MDL handling but we don’t do any MDL manipulation directly (at least on the IO path).

After a little disassembly, it appears that NtfsCommonWrite allocates this buffer via the NtfsCreateMdlAndBuffer call then copies the write data into it. In this case the file object is for $MFT and the write buffer contains the telltale MFT entry header which supports this assertion ie

0: kd> db 84f8d000
84f8d000 46 49 4c 45 30 00 03 00-9f 14 6a 03 00 00 00 00 FILE0…j…
84f8d010 01 00 01 00 38 00 01 00-f0 01 00 00 00 04 00 00 …8…
84f8d020 00 00 00 00 00 00 00 00-03 00 00 00 94 4a 00 00 …J…
84f8d030 88 02 00 00 00 00 00 00-10 00 00 00 60 00 00 00 …`…
84f8d040 00 00 00 00 00 00 00 00-48 00 00 00 18 00 00 00 …H…
84f8d050 e2 05 d2 4f 66 6f c9 01-30 3f cc bf 0f 99 c3 01 …Ofo…0?..
84f8d060 b9 d7 2c 1d 06 9a cc 01-6b 75 fb 31 d9 6d cc 01 …,…ku.1.m…
84f8d070 01 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …

So I’ve been looking at the lifecycle of this buffer and as far as I can tell, NtfsCommonWrite calls both NtfsCreateMdlAndBuffer and NtfsDeleteMdlAndBuffer so it’s difficult to see how our minifilter driver could influence the reference count on this buffer if it is only used internally

Is anybody able to hazard a guess as to what we might be seeing here ??

Regards

Mark

0: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

PFN_LIST_CORRUPT (4e)
Typically caused by drivers passing bad memory descriptor lists (ie: calling
MmUnlockPages twice with the same list, etc). If a kernel debugger is
available get the stack trace.
Arguments:
Arg1: 0000009a,
Arg2: 00004f8d
Arg3: 00000006
Arg4: 00000002

Debugging Details:

BUGCHECK_STR: 0x4E_9a

DEFAULT_BUCKET_ID: DRIVER_FAULT

PROCESS_NAME: System

CURRENT_IRQL: 0

LAST_CONTROL_TRANSFER: from 8086598b to 80827c83

STACK_TEXT:
f78e6540 8086598b 0000004e 0000009a 00004f8d nt!KeBugCheckEx+0x1b
f78e655c 80891779 8148b36c 808aeae0 014d2578 nt!MiBadRefCount+0x33
f78e6594 808925bb 84f8d000 84f8f000 853538e0 nt!MiFreePoolPages+0x5c9
f78e65ec f71017d9 3966744e 00000000 f78e67f0 nt!ExFreePoolWithTag+0x277
f78e65fc f7101817 8521af48 84f8d000 f7106f4b Ntfs!NtfsDeleteMdlAndBuffer+0x31
f78e66e8 f7103177 f78e6700 f71031dc e29345d0 Ntfs!NtfsCommonWrite+0x180b
f78e67f0 f71057d0 f78e6800 853538e0 0108070a Ntfs!NtfsCompleteRequest+0x35
f78e696c 8081df85 86106020 853538e0 853538e0 Ntfs!NtfsFsdWrite+0x16a
f78e6980 f7220d28 8500b700 8646d1c8 00000000 nt!IofCallDriver+0x45
f78e69ac 8081df85 86444200 853538e0 853538e0 fltmgr!FltpDispatch+0x152
f78e69c0 f7220b25 86444020 853538e0 8608cc18 nt!IofCallDriver+0x45
f78e69e4 f7220cf5 f78e6a04 86444020 00000000 fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x20b
f78e6a1c 8081df85 86444020 853538e0 8578e480 fltmgr!FltpDispatch+0x11f
f78e6a30 f5d8a426 86093f90 f78e6ad4 856c0548 nt!IofCallDriver+0x45
WARNING: Stack unwind information not available. Following frames may be wrong.
f78e6ab4 f5d97bd9 86093f90 853538e0 f78e6aec mfehidk+0x7426
f78e6ac4 f5d97c29 f78e6ad4 8579af38 856c0548 mfehidk+0x14bd9
f78e6aec 8081df85 856c0548 853538e0 012a5000 mfehidk!DEVICEDISPATCH::DispatchPassThrough+0x48
f78e6b00 8081e69f 00000000 f78e6b3c 863af5f0 nt!IofCallDriver+0x45
f78e6b14 80836446 86093f0a f78e6b3c f78e6c04 nt!IoSynchronousPageWrite+0xaf
f78e6c30 8083782b e20b1528 e20b1538 863af5f0 nt!MiFlushSectionInternal+0x6ba
f78e6c74 8080f8fe 86093dc0 f78e6c00 00002000 nt!MmFlushSection+0x211
f78e6cfc 8080fc77 00002000 00000000 00000001 nt!CcFlushCache+0x3a6
f78e6d40 808127c2 8658e020 808ae5c0 8658c1f8 nt!CcWriteBehind+0x11b
f78e6d80 80880469 8658c1f8 00000000 8658e020 nt!CcWorkerThread+0x15a
f78e6dac 80949b7c 8658c1f8 00000000 00000000 nt!ExpWorkerThread+0xeb
f78e6ddc 8088e092 8088037e 00000000 00000000 nt!PspSystemThreadStartup+0x2e
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

STACK_COMMAND: kb

FOLLOWUP_IP:
mfehidk+7426
f5d8a426 8945f8 mov dword ptr [ebp-8],eax

SYMBOL_STACK_INDEX: e

SYMBOL_NAME: mfehidk+7426

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: mfehidk

IMAGE_NAME: mfehidk.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 4564d4e5

FAILURE_BUCKET_ID: 0x4E_9a_mfehidk+7426

BUCKET_ID: 0x4E_9a_mfehidk+7426

Followup: MachineOwner

  1. Do !search for the PFN number (4f8d). There may be another locked MDL which contains this PFN. If you find one, search for pointers to that MDL and see if you can identify the driver that created it.

  2. Does your driver work (with no breaks/asserts) under verifier and on checked builds?

  3. If any of your customers are able to reproduce this you can ask them to try setting TrackLockedPages=1 registry option as described in http://support.microsoft.com/kb/256010. You can then use !lockedpages on the System process to dump stack traces for all outstanding MDLs describing probe-and-locked kernel memory.

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
Sent: Monday, August 13, 2012 12:53 PM
To: Windows File Systems Devs Interest List
Subject: [ntfsd] PFN_LIST_CORRUPT

Hi all
A PFN_LIST_CORRUPT memory dump similar to the one below has been reported by three different customers following the installation of our minifilter driver. In each case the fault was reported against a different third party driver (in this case McAfee’s) but I don’t think this is where the problem lies - particulary since one of them was against AFD.SYS.

The problem occurs because of an elevated reference count on a memory buffer (2 in this example). I’ve confirmed that this is the same memory buffer that is passed to Ntfs!NtfsDeleteMdlAndBuffer (84f8f000)

: kd> !pfn 00004f8d
PFN 00004F8D at address 8148B36C
flink 00000000 blink / share count 00000001 pteaddress C0427C68
reference count 0002 Cached color 0
restore pte 00000000 containing page 000B45 Active R
ReadInProgress

0: kd> !pte C0427C68
VA 84f8d000
PDE at C0602138 PTE at C0427C68
contains 0000000004E009E3 contains 0000000000000000
pfn 4e00 -GLDA–KWEV LARGE PAGE pfn 4f8d

I know that this issue is typically related to incorrect MDL handling but we don’t do any MDL manipulation directly (at least on the IO path).

After a little disassembly, it appears that NtfsCommonWrite allocates this buffer via the NtfsCreateMdlAndBuffer call then copies the write data into it. In this case the file object is for $MFT and the write buffer contains the telltale MFT entry header which supports this assertion ie

0: kd> db 84f8d000
84f8d000 46 49 4c 45 30 00 03 00-9f 14 6a 03 00 00 00 00 FILE0…j…
84f8d010 01 00 01 00 38 00 01 00-f0 01 00 00 00 04 00 00 …8…
84f8d020 00 00 00 00 00 00 00 00-03 00 00 00 94 4a 00 00 …J…
84f8d030 88 02 00 00 00 00 00 00-10 00 00 00 60 00 00 00 …`…
84f8d040 00 00 00 00 00 00 00 00-48 00 00 00 18 00 00 00 …H…
84f8d050 e2 05 d2 4f 66 6f c9 01-30 3f cc bf 0f 99 c3 01 …Ofo…0?..
84f8d060 b9 d7 2c 1d 06 9a cc 01-6b 75 fb 31 d9 6d cc 01 …,…ku.1.m…
84f8d070 01 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …

So I’ve been looking at the lifecycle of this buffer and as far as I can tell, NtfsCommonWrite calls both NtfsCreateMdlAndBuffer and NtfsDeleteMdlAndBuffer so it’s difficult to see how our minifilter driver could influence the reference count on this buffer if it is only used internally

Is anybody able to hazard a guess as to what we might be seeing here ??

Regards

Mark

0: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

PFN_LIST_CORRUPT (4e)
Typically caused by drivers passing bad memory descriptor lists (ie: calling MmUnlockPages twice with the same list, etc). If a kernel debugger is available get the stack trace.
Arguments:
Arg1: 0000009a,
Arg2: 00004f8d
Arg3: 00000006
Arg4: 00000002

Debugging Details:

BUGCHECK_STR: 0x4E_9a

DEFAULT_BUCKET_ID: DRIVER_FAULT

PROCESS_NAME: System

CURRENT_IRQL: 0

LAST_CONTROL_TRANSFER: from 8086598b to 80827c83

STACK_TEXT:
f78e6540 8086598b 0000004e 0000009a 00004f8d nt!KeBugCheckEx+0x1b f78e655c 80891779 8148b36c 808aeae0 014d2578 nt!MiBadRefCount+0x33
f78e6594 808925bb 84f8d000 84f8f000 853538e0 nt!MiFreePoolPages+0x5c9 f78e65ec f71017d9 3966744e 00000000 f78e67f0 nt!ExFreePoolWithTag+0x277 f78e65fc f7101817 8521af48 84f8d000 f7106f4b Ntfs!NtfsDeleteMdlAndBuffer+0x31
f78e66e8 f7103177 f78e6700 f71031dc e29345d0 Ntfs!NtfsCommonWrite+0x180b
f78e67f0 f71057d0 f78e6800 853538e0 0108070a Ntfs!NtfsCompleteRequest+0x35 f78e696c 8081df85 86106020 853538e0 853538e0 Ntfs!NtfsFsdWrite+0x16a
f78e6980 f7220d28 8500b700 8646d1c8 00000000 nt!IofCallDriver+0x45 f78e69ac 8081df85 86444200 853538e0 853538e0 fltmgr!FltpDispatch+0x152
f78e69c0 f7220b25 86444020 853538e0 8608cc18 nt!IofCallDriver+0x45
f78e69e4 f7220cf5 f78e6a04 86444020 00000000 fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x20b
f78e6a1c 8081df85 86444020 853538e0 8578e480 fltmgr!FltpDispatch+0x11f
f78e6a30 f5d8a426 86093f90 f78e6ad4 856c0548 nt!IofCallDriver+0x45
WARNING: Stack unwind information not available. Following frames may be wrong.
f78e6ab4 f5d97bd9 86093f90 853538e0 f78e6aec mfehidk+0x7426
f78e6ac4 f5d97c29 f78e6ad4 8579af38 856c0548 mfehidk+0x14bd9 f78e6aec 8081df85 856c0548 853538e0 012a5000 mfehidk!DEVICEDISPATCH::DispatchPassThrough+0x48
f78e6b00 8081e69f 00000000 f78e6b3c 863af5f0 nt!IofCallDriver+0x45
f78e6b14 80836446 86093f0a f78e6b3c f78e6c04 nt!IoSynchronousPageWrite+0xaf
f78e6c30 8083782b e20b1528 e20b1538 863af5f0 nt!MiFlushSectionInternal+0x6ba
f78e6c74 8080f8fe 86093dc0 f78e6c00 00002000 nt!MmFlushSection+0x211 f78e6cfc 8080fc77 00002000 00000000 00000001 nt!CcFlushCache+0x3a6
f78e6d40 808127c2 8658e020 808ae5c0 8658c1f8 nt!CcWriteBehind+0x11b
f78e6d80 80880469 8658c1f8 00000000 8658e020 nt!CcWorkerThread+0x15a f78e6dac 80949b7c 8658c1f8 00000000 00000000 nt!ExpWorkerThread+0xeb f78e6ddc 8088e092 8088037e 00000000 00000000 nt!PspSystemThreadStartup+0x2e
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

STACK_COMMAND: kb

FOLLOWUP_IP:
mfehidk+7426
f5d8a426 8945f8 mov dword ptr [ebp-8],eax

SYMBOL_STACK_INDEX: e

SYMBOL_NAME: mfehidk+7426

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: mfehidk

IMAGE_NAME: mfehidk.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 4564d4e5

FAILURE_BUCKET_ID: 0x4E_9a_mfehidk+7426

BUCKET_ID: 0x4E_9a_mfehidk+7426

Followup: MachineOwner


NTFSD is sponsored by OSR

For our schedule of debugging and file system seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Thanks Pavel
I’ve passed on a request to have our customer set the TrackLockedPages value. We’ve already asked them to run with verifier. It works fine on verifier here but then the problem doesn’t manifest itself here so the root cause is probably enviornment specific.

One common factor is that in each case it occurred on Windows 2003. Unforunately we cannot set up a 2003 checked environment because we don’t have MSDN access to ge tthe checked build. I’ll see if I can find anything from the memory search but it’s turning up a lot of hits…

Regards

mark

>One common factor is that in each case it occurred on Windows 2003.

Unforunately we cannot set up a 2003 checked environment because we don’t
>have MSDN access to ge tthe checked build. I’ll see if I can find anything
from the memory search but it’s turning up a lot of hits…

Note that the checked Service Packs are available for free download. For
example:

http://www.microsoft.com/en-us/download/details.aspx?id=9445

While it’s not the entire checked distribution, usually it’s sufficient to
simply extract the checked kernel and HAL and run with those (particularly
in the case of MDL issues).

-scott


Scott Noone
Consulting Associate and Chief System Problem Analyst
OSR Open Systems Resources, Inc.
http://www.osronline.com

wrote in message news:xxxxx@ntfsd…

Thanks Pavel
I’ve passed on a request to have our customer set the TrackLockedPages
value. We’ve already asked them to run with verifier. It works fine on
verifier here but then the problem doesn’t manifest itself here so the root
cause is probably enviornment specific.

One common factor is that in each case it occurred on Windows 2003.
Unforunately we cannot set up a 2003 checked environment because we don’t
have MSDN access to ge tthe checked build. I’ll see if I can find anything
from the memory search but it’s turning up a lot of hits…

Regards

mark

Hi Scott
worked a treat. Many thanks for the steer

If you ever use partial MDLs - check twice that they do not outlive their master MDLs and are correctly destroyed at proper moments.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

wrote in message news:xxxxx@ntfsd…
> Hi all
> A PFN_LIST_CORRUPT memory dump similar to the one below has been reported by three

Hello
sorry to dredge up an old thread but I’ve been waiting for the customer to get back to us after enabling verifier and the TrackLockedPages registry key suggested by Pavel. In the event the latest bugcheck was another PFN_LIST_CORRUPT and not the verifier crash I was hoping for.

The !lockedpages windbg extension reports “Unable to get lock header data”. I’ve tried enabling TrackedLockedPages on a similar Windows 2003 environment in house and this too reports the same error. Not sure why but not much help there :frowning:

But at least the latest crash happened in our own driver which is the first time so at least I know how we were using the memory. In this case we allocated a 4K block of Memory which was page aligned at address 891d5000 and the crash occurred when we freed this block of memory off in the same function. As before, the crash was a result of an elevated reference count on the PFN which was reported as 0x8394. And sure enough this maps onto our 4K block

0: kd> !pfn 00008394
PFN 00008394 at address 814E6430
flink 00000000 blink / share count 00000001 pteaddress C0448EA8
reference count 0002 Cached color 0
restore pte 00000000 containing page 009443 Active R V
ReadInProgress VerifierAllocation

0: kd> !pte C0448EA8
VA 891d5000
PDE at C0602240 PTE at C0448EA8
contains 0000000009443863 contains 0000000008394963
pfn 9443 —DA–KWEV pfn 8394 -G-DA–KWEV

In the absence of any other ideas, I did a search for all "Mdl " memory tags and looked at the mapped addresses. This found the following MDL with a start VA that equals my buffer

0: kd> dt _mdl 8997acf0
ntdll!_MDL
+0x000 Next : (null)
+0x004 Size : 0n32
+0x006 MdlFlags : 0n139
+0x008 Process : (null)
+0x00c MappedSystemVa : 0xf7a86aa8 Void
+0x010 StartVa : 0x891d5000 Void
+0x014 ByteCount : 0x29c
+0x018 ByteOffset : 0xaa8

!pte 0xf7a86aa8
VA f7a86aa8
PDE at C0603DE8 PTE at C07BD430
contains 00000000093C4863 contains 0000000008394963
pfn 93c4 —DA–KWEV pfn 8394 -G-DA–KWEV

Not really sure how to interpret this because I don’t understand the difference between the MappedSystemVA and StartVA in the MDL. Both addresses have the same PFN though which surprised me as my understanding is that a PFN can only be mapped to a single PTE.

Would it be reasonable to assume that something has freed off the buffer referenced by the startVA in this MDL (thereby making it available for reallocation) but failed to unlock the page (hence the elevated reference count on the PFN)?

Regards

Mark

>The !lockedpages windbg extension reports “Unable to get lock header data”.

I’ve tried enabling TrackedLockedPages on a similar Windows 2003
>environment in house and this too reports the same error. Not sure why but
not much help there :frowning:

Sounds like it’s broken with public symbols. Can you run the following and
then try !lockedpages again? It won’t fix the problem, but it should
indicate which data types cannot be found:

.show_sym_failures /s /t

Maybe they are documented elsewhere, which would allow you to manually add
the types into the NT PDB.

the crash occurred when we freed this block of memory off in the same
function

What did you do with the memory between allocating it and freeing it?

Not really sure how to interpret this because I don’t understand the
difference between the MappedSystemVA and StartVA in the MDL.

StartVA is the virtual address used when allocating the MDL. This address is
potentially process context specific. MappedSystemVA is the address returned
from MmGetSystemAddressForMdlSafe, which is process context independent.

Both addresses have the same PFN though which surprised me as my
understanding is that a PFN can only be mapped to a single PTE.

That is not correct, multiple virtual addresses can certainly map to the
same PFN. This is how shared memory works, for example. I think you’re
confusing this with the fact that a PFN entry tracks a single PTE that maps
to it, though that doesn’t mean there can’t be more than one PTE.

In the MDL that you dumped, the flags value of 0n139 (0x8b) maps to:

MDL_MAPPED_TO_SYSTEM_VA |
MDL_PAGES_LOCKED |
MDL_ALLOCATED_FIXED_SIZE |
MDL_WRITE_OPERATION

Which would certainly seem to me that the MDL still has the underlying pages
locked, thus the bugcheck is correct here. This is why I’m wondering what
you did with the buffer between allocating it and freeing it.

-scott

wrote in message news:xxxxx@ntfsd…

Hello
sorry to dredge up an old thread but I’ve been waiting for the customer to
get back to us after enabling verifier and the TrackLockedPages registry key
suggested by Pavel. In the event the latest bugcheck was another
PFN_LIST_CORRUPT and not the verifier crash I was hoping for.

The !lockedpages windbg extension reports “Unable to get lock header data”.
I’ve tried enabling TrackedLockedPages on a similar Windows 2003
environment in house and this too reports the same error. Not sure why but
not much help there :frowning:

But at least the latest crash happened in our own driver which is the first
time so at least I know how we were using the memory. In this case we
allocated a 4K block of Memory which was page aligned at address 891d5000
and the crash occurred when we freed this block of memory off in the same
function. As before, the crash was a result of an elevated reference count
on the PFN which was reported as 0x8394. And sure enough this maps onto our
4K block

0: kd> !pfn 00008394
PFN 00008394 at address 814E6430
flink 00000000 blink / share count 00000001 pteaddress C0448EA8
reference count 0002 Cached color 0
restore pte 00000000 containing page 009443 Active R V
ReadInProgress VerifierAllocation

0: kd> !pte C0448EA8
VA 891d5000
PDE at C0602240 PTE at C0448EA8
contains 0000000009443863 contains 0000000008394963
pfn 9443 —DA–KWEV pfn 8394 -G-DA–KWEV

In the absence of any other ideas, I did a search for all "Mdl " memory tags
and looked at the mapped addresses. This found the following MDL with a
start VA that equals my buffer

0: kd> dt _mdl 8997acf0
ntdll!_MDL
+0x000 Next : (null)
+0x004 Size : 0n32
+0x006 MdlFlags : 0n139
+0x008 Process : (null)
+0x00c MappedSystemVa : 0xf7a86aa8 Void
+0x010 StartVa : 0x891d5000 Void
+0x014 ByteCount : 0x29c
+0x018 ByteOffset : 0xaa8

!pte 0xf7a86aa8
VA f7a86aa8
PDE at C0603DE8 PTE at C07BD430
contains 00000000093C4863 contains 0000000008394963
pfn 93c4 —DA–KWEV pfn 8394 -G-DA–KWEV

Not really sure how to interpret this because I don’t understand the
difference between the MappedSystemVA and StartVA in the MDL. Both
addresses have the same PFN though which surprised me as my understanding is
that a PFN can only be mapped to a single PTE.

Would it be reasonable to assume that something has freed off the buffer
referenced by the startVA in this MDL (thereby making it available for
reallocation) but failed to unlock the page (hence the elevated reference
count on the PFN)?

Regards

Mark

What’s the output of !analyze -v ?


From: Scott Noone
Sent: 9/7/2012 6:38 AM
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] PFN_LIST_CORRUPT

The !lockedpages windbg extension reports “Unable to get lock header data”.
I’ve tried enabling TrackedLockedPages on a similar Windows 2003
>environment in house and this too reports the same error. Not sure why but
not much help there :frowning:

Sounds like it’s broken with public symbols. Can you run the following and
then try !lockedpages again? It won’t fix the problem, but it should
indicate which data types cannot be found:

.show_sym_failures /s /t

Maybe they are documented elsewhere, which would allow you to manually add
the types into the NT PDB.

the crash occurred when we freed this block of memory off in the same
function

What did you do with the memory between allocating it and freeing it?

Not really sure how to interpret this because I don’t understand the
difference between the MappedSystemVA and StartVA in the MDL.

StartVA is the virtual address used when allocating the MDL. This address is
potentially process context specific. MappedSystemVA is the address returned
from MmGetSystemAddressForMdlSafe, which is process context independent.

Both addresses have the same PFN though which surprised me as my
understanding is that a PFN can only be mapped to a single PTE.

That is not correct, multiple virtual addresses can certainly map to the
same PFN. This is how shared memory works, for example. I think you’re
confusing this with the fact that a PFN entry tracks a single PTE that maps
to it, though that doesn’t mean there can’t be more than one PTE.

In the MDL that you dumped, the flags value of 0n139 (0x8b) maps to:

MDL_MAPPED_TO_SYSTEM_VA |
MDL_PAGES_LOCKED |
MDL_ALLOCATED_FIXED_SIZE |
MDL_WRITE_OPERATION

Which would certainly seem to me that the MDL still has the underlying pages
locked, thus the bugcheck is correct here. This is why I’m wondering what
you did with the buffer between allocating it and freeing it.

-scott

wrote in message news:xxxxx@ntfsd…

Hello
sorry to dredge up an old thread but I’ve been waiting for the customer to
get back to us after enabling verifier and the TrackLockedPages registry key
suggested by Pavel. In the event the latest bugcheck was another
PFN_LIST_CORRUPT and not the verifier crash I was hoping for.

The !lockedpages windbg extension reports “Unable to get lock header data”.
I’ve tried enabling TrackedLockedPages on a similar Windows 2003
environment in house and this too reports the same error. Not sure why but
not much help there :frowning:

But at least the latest crash happened in our own driver which is the first
time so at least I know how we were using the memory. In this case we
allocated a 4K block of Memory which was page aligned at address 891d5000
and the crash occurred when we freed this block of memory off in the same
function. As before, the crash was a result of an elevated reference count
on the PFN which was reported as 0x8394. And sure enough this maps onto our
4K block

0: kd> !pfn 00008394
PFN 00008394 at address 814E6430
flink 00000000 blink / share count 00000001 pteaddress C0448EA8
reference count 0002 Cached color 0
restore pte 00000000 containing page 009443 Active R V
ReadInProgress VerifierAllocation

0: kd> !pte C0448EA8
VA 891d5000
PDE at C0602240 PTE at C0448EA8
contains 0000000009443863 contains 0000000008394963
pfn 9443 —DA–KWEV pfn 8394 -G-DA–KWEV

In the absence of any other ideas, I did a search for all "Mdl " memory tags
and looked at the mapped addresses. This found the following MDL with a
start VA that equals my buffer

0: kd> dt _mdl 8997acf0
ntdll!_MDL
+0x000 Next : (null)
+0x004 Size : 0n32
+0x006 MdlFlags : 0n139
+0x008 Process : (null)
+0x00c MappedSystemVa : 0xf7a86aa8 Void
+0x010 StartVa : 0x891d5000 Void
+0x014 ByteCount : 0x29c
+0x018 ByteOffset : 0xaa8

!pte 0xf7a86aa8
VA f7a86aa8
PDE at C0603DE8 PTE at C07BD430
contains 00000000093C4863 contains 0000000008394963
pfn 93c4 —DA–KWEV pfn 8394 -G-DA–KWEV

Not really sure how to interpret this because I don’t understand the
difference between the MappedSystemVA and StartVA in the MDL. Both
addresses have the same PFN though which surprised me as my understanding is
that a PFN can only be mapped to a single PTE.

Would it be reasonable to assume that something has freed off the buffer
referenced by the startVA in this MDL (thereby making it available for
reallocation) but failed to unlock the page (hence the elevated reference
count on the PFN)?

Regards

Mark


NTFSD is sponsored by OSR

For our schedule of debugging and file system seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Hi Scott
Haven’t come acrosss .show_sym_failures before. Very useful. When I set this I see the following error:

0: kd> !lockedpages
Process: ffffffff895dace0
type lookup ‘nt!_LOCK_HEADER’ failure.
894e8d68: Unable to get lock header data.

In terms of what we do with the buffer, it’s very little really. We allocate a 4K buffer using ExAllocatepoolWithTag and use this to store a UNICODE_STRING that we compare to values in a list. We then free this with ExFreePool at the end of the function and this is where the bugcheck occurs. Note though that this is the first time this problem has manifested itself in our driver. The last time this occurred it was an internal memory buffer in NTFS itself which, like ours, was allocated and freed in the same function. I think the root problem is that the same page is already mapped elsewhere (but freed and thereby available for reallocation) so it happens to be whichever driver received that memory block from an allocation request.

The analyze -v output is:

PFN_LIST_CORRUPT (4e)
Typically caused by drivers passing bad memory descriptor lists (ie: calling
MmUnlockPages twice with the same list, etc). If a kernel debugger is
available get the stack trace.
Arguments:
Arg1: 0000009a,
Arg2: 00008394
Arg3: 00000006
Arg4: 00000002

Debugging Details:

type lookup ‘nt!_MM_DRIVER_VERIFIER_DATA’ failure, field ‘OptionChanges’ not found.
type lookup ‘nt!_MM_DRIVER_VERIFIER_DATA’ failure, field ‘VerifyMode’ not found.
type lookup ‘nt!_KPRCB’ failure, field ‘WaitLock’ not found.
type lookup ‘nt!KPRCB’ failure.
sym lookup ‘nt!_KiDoubleFaultStack’ failure
sym lookup ‘nt!_KiServiceTable’ failure
sym lookup ‘hal!_HalpRealModeStart’ failure

BUGCHECK_STR: 0x4E_9a

DEFAULT_BUCKET_ID: DRIVER_FAULT

PROCESS_NAME: Nhstw32.exe

CURRENT_IRQL: 0

LAST_CONTROL_TRANSFER: from 8086598b to 80827c83

STACK_TEXT:
b9a25944 8086598b 0000004e 0000009a 00008394 nt!KeBugCheckEx+0x1b
b9a25960 80891779 814e6430 808aeae0 012a7e10 nt!MiBadRefCount+0x33
b9a25998 808925bb 891d5000 891d5000 b9a25b34 nt!MiFreePoolPages+0x5c9
b9a259f0 bab8d3ee 5267476c 00000000 031b44f0 nt!ExFreePoolWithTag+0x277
b9a25a30 bab8d423 b9a25b34 b9a25a80 808b7b45 eps!RegPreCreateKeyEx+0x98 [c:\buildwork\60\409\sources\eps\latest\source\bootstrap\registry.c @ 638]
b9a25a3c 808b7b45 00000000 0000001a b9a25b34 eps!RegistryCallback+0x25 [c:\buildwork\60\409\sources\eps\latest\source\bootstrap\registry.c @ 666]
b9a25a80 808da118 0000001a b9a25b34 00000001 nt!CmpCallCallBacks+0xa7
b9a25b6c 809374b1 e254a380 b9a25cbc 895624c0 nt!CmpParseKey+0xd4
b9a25bec 80933a76 0000035c b9a25c2c 00000040 nt!ObpLookupObjectName+0x11f
b9a25c40 808b9cbf 00000000 8a0b4e70 0017dd01 nt!ObOpenObjectByName+0xea
b9a25d40 808897bc 0012ceec 02000000 0012ca24 nt!NtCreateKey+0x2d3
b9a25d40 7c82860c 0012ceec 02000000 0012ca24 nt!KiFastCallEntry+0xfc
0012c9f0 7c826e49 7d20718c 0012ceec 02000000 ntdll!KiFastSystemCallRet
0012c9f4 7d20718c 0012ceec 02000000 0012ca24 ntdll!NtCreateKey+0xc
0012cc10 7d20730c 0000035c 0012cc58 0012cc60 ADVAPI32!LocalBaseRegCreateKey+0x293
0012cc6c 7c914f15 0000035c 0012ccb4 00000000 ADVAPI32!RegCreateKeyExW+0xf1
0012ceb8 7c915071 7c8d7b5c 00000358 7c8d926c SHELL32!_OpenKeyForFolder+0xf8
0012cedc 7c90ecd6 00000000 7c8d7b5c 00000000 SHELL32!_GetFolderPath+0x8f
0012cf08 7c92dde7 00000000 7c8d7b5c 00000358 SHELL32!_GetFolderPathCached+0x34
0012cf38 766d13be 00000000 00000005 00000358 SHELL32!SHGetFolderPathW+0x9b
0012cf5c 766d135d 00000000 00000005 00000000 shfolder!_SHGetFolderPath+0x3e
0012cf84 766d1501 00000000 00000005 00000000 shfolder!SHGetFolderPathW+0x2d
0012d1b4 200c23b8 00000000 00000005 00000354 shfolder!SHGetFolderPathA+0x3b
WARNING: Stack unwind information not available. Following frames may be wrong.
0012d2d4 200c27cc 00000005 200dbd58 0012d308 dwutil!dwuGetEnvironmentVariableA+0x412
0012d2e4 200c1c24 200c1bca 00406f87 00000000 dwutil!dwuGetEnvironmentVariableA+0x826
0012d308 00419dc3 0012d738 00526584 00ae3970 dwutil!dwuProperDirsReInit+0x6d
0012d320 00402d1d 00ae3b24 00000001 00000001 NHSTW32!_GetExceptDLLinfo+0x18d7d
0012d344 00402dc3 00ae3970 00000001 00000001 NHSTW32!_GetExceptDLLinfo+0x1cd7
0012d35c 2002a440 00ae3970 00000000 00000000 NHSTW32!_GetExceptDLLinfo+0x1d7d
0012d370 20031084 00ae3970 00402d3c 00000000 DWRT32!ags+0x24
0012d390 2004fb15 00ae3a47 0012d418 00000000 DWRT32!aof+0x28
0012d3e4 20050446 00ae3a53 0012d418 00000000 DWRT32!agh+0x97
0012d440 20050546 00ae3a53 0000060a 00000000 DWRT32!bdg+0x1b9
0012d474 200505cd 00ae3a53 0000060a 00000000 DWRT32!bdg+0x2b9
0012d4c0 2005056e 00ae3a53 0000060a 00000000 DWRT32!bdg+0x340
0012d4d8 7739b6e3 000100bc 0000060a 00000000 DWRT32!bdg+0x2e1
0012d504 7739b874 01ed01e0 000100bc 0000060a USER32!InternalCallWinProc+0x28
0012d57c 7739ba92 00000000 01ed01e0 000100bc USER32!UserCallWinProcCheckWow+0x151
0012d5e4 773a16e5 0012d644 00000001 0012d628 USER32!DispatchMessageWorker+0x327
0012d5f4 20053d9b 0012d644 0012d749 00000001 USER32!DispatchMessageA+0xf
0012d628 2001a6f3 0012d738 0012d644 00000000 DWRT32!ahk+0xab
0012d660 20053a63 0012d749 00000000 2009dca4 DWRT32!afm+0x36
0012d6b4 2005365f 0012d738 00000000 2009dca4 DWRT32!ahh+0x54
0012d708 00401e70 0012d738 0012ff58 00523052 DWRT32!ahl+0x58
0012ff3c 0050c1f2 00000001 00ae338c 00000000 NHSTW32!_GetExceptDLLinfo+0xe2a
0012ff84 200116e2 00400000 00000000 001624d5 NHSTW32!_GetExceptDLLinfo+0x10b1ac
0012ffbc 0050925a 77e6f23b 00000000 00000000 DWRT32!acm+0x132
0012fff0 00000000 00509248 00000000 78746341 NHSTW32!_GetExceptDLLinfo+0x108214

STACK_COMMAND: kb

FOLLOWUP_IP:
eps!RegPreCreateKeyEx+98 [c:\buildwork\60\409\sources\eps\latest\source\bootstrap\registry.c @ 638]
bab8d3ee 8b45e0 mov eax,dword ptr [ebp-20h]

SYMBOL_STACK_INDEX: 4

SYMBOL_NAME: eps!RegPreCreateKeyEx+98

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: eps

IMAGE_NAME: eps.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 4fa005a4

FAILURE_BUCKET_ID: 0x4E_9a_VRF_eps!RegPreCreateKeyEx+98

BUCKET_ID: 0x4E_9a_VRF_eps!RegPreCreateKeyEx+98

Followup: MachineOwner

Regards

Mark

>type lookup ‘nt!_LOCK_HEADER’ failure.

894e8d68: Unable to get lock header data.

I can reproduce this problem. Walking through the !lockedpages code a bit I
can see there is at least one other structure that is also missing from the
public symbols (_LOCK_TRACKER). I filed a bug against the extension command
for this, though I wouldn’t expect it to be fixed in the immediate future.

All signs really do point to someone misbehaving and leaking the MDL. Too
bad !lockedpages isn’t working, it might have a clue.

-scott

wrote in message news:xxxxx@ntfsd…

Hi Scott
Haven’t come acrosss .show_sym_failures before. Very useful. When I set this
I see the following error:

0: kd> !lockedpages
Process: ffffffff895dace0
type lookup ‘nt!_LOCK_HEADER’ failure.
894e8d68: Unable to get lock header data.

In terms of what we do with the buffer, it’s very little really. We allocate
a 4K buffer using ExAllocatepoolWithTag and use this to store a
UNICODE_STRING that we compare to values in a list. We then free this with
ExFreePool at the end of the function and this is where the bugcheck occurs.
Note though that this is the first time this problem has manifested itself
in our driver. The last time this occurred it was an internal memory buffer
in NTFS itself which, like ours, was allocated and freed in the same
function. I think the root problem is that the same page is already mapped
elsewhere (but freed and thereby available for reallocation) so it happens
to be whichever driver received that memory block from an allocation
request.

The analyze -v output is:

PFN_LIST_CORRUPT (4e)
Typically caused by drivers passing bad memory descriptor lists (ie: calling
MmUnlockPages twice with the same list, etc). If a kernel debugger is
available get the stack trace.
Arguments:
Arg1: 0000009a,
Arg2: 00008394
Arg3: 00000006
Arg4: 00000002

Debugging Details:

type lookup ‘nt!_MM_DRIVER_VERIFIER_DATA’ failure, field ‘OptionChanges’ not
found.
type lookup ‘nt!_MM_DRIVER_VERIFIER_DATA’ failure, field ‘VerifyMode’ not
found.
type lookup ‘nt!_KPRCB’ failure, field ‘WaitLock’ not found.
type lookup ‘nt!KPRCB’ failure.
sym lookup ‘nt!_KiDoubleFaultStack’ failure
sym lookup ‘nt!_KiServiceTable’ failure
sym lookup ‘hal!_HalpRealModeStart’ failure

BUGCHECK_STR: 0x4E_9a

DEFAULT_BUCKET_ID: DRIVER_FAULT

PROCESS_NAME: Nhstw32.exe

CURRENT_IRQL: 0

LAST_CONTROL_TRANSFER: from 8086598b to 80827c83

STACK_TEXT:
b9a25944 8086598b 0000004e 0000009a 00008394 nt!KeBugCheckEx+0x1b
b9a25960 80891779 814e6430 808aeae0 012a7e10 nt!MiBadRefCount+0x33
b9a25998 808925bb 891d5000 891d5000 b9a25b34 nt!MiFreePoolPages+0x5c9
b9a259f0 bab8d3ee 5267476c 00000000 031b44f0 nt!ExFreePoolWithTag+0x277
b9a25a30 bab8d423 b9a25b34 b9a25a80 808b7b45 eps!RegPreCreateKeyEx+0x98
[c:\buildwork\60\409\sources\eps\latest\source\bootstrap\registry.c @ 638]
b9a25a3c 808b7b45 00000000 0000001a b9a25b34 eps!RegistryCallback+0x25
[c:\buildwork\60\409\sources\eps\latest\source\bootstrap\registry.c @ 666]
b9a25a80 808da118 0000001a b9a25b34 00000001 nt!CmpCallCallBacks+0xa7
b9a25b6c 809374b1 e254a380 b9a25cbc 895624c0 nt!CmpParseKey+0xd4
b9a25bec 80933a76 0000035c b9a25c2c 00000040 nt!ObpLookupObjectName+0x11f
b9a25c40 808b9cbf 00000000 8a0b4e70 0017dd01 nt!ObOpenObjectByName+0xea
b9a25d40 808897bc 0012ceec 02000000 0012ca24 nt!NtCreateKey+0x2d3
b9a25d40 7c82860c 0012ceec 02000000 0012ca24 nt!KiFastCallEntry+0xfc
0012c9f0 7c826e49 7d20718c 0012ceec 02000000 ntdll!KiFastSystemCallRet
0012c9f4 7d20718c 0012ceec 02000000 0012ca24 ntdll!NtCreateKey+0xc
0012cc10 7d20730c 0000035c 0012cc58 0012cc60
ADVAPI32!LocalBaseRegCreateKey+0x293
0012cc6c 7c914f15 0000035c 0012ccb4 00000000 ADVAPI32!RegCreateKeyExW+0xf1
0012ceb8 7c915071 7c8d7b5c 00000358 7c8d926c SHELL32!_OpenKeyForFolder+0xf8
0012cedc 7c90ecd6 00000000 7c8d7b5c 00000000 SHELL32!_GetFolderPath+0x8f
0012cf08 7c92dde7 00000000 7c8d7b5c 00000358
SHELL32!_GetFolderPathCached+0x34
0012cf38 766d13be 00000000 00000005 00000358 SHELL32!SHGetFolderPathW+0x9b
0012cf5c 766d135d 00000000 00000005 00000000 shfolder!_SHGetFolderPath+0x3e
0012cf84 766d1501 00000000 00000005 00000000 shfolder!SHGetFolderPathW+0x2d
0012d1b4 200c23b8 00000000 00000005 00000354 shfolder!SHGetFolderPathA+0x3b
WARNING: Stack unwind information not available. Following frames may be
wrong.
0012d2d4 200c27cc 00000005 200dbd58 0012d308
dwutil!dwuGetEnvironmentVariableA+0x412
0012d2e4 200c1c24 200c1bca 00406f87 00000000
dwutil!dwuGetEnvironmentVariableA+0x826
0012d308 00419dc3 0012d738 00526584 00ae3970 dwutil!dwuProperDirsReInit+0x6d
0012d320 00402d1d 00ae3b24 00000001 00000001
NHSTW32!_GetExceptDLLinfo+0x18d7d
0012d344 00402dc3 00ae3970 00000001 00000001
NHSTW32!_GetExceptDLLinfo+0x1cd7
0012d35c 2002a440 00ae3970 00000000 00000000
NHSTW32!_GetExceptDLLinfo+0x1d7d
0012d370 20031084 00ae3970 00402d3c 00000000 DWRT32!ags+0x24
0012d390 2004fb15 00ae3a47 0012d418 00000000 DWRT32!aof+0x28
0012d3e4 20050446 00ae3a53 0012d418 00000000 DWRT32!agh+0x97
0012d440 20050546 00ae3a53 0000060a 00000000 DWRT32!bdg+0x1b9
0012d474 200505cd 00ae3a53 0000060a 00000000 DWRT32!bdg+0x2b9
0012d4c0 2005056e 00ae3a53 0000060a 00000000 DWRT32!bdg+0x340
0012d4d8 7739b6e3 000100bc 0000060a 00000000 DWRT32!bdg+0x2e1
0012d504 7739b874 01ed01e0 000100bc 0000060a USER32!InternalCallWinProc+0x28
0012d57c 7739ba92 00000000 01ed01e0 000100bc
USER32!UserCallWinProcCheckWow+0x151
0012d5e4 773a16e5 0012d644 00000001 0012d628
USER32!DispatchMessageWorker+0x327
0012d5f4 20053d9b 0012d644 0012d749 00000001 USER32!DispatchMessageA+0xf
0012d628 2001a6f3 0012d738 0012d644 00000000 DWRT32!ahk+0xab
0012d660 20053a63 0012d749 00000000 2009dca4 DWRT32!afm+0x36
0012d6b4 2005365f 0012d738 00000000 2009dca4 DWRT32!ahh+0x54
0012d708 00401e70 0012d738 0012ff58 00523052 DWRT32!ahl+0x58
0012ff3c 0050c1f2 00000001 00ae338c 00000000 NHSTW32!_GetExceptDLLinfo+0xe2a
0012ff84 200116e2 00400000 00000000 001624d5
NHSTW32!_GetExceptDLLinfo+0x10b1ac
0012ffbc 0050925a 77e6f23b 00000000 00000000 DWRT32!acm+0x132
0012fff0 00000000 00509248 00000000 78746341
NHSTW32!_GetExceptDLLinfo+0x108214

STACK_COMMAND: kb

FOLLOWUP_IP:
eps!RegPreCreateKeyEx+98
[c:\buildwork\60\409\sources\eps\latest\source\bootstrap\registry.c @ 638]
bab8d3ee 8b45e0 mov eax,dword ptr [ebp-20h]

SYMBOL_STACK_INDEX: 4

SYMBOL_NAME: eps!RegPreCreateKeyEx+98

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: eps

IMAGE_NAME: eps.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 4fa005a4

FAILURE_BUCKET_ID: 0x4E_9a_VRF_eps!RegPreCreateKeyEx+98

BUCKET_ID: 0x4E_9a_VRF_eps!RegPreCreateKeyEx+98

Followup: MachineOwner

Regards

Mark

There is often a situation in which “ownership” is an issue; that is,
under some conditions, a copy of the string is made, and you retain
ownership of the input parameter. In other cases, you effective hand over
ownership of the string to some component, and therefore must not delete
it.

If you allocated too little space for your UNICODE_STRING buffer, you
might be clobbering an innocent bystander.

In order to understand the code, we would have to see it; your allocation
of memory, the way you hand the string off, what you are handing it off
too, how you delete it, etc. The PTP (Psychic Transfer Protocol) is not
implemented on any of my machines, not PTP-v4 nor PTP-v6, so I can’t see
your code unless you post it.

Note that when posting code you must show the declarations of all
variables involved in the code, and show how they are initialized to
meaningful values, and what their scope and extent is. Don’t say
UNICODE_STRING str;
unless you say
“In my device extension, I declare:”
or
“At the start of my function, I declare:”
as part of the description.
joe

Hi Scott
Haven’t come acrosss .show_sym_failures before. Very useful. When I set
this I see the following error:

0: kd> !lockedpages
Process: ffffffff895dace0
type lookup ‘nt!_LOCK_HEADER’ failure.
894e8d68: Unable to get lock header data.

In terms of what we do with the buffer, it’s very little really. We
allocate a 4K buffer using ExAllocatepoolWithTag and use this to store a
UNICODE_STRING that we compare to values in a list. We then free this with
ExFreePool at the end of the function and this is where the bugcheck
occurs. Note though that this is the first time this problem has
manifested itself in our driver. The last time this occurred it was an
internal memory buffer in NTFS itself which, like ours, was allocated and
freed in the same function. I think the root problem is that the same page
is already mapped elsewhere (but freed and thereby available for
reallocation) so it happens to be whichever driver received that memory
block from an allocation request.

The analyze -v output is:

PFN_LIST_CORRUPT (4e)
Typically caused by drivers passing bad memory descriptor lists (ie:
calling
MmUnlockPages twice with the same list, etc). If a kernel debugger is
available get the stack trace.
Arguments:
Arg1: 0000009a,
Arg2: 00008394
Arg3: 00000006
Arg4: 00000002

Debugging Details:

type lookup ‘nt!_MM_DRIVER_VERIFIER_DATA’ failure, field ‘OptionChanges’
not found.
type lookup ‘nt!_MM_DRIVER_VERIFIER_DATA’ failure, field ‘VerifyMode’ not
found.
type lookup ‘nt!_KPRCB’ failure, field ‘WaitLock’ not found.
type lookup ‘nt!KPRCB’ failure.
sym lookup ‘nt!_KiDoubleFaultStack’ failure
sym lookup ‘nt!_KiServiceTable’ failure
sym lookup ‘hal!_HalpRealModeStart’ failure

BUGCHECK_STR: 0x4E_9a

DEFAULT_BUCKET_ID: DRIVER_FAULT

PROCESS_NAME: Nhstw32.exe

CURRENT_IRQL: 0

LAST_CONTROL_TRANSFER: from 8086598b to 80827c83

STACK_TEXT:
b9a25944 8086598b 0000004e 0000009a 00008394 nt!KeBugCheckEx+0x1b
b9a25960 80891779 814e6430 808aeae0 012a7e10 nt!MiBadRefCount+0x33
b9a25998 808925bb 891d5000 891d5000 b9a25b34 nt!MiFreePoolPages+0x5c9
b9a259f0 bab8d3ee 5267476c 00000000 031b44f0 nt!ExFreePoolWithTag+0x277
b9a25a30 bab8d423 b9a25b34 b9a25a80 808b7b45 eps!RegPreCreateKeyEx+0x98
[c:\buildwork\60\409\sources\eps\latest\source\bootstrap\registry.c @ 638]
b9a25a3c 808b7b45 00000000 0000001a b9a25b34 eps!RegistryCallback+0x25
[c:\buildwork\60\409\sources\eps\latest\source\bootstrap\registry.c @ 666]
b9a25a80 808da118 0000001a b9a25b34 00000001 nt!CmpCallCallBacks+0xa7
b9a25b6c 809374b1 e254a380 b9a25cbc 895624c0 nt!CmpParseKey+0xd4
b9a25bec 80933a76 0000035c b9a25c2c 00000040 nt!ObpLookupObjectName+0x11f
b9a25c40 808b9cbf 00000000 8a0b4e70 0017dd01 nt!ObOpenObjectByName+0xea
b9a25d40 808897bc 0012ceec 02000000 0012ca24 nt!NtCreateKey+0x2d3
b9a25d40 7c82860c 0012ceec 02000000 0012ca24 nt!KiFastCallEntry+0xfc
0012c9f0 7c826e49 7d20718c 0012ceec 02000000 ntdll!KiFastSystemCallRet
0012c9f4 7d20718c 0012ceec 02000000 0012ca24 ntdll!NtCreateKey+0xc
0012cc10 7d20730c 0000035c 0012cc58 0012cc60
ADVAPI32!LocalBaseRegCreateKey+0x293
0012cc6c 7c914f15 0000035c 0012ccb4 00000000 ADVAPI32!RegCreateKeyExW+0xf1
0012ceb8 7c915071 7c8d7b5c 00000358 7c8d926c
SHELL32!_OpenKeyForFolder+0xf8
0012cedc 7c90ecd6 00000000 7c8d7b5c 00000000 SHELL32!_GetFolderPath+0x8f
0012cf08 7c92dde7 00000000 7c8d7b5c 00000358
SHELL32!_GetFolderPathCached+0x34
0012cf38 766d13be 00000000 00000005 00000358 SHELL32!SHGetFolderPathW+0x9b
0012cf5c 766d135d 00000000 00000005 00000000
shfolder!_SHGetFolderPath+0x3e
0012cf84 766d1501 00000000 00000005 00000000
shfolder!SHGetFolderPathW+0x2d
0012d1b4 200c23b8 00000000 00000005 00000354
shfolder!SHGetFolderPathA+0x3b
WARNING: Stack unwind information not available. Following frames may be
wrong.
0012d2d4 200c27cc 00000005 200dbd58 0012d308
dwutil!dwuGetEnvironmentVariableA+0x412
0012d2e4 200c1c24 200c1bca 00406f87 00000000
dwutil!dwuGetEnvironmentVariableA+0x826
0012d308 00419dc3 0012d738 00526584 00ae3970
dwutil!dwuProperDirsReInit+0x6d
0012d320 00402d1d 00ae3b24 00000001 00000001
NHSTW32!_GetExceptDLLinfo+0x18d7d
0012d344 00402dc3 00ae3970 00000001 00000001
NHSTW32!_GetExceptDLLinfo+0x1cd7
0012d35c 2002a440 00ae3970 00000000 00000000
NHSTW32!_GetExceptDLLinfo+0x1d7d
0012d370 20031084 00ae3970 00402d3c 00000000 DWRT32!ags+0x24
0012d390 2004fb15 00ae3a47 0012d418 00000000 DWRT32!aof+0x28
0012d3e4 20050446 00ae3a53 0012d418 00000000 DWRT32!agh+0x97
0012d440 20050546 00ae3a53 0000060a 00000000 DWRT32!bdg+0x1b9
0012d474 200505cd 00ae3a53 0000060a 00000000 DWRT32!bdg+0x2b9
0012d4c0 2005056e 00ae3a53 0000060a 00000000 DWRT32!bdg+0x340
0012d4d8 7739b6e3 000100bc 0000060a 00000000 DWRT32!bdg+0x2e1
0012d504 7739b874 01ed01e0 000100bc 0000060a
USER32!InternalCallWinProc+0x28
0012d57c 7739ba92 00000000 01ed01e0 000100bc
USER32!UserCallWinProcCheckWow+0x151
0012d5e4 773a16e5 0012d644 00000001 0012d628
USER32!DispatchMessageWorker+0x327
0012d5f4 20053d9b 0012d644 0012d749 00000001 USER32!DispatchMessageA+0xf
0012d628 2001a6f3 0012d738 0012d644 00000000 DWRT32!ahk+0xab
0012d660 20053a63 0012d749 00000000 2009dca4 DWRT32!afm+0x36
0012d6b4 2005365f 0012d738 00000000 2009dca4 DWRT32!ahh+0x54
0012d708 00401e70 0012d738 0012ff58 00523052 DWRT32!ahl+0x58
0012ff3c 0050c1f2 00000001 00ae338c 00000000
NHSTW32!_GetExceptDLLinfo+0xe2a
0012ff84 200116e2 00400000 00000000 001624d5
NHSTW32!_GetExceptDLLinfo+0x10b1ac
0012ffbc 0050925a 77e6f23b 00000000 00000000 DWRT32!acm+0x132
0012fff0 00000000 00509248 00000000 78746341
NHSTW32!_GetExceptDLLinfo+0x108214

STACK_COMMAND: kb

FOLLOWUP_IP:
eps!RegPreCreateKeyEx+98
[c:\buildwork\60\409\sources\eps\latest\source\bootstrap\registry.c @ 638]
bab8d3ee 8b45e0 mov eax,dword ptr [ebp-20h]

SYMBOL_STACK_INDEX: 4

SYMBOL_NAME: eps!RegPreCreateKeyEx+98

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: eps

IMAGE_NAME: eps.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 4fa005a4

FAILURE_BUCKET_ID: 0x4E_9a_VRF_eps!RegPreCreateKeyEx+98

BUCKET_ID: 0x4E_9a_VRF_eps!RegPreCreateKeyEx+98

Followup: MachineOwner

Regards

Mark


NTFSD is sponsored by OSR

For our schedule of debugging and file system seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Do a search for the address of the leaked MDL. There should be one hit in the lock tracking structures (which are allocated from nonpaged pool with MmLk tag). Do ‘dps’ on the found address and you should see a stack trace shortly after the MDL pointer. (On win2003 through win7 the stack trace contains only two frames. Win8 saves 8 frames.)

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Scott Noone
Sent: Friday, September 7, 2012 11:15 AM
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] PFN_LIST_CORRUPT

type lookup ‘nt!_LOCK_HEADER’ failure.
894e8d68: Unable to get lock header data.

I can reproduce this problem. Walking through the !lockedpages code a bit I can see there is at least one other structure that is also missing from the public symbols (_LOCK_TRACKER). I filed a bug against the extension command for this, though I wouldn’t expect it to be fixed in the immediate future.

All signs really do point to someone misbehaving and leaking the MDL. Too bad !lockedpages isn’t working, it might have a clue.

Hi Pavel
thanks for that I’d already found the MmLk tag using the technique you proposed but hadn’t appreciated what it was as the tag is not documented.

In fact Scott had looked at the dump file and had already found this and traced it to a call we made to FltLockUserBuffer on the IRP_MJ_DIRECTORY_CONTROL path. It transpires that FltLockUserBuffer either updates the MDL in the callback data or allocates an MDL if non exists. Under normal conditions this MDL is stored in the IRP and is released by the IO manager prior to freeing the IRP.

In this case though it appears that the IRP was allocated by a legacy filter above filter manager and freed during the IO completion routine but without checking for the presence of an MDL. So the MDL was never freed.

We’ve resolved this in our own driver by refactoring the code so that the FltLockUserBuffer call is removed but I don’t believe we were doing anything wrong by using this API other than exposing a shortcoming of the legacy filters that were layered above us.

Kudos to Scott for getting down to the root cause of this. The man’s a genius :slight_smile: