PFN_LIST_CORRUPT Arg1: 9A

OSR_Community_User · April 16, 2007, 11:43pm

Hi,

I know this bug is very hard to trace, but it is very important
for us, because in a very particular configuration (OS, AV, apps and our
driver), it is very easy to repro.
The bugcheck is 0x4E with Arg1:9A (attempt to free pool that is
still locked for I/O) (WinDBG output will follow at the end of e-mail).
Most of the time (around 40% of crash dumps) it occurs in our driver in
the same line, very rarely in another line, and rest (40%) in one AV or
OS (ntfs, afd, rdp) drivers. The quirky part is that when it occurs in
our driver, the line where it occurs is guaranteed not to free any
locked pool (in fact, it is our temporary pool, which is not shared with
other drivers - and in the few rare cases, it is not even used, i.e.
simply allocated/deallocated with no use). I am sure the buffer address
is not corrupted between pre/post QUERY_DIRECTORY, because for testing,
the buffer had two fields for comparison (they match at the time of
crash).
I’ve done a !search on the pool to see what MDLs have it locked
(apparently many!), but none of them lead to active IRPs with a !search
for the MDL address
Turning DV with selective or all options yielded no different
results (harder to repro, but we still got a BSOD with the same
bugcheck/Arg1).
So as not to make this post too long, does anyone have any idea
what else I can try? WinDBG output attached.
(If the AV authors would like to investigate this, I can provide
several dumps where the bugcheck occurs in their driver apparently - it
is obvious some driver is corrupting memory, but I can’t track it down
further:(

BugCheck 4E, {9a, e6b8, 6, 9}

*** ERROR: Symbol file could not be found. Defaulted to export symbols
for mfehidk.sys -
Probably caused by : VShield.sys ( VShield!DirCtrlPostOp+ce )

Followup: MachineOwner

kd> !analyze -v
*******************************************************************************

*
*
* Bugcheck
Analysis *
*
*
*******************************************************************************

PFN_LIST_CORRUPT (4e)
Typically caused by drivers passing bad memory descriptor lists (ie:
calling
MmUnlockPages twice with the same list, etc). If a kernel debugger is
available get the stack trace.
Arguments:
Arg1: 0000009a,
Arg2: 0000e6b8
Arg3: 00000006
Arg4: 00000009

Debugging Details:

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0x4E

PROCESS_NAME: mssearch.exe

CURRENT_IRQL: 0

LAST_CONTROL_TRANSFER: from 80861a71 to 80826659

STACK_TEXT:
f74eea44 80861a71 0000004e 0000009a 0000e6b8 nt!KeBugCheckEx+0x1b
f74eea60 8088b751 81193c20 808a7bc0 00dede38 nt!MiBadRefCount+0x33
f74eea98 8088c5ad ff345000 813d1348 813d13fc nt!MiFreePoolPages+0x5cf
f74eeaec 8088cb25 6b725761 00000000 f74eeb24 nt!ExFreePoolWithTag+0x277
f74eeafc f84ada6e ff345000 00000000 00000000 nt!ExFreePool+0xf
f74eeb24 f820db83 813d13a4 f74eeb48 ff9f24e0 VShield!DirCtrlPostOp+0xce
[f:\projects\alfaff\driver\afm_afp\dirctrl.c @ 198]
f74eeb8c f820ffe0 003d1348 00000000 813d1348
fltMgr!FltpPerformPostCallbacks+0x1c5
f74eeba0 f821050f 813d1348 814017f8 f74eebe0
fltMgr!FltpProcessIoCompletion+0x10
f74eebb0 f8210ba1 821b2840 814017f8 813d1348
fltMgr!FltpPassThroughCompletion+0x89
f74eebe0 f8210d03 f74eec00 80000006 00000000
fltMgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x269
f74eec18 8081d39d 821b2840 814017f8 81c3a8f8 fltMgr!FltpDispatch+0x11f
f74eec2c f72fdaec 814017f8 f74eeccc 808e6cb6 nt!IofCallDriver+0x45
WARNING: Stack unwind information not available. Following frames may be
wrong.
f74eecac f730738e 81e9ed00 814017f8 f74eece4 mfehidk+0x6aec
f74eecbc f73073de f74eeccc 81ede678 81821cf0 mfehidk+0x1038e
f74eece4 8081d39d 81821cf0 814017f8 00bbf9ac
mfehidk!DEVICEDISPATCH::DispatchPassThrough+0x48
f74eecf8 808ec789 f74eed64 00bbf9ac 808e6cb6 nt!IofCallDriver+0x45
f74eed0c 808e6d13 81821cf0 814017f8 81e9ed00
nt!IopSynchronousServiceTail+0x10b
f74eed30 80882fa8 00000284 00000000 00000000
nt!NtQueryDirectoryFile+0x5d
f74eed30 7c82ed54 00000284 00000000 00000000 nt!KiFastCallEntry+0xf8
00bbf9f4 00000000 00000000 00000000 00000000 0x7c82ed54

STACK_COMMAND: kb

FOLLOWUP_IP:
VShield!DirCtrlPostOp+ce [f:\projects\alfaff\driver\afm_afp\dirctrl.c @
198]
f84ada6e 8b4dfc mov ecx,dword ptr [ebp-4]

FAULTING_SOURCE_CODE:
194: if(p2pCtx)
195: {
196: AFF_ReleaseContext(lpContext, lpFsCtx);
197: ExFreePool(p2pCtx->workBuf);

198: ExFreeToNPagedLookasideList(&Pre2Post_List, p2pCtx);
199: }
200: return retValue;
201: }
(The error occurs at ExFreePool line, p2pCtx->workBuf value is ff345000,
as is first parameter of ExFreePool - this is DirCtrlPostOp - and this
code is executed directly, i.e. is the only code in PostOp)

SYMBOL_STACK_INDEX: 5

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: VShield

IMAGE_NAME: VShield.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 4623e42e

SYMBOL_NAME: VShield!DirCtrlPostOp+ce

FAILURE_BUCKET_ID: 0x4E_VShield!DirCtrlPostOp+ce

BUCKET_ID: 0x4E_VShield!DirCtrlPostOp+ce

Followup: MachineOwner

–
Kind regards, Dejan
http://www.alfasp.com
File system audit, security and encryption kits.

OSR_Community_User · April 17, 2007, 8:37am

Unfortunately, you have your work cut out for you. It certainly does
look like someone/something has been either mapping or locking the pages
for this address (is it possible these pages are still undergoing DMA?)

You have nine outstanding references (4th parameter). What does !pfn on
the page address (parameter 2) tell you? It might point back to a PTE.

This type of problem (page frame reference count bug) is very
challenging to track down because by this point the error has already
occurred. My suggestion would be to look at any exception handlers you
have and make sure you are unlocking/unmapping any pages you have
locked/mapped in this case. Alternatively, you might also wish to set
the debugger to break on exceptions and see if you are taking exceptions
in other drivers along this path where they might not be releasing the
references they have made to this pool.

While you didn’t mention it, I assumed this issue is occurring on Vista.

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

OSR_Community_User · April 17, 2007, 1:54pm

Another possibility!! could be that either your driver or some other is
writing beyond the allocated memory range.
You can build some kind of bounds checker(fensing the allocs) in your
code to assess such cases.

I have suffered through corrupted PFN lists and used the technique to
figure out the errors.
Harish

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-283778-
xxxxx@lists.osr.com] On Behalf Of Tony Mason
Sent: Tuesday, April 17, 2007 5:37 AM
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] PFN_LIST_CORRUPT Arg1: 9A

Unfortunately, you have your work cut out for you. It certainly does
look like someone/something has been either mapping or locking the
pages
for this address (is it possible these pages are still undergoing
DMA?)

You have nine outstanding references (4th parameter). What does !pfn
on
the page address (parameter 2) tell you? It might point back to a
PTE.

This type of problem (page frame reference count bug) is very
challenging to track down because by this point the error has already
occurred. My suggestion would be to look at any exception handlers
you
have and make sure you are unlocking/unmapping any pages you have
locked/mapped in this case. Alternatively, you might also wish to set
the debugger to break on exceptions and see if you are taking
exceptions
in other drivers along this path where they might not be releasing the
references they have made to this pool.

While you didn’t mention it, I assumed this issue is occurring on
Vista.

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: unknown lmsubst tag
argument:
‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com

Duane_Souder · April 17, 2007, 2:00pm

Or you can get the OS to help you… use the special pool feature to
isolate pool damage
http:
http://support.microsoft.com/default.aspx?scid=kb;EN-US;188831 <>

Add these two registry keys: <>
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session
Manager\Memory Management
<>
Value Name: PoolTag
Data Type: REG_DWORD
Data: 0x2A <> << this means all memory >>

Value Name: PoolTagOverruns
Data Type: REG_DWORD
Data: 1

Duane Souder
CSA Driver Development Team
Cisco Systems, Inc.

Harish Arora wrote:

>Another possibility!! could be that either your driver or some other is
>writing beyond the allocated memory range.
>You can build some kind of bounds checker(fensing the allocs) in your
>code to assess such cases.
>
>I have suffered through corrupted PFN lists and used the technique to
>figure out the errors.
>Harish
>
>
>
>
>>-----Original Message-----
>>From: xxxxx@lists.osr.com [mailto:bounce-283778-
>>xxxxx@lists.osr.com] On Behalf Of Tony Mason
>>Sent: Tuesday, April 17, 2007 5:37 AM
>>To: Windows File Systems Devs Interest List
>>Subject: RE: [ntfsd] PFN_LIST_CORRUPT Arg1: 9A
>>
>>Unfortunately, you have your work cut out for you. It certainly does
>>look like someone/something has been either mapping or locking the
>>pages
>>for this address (is it possible these pages are still undergoing
>>
>>
>DMA?)
>
>
>>You have nine outstanding references (4th parameter). What does !pfn
>>on
>>the page address (parameter 2) tell you? It might point back to a
>>
>>
>PTE.
>
>
>>This type of problem (page frame reference count bug) is very
>>challenging to track down because by this point the error has already
>>occurred. My suggestion would be to look at any exception handlers
>>
>>
>you
>
>
>>have and make sure you are unlocking/unmapping any pages you have
>>locked/mapped in this case. Alternatively, you might also wish to set
>>the debugger to break on exceptions and see if you are taking
>>exceptions
>>in other drivers along this path where they might not be releasing the
>>references they have made to this pool.
>>
>>While you didn’t mention it, I assumed this issue is occurring on
>>Vista.
>>
>>Tony
>>
>>Tony Mason
>>Consulting Partner
>>OSR Open Systems Resources, Inc.
>>http://www.osr.com
>>
>>
>>—
>>Questions? First check the IFS FAQ at
>>https://www.osronline.com/article.cfm?id=17
>>
>>You are currently subscribed to ntfsd as: unknown lmsubst tag
>>
>>
>argument:
>
>
>>‘’
>>To unsubscribe send a blank email to xxxxx@lists.osr.com
>>
>>
>
>—
>Questions? First check the IFS FAQ at https://www.osronline.com/article.cfm?id=17
>
>You are currently subscribed to ntfsd as: unknown lmsubst tag argument: ‘’
>To unsubscribe send a blank email to xxxxx@lists.osr.com
>
>
></http:>

OSR_Community_User · April 17, 2007, 2:06pm

I recall to have had this problem when playing with AllocateMdl/ProbeAndLock for some time.
I got the PFN_LIST_CORRUPT error from time to time, after thousands of iterations.

My code was something like the following:

lockedMdl = FALSE;
allocatedMdl = FALSE;

Mdl = IoAllocateMdl( (unsigned char*)ptr, (ULONG)len, FALSE, FALSE, NULL );
if (Mdl != NULL)
{
allocatedMdl = TRUE;
__try {
MmProbeAndLockPages( Mdl, KernelMode, IoReadAccess );
lockedMdl = TRUE;
…
}__except (EXCEPTION_EXECUTE_HANDLER)
{
…
}

if(lockedMdl)
MmUnlockPages( Mdl );
if(allocatedMdl)
IoFreeMdl( Mdl );

-----Mensaje original-----
De: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com]En nombre de Harish Arora
Enviado el: martes, 17 de abril de 2007 19:53
Para: Windows File Systems Devs Interest List
Asunto: RE: [ntfsd] PFN_LIST_CORRUPT Arg1: 9A

Another possibility!! could be that either your driver or some other is
writing beyond the allocated memory range.
You can build some kind of bounds checker(fensing the allocs) in your
code to assess such cases.

I have suffered through corrupted PFN lists and used the technique to
figure out the errors.
Harish

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-283778-
xxxxx@lists.osr.com] On Behalf Of Tony Mason
Sent: Tuesday, April 17, 2007 5:37 AM
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] PFN_LIST_CORRUPT Arg1: 9A

Unfortunately, you have your work cut out for you. It certainly does
look like someone/something has been either mapping or locking the
pages
for this address (is it possible these pages are still undergoing
DMA?)

You have nine outstanding references (4th parameter). What does !pfn
on
the page address (parameter 2) tell you? It might point back to a
PTE.

This type of problem (page frame reference count bug) is very
challenging to track down because by this point the error has already
occurred. My suggestion would be to look at any exception handlers
you
have and make sure you are unlocking/unmapping any pages you have
locked/mapped in this case. Alternatively, you might also wish to set
the debugger to break on exceptions and see if you are taking
exceptions
in other drivers along this path where they might not be releasing the
references they have made to this pool.

While you didn’t mention it, I assumed this issue is occurring on
Vista.

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: unknown lmsubst tag
argument:
‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com

Questions? First check the IFS FAQ at https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: unknown lmsubst tag argument: ‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com

OSR_Community_User · April 17, 2007, 2:21pm

Hi, Tony,

Unfortunately, you have your work cut out for you. It certainly does look like
someone/something has been either mapping or locking the pages for this address (is
it possible these pages are still undergoing DMA?)

There are no IRPs with an MDL that describes this particular address, so no.

You have nine outstanding references (4th parameter). What does !pfn on the page
address (parameter 2) tell you? It might point back to a PTE.

kd> !pfn e6b8
PFN 0000E6B8 at address 81193C20
flink 00000000 blink / share count 00000001 pteaddress C07F9A28
reference count 0009 Cached color 0
restore pte 00000080 containing page 000AD2 Active R
ReadInProgress
kd> !pte C07F9A28
VA ff345000
PDE at 00000000C0603FC8 PTE at 00000000C07F9A28
contains 0000000000AD2063 contains 000000000E6B8163
pfn ad2 —DA–KWEV pfn e6b8 -G-DA–KWEV

Other than “ReadInProgress” I don’t see how it helps?

My suggestion would be to look at any exception handlers you have and make sure you
are unlocking/unmapping any pages you have locked/mapped in this case.

The driver does no page locking/unlock, except calling FltLockUserBuffer from
QUERY_DIRECTORY completion. Also, it changes no buffers for IRP_MJ_READ/WRITE.

Alternatively, you might also wish to set the debugger to break on exceptions and
see if you are taking exceptions in other drivers along this path where they might
not be releasing the references they have made to this pool.

Will try that.

While you didn’t mention it, I assumed this issue is occurring on Vista.

Srv03 SP1, UK. I ommitted that part of WinDBG output to make the e-mail less
bulky

–
Kind regards, Dejan
http://www.alfasp.com
File system audit, security and encryption kits.

OSR_Community_User · April 17, 2007, 2:39pm

> Another possibility!! could be that either your driver or some other is

writing beyond the allocated memory range. You can build some kind of bounds
checker(fensing the allocs) in your code to assess such cases.

The first thing we tried was enabling Driver Verifier, but regardless of the
options selected, it either doesn’t blue screen (it requires a REALLY fast machine to
blue screen, for example, enabling Special Pool on all drivers slows the OS down
enough to “never” get a BSOD (never = several days no BSOD)) or BSODs with the same
bugcheck (PFN_LIST_CORRUPT) only takes longer. Without DV it takes <20 minutes to
BSOD.
I wish it were as trivial.

There was ONE time that I was able to get a different bugcheck. But it
doesn’t show much more, as it only confirms that it’s a memory corruption.
BugCheck D1, {80931008, ff, 1, ba22b396}

*** ERROR: Symbol file could not be found. Defaulted to export symbols for
mfehidk.sys -
*** ERROR: Module load completed but symbols could not be loaded for MfeBOPK.sys
Probably caused by : mfehidk.sys

Followup: MachineOwner

0: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 80931008, memory referenced
Arg2: 000000ff, IRQL
Arg3: 00000001, value 0 = read operation, 1 = write operation
Arg4: ba22b396, address which referenced memory

Debugging Details:

WRITE_ADDRESS: 80931008

CURRENT_IRQL: 0

FAULTING_IP:
mfehidk+e396
ba22b396 a4 movs byte ptr es:[edi],byte ptr [esi]

DEFAULT_BUCKET_ID: CODE_CORRUPTION

BUGCHECK_STR: 0xD1

PROCESS_NAME: McShield.exe

TRAP_FRAME: ba4ec884 – (.trap ffffffffba4ec884)
ErrCode = 00000002
eax=8000003b ebx=ba22c9d5 ecx=00000005 edx=82e3ea00 esi=ba4eca34 edi=80931008
eip=ba22b396 esp=ba4ec8f8 ebp=ba4ec90c iopl=0 nv up di pl nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010002
mfehidk+0xe396:
ba22b396 a4 movs byte ptr es:[edi],byte ptr [esi] es:0023:80931008=43
ds:0023:ba4eca34=39
Resetting default scope

LAST_CONTROL_TRANSFER: from ba22b396 to 8088bdd3

STACK_TEXT:
ba4ec884 ba22b396 badb0d00 82e3ea00 00000000 nt!KiTrap0E+0x2a7
WARNING: Stack unwind information not available. Following frames may be wrong.
ba4ec90c ba22ad01 80931004 ba4eca30 00000005 mfehidk+0xe396
ba4ecac4 ba22ad8f 80931004 00000007 00000000 mfehidk+0xdd01
ba4ecaf0 ba22a86e 006ffc40 ba22a6b8 82a03830 mfehidk+0xdd8f
ba4ecb1c ba22a8c0 006ffc40 ba4ecb63 00000000 mfehidk+0xd86e
ba4ecb38 f781fb76 006ffc40 822e72cc 822e72d8 mfehidk+0xd8c0
ba4ecb64 f782051f 006ffc40 824c8de8 823fbed0 MfeBOPK+0xb76
ba4ecb84 f782059c 006ffc20 0000005a ba4ecbf4 MfeBOPK+0x151f
ba4ecb94 ba21eda8 00000002 006ffc20 0000005a MfeBOPK+0x159c
ba4ecbf4 ba21f3ca 00000402 006ffc20 0000005a mfehidk+0x1da8
ba4ecc18 ba22d97b 00000000 ba4ecc01 006ffc20 mfehidk+0x23ca
ba4ecc58 808f518f 823a7890 00000001 006ffc20
mfehidk!DEVICEDISPATCH::DispatchPassThrough+0x5e5
ba4ecd00 808ee0e4 0000014c 00000000 00000000 nt!IopXxxControlFile+0x255
ba4ecd34 80888c6c 0000014c 00000000 00000000 nt!NtDeviceIoControlFile+0x2a
ba4ecd34 7c82ed54 0000014c 00000000 00000000 nt!KiFastCallEntry+0xfc
0097f7d4 00000000 00000000 00000000 00000000 0x7c82ed54

STACK_COMMAND: kb

CHKIMG_EXTENSION: !chkimg -lo 50 -d !nt
808edefa-808edefe 5 bytes - nt!NtCreateFile
[8b ff 55 8b ec:e9 ac ea 93 39]
80931004-80931007 4 bytes - nt!NtProtectVirtualMemory (+0x4310a)
[6a 44 68 d8:e9 b6 b9 8f]
9 errors : !nt (808edefa-80931007)

MODULE_NAME: mfehidk

IMAGE_NAME: mfehidk.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 4456568b

FOLLOWUP_NAME: MachineOwner

MEMORY_CORRUPTOR: PATCH_mfehidk

FAILURE_BUCKET_ID: MEMORY_CORRUPTION_PATCH_mfehidk

BUCKET_ID: MEMORY_CORRUPTION_PATCH_mfehidk

Followup: MachineOwner

–
Kind regards, Dejan
http://www.alfasp.com
File system audit, security and encryption kits.

Duane_Souder · April 17, 2007, 3:14pm

DV comes with to much OS intervention and enabling Special Pool for all
of memory
is overkill for many memory issues.
If the memory that is over-written always has the same memory tag, or
neighbor memory
allocations always have the same tag, then enabling Special Pool for the
corrupted
memory tag (and immediate neighbors) will have a significantly less
impact on system
performance and can produce quick results.
Duane Souder
CSA Driver Development Team
Cisco Systems, Inc.

The first thing we tried was enabling Driver Verifier, but regardless of the
options selected, it either doesn’t blue screen (it requires a REALLY fast machine to
blue screen, for example, enabling Special Pool on all drivers slows the OS down
enough to “never” get a BSOD (never = several days no BSOD)) or BSODs with the same
bugcheck (PFN_LIST_CORRUPT) only takes longer. Without DV it takes <20 minutes to
BSOD.
I wish it were as trivial.

There was ONE time that I was able to get a different bugcheck. But it
doesn’t show much more, as it only confirms that it’s a memory corruption.
BugCheck D1, {80931008, ff, 1, ba22b396}

OSR_Community_User · April 17, 2007, 3:21pm

I checked that before anything else just in case - but we do not allocate/free any MDLs in
this product (it does not process read/write except for audit, and it does not replace buffers
for QUERY_DIRECTORY).

Iñaki Castillo wrote:

I recall to have had this problem when playing with AllocateMdl/ProbeAndLock for some time.
I got the PFN_LIST_CORRUPT error from time to time, after thousands of iterations.

My code was something like the following:

lockedMdl = FALSE;
allocatedMdl = FALSE;

Mdl = IoAllocateMdl( (unsigned char*)ptr, (ULONG)len, FALSE, FALSE, NULL );
if (Mdl != NULL)
{
allocatedMdl = TRUE;
__try {
MmProbeAndLockPages( Mdl, KernelMode, IoReadAccess );
lockedMdl = TRUE;
…
}__except (EXCEPTION_EXECUTE_HANDLER)
{
…
}

if(lockedMdl)
MmUnlockPages( Mdl );
if(allocatedMdl)
IoFreeMdl( Mdl );

–
Kind regards, Dejan
http://www.alfasp.com
File system audit, security and encryption kits.

OSR_Community_User · April 17, 2007, 3:28pm

I just woke up when I replied to that that I forgot - this is a mini-filter,
so… we aren’t catching any exceptions from below for sure;-)

My suggestion would be to look at any exception handlers you have and make sure you
are unlocking/unmapping any pages you have locked/mapped in this case.
lternatively, you might also wish to set the debugger to break on exceptions and
see if you are taking exceptions in other drivers along this path where they might
not be releasing the references they have made to this pool.

–
Kind regards, Dejan
http://www.alfasp.com
File system audit, security and encryption kits.

OSR_Community_User · April 18, 2007, 4:52am

“Dejan Maksimovic” wrote:

The first thing we tried was enabling Driver Verifier, but
regardless of the
options selected, it either doesn’t blue screen (it requires a REALLY fast
machine to
blue screen, for example, enabling Special Pool on all drivers slows the
OS down
enough to “never” get a BSOD (never = several days no BSOD)) or BSODs with
the same
bugcheck (PFN_LIST_CORRUPT) only takes longer. Without DV it takes <20
minutes to
BSOD.

Try enabling pool tracking but not special pool. When you hit the
bugcheck do !verifier 80 and see if anything interesting
comes up.

I think that starting with Vista !verifier 80 also shows MmMapLockedPages
traces so you might get more data on Vista/Longhorn.

–
This posting is provided “AS IS” with no warranties, and confers no
rights.

OSR_Community_User · April 18, 2007, 6:45am

We’ve tried almost all logical combinations (SP only, MT only, SP/MT, SP/IO,
MT/IO, IO, SP/MT/IO) - all result in the same error (PFN_LIST_CORRUPT).
I really wish it were something as stupid as double Free, but it seems it’s
muuuuch bigger.

Pavel Lebedinsky wrote:

Try enabling pool tracking but not special pool. When you hit the
bugcheck do !verifier 80 and see if anything interesting
> comes up.
>
> I think that starting with Vista !verifier 80 also shows MmMapLockedPages
> traces so you might get more data on Vista/Longhorn.

–
Kind regards, Dejan
http://www.alfasp.com
File system audit, security and encryption kits.