PFN_LIST_CORRUPT MiBadRefCount

Can anyone tell me how to troubleshoot an issue like this ?

It looks like some mdl has already been released when the process exits to release resources。

How do I find this MDL and who released it early?

I tried to turn on Driver Verifier but I didn’t see any valid message。

PFN_LIST_CORRUPT (4e)
Typically caused by drivers passing bad memory descriptor lists (ie: calling
MmUnlockPages twice with the same list, etc).  If a kernel debugger is
available get the stack trace.
Arguments:
Arg1: 000000000000009a, 
Arg2: 00000000003cb003
Arg3: 0000000000000003
Arg4: 0000000000000000

nt!KeBugCheckEx
nt!MiBadRefCount+0x4f
nt!MiRemoveLockedPageChargeAndDecRef+0x251
nt!MiDeletePerSessionProtos+0x12ad4c
nt!MiFreeSubsectionProtos+0x1c
nt!MiDereferencePerSessionProtos+0x8c
nt!MiDeleteVad+0x9f6
nt!MiUnmapVad+0x49
nt!MiCleanVad+0x30
nt!MmCleanProcessAddressSpace+0x137
nt!PspRundownSingleProcess+0x20c
nt!PspExitThread+0x5f6
nt!KiSchedulerApcTerminate+0x38
nt!KiDeliverApc+0x487
nt!KiInitiateUserApc+0x70
nt!KiSystemServiceExit+0x9f
0x00007ffd`e54f9014

3: kd> !pfn 3cb003
    PFN 003CB003 at address FFFFF8800B610090
    flink       FFFFFFFFF  blink / share count 0042DD14  pteaddress FFFF9A0ED25FFE50
    reference count 0000    used entry count  0080      Cached    color 0   Priority 5
    restore pte DC8D412C0C800480  containing page 3D39FD  Modified   MP      
    Modified Shared   

What’s the context here? Is this just some random system or is this while testing your driver?

@“Scott_Noone_(OSR)” said:
What’s the context here? Is this just some random system or is this while testing your driver?

Honestly, I’m not sure if this issue is caused by my driver because I can’t see any information related to non-Microsoft drivers in this stack, and the problem is not consistently reproducible. At the moment, I can only suspect that there may be some flaws in my own driver

@“Scott_Noone_(OSR)” said:
What’s the context here? Is this just some random system or is this while testing your driver?

Moreover, I cannot reproduce this issue in my own environment. Otherwise, perhaps I could try to locate the problem through some more cumbersome methods, such as individually pausing my drivers or recording all the mdl operations within them.

You really have to assume it is your code that is causing this. Its the default hypothesis. Corruption issues generally are not kind enough to crap out with your driver in the stack. So the approach I generally take is to use logging to understand what my crappy code was doing when the system crashed. Either the full ETW nonsense or the much less awful IFR can be useful. Both are lightweight enough to not massively disturb the runtime characteristics of your code.

1 Like

Does your driver do anything funky with MDLs? If yes now would also be a good opportunity to review that code.

@“Scott_Noone_(OSR)” said:
Does your driver do anything funky with MDLs? If yes now would also be a good opportunity to review that code.

I’m not entirely sure because this may involve seven or eight drivers. I only have knowledge of a part of them, and my own environment cannot reproduce the issue (the problem itself also occurs randomly, so it’s inconvenient to pinpoint which specific driver might be causing the problem). That’s why I’m wondering if it’s possible to extract some clues from the dump, even if it helps narrow down the scope of investigation a bit.

Not really…A process is exiting and tearing down its virtual address space. In doing so it finds an in use page with a reference count of zero. I’d look at anywhere in the code that calls MmUnlockPages or modifies Irp->MdlAddress. Other than that you could try enabling Verifier on ntoskrnl.exe and see if that gets you anywhere.

@“Scott_Noone_(OSR)” said:
Not really…A process is exiting and tearing down its virtual address space. In doing so it finds an in use page with a reference count of zero. I’d look at anywhere in the code that calls MmUnlockPages or modifies Irp->MdlAddress. Other than that you could try enabling Verifier on ntoskrnl.exe and see if that gets you anywhere.

Thanks a lot for your suggestion. It seems that I’ll need to set aside some time to check each driver individually.