PFN_LIST_CORRUPT from Error Reporting Service on W2K3?

Gentlefolk

I have been given a crash dump from a PFN_LIST_CORRUPT bugcheck on W2K3 for
analysis. I find that at least at the !analyze -v level the faulting thread
appears to be in the Error Reporting Service ersvc inside svchost.exe, and
well I am a little bit stuck at the moment. I was hoping some more
experienced types might be able to give me some pointers as to how I can
start to really analyze this dump, so suggestions would be very much
appreciated. Here is a windbag session up to the completion of !analyze -v
in case that helps.

Thanks in advance
Lyndon

Windows Server 2003 Kernel Version 3790 UP Free x86 compatible
Product: Server, suite: TerminalServer SingleUserTS
Built by: 3790.srv03_rtm.030324-2048
Kernel base = 0x804de000 PsLoadedModuleList = 0x80568c08
Debug session time: Thu Sep 30 19:47:24 2004
System Uptime: 9 days 2:12:36.830
Loading Kernel Symbols
............................................................................
................................
Loading unloaded module list
..........
Loading User Symbols
.....................
****************************************************************************
***
*
*
* Bugcheck Analysis
*
*
*
****************************************************************************
***

Use !analyze -v to get detailed debugging information.

BugCheck 4E, {7, 15e26, 5c77, 0}

Probably caused by : ntoskrnl.exe ( nt!MiDecrementReferenceCount+47 )

kd> !analyze -v
****************************************************************************
***
*
*
* Bugcheck Analysis
*
*
*
****************************************************************************
***

PFN_LIST_CORRUPT (4e)
Typically caused by drivers passing bad memory descriptor lists (ie: calling
MmUnlockPages twice with the same list, etc). If a kernel debugger is
available get the stack trace.
Arguments:
Arg1: 00000007, A driver has unlocked a page more times than it locked it
Arg2: 00015e26, page frame number
Arg3: 00005c77, current share count
Arg4: 00000000, 0

Debugging Details:

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0x4E

CURRENT_IRQL: 2

LAST_CONTROL_TRANSFER: from 805307c5 to 8053eec8

STACK_TEXT:
f4fe5ba0 805307c5 0000004e 00000007 00015e26 nt!KeBugCheckEx+0x19
f4fe5bc0 80529e23 006af020 f7b9f000 f4fe5c00
nt!MiDecrementReferenceCount+0x47
f4fe5be8 80585019 f4fe5c00 f4fe5d20 825eac90 nt!MmUnlockPages+0x2f9
f4fe5cbc 80584c2c 825eac10 1086e000 82baf768 nt!MiDoMappedCopy+0x175
f4fe5cec 8058629c 825eac10 1086e000 82baf768 nt!MmCopyVirtualMemory+0x73
f4fe5d48 804e7a8c 0000011c 1086e000 006ae020 nt!NtReadVirtualMemory+0xd0
f4fe5d48 7ffe0304 0000011c 1086e000 006ae020 nt!KiSystemService+0xcb
0058b13c 77f43077 77e5a214 0000011c 1086e000
SharedUserData!SystemCallStub+0x4
0058b140 77e5a214 0000011c 1086e000 006ae020 ntdll!ZwReadVirtualMemory+0xc
0058b15c 6d5b3af3 0000011c 1086e000 006ae020 kernel32!ReadProcessMemory+0x19
0058b184 6d5b3b68 0000011c 1086e000 00000000
dbghelp!Win32LiveSystemProvider::ReadVirtual+0x3b
0058b1a4 6d5b0ea3 0000011c 10850000 00000000
dbghelp!Win32LiveSystemProvider::ReadAllVirtual+0x1b
0058b1cc 6d5b10b7 0058b2e8 0058b24c 00090120
dbghelp!WriteMemoryFromProcess+0x33
0058b200 6d5b1338 0058b2e8 0058b24c 00090120 dbghelp!WriteMemoryBlocks+0x31
0058b220 6d5b1560 0058b2e8 0058b24c 00090120 dbghelp!WriteDumpData+0x6f
0058b360 6d5b165c 0000011c 00000900 00264908
dbghelp!MiniDumpProvideDump+0x16f
0058b3c0 6950bc55 0000011c 00000900 00000118 dbghelp!MiniDumpWriteDump+0xc6
0058bd80 6950bd1e 0000011c 00000900 00000118
faultrep!InternalGenerateMinidumpEx+0x6be
0058bdb4 6950be90 0000011c 00000900 0058bdd0
faultrep!InternalGenerateMinidump+0x9a
0058c714 69506ded 0000011c 00000900 0058cbc8
faultrep!InternalGenFullAndTriageMinidumps+0x149
0058d654 74da2db7 0058d670 00002000 00000000
faultrep!ReportFaultToQueue+0x461
0058df24 74da3175 000000d4 0058df7c 0058df70 ersvc!ProcessFaultRequest+0x779
0058ff80 74da344b 0008de28 00000000 00000000 ersvc!ExecServer+0x110
0058ffb8 77e4a990 0008de28 00000000 00000000 ersvc!threadExecServer+0x51
0058ffec 00000000 74da33fa 0008de28 00000000 kernel32!BaseThreadStart+0x34

FOLLOWUP_IP:
nt!MiDecrementReferenceCount+47
805307c5 ff0564845680 inc dword ptr [nt!MiFlushForNonCached
(80568464)]

FOLLOWUP_NAME: MachineOwner

SYMBOL_NAME: nt!MiDecrementReferenceCount+47

MODULE_NAME: nt

IMAGE_NAME: ntoskrnl.exe

DEBUG_FLR_IMAGE_TIMESTAMP: 3e800a79

STACK_COMMAND: kb

BUCKET_ID: 0x4E_nt!MiDecrementReferenceCount+47

Followup: MachineOwner

I suspect the issue is that a driver (or other OS component) has
previously unlocked this memory location. Unfortunately, you are seeing
the SECOND unlock, but the bug is likely in the FIRST unlock. These are
annoying to find and fix, because you will need to identify the driver
that was handling this memory location before this call - and that is
unlikely to be in the context of the current crash.

Actually, from the stack you are on, my guess is that the previous call
was through a file system driver. Do you happen to have some file
system filters installed on this machine? If so, that'd be the logical
place to start looking...

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.

Looking forward to seeing you at the Next OSR File Systems Class October
18, 2004 in Silicon Valley!

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Lyndon J Clarke
Sent: Friday, October 01, 2004 5:05 AM
To: Kernel Debugging Interest List
Subject: [windbg] PFN_LIST_CORRUPT from Error Reporting Service on W2K3?

Gentlefolk

I have been given a crash dump from a PFN_LIST_CORRUPT bugcheck on W2K3
for
analysis. I find that at least at the !analyze -v level the faulting
thread
appears to be in the Error Reporting Service ersvc inside svchost.exe,
and
well I am a little bit stuck at the moment. I was hoping some more
experienced types might be able to give me some pointers as to how I can
start to really analyze this dump, so suggestions would be very much
appreciated. Here is a windbag session up to the completion of !analyze
-v
in case that helps.

Thanks in advance
Lyndon

Windows Server 2003 Kernel Version 3790 UP Free x86 compatible
Product: Server, suite: TerminalServer SingleUserTS
Built by: 3790.srv03_rtm.030324-2048
Kernel base = 0x804de000 PsLoadedModuleList = 0x80568c08
Debug session time: Thu Sep 30 19:47:24 2004
System Uptime: 9 days 2:12:36.830
Loading Kernel Symbols
........................................................................
....
................................
Loading unloaded module list
..........
Loading User Symbols
.....................
************************************************************************
****
***
*
*
* Bugcheck Analysis
*
*
*
************************************************************************
****
***

Use !analyze -v to get detailed debugging information.

BugCheck 4E, {7, 15e26, 5c77, 0}

Probably caused by : ntoskrnl.exe ( nt!MiDecrementReferenceCount+47 )

kd> !analyze -v
************************************************************************
****
***
*
*
* Bugcheck Analysis
*
*
*
************************************************************************
****
***

PFN_LIST_CORRUPT (4e)
Typically caused by drivers passing bad memory descriptor lists (ie:
calling
MmUnlockPages twice with the same list, etc). If a kernel debugger is
available get the stack trace.
Arguments:
Arg1: 00000007, A driver has unlocked a page more times than it locked
it
Arg2: 00015e26, page frame number
Arg3: 00005c77, current share count
Arg4: 00000000, 0

Debugging Details:

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0x4E

CURRENT_IRQL: 2

LAST_CONTROL_TRANSFER: from 805307c5 to 8053eec8

STACK_TEXT:
f4fe5ba0 805307c5 0000004e 00000007 00015e26 nt!KeBugCheckEx+0x19
f4fe5bc0 80529e23 006af020 f7b9f000 f4fe5c00
nt!MiDecrementReferenceCount+0x47
f4fe5be8 80585019 f4fe5c00 f4fe5d20 825eac90 nt!MmUnlockPages+0x2f9
f4fe5cbc 80584c2c 825eac10 1086e000 82baf768 nt!MiDoMappedCopy+0x175
f4fe5cec 8058629c 825eac10 1086e000 82baf768 nt!MmCopyVirtualMemory+0x73
f4fe5d48 804e7a8c 0000011c 1086e000 006ae020 nt!NtReadVirtualMemory+0xd0
f4fe5d48 7ffe0304 0000011c 1086e000 006ae020 nt!KiSystemService+0xcb
0058b13c 77f43077 77e5a214 0000011c 1086e000
SharedUserData!SystemCallStub+0x4
0058b140 77e5a214 0000011c 1086e000 006ae020
ntdll!ZwReadVirtualMemory+0xc
0058b15c 6d5b3af3 0000011c 1086e000 006ae020
kernel32!ReadProcessMemory+0x19
0058b184 6d5b3b68 0000011c 1086e000 00000000
dbghelp!Win32LiveSystemProvider::ReadVirtual+0x3b
0058b1a4 6d5b0ea3 0000011c 10850000 00000000
dbghelp!Win32LiveSystemProvider::ReadAllVirtual+0x1b
0058b1cc 6d5b10b7 0058b2e8 0058b24c 00090120
dbghelp!WriteMemoryFromProcess+0x33
0058b200 6d5b1338 0058b2e8 0058b24c 00090120
dbghelp!WriteMemoryBlocks+0x31
0058b220 6d5b1560 0058b2e8 0058b24c 00090120 dbghelp!WriteDumpData+0x6f
0058b360 6d5b165c 0000011c 00000900 00264908
dbghelp!MiniDumpProvideDump+0x16f
0058b3c0 6950bc55 0000011c 00000900 00000118
dbghelp!MiniDumpWriteDump+0xc6
0058bd80 6950bd1e 0000011c 00000900 00000118
faultrep!InternalGenerateMinidumpEx+0x6be
0058bdb4 6950be90 0000011c 00000900 0058bdd0
faultrep!InternalGenerateMinidump+0x9a
0058c714 69506ded 0000011c 00000900 0058cbc8
faultrep!InternalGenFullAndTriageMinidumps+0x149
0058d654 74da2db7 0058d670 00002000 00000000
faultrep!ReportFaultToQueue+0x461
0058df24 74da3175 000000d4 0058df7c 0058df70
ersvc!ProcessFaultRequest+0x779
0058ff80 74da344b 0008de28 00000000 00000000 ersvc!ExecServer+0x110
0058ffb8 77e4a990 0008de28 00000000 00000000 ersvc!threadExecServer+0x51
0058ffec 00000000 74da33fa 0008de28 00000000
kernel32!BaseThreadStart+0x34

FOLLOWUP_IP:
nt!MiDecrementReferenceCount+47
805307c5 ff0564845680 inc dword ptr [nt!MiFlushForNonCached
(80568464)]

FOLLOWUP_NAME: MachineOwner

SYMBOL_NAME: nt!MiDecrementReferenceCount+47

MODULE_NAME: nt

IMAGE_NAME: ntoskrnl.exe

DEBUG_FLR_IMAGE_TIMESTAMP: 3e800a79

STACK_COMMAND: kb

BUCKET_ID: 0x4E_nt!MiDecrementReferenceCount+47

Followup: MachineOwner


You are currently subscribed to windbg as: xxxxx@osr.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Hi Tony

Thanks for the advice. I was wondering how on earth that stack lead you to
guess that the previous call was through a file system driver. I dont know
if the first call was from a file system or filter … however there are
indeed a couple of file system filters in the file device stacks on this
machine.

Cheers
Lyndon

“Tony Mason” wrote in message news:xxxxx@windbg…
I suspect the issue is that a driver (or other OS component) has
previously unlocked this memory location. Unfortunately, you are seeing
the SECOND unlock, but the bug is likely in the FIRST unlock. These are
annoying to find and fix, because you will need to identify the driver
that was handling this memory location before this call - and that is
unlikely to be in the context of the current crash.

Actually, from the stack you are on, my guess is that the previous call
was through a file system driver. Do you happen to have some file
system filters installed on this machine? If so, that’d be the logical
place to start looking…

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

Looking forward to seeing you at the Next OSR File Systems Class October
18, 2004 in Silicon Valley!

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Lyndon J Clarke
Sent: Friday, October 01, 2004 5:05 AM
To: Kernel Debugging Interest List
Subject: [windbg] PFN_LIST_CORRUPT from Error Reporting Service on W2K3?

Gentlefolk

I have been given a crash dump from a PFN_LIST_CORRUPT bugcheck on W2K3
for
analysis. I find that at least at the !analyze -v level the faulting
thread
appears to be in the Error Reporting Service ersvc inside svchost.exe,
and
well I am a little bit stuck at the moment. I was hoping some more
experienced types might be able to give me some pointers as to how I can
start to really analyze this dump, so suggestions would be very much
appreciated. Here is a windbag session up to the completion of !analyze
-v
in case that helps.

Thanks in advance
Lyndon

Windows Server 2003 Kernel Version 3790 UP Free x86 compatible
Product: Server, suite: TerminalServer SingleUserTS
Built by: 3790.srv03_rtm.030324-2048
Kernel base = 0x804de000 PsLoadedModuleList = 0x80568c08
Debug session time: Thu Sep 30 19:47:24 2004
System Uptime: 9 days 2:12:36.830
Loading Kernel Symbols



Loading unloaded module list

Loading User Symbols






Bugcheck Analysis



*******



Use !analyze -v to get detailed debugging information.

BugCheck 4E, {7, 15e26, 5c77, 0}

Probably caused by : ntoskrnl.exe ( nt!MiDecrementReferenceCount+47 )

kd> !analyze -v





Bugcheck Analysis



******
*


PFN_LIST_CORRUPT (4e)
Typically caused by drivers passing bad memory descriptor lists (ie:
calling
MmUnlockPages twice with the same list, etc). If a kernel debugger is
available get the stack trace.
Arguments:
Arg1: 00000007, A driver has unlocked a page more times than it locked
it
Arg2: 00015e26, page frame number
Arg3: 00005c77, current share count
Arg4: 00000000, 0

Debugging Details:
------------------

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0x4E

CURRENT_IRQL: 2

LAST_CONTROL_TRANSFER: from 805307c5 to 8053eec8

STACK_TEXT:
f4fe5ba0 805307c5 0000004e 00000007 00015e26 nt!KeBugCheckEx+0x19
f4fe5bc0 80529e23 006af020 f7b9f000 f4fe5c00
nt!MiDecrementReferenceCount+0x47
f4fe5be8 80585019 f4fe5c00 f4fe5d20 825eac90 nt!MmUnlockPages+0x2f9
f4fe5cbc 80584c2c 825eac10 1086e000 82baf768 nt!MiDoMappedCopy+0x175
f4fe5cec 8058629c 825eac10 1086e000 82baf768 nt!MmCopyVirtualMemory+0x73
f4fe5d48 804e7a8c 0000011c 1086e000 006ae020 nt!NtReadVirtualMemory+0xd0
f4fe5d48 7ffe0304 0000011c 1086e000 006ae020 nt!KiSystemService+0xcb
0058b13c 77f43077 77e5a214 0000011c 1086e000
SharedUserData!SystemCallStub+0x4
0058b140 77e5a214 0000011c 1086e000 006ae020
ntdll!ZwReadVirtualMemory+0xc
0058b15c 6d5b3af3 0000011c 1086e000 006ae020
kernel32!ReadProcessMemory+0x19
0058b184 6d5b3b68 0000011c 1086e000 00000000
dbghelp!Win32LiveSystemProvider::ReadVirtual+0x3b
0058b1a4 6d5b0ea3 0000011c 10850000 00000000
dbghelp!Win32LiveSystemProvider::ReadAllVirtual+0x1b
0058b1cc 6d5b10b7 0058b2e8 0058b24c 00090120
dbghelp!WriteMemoryFromProcess+0x33
0058b200 6d5b1338 0058b2e8 0058b24c 00090120
dbghelp!WriteMemoryBlocks+0x31
0058b220 6d5b1560 0058b2e8 0058b24c 00090120 dbghelp!WriteDumpData+0x6f
0058b360 6d5b165c 0000011c 00000900 00264908
dbghelp!MiniDumpProvideDump+0x16f
0058b3c0 6950bc55 0000011c 00000900 00000118
dbghelp!MiniDumpWriteDump+0xc6
0058bd80 6950bd1e 0000011c 00000900 00000118
faultrep!InternalGenerateMinidumpEx+0x6be
0058bdb4 6950be90 0000011c 00000900 0058bdd0
faultrep!InternalGenerateMinidump+0x9a
0058c714 69506ded 0000011c 00000900 0058cbc8
faultrep!InternalGenFullAndTriageMinidumps+0x149
0058d654 74da2db7 0058d670 00002000 00000000
faultrep!ReportFaultToQueue+0x461
0058df24 74da3175 000000d4 0058df7c 0058df70
ersvc!ProcessFaultRequest+0x779
0058ff80 74da344b 0008de28 00000000 00000000 ersvc!ExecServer+0x110
0058ffb8 77e4a990 0008de28 00000000 00000000 ersvc!threadExecServer+0x51
0058ffec 00000000 74da33fa 0008de28 00000000
kernel32!BaseThreadStart+0x34

FOLLOWUP_IP:
nt!MiDecrementReferenceCount+47
805307c5 ff0564845680 inc dword ptr [nt!MiFlushForNonCached
(80568464)]

FOLLOWUP_NAME: MachineOwner

SYMBOL_NAME: nt!MiDecrementReferenceCount+47

MODULE_NAME: nt

IMAGE_NAME: ntoskrnl.exe

DEBUG_FLR_IMAGE_TIMESTAMP: 3e800a79

STACK_COMMAND: kb

BUCKET_ID: 0x4E_nt!MiDecrementReferenceCount+47

Followup: MachineOwner
---------


You are currently subscribed to windbg as: xxxxx@osr.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Hi Lyndon,

Well, I suppose I’m just suspicious of all file systems (and
particularly filter drivers) by nature. But the rationale I used was:

  • I suspect something previously has manipulated this memory region and
    unlocked it when it should not have done so.
  • I see the function name in user mode called
    “faultrep!InternalGenerateMinidump” and this suggested to me a function
    that would generate a minidump (quite a leap there!)
  • I see a function that says “WriteMemoryFromProcess” and this suggests
    that data from the memory of this process is being written into a
    minidump.
  • user dumps are written into files

And voila! I’m into the storage stack. Thus, my “hunch” was that the
previous operation did something with the memory block from this now
dead process. Since we know that this would involve reading from the
process and writing to the file system and I know that this works on a
stock system, the logical guess is that something added to this sequence
would cause the problem. Most people don’t add things that modify the
VM system, but most do add things that modify the file system stack.

Thus, my guess that this is related to an MDL bungling error in a file
system filter driver. Is it a stretch? Yes. But it would give you a
place to look.

I’m guessing if you look at that call in dbghelp!WriteMemoryFromProcess
that you will find it is moving data from the process address space to
the mini dump file. From that (or from the handle table) you can
probably figure out which file.

If you can reproduce this problem, it won’t be so hard to debug - you
can walk the WriteMemoryFromProcess and watch to see what is happening.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

Looking forward to seeing you at the Next OSR File Systems Class October
18, 2004 in Silicon Valley!

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Lyndon J Clarke
Sent: Friday, October 01, 2004 6:18 AM
To: Kernel Debugging Interest List
Subject: Re:[windbg] PFN_LIST_CORRUPT from Error Reporting Service on
W2K3?

Hi Tony

Thanks for the advice. I was wondering how on earth that stack lead you
to
guess that the previous call was through a file system driver. I dont
know
if the first call was from a file system or filter … however there are
indeed a couple of file system filters in the file device stacks on this
machine.

Cheers
Lyndon

“Tony Mason” wrote in message news:xxxxx@windbg…
I suspect the issue is that a driver (or other OS component) has
previously unlocked this memory location. Unfortunately, you are seeing
the SECOND unlock, but the bug is likely in the FIRST unlock. These are
annoying to find and fix, because you will need to identify the driver
that was handling this memory location before this call - and that is
unlikely to be in the context of the current crash.

Actually, from the stack you are on, my guess is that the previous call
was through a file system driver. Do you happen to have some file
system filters installed on this machine? If so, that’d be the logical
place to start looking…

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

Looking forward to seeing you at the Next OSR File Systems Class October
18, 2004 in Silicon Valley!

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Lyndon J Clarke
Sent: Friday, October 01, 2004 5:05 AM
To: Kernel Debugging Interest List
Subject: [windbg] PFN_LIST_CORRUPT from Error Reporting Service on W2K3?

Gentlefolk

I have been given a crash dump from a PFN_LIST_CORRUPT bugcheck on W2K3
for
analysis. I find that at least at the !analyze -v level the faulting
thread
appears to be in the Error Reporting Service ersvc inside svchost.exe,
and
well I am a little bit stuck at the moment. I was hoping some more
experienced types might be able to give me some pointers as to how I can
start to really analyze this dump, so suggestions would be very much
appreciated. Here is a windbag session up to the completion of !analyze
-v
in case that helps.

Thanks in advance
Lyndon

Windows Server 2003 Kernel Version 3790 UP Free x86 compatible
Product: Server, suite: TerminalServer SingleUserTS
Built by: 3790.srv03_rtm.030324-2048
Kernel base = 0x804de000 PsLoadedModuleList = 0x80568c08
Debug session time: Thu Sep 30 19:47:24 2004
System Uptime: 9 days 2:12:36.830
Loading Kernel Symbols



Loading unloaded module list

Loading User Symbols






Bugcheck Analysis



*******



Use !analyze -v to get detailed debugging information.

BugCheck 4E, {7, 15e26, 5c77, 0}

Probably caused by : ntoskrnl.exe ( nt!MiDecrementReferenceCount+47 )

kd> !analyze -v





Bugcheck Analysis



******
*


PFN_LIST_CORRUPT (4e)
Typically caused by drivers passing bad memory descriptor lists (ie:
calling
MmUnlockPages twice with the same list, etc). If a kernel debugger is
available get the stack trace.
Arguments:
Arg1: 00000007, A driver has unlocked a page more times than it locked
it
Arg2: 00015e26, page frame number
Arg3: 00005c77, current share count
Arg4: 00000000, 0

Debugging Details:
------------------

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0x4E

CURRENT_IRQL: 2

LAST_CONTROL_TRANSFER: from 805307c5 to 8053eec8

STACK_TEXT:
f4fe5ba0 805307c5 0000004e 00000007 00015e26 nt!KeBugCheckEx+0x19
f4fe5bc0 80529e23 006af020 f7b9f000 f4fe5c00
nt!MiDecrementReferenceCount+0x47
f4fe5be8 80585019 f4fe5c00 f4fe5d20 825eac90 nt!MmUnlockPages+0x2f9
f4fe5cbc 80584c2c 825eac10 1086e000 82baf768 nt!MiDoMappedCopy+0x175
f4fe5cec 8058629c 825eac10 1086e000 82baf768 nt!MmCopyVirtualMemory+0x73
f4fe5d48 804e7a8c 0000011c 1086e000 006ae020 nt!NtReadVirtualMemory+0xd0
f4fe5d48 7ffe0304 0000011c 1086e000 006ae020 nt!KiSystemService+0xcb
0058b13c 77f43077 77e5a214 0000011c 1086e000
SharedUserData!SystemCallStub+0x4
0058b140 77e5a214 0000011c 1086e000 006ae020
ntdll!ZwReadVirtualMemory+0xc
0058b15c 6d5b3af3 0000011c 1086e000 006ae020
kernel32!ReadProcessMemory+0x19
0058b184 6d5b3b68 0000011c 1086e000 00000000
dbghelp!Win32LiveSystemProvider::ReadVirtual+0x3b
0058b1a4 6d5b0ea3 0000011c 10850000 00000000
dbghelp!Win32LiveSystemProvider::ReadAllVirtual+0x1b
0058b1cc 6d5b10b7 0058b2e8 0058b24c 00090120
dbghelp!WriteMemoryFromProcess+0x33
0058b200 6d5b1338 0058b2e8 0058b24c 00090120
dbghelp!WriteMemoryBlocks+0x31
0058b220 6d5b1560 0058b2e8 0058b24c 00090120 dbghelp!WriteDumpData+0x6f
0058b360 6d5b165c 0000011c 00000900 00264908
dbghelp!MiniDumpProvideDump+0x16f
0058b3c0 6950bc55 0000011c 00000900 00000118
dbghelp!MiniDumpWriteDump+0xc6
0058bd80 6950bd1e 0000011c 00000900 00000118
faultrep!InternalGenerateMinidumpEx+0x6be
0058bdb4 6950be90 0000011c 00000900 0058bdd0
faultrep!InternalGenerateMinidump+0x9a
0058c714 69506ded 0000011c 00000900 0058cbc8
faultrep!InternalGenFullAndTriageMinidumps+0x149
0058d654 74da2db7 0058d670 00002000 00000000
faultrep!ReportFaultToQueue+0x461
0058df24 74da3175 000000d4 0058df7c 0058df70
ersvc!ProcessFaultRequest+0x779
0058ff80 74da344b 0008de28 00000000 00000000 ersvc!ExecServer+0x110
0058ffb8 77e4a990 0008de28 00000000 00000000 ersvc!threadExecServer+0x51
0058ffec 00000000 74da33fa 0008de28 00000000
kernel32!BaseThreadStart+0x34

FOLLOWUP_IP:
nt!MiDecrementReferenceCount+47
805307c5 ff0564845680 inc dword ptr [nt!MiFlushForNonCached
(80568464)]

FOLLOWUP_NAME: MachineOwner

SYMBOL_NAME: nt!MiDecrementReferenceCount+47

MODULE_NAME: nt

IMAGE_NAME: ntoskrnl.exe

DEBUG_FLR_IMAGE_TIMESTAMP: 3e800a79

STACK_COMMAND: kb

BUCKET_ID: 0x4E_nt!MiDecrementReferenceCount+47

Followup: MachineOwner
---------


You are currently subscribed to windbg as: xxxxx@osr.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


You are currently subscribed to windbg as: xxxxx@osr.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

I see. Thanks for the explanation. Excellent. The value of a lot of
experience!

I know as a fact there are two filter drivers in the file system device
stack. There is symantec a/v and something homegrown.
I also know as a fact that the something homegrown filter will not be doing
anything at all with the data which is being written
to minidump file in this scenario (assuming no scribbling memory of course).

I looked at the Mdl passed into MmUnlockPages. The process field happened to
be the process which is the companion service of the
something homegrown filter which I find rather interesting indeed. I guess
that this looks like it will have been a fault in the companion process
which caused the exception reporting service to attempt the minidump.

Now I know that the something homegrown does indeed do MmProbeAndLockPages()
… MmUnlockPages() however every one of these calls are for (mdls which
reference) the virtual address-es of a single contiguous shared memory area
in the user address space of the companion process (again assuming no
scribbling memory of course). I looked at the Mdl field StartVa which I
guess contains the user address and compared this with the user address of
that shared memory area; the StartVa does not fall inside or anywhere near
(it is hundreds of megabytes smalller than) the user address of that shared
memory area. I also looked at the ByteCount (4096) and the ByteOffset (zero)
and the Next (NULL). I am thinking that if my guess about StartVa is correct
then all of this in essence means (assuming no scribbling …) that the
something homegrown filter cannot have been the first MmUnlockPages() in the
scenario you described.

“Tony Mason” wrote in message news:xxxxx@windbg…
Hi Lyndon,

Well, I suppose I’m just suspicious of all file systems (and
particularly filter drivers) by nature. But the rationale I used was:

- I suspect something previously has manipulated this memory region and
unlocked it when it should not have done so.
- I see the function name in user mode called
“faultrep!InternalGenerateMinidump” and this suggested to me a function
that would generate a minidump (quite a leap there!)
- I see a function that says “WriteMemoryFromProcess” and this suggests
that data from the memory of this process is being written into a
minidump.
- user dumps are written into files

And voila! I’m into the storage stack. Thus, my “hunch” was that the
previous operation did something with the memory block from this now
dead process. Since we know that this would involve reading from the
process and writing to the file system and I know that this works on a
stock system, the logical guess is that something added to this sequence
would cause the problem. Most people don’t add things that modify the
VM system, but most do add things that modify the file system stack.

Thus, my guess that this is related to an MDL bungling error in a file
system filter driver. Is it a stretch? Yes. But it would give you a
place to look.

I’m guessing if you look at that call in dbghelp!WriteMemoryFromProcess
that you will find it is moving data from the process address space to
the mini dump file. From that (or from the handle table) you can
probably figure out which file.

If you can reproduce this problem, it won’t be so hard to debug - you
can walk the WriteMemoryFromProcess and watch to see what is happening.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

Looking forward to seeing you at the Next OSR File Systems Class October
18, 2004 in Silicon Valley!

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Lyndon J Clarke
Sent: Friday, October 01, 2004 6:18 AM
To: Kernel Debugging Interest List
Subject: Re:[windbg] PFN_LIST_CORRUPT from Error Reporting Service on
W2K3?

Hi Tony

Thanks for the advice. I was wondering how on earth that stack lead you
to
guess that the previous call was through a file system driver. I dont
know
if the first call was from a file system or filter … however there are
indeed a couple of file system filters in the file device stacks on this
machine.

Cheers
Lyndon

“Tony Mason” wrote in message news:xxxxx@windbg…
I suspect the issue is that a driver (or other OS component) has
previously unlocked this memory location. Unfortunately, you are seeing
the SECOND unlock, but the bug is likely in the FIRST unlock. These are
annoying to find and fix, because you will need to identify the driver
that was handling this memory location before this call - and that is
unlikely to be in the context of the current crash.

Actually, from the stack you are on, my guess is that the previous call
was through a file system driver. Do you happen to have some file
system filters installed on this machine? If so, that’d be the logical
place to start looking…

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

Looking forward to seeing you at the Next OSR File Systems Class October
18, 2004 in Silicon Valley!

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Lyndon J Clarke
Sent: Friday, October 01, 2004 5:05 AM
To: Kernel Debugging Interest List
Subject: [windbg] PFN_LIST_CORRUPT from Error Reporting Service on W2K3?

Gentlefolk

I have been given a crash dump from a PFN_LIST_CORRUPT bugcheck on W2K3
for
analysis. I find that at least at the !analyze -v level the faulting
thread
appears to be in the Error Reporting Service ersvc inside svchost.exe,
and
well I am a little bit stuck at the moment. I was hoping some more
experienced types might be able to give me some pointers as to how I can
start to really analyze this dump, so suggestions would be very much
appreciated. Here is a windbag session up to the completion of !analyze
-v
in case that helps.

Thanks in advance
Lyndon

Windows Server 2003 Kernel Version 3790 UP Free x86 compatible
Product: Server, suite: TerminalServer SingleUserTS
Built by: 3790.srv03_rtm.030324-2048
Kernel base = 0x804de000 PsLoadedModuleList = 0x80568c08
Debug session time: Thu Sep 30 19:47:24 2004
System Uptime: 9 days 2:12:36.830
Loading Kernel Symbols



Loading unloaded module list

Loading User Symbols






Bugcheck Analysis



*******



Use !analyze -v to get detailed debugging information.

BugCheck 4E, {7, 15e26, 5c77, 0}

Probably caused by : ntoskrnl.exe ( nt!MiDecrementReferenceCount+47 )

kd> !analyze -v





Bugcheck Analysis



******
*


PFN_LIST_CORRUPT (4e)
Typically caused by drivers passing bad memory descriptor lists (ie:
calling
MmUnlockPages twice with the same list, etc). If a kernel debugger is
available get the stack trace.
Arguments:
Arg1: 00000007, A driver has unlocked a page more times than it locked
it
Arg2: 00015e26, page frame number
Arg3: 00005c77, current share count
Arg4: 00000000, 0

Debugging Details:
------------------

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0x4E

CURRENT_IRQL: 2

LAST_CONTROL_TRANSFER: from 805307c5 to 8053eec8

STACK_TEXT:
f4fe5ba0 805307c5 0000004e 00000007 00015e26 nt!KeBugCheckEx+0x19
f4fe5bc0 80529e23 006af020 f7b9f000 f4fe5c00
nt!MiDecrementReferenceCount+0x47
f4fe5be8 80585019 f4fe5c00 f4fe5d20 825eac90 nt!MmUnlockPages+0x2f9
f4fe5cbc 80584c2c 825eac10 1086e000 82baf768 nt!MiDoMappedCopy+0x175
f4fe5cec 8058629c 825eac10 1086e000 82baf768 nt!MmCopyVirtualMemory+0x73
f4fe5d48 804e7a8c 0000011c 1086e000 006ae020 nt!NtReadVirtualMemory+0xd0
f4fe5d48 7ffe0304 0000011c 1086e000 006ae020 nt!KiSystemService+0xcb
0058b13c 77f43077 77e5a214 0000011c 1086e000
SharedUserData!SystemCallStub+0x4
0058b140 77e5a214 0000011c 1086e000 006ae020
ntdll!ZwReadVirtualMemory+0xc
0058b15c 6d5b3af3 0000011c 1086e000 006ae020
kernel32!ReadProcessMemory+0x19
0058b184 6d5b3b68 0000011c 1086e000 00000000
dbghelp!Win32LiveSystemProvider::ReadVirtual+0x3b
0058b1a4 6d5b0ea3 0000011c 10850000 00000000
dbghelp!Win32LiveSystemProvider::ReadAllVirtual+0x1b
0058b1cc 6d5b10b7 0058b2e8 0058b24c 00090120
dbghelp!WriteMemoryFromProcess+0x33
0058b200 6d5b1338 0058b2e8 0058b24c 00090120
dbghelp!WriteMemoryBlocks+0x31
0058b220 6d5b1560 0058b2e8 0058b24c 00090120 dbghelp!WriteDumpData+0x6f
0058b360 6d5b165c 0000011c 00000900 00264908
dbghelp!MiniDumpProvideDump+0x16f
0058b3c0 6950bc55 0000011c 00000900 00000118
dbghelp!MiniDumpWriteDump+0xc6
0058bd80 6950bd1e 0000011c 00000900 00000118
faultrep!InternalGenerateMinidumpEx+0x6be
0058bdb4 6950be90 0000011c 00000900 0058bdd0
faultrep!InternalGenerateMinidump+0x9a
0058c714 69506ded 0000011c 00000900 0058cbc8
faultrep!InternalGenFullAndTriageMinidumps+0x149
0058d654 74da2db7 0058d670 00002000 00000000
faultrep!ReportFaultToQueue+0x461
0058df24 74da3175 000000d4 0058df7c 0058df70
ersvc!ProcessFaultRequest+0x779
0058ff80 74da344b 0008de28 00000000 00000000 ersvc!ExecServer+0x110
0058ffb8 77e4a990 0008de28 00000000 00000000 ersvc!threadExecServer+0x51
0058ffec 00000000 74da33fa 0008de28 00000000
kernel32!BaseThreadStart+0x34

FOLLOWUP_IP:
nt!MiDecrementReferenceCount+47
805307c5 ff0564845680 inc dword ptr [nt!MiFlushForNonCached
(80568464)]

FOLLOWUP_NAME: MachineOwner

SYMBOL_NAME: nt!MiDecrementReferenceCount+47

MODULE_NAME: nt

IMAGE_NAME: ntoskrnl.exe

DEBUG_FLR_IMAGE_TIMESTAMP: 3e800a79

STACK_COMMAND: kb

BUCKET_ID: 0x4E_nt!MiDecrementReferenceCount+47

Followup: MachineOwner
---------


You are currently subscribed to windbg as: xxxxx@osr.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


You are currently subscribed to windbg as: xxxxx@osr.com
To unsubscribe send a blank email to xxxxx@lists.osr.com