How did this happen?

Hi all - This is a long-winded question, so please bear with me.

My FSD is trying to MDL-write a large file. The FSD uses a user-mode
service to actually write the data somewhere, (so I must make a copy of
the IRP and copy the buffer to my own MDL and re-process the write
request, so I don’t hold up the MPW thread). To prevent the application
trying to write though Cache Manager from consuming all memory, I’ve tried
to limit the amount of pages that CC will maintain for each file stream by
using this call:

CcSetDirtyPageThreshold( FileObject, 1000);

When I copy a big file, everything is ok for a while, then I get the
following assert:

SRV: Assertion Failed at line 924 in D:\nt\private\ntos\srv\io.c
Break instruction exception - code 80000003 (first chance)
nt!DbgBreakPoint:
804a9bc0 cc int 3

This happens around this location:

srv!SrvIssueMdlCompleteRequest+e9:

If I allow it to continue, I get another assert immediately:

SRV: IRP FF92D3E8 already in use at 2510!
Break instruction exception - code 80000003 (first chance)
nt!DbgBreakPoint:
804a9bc0 cc int 3

Thread info is:

kd> !thread
THREAD 87736580 Cid 8.33c Teb: 00000000 Win32Thread: 00000000
RUNNING
Not impersonating
Owning Process 8188b7a0
WaitTime (seconds) 6694288
Context Switch Count 2100
UserTime 0:00:00.0000
KernelTime 0:00:08.0362
Start Address srv!WorkerThread (0xbb2398de)
Stack Init ed488000 Current ed4877ac Base ed488000 Limit ed485000 Call
0
Priority 9 BasePriority 9 PriorityDecrement 0 DecrementCount 0

ChildEBP RetAddr Args to Child
ed487d2c bb222855 ff92b058 ff92d2b8 ff92b0ff nt!DbgBreakPoint
ed487d4c bb296dc5 ff92d690 bb2136a3 ff92d2b8 srv!SrvStartSend2+0x93
ed487d80 bb239ac9 ff92d2ff 00000000 00000000
srv!SrvFsdRestartWriteAndX+0x68a
ed487da8 804a91e6 ff92d2b8 00000000 00000000 srv!WorkerThread+0x1eb
ed487ddc 804c3d4a bb2398de 87a5c3e0 00000000
nt!PspSystemThreadStartup+0x54
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

Then I let it run some more, and it asserts with this:

*** Assertion failed: (MemoryDescriptorList->MdlFlags & MDL_PAGES_LOCKED)
!= 0
*** Source File: F:\nt\private\ntos\mm\iosup.c, line 1354

Break, Ignore, Terminate Process or Terminate Thread (bipt)?

kd> b

Execute ‘!cxr ED427918’ to dump context
Break instruction exception - code 80000003 (first chance)
nt!DbgBreakPoint:
804a9bc0 cc int 3

kd> .cxr ED427918
eax=ed427918 ebx=ff8df808 ecx=009bb0c8 edx=ff92d65c esi=00000000
edi=ff92d690
eip=80451b47 esp=ed427bf0 ebp=ed427c14 iopl=0 nv up ei ng nz na po
nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000
efl=00000286
nt!MmUnlockPages+28:
80451b47 eb02 jmp nt!MmUnlockPages+0x2c (80451b4b)
kd> kv
*** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr Args to Child
ed427c14 80414712 ff92d690 82548ec8 804d4200 nt!MmUnlockPages+0x28 (FPO:
[Non-Fpo])
ed427c40 804146d7 00000000 ff92d65c ff92d690 nt!CcMdlWriteComplete2+0x36
(FPO: [Non-Fpo])
ed427c54 bae4b5c6 ff8df808 ff92d65c ff92d690 nt!CcMdlWriteComplete+0x3f
(FPO: [Non-Fpo])
ed427c90 bae50201 ff92d3e8 ff92d650 00000000 ntsfsd!FsdMdlComplete+0xb5
(FPO: [Non-Fpo]) (CONV: stdcall) [c:\current\driver\fsdread.c @ 810]
ed427ce8 bae4867f ff92d3e8 ff92d650 82548ec8 ntsfsd!FsdWriteRequest+0xd8
(FPO: [Non-Fpo]) (CONV: stdcall) [c:\current\driver\fsdwrite.c @ 144]
ed427d40 8041c1ec 82548ec8 00000000 00000000 ntsfsd!FspDispatch+0xb6 (FPO:
[Non-Fpo]) (CONV: stdcall) [c:\current\driver\fsddisp.c @ 125]
ed427da8 804a91e6 00000000 00000000 00000000 nt!ExpWorkerThread+0x106
(FPO: [Non-Fpo])
ed427ddc 804c3d4a 8041c0e6 00000000 00000000
nt!PspSystemThreadStartup+0x54 (FPO: [Non-Fpo])
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

Then I let it run again:

*** Assertion failed: (MemoryDescriptorList->MdlFlags & MDL_PAGES_LOCKED)
!= 0
*** Source File: F:\nt\private\ntos\mm\iosup.c, line 1354

Break, Ignore, Terminate Process or Terminate Thread (bipt)?

*** Assertion failed: (MemoryDescriptorList->MdlFlags &
MDL_SOURCE_IS_NONPAGED_POOL) == 0
*** Source File: F:\nt\private\ntos\mm\iosup.c, line 1355

Break, Ignore, Terminate Process or Terminate Thread (bipt)?

*** Assertion failed: Pfn1->u3.e1.PageLocation != ActiveAndValid
*** Source File: F:\nt\private\ntos\mm\iosup.c, line 1454

Break, Ignore, Terminate Process or Terminate Thread (bipt)?

*** Assertion failed: Pfn1->u2.ShareCount == 0
*** Source File: F:\nt\private\ntos\mm\iosup.c, line 1454

Break, Ignore, Terminate Process or Terminate Thread (bipt)?

*** Assertion failed: Pfn1->u3.e1.LockCharged == 1
*** Source File: F:\nt\private\ntos\mm\iosup.c, line 1454

Break, Ignore, Terminate Process or Terminate Thread (bipt)?

*** Fatal System Error: 0x0000004e
(0x00000007,0x00007C87,0x00000001,0x00000000)

Break instruction exception - code 80000003 (first chance)

A fatal system error has occurred.
Debugger entered on first try; Bugcheck callbacks have not been invoked.

A fatal system error has occurred.

*******************************************************************************

*
*
* Bugcheck Analysis
*
*
*
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 4E, {7, 7c87, 1, 0}

Probably caused by : ntsfsd.sys ( ntsfsd!FsdMdlComplete+b5 )

Followup: MachineOwner

nt!RtlpBreakWithStatusInstruction:
804a9bcc cc int 3

A “4E” bugcheck means:

*******************************************************************************

*
*
* Bugcheck Analysis
*
*
*
*******************************************************************************

PFN_LIST_CORRUPT (4e)
Typically caused by drivers passing bad memory descriptor lists (ie:
calling
MmUnlockPages twice with the same list, etc). If a kernel debugger is
available get the stack trace.
Arguments:
Arg1: 00000007, A driver has unlocked a page more times than it locked it
Arg2: 00007c87, page frame number
Arg3: 00000001, current share count
Arg4: 00000000, 0

Debugging Details:

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0x4E

LAST_CONTROL_TRANSFER: from 80438bab to 804a9bcc

STACK_TEXT:
ed427818 80438bab 00000003 ed427860 00000007
nt!RtlpBreakWithStatusInstruction
ed427848 8043919e 00000003 00007c87 81985ca8 nt!KiBugCheckDebugBreak+0x31
ed427bd4 80499666 0000004e 00000007 00007c87 nt!KeBugCheckEx+0x390
ed427bf8 80451df4 ff92d690 8169e4c8 ff8df808
nt!MiDecrementReferenceCount+0x95
ed427c14 80414712 ff92d600 82548ec8 804d4200 nt!MmUnlockPages+0x2d5
ed427c40 804146d7 00000000 ff92d65c ff92d690 nt!CcMdlWriteComplete2+0x36
ed427c54 bae4b5c6 ff8df808 ff92d65c ff92d690 nt!CcMdlWriteComplete+0x3f
ed427c90 bae50201 ff92d3e8 ff92d650 00000000 ntsfsd!FsdMdlComplete+0xb5
[c:\current\driver\fsdread.c @ 810]
ed427ce8 bae4867f ff92d3e8 ff92d650 82548ec8 ntsfsd!FsdWriteRequest+0xd8
[c:\current\driver\fsdwrite.c @ 144]
ed427d40 8041c1ec 82548ec8 00000000 00000000 ntsfsd!FspDispatch+0xb6
[c:\current\driver\fsddisp.c @ 125]
ed427da8 804a91e6 00000000 00000000 00000000 nt!ExpWorkerThread+0x106
ed427ddc 804c3d4a 8041c0e6 00000000 00000000
nt!PspSystemThreadStartup+0x54
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

FOLLOWUP_IP:
ntsfsd!FsdMdlComplete+b5
bae4b5c6 8b5508 mov edx,[ebp+0x8]

FOLLOWUP_NAME: MachineOwner

SYMBOL_NAME: ntsfsd!FsdMdlComplete+b5

MODULE_NAME: ntsfsd

IMAGE_NAME: ntsfsd.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 3da32c7e

STACK_COMMAND: kb

BUCKET_ID: 0x4E_ntsfsd!FsdMdlComplete+b5

Followup: MachineOwner

Now as far as I can tell, I’m not doing a MmUnlockPages, unless that
happens in here:

CcMdlWriteComplete( IrpSp->FileObject,
&(IrpSp->Parameters.Write.ByteOffset), Irp->MdlAddress);

I also can’t see where this function is getting called more than once per
MDL Write IRP.

My questions are:

  • What caused the first assert?

  • Why does this work (albeit with very poor performance) when I specify
    CcSetDirtyPageThreshold( FileObject, 10); // instead of 1000

Thanks in advance for any help you may be able to give, and again, sorry
for the large post.

Greg

Greg,

I believe I’ve previously reported a problem in this area to the SRV team
(at a plugfest, actually.) The problem I reported was when returning
STATUS_PENDING from a filter. This is ALLOWED in the I/O model on Windows,
but SRV does not handle it properly (it should block and wait if it wants
synchronous I/O, or it should use IoForwardIrpSynchronously.)

Thus, SRV is complaining because you haven’t returned STATUS_SUCCESS (you
can see this in the disassembly - it is looking to ensure the neither of the
upper two bits are set, since if they are you don’t have a success code.)

The subsequent problems are likely related to that, as they are complaining
that the tear-down of an MDL is unlocking pages that aren’t locked - and
this is a very bad thing. You are dropping the reference count on the page,
but the page isn’t locked by this MDL. THAT is ultimately what leads to the
PFN list corrupt problem.

Short-term fix: block and wait in your filter for IRP_MN_MDL|IRP_MN_COMPLETE
(it isn’t a big deal, since this is nothing more than an MDL tear-down.)
That will get rid of your initial assert. I suspect that may resolve your
secondary problem, but there is no guarantee of this (it could be some other
problem in your driver.)

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: Greg Pearce [mailto:xxxxx@filetek.com]
Sent: Wednesday, October 09, 2002 10:52 AM
To: File Systems Developers
Subject: [ntfsd] How did this happen?

Hi all - This is a long-winded question, so please bear with me.

My FSD is trying to MDL-write a large file. The FSD uses a user-mode
service to actually write the data somewhere, (so I must make a copy of
the IRP and copy the buffer to my own MDL and re-process the write
request, so I don’t hold up the MPW thread). To prevent the application
trying to write though Cache Manager from consuming all memory, I’ve tried
to limit the amount of pages that CC will maintain for each file stream by
using this call:

CcSetDirtyPageThreshold( FileObject, 1000);

When I copy a big file, everything is ok for a while, then I get the
following assert:

SRV: Assertion Failed at line 924 in D:\nt\private\ntos\srv\io.c
Break instruction exception - code 80000003 (first chance)
nt!DbgBreakPoint:
804a9bc0 cc int 3

This happens around this location:

srv!SrvIssueMdlCompleteRequest+e9:

If I allow it to continue, I get another assert immediately:

SRV: IRP FF92D3E8 already in use at 2510!
Break instruction exception - code 80000003 (first chance)
nt!DbgBreakPoint:
804a9bc0 cc int 3

Thread info is:

kd> !thread
THREAD 87736580 Cid 8.33c Teb: 00000000 Win32Thread: 00000000
RUNNING
Not impersonating
Owning Process 8188b7a0
WaitTime (seconds) 6694288
Context Switch Count 2100
UserTime 0:00:00.0000
KernelTime 0:00:08.0362
Start Address srv!WorkerThread (0xbb2398de)
Stack Init ed488000 Current ed4877ac Base ed488000 Limit ed485000 Call
0
Priority 9 BasePriority 9 PriorityDecrement 0 DecrementCount 0

ChildEBP RetAddr Args to Child
ed487d2c bb222855 ff92b058 ff92d2b8 ff92b0ff nt!DbgBreakPoint
ed487d4c bb296dc5 ff92d690 bb2136a3 ff92d2b8 srv!SrvStartSend2+0x93
ed487d80 bb239ac9 ff92d2ff 00000000 00000000
srv!SrvFsdRestartWriteAndX+0x68a
ed487da8 804a91e6 ff92d2b8 00000000 00000000 srv!WorkerThread+0x1eb
ed487ddc 804c3d4a bb2398de 87a5c3e0 00000000
nt!PspSystemThreadStartup+0x54
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

Then I let it run some more, and it asserts with this:

*** Assertion failed: (MemoryDescriptorList->MdlFlags & MDL_PAGES_LOCKED)
!= 0
*** Source File: F:\nt\private\ntos\mm\iosup.c, line 1354

Break, Ignore, Terminate Process or Terminate Thread (bipt)?

kd> b

Execute ‘!cxr ED427918’ to dump context
Break instruction exception - code 80000003 (first chance)
nt!DbgBreakPoint:
804a9bc0 cc int 3

kd> .cxr ED427918
eax=ed427918 ebx=ff8df808 ecx=009bb0c8 edx=ff92d65c esi=00000000
edi=ff92d690
eip=80451b47 esp=ed427bf0 ebp=ed427c14 iopl=0 nv up ei ng nz na po
nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000
efl=00000286
nt!MmUnlockPages+28:
80451b47 eb02 jmp nt!MmUnlockPages+0x2c (80451b4b)
kd> kv
*** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr Args to Child
ed427c14 80414712 ff92d690 82548ec8 804d4200 nt!MmUnlockPages+0x28 (FPO:
[Non-Fpo])
ed427c40 804146d7 00000000 ff92d65c ff92d690 nt!CcMdlWriteComplete2+0x36
(FPO: [Non-Fpo])
ed427c54 bae4b5c6 ff8df808 ff92d65c ff92d690 nt!CcMdlWriteComplete+0x3f
(FPO: [Non-Fpo])
ed427c90 bae50201 ff92d3e8 ff92d650 00000000 ntsfsd!FsdMdlComplete+0xb5
(FPO: [Non-Fpo]) (CONV: stdcall) [c:\current\driver\fsdread.c @ 810]
ed427ce8 bae4867f ff92d3e8 ff92d650 82548ec8 ntsfsd!FsdWriteRequest+0xd8
(FPO: [Non-Fpo]) (CONV: stdcall) [c:\current\driver\fsdwrite.c @ 144]
ed427d40 8041c1ec 82548ec8 00000000 00000000 ntsfsd!FspDispatch+0xb6 (FPO:
[Non-Fpo]) (CONV: stdcall) [c:\current\driver\fsddisp.c @ 125]
ed427da8 804a91e6 00000000 00000000 00000000 nt!ExpWorkerThread+0x106
(FPO: [Non-Fpo])
ed427ddc 804c3d4a 8041c0e6 00000000 00000000
nt!PspSystemThreadStartup+0x54 (FPO: [Non-Fpo])
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

Then I let it run again:

*** Assertion failed: (MemoryDescriptorList->MdlFlags & MDL_PAGES_LOCKED)
!= 0
*** Source File: F:\nt\private\ntos\mm\iosup.c, line 1354

Break, Ignore, Terminate Process or Terminate Thread (bipt)?

*** Assertion failed: (MemoryDescriptorList->MdlFlags &
MDL_SOURCE_IS_NONPAGED_POOL) == 0
*** Source File: F:\nt\private\ntos\mm\iosup.c, line 1355

Break, Ignore, Terminate Process or Terminate Thread (bipt)?

*** Assertion failed: Pfn1->u3.e1.PageLocation != ActiveAndValid
*** Source File: F:\nt\private\ntos\mm\iosup.c, line 1454

Break, Ignore, Terminate Process or Terminate Thread (bipt)?

*** Assertion failed: Pfn1->u2.ShareCount == 0
*** Source File: F:\nt\private\ntos\mm\iosup.c, line 1454

Break, Ignore, Terminate Process or Terminate Thread (bipt)?

*** Assertion failed: Pfn1->u3.e1.LockCharged == 1
*** Source File: F:\nt\private\ntos\mm\iosup.c, line 1454

Break, Ignore, Terminate Process or Terminate Thread (bipt)?

*** Fatal System Error: 0x0000004e
(0x00000007,0x00007C87,0x00000001,0x00000000)

Break instruction exception - code 80000003 (first chance)

A fatal system error has occurred.
Debugger entered on first try; Bugcheck callbacks have not been invoked.

A fatal system error has occurred.

****************************************************************************
***

*
*
* Bugcheck Analysis
*
*
*
****************************************************************************
***

Use !analyze -v to get detailed debugging information.

BugCheck 4E, {7, 7c87, 1, 0}

Probably caused by : ntsfsd.sys ( ntsfsd!FsdMdlComplete+b5 )

Followup: MachineOwner

nt!RtlpBreakWithStatusInstruction:
804a9bcc cc int 3

A “4E” bugcheck means:

****************************************************************************
***

*
*
* Bugcheck Analysis
*
*
*
****************************************************************************
***

PFN_LIST_CORRUPT (4e)
Typically caused by drivers passing bad memory descriptor lists (ie:
calling
MmUnlockPages twice with the same list, etc). If a kernel debugger is
available get the stack trace.
Arguments:
Arg1: 00000007, A driver has unlocked a page more times than it locked it
Arg2: 00007c87, page frame number
Arg3: 00000001, current share count
Arg4: 00000000, 0

Debugging Details:

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0x4E

LAST_CONTROL_TRANSFER: from 80438bab to 804a9bcc

STACK_TEXT:
ed427818 80438bab 00000003 ed427860 00000007
nt!RtlpBreakWithStatusInstruction
ed427848 8043919e 00000003 00007c87 81985ca8 nt!KiBugCheckDebugBreak+0x31
ed427bd4 80499666 0000004e 00000007 00007c87 nt!KeBugCheckEx+0x390
ed427bf8 80451df4 ff92d690 8169e4c8 ff8df808
nt!MiDecrementReferenceCount+0x95
ed427c14 80414712 ff92d600 82548ec8 804d4200 nt!MmUnlockPages+0x2d5
ed427c40 804146d7 00000000 ff92d65c ff92d690 nt!CcMdlWriteComplete2+0x36
ed427c54 bae4b5c6 ff8df808 ff92d65c ff92d690 nt!CcMdlWriteComplete+0x3f
ed427c90 bae50201 ff92d3e8 ff92d650 00000000 ntsfsd!FsdMdlComplete+0xb5
[c:\current\driver\fsdread.c @ 810]
ed427ce8 bae4867f ff92d3e8 ff92d650 82548ec8 ntsfsd!FsdWriteRequest+0xd8
[c:\current\driver\fsdwrite.c @ 144]
ed427d40 8041c1ec 82548ec8 00000000 00000000 ntsfsd!FspDispatch+0xb6
[c:\current\driver\fsddisp.c @ 125]
ed427da8 804a91e6 00000000 00000000 00000000 nt!ExpWorkerThread+0x106
ed427ddc 804c3d4a 8041c0e6 00000000 00000000
nt!PspSystemThreadStartup+0x54
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

FOLLOWUP_IP:
ntsfsd!FsdMdlComplete+b5
bae4b5c6 8b5508 mov edx,[ebp+0x8]

FOLLOWUP_NAME: MachineOwner

SYMBOL_NAME: ntsfsd!FsdMdlComplete+b5

MODULE_NAME: ntsfsd

IMAGE_NAME: ntsfsd.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 3da32c7e

STACK_COMMAND: kb

BUCKET_ID: 0x4E_ntsfsd!FsdMdlComplete+b5

Followup: MachineOwner

Now as far as I can tell, I’m not doing a MmUnlockPages, unless that
happens in here:

CcMdlWriteComplete( IrpSp->FileObject,
&(IrpSp->Parameters.Write.ByteOffset), Irp->MdlAddress);

I also can’t see where this function is getting called more than once per
MDL Write IRP.

My questions are:

  • What caused the first assert?

  • Why does this work (albeit with very poor performance) when I specify
    CcSetDirtyPageThreshold( FileObject, 10); // instead of 1000

Thanks in advance for any help you may be able to give, and again, sorry
for the large post.

Greg


You are currently subscribed to ntfsd as: xxxxx@osr.com
To unsubscribe send a blank email to %%email.unsub%%

Correction: I do see in the stack trace, where CcMdlWriteComplete is
calling MmUnlockPages. But my questions still remain.

Thanks -

Greg

Thanks Tony - Does it matter that this isn’t a filter driver?

I hate sounding really stupid, but how did you determine that I was
returning STATUS_PENDING in the trace I posted?

Thanks - Greg

Greg,

Actually, it doesn’t matter that it isn’t a filter driver. Sorry, I forgot
you are doing a file system (so many people end up doing filter drivers.
Lots of people are masochists.)

If the HIGHEST driver returns STATUS_PENDING (filter or file system) SRV
doesn’t handle it properly. If you do this on XP (or .NET) it is fixed (I
just looked at the disassembly for XP SP0).

Just disassemble srv!SrvIssueMdlCompleteRequest and you will see the
relevant code (it is pretty linear code, so the check is just after the
IoCallDriver call, which of course is a call through the import table…)
And actually, since I answered from memory, I got it wrong. It doesn’t look
for a success code - STATUS_PENDING is a success code after all. Instead,
it looks for STATUS_SUCCESS (hence the check on the EAX register, as that’s
the return value from the underlying driver.)

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: Greg Pearce [mailto:xxxxx@filetek.com]
Sent: Wednesday, October 09, 2002 11:29 AM
To: File Systems Developers
Subject: [ntfsd] RE: How did this happen?

Thanks Tony - Does it matter that this isn’t a filter driver?

I hate sounding really stupid, but how did you determine that I was
returning STATUS_PENDING in the trace I posted?

Thanks - Greg


You are currently subscribed to ntfsd as: xxxxx@osr.com
To unsubscribe send a blank email to %%email.unsub%%

Tony -

As usual, your the man! Thanks for clarifying this - everything works as
it should now! I found a place where it was possible for me to return
STATUS_PENDING - and alas! That was it!

Thanks again -

When is your next debugging course, preferably in Washington DC?

Greg