Lazy writer thread hang in MiWaitForPageWriteCompletion

I have an FSD on which I am now performing some general use testing. I am experiencing a strange issue where the LazyWriter thread hangs in MiWaitForPageWriteCompletion and never completes.

My understanding of this problem is as follows:

  1. The Lazy Writer calls CcWriteBehind.
  2. CcWriteBehind executes AcquireForLazyWrite on my FSD. My FSD responds by acquiring the Main and PagingIo resources for the file in question.
  3. CcWriteBehind calls MmFlushSection/MiFlushSectionInternal, which results in a WRITE PAGING_IO IRP to be sent to my FSD.
  4. After the IRP completes, MiFlushSectionInternal calls MiWaitForPageWriteCompletion which then blocks in KeWaitForGate *FOREVER*!
  5. CcWriteBehind never completes and thus the file remains locked forever and cannot have any additional operations posted on it.

This problem happens intermittently after executing the command below a few times. Usually it takes just 2-3 executions of the command for the problem to appear:
copy /y LARGE-DIR* z:\

An additional detail: my FSD acts as a “proxy” for a user-mode file system and uses a zero copy technique to map read/write buffers in the virtual address space of the user-mode file system process. It uses the MmMapLockedPagesSpecifyCache(UserMode) call to do the mapping (in the correct process context). It uses MmUnmapLockedPages (again in the correct process context) during READ/WRITE IRP completion.

I am not certain if this detail is relevant but I am mentioning it because disassembly of MiWaitForPageWriteCompletion reveals a potential relationship to MDL/page handling (MiUnlockProtoPoolPage).

I attach the call stack of the Lazy Writer thread and the results of !irpfind at the time of this hang (the results of !irpfind confirm that I am not doing something stupid like not completing a WRITE IRP – there is only a CREATE and IRP_MN_NOTIFY_CHANGE_DIRECTORY posted to my device).

Any insights greatly appreciated.

Bill

kd> !thread fffffa8001a82b00
THREAD fffffa8001a82b00 Cid 0004.0078 Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (WrVirtualMemory) KernelMode Non-Alertable
fffff88002fcc620 Gate
Not impersonating
DeviceMap fffff8a000009770
Owning Process fffffa8001a70040 Image: System
Attached Process N/A Image: N/A
Wait Start TickCount 16901 Ticks: 330 (0:00:00:05.156)
Context Switch Count 2094 IdealProcessor: 0
UserTime 00:00:00.000
KernelTime 00:00:00.250
Win32 Start Address nt!ExpWorkerThread (0xfffff800c1ebc250)
Stack Init fffff88002fccc90 Current fffff88002fcc360
Base fffff88002fcd000 Limit fffff88002fc7000 Call 0
Priority 15 BasePriority 13 UnusualBoost 0 ForegroundBoost 0 IoPriority 0 PagePriority 5
Child-SP RetAddr : Args to Child : Call Site
fffff88002fcc3a0 fffff800c1eabf7c : 0000000000000001 0000000000000000 0000000000000024 fffffa8004b4d560 : nt!KiSwapContext+0x76
fffff88002fcc4e0 fffff800c1e64d2f : fffff98000000002 fffff80000000000 0000000000000012 0000000000000000 : nt!KiCommitThreadWait+0x23c
fffff88002fcc5a0 fffff800c1e1ac5a : fffffa8003252198 fffffa800172e0f0 fffffa8003252150 fffff800c1ef6f9e : nt!KeWaitForGate+0x10f
fffff88002fcc5f0 fffff800c1efb2ba : 000000007ba058c0 fffffa800172e0f0 fffff8a002124110 fffff8a002124110 : nt!MiWaitForPageWriteCompletion+0xa6
fffff88002fcc660 fffff800c1efd639 : fffff88002fccaf0 0000000000000000 0000000000000000 fffff800c1ee66b4 : nt!MiFlushSectionInternal+0xb5f
fffff88002fcc870 fffff800c1efb98b : 000000000000420b 0000000000004200 0000000000000000 fffffa8002ac0550 : nt!MmFlushSection+0x205
fffff88002fcc930 fffff800c1efe515 : fffff980031d6fd8 0000000000000000 0000000000000001 0000000000000000 : nt!CcFlushCachePriv+0x3fb
fffff88002fcca30 fffff800c1ef6073 : 0000000000000000 0000000000000000 0000000000020000 7fffffffffffffff : nt!CcWriteBehind+0x225
fffff88002fccab0 fffff800c1ebc391 : fffff800c208d080 fffffa8001a82b00 fffffa8001a973f0 fffff800c208d080 : nt!CcWorkerThread+0x233
fffff88002fccb80 fffff800c1e2b521 : 0000000000000000 0000000000000080 fffff800c1ebc250 fffffa8001a82b00 : nt!ExpWorkerThread+0x142
fffff88002fccc10 fffff800c1e69dd6 : fffff800c20ff180 fffffa8001a82b00 fffffa8001a6db00 fffffa8001a70040 : nt!PspSystemThreadStartup+0x59
fffff88002fccc60 0000000000000000 : fffff88002fcd000 fffff88002fc7000 0000000000000000 0000000000000000 : nt!KiStartSystemThread+0x16

kd> !irpfind 0 0 device 0xfffffa80`0321f060
Looking for IRPs with device object == fffffa800321f060

*** CacheSize too low - increasing to 102 MB

Max cache size is : 107347968 bytes (0x19980 KB)
Total memory in cache : 5576 bytes (0x6 KB)
Number of regions cached: 28
251 full reads broken into 264 partial reads
counts: 230 cached/34 uncached, 87.12% cached
bytes : 2784 cached/3768 uncached, 42.49% cached
** Transition PTEs are implicitly decoded
** Prototype PTEs are implicitly decoded

Scanning large pool allocation table for tag 0x3f707249 (Irp?) (fffffa8004c05000 : fffffa8004d85000)

Searching nonpaged pool (fffffa8001a00000 : fffffa80f5400000) for tag 0x3f707249 (Irp?)

Searching special pool (fffff98000000000 : fffffa8000000000) for tag 0x3f707249 (Irp?)

Irp [Thread] irpStack: (Mj,Mn) DevObj [Driver] MDL Process
fffff98009882ea0 [fffffa8002ada440] irpStack: ( 0, 0) fffffa800321f060 [\FileSystem\WinFsp]
fffff980098f2ea0 [fffffa8004818080] irpStack: ( c, 2) fffffa800321f060 [\FileSystem\WinFsp]

Can you show us also the 2 IRPs as well, where are they hanging ?

Regards,
Gabriel
Windows Driver Consulting
www.kasardia.com

On Wed, Apr 13, 2016 at 8:14 AM, wrote:

> I have an FSD on which I am now performing some general use testing. I am
> experiencing a strange issue where the LazyWriter thread hangs in
> MiWaitForPageWriteCompletion and never completes.
>
> My understanding of this problem is as follows:
> 1. The Lazy Writer calls CcWriteBehind.
> 2. CcWriteBehind executes AcquireForLazyWrite on my FSD. My FSD
> responds by acquiring the Main and PagingIo resources for the file in
> question.
> 3. CcWriteBehind calls MmFlushSection/MiFlushSectionInternal, which
> results in a WRITE PAGING_IO IRP to be sent to my FSD.
> 4. After the IRP completes, MiFlushSectionInternal calls
> MiWaitForPageWriteCompletion which then blocks in KeWaitForGate FOREVER!
> 5. CcWriteBehind never completes and thus the file remains locked
> forever and cannot have any additional operations posted on it.
>
> This problem happens intermittently after executing the command below a
> few times. Usually it takes just 2-3 executions of the command for the
> problem to appear:
> copy /y LARGE-DIR* z:<br>>
> An additional detail: my FSD acts as a “proxy” for a user-mode file system
> and uses a zero copy technique to map read/write buffers in the virtual
> address space of the user-mode file system process. It uses the
> MmMapLockedPagesSpecifyCache(UserMode) call to do the mapping (in the
> correct process context). It uses MmUnmapLockedPages (again in the correct
> process context) during READ/WRITE IRP completion.
>
> I am not certain if this detail is relevant but I am mentioning it because
> disassembly of MiWaitForPageWriteCompletion reveals a potential
> relationship to MDL/page handling (MiUnlockProtoPoolPage).
>
> I attach the call stack of the Lazy Writer thread and the results of
> !irpfind at the time of this hang (the results of !irpfind confirm that I
> am not doing something stupid like not completing a WRITE IRP – there is
> only a CREATE and IRP_MN_NOTIFY_CHANGE_DIRECTORY posted to my device).
>
> Any insights greatly appreciated.
>
> Bill
>
> kd> !thread fffffa8001a82b00
> THREAD fffffa8001a82b00 Cid 0004.0078 Teb: 0000000000000000 Win32Thread:
> 0000000000000000 WAIT: (WrVirtualMemory) KernelMode Non-Alertable
> fffff88002fcc620 Gate
> Not impersonating
> DeviceMap fffff8a000009770
> Owning Process fffffa8001a70040 Image: System
> Attached Process N/A Image: N/A
> Wait Start TickCount 16901 Ticks: 330 (0:00:00:05.156)
> Context Switch Count 2094 IdealProcessor: 0
> UserTime 00:00:00.000
> KernelTime 00:00:00.250
> Win32 Start Address nt!ExpWorkerThread (0xfffff800c1ebc250)
> Stack Init fffff88002fccc90 Current fffff88002fcc360
> Base fffff88002fcd000 Limit fffff88002fc7000 Call 0
> Priority 15 BasePriority 13 UnusualBoost 0 ForegroundBoost 0 IoPriority 0
> PagePriority 5
> Child-SP RetAddr : Args to Child
> : Call Site
> fffff88002fcc3a0 fffff800c1eabf7c : 0000000000000001 0000000000000000
> 0000000000000024 fffffa8004b4d560 : nt!KiSwapContext+0x76
> fffff88002fcc4e0 fffff800c1e64d2f : fffff98000000002 fffff80000000000
> 0000000000000012 0000000000000000 : nt!KiCommitThreadWait+0x23c
> fffff88002fcc5a0 fffff800c1e1ac5a : fffffa8003252198 fffffa800172e0f0
> fffffa8003252150 fffff800c1ef6f9e : nt!KeWaitForGate+0x10f
> fffff88002fcc5f0 fffff800c1efb2ba : 000000007ba058c0 fffffa800172e0f0
> fffff8a002124110 fffff8a002124110 : nt!MiWaitForPageWriteCompletion+0xa6
> fffff88002fcc660 fffff800c1efd639 : fffff88002fccaf0 0000000000000000
> 0000000000000000 fffff800c1ee66b4 : nt!MiFlushSectionInternal+0xb5f
> fffff88002fcc870 fffff800c1efb98b : 000000000000420b 0000000000004200
> 0000000000000000 fffffa8002ac0550 : nt!MmFlushSection+0x205
> fffff88002fcc930 fffff800c1efe515 : fffff980031d6fd8 0000000000000000
> 0000000000000001 0000000000000000 : nt!CcFlushCachePriv+0x3fb
> fffff88002fcca30 fffff800c1ef6073 : 0000000000000000 0000000000000000
> 0000000000020000 7fffffffffffffff : nt!CcWriteBehind+0x225
> fffff88002fccab0 fffff800c1ebc391 : fffff800c208d080 fffffa8001a82b00
> fffffa8001a973f0 fffff800c208d080 : nt!CcWorkerThread+0x233
> fffff88002fccb80 fffff800c1e2b521 : 0000000000000000 0000000000000080
> fffff800c1ebc250 fffffa8001a82b00 : nt!ExpWorkerThread+0x142
> fffff88002fccc10 fffff800c1e69dd6 : fffff800c20ff180 fffffa8001a82b00
> fffffa8001a6db00 fffffa8001a70040 : nt!PspSystemThreadStartup+0x59
> fffff88002fccc60 0000000000000000 : fffff88002fcd000 fffff88002fc7000
> 0000000000000000 0000000000000000 : nt!KiStartSystemThread+0x16
>
> kd> !irpfind 0 0 device 0xfffffa80`0321f060
> Looking for IRPs with device object == fffffa800321f060
>
> *CacheSize too low - increasing to 102 MB
>
> Max cache size is : 107347968 bytes (0x19980 KB)
> Total memory in cache : 5576 bytes (0x6 KB)
> Number of regions cached: 28
> 251 full reads broken into 264 partial reads
> counts: 230 cached/34 uncached, 87.12% cached
> bytes : 2784 cached/3768 uncached, 42.49% cached
>
Transition PTEs are implicitly decoded
> ** Prototype PTEs are implicitly decoded
>
> Scanning large pool allocation table for tag 0x3f707249 (Irp?)
> (fffffa8004c05000 : fffffa8004d85000)
>
>
> Searching nonpaged pool (fffffa8001a00000 : fffffa80f5400000) for tag
> 0x3f707249 (Irp?)
>
>
> Searching special pool (fffff98000000000 : fffffa8000000000) for tag
> 0x3f707249 (Irp?)
>
> Irp [Thread] irpStack: (Mj,Mn) DevObj
> [Driver] MDL Process
> fffff98009882ea0 [fffffa8002ada440] irpStack: ( 0, 0) fffffa800321f060 [
> \FileSystem\WinFsp]
> fffff980098f2ea0 [fffffa8004818080] irpStack: ( c, 2) fffffa800321f060 [
> \FileSystem\WinFsp]
>
>
> —
> NTFSD is sponsored by OSR
>
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer&gt;
>


Bercea. G.</http:>

Thanks for your answer. The 2 IRP’s are:

  1. IRP_MJ_CREATE. This IRP is hanging trying to acquire the Main resource of the file acquired and held by the hang Lazy Writer thread.

kd> !irp fffff98009882ea0
Irp is active with 2 stacks 1 is current (= 0xfffff98009882f70)
No Mdl: No System Buffer: Thread fffffa8002ada440: Irp stack trace.
cmd flg cl Device File Completion-Context

[IRP_MJ_CREATE(0), N/A(0)]
0 e1 fffffa800321f060 fffffa8003290450 fffff88001506260-fffffa8002a987f0 Success Error Cancel pending
\FileSystem\WinFsp
Args: fffff880062315f0 01200000 00070000 00000000
[IRP_MJ_CREATE(0), N/A(0)]
0 0 fffffa8004add940 fffffa8003290450 00000000-00000000
\FileSystem\FltMgr
Args: fffff880062315f0 01200000 00070000 00000000

  1. IRP_MJ_DIRECTORY_CONTROL/IRP_MN_NOTIFY_CHANGE_DIRECTORY. This IRP is posted in FsRtlNotifyFullChangeDirectory (by Explorer) to be completed if the FSD makes any changes to the watched directory.

kd> !irp fffff980098f2ea0
Irp is active with 2 stacks 2 is current (= 0xfffff980098f2fb8)
No Mdl: No System Buffer: Thread fffffa8004818080: Irp stack trace.
cmd flg cl Device File Completion-Context
[N/A(0), N/A(0)]
0 10 00000000 00000000 00000000-00000000

Args: 00000000 00000000 00000000 00000000

[IRP_MJ_DIRECTORY_CONTROL(c), N/A(2)]
1 e1 fffffa800321f060 fffffa8004b4df20 fffff88005c36570-fffff980098b2fe0 Success Error Cancel pending
\FileSystem\WinFsp winfsp_x64!FspFsvolNotifyChangeDirectoryCompletion
Args: 00000020 00000003 00000000 00000000

does !locks tell you anything?

A not-uncommon problem here (although it doesn’t sound like it’s your case)
is that Mm detects that the page is in transit already and wait for that to
happen. But the thread that has started the IO is waiting for a lock that
you hold (often the paging resource which someone else is queueing for
exclusive).

The wait is on a gate, so it looks like a pushlock or something. I don’t think !locks would provide too much but would be interesting to see.

Is there anything specific about the flags passed to CreateFile when this happens ? Something that would change the caching strategy . just wondering


Gabriel Bercea

Windows Kernel Driver Consulting

www.kasardia.com

On Wed, Apr 13, 2016 at 1:45 AM -0700, “Rod Widdowson” wrote:

does !locks tell you anything?

A not-uncommon problem here (although it doesn’t sound like it’s your case)
is that Mm detects that the page is in transit already and wait for that to
happen. But the thread that has started the IO is waiting for a lock that
you hold (often the paging resource which someone else is queueing for
exclusive).


NTFSD is sponsored by OSR

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at

To unsubscribe, visit the List Server section of OSR Online at

This looks like, “gee, there’s already a write in progress to this page, so
I should wait for it.” It would seem to me then that you’ve either lost a
write or you have a deadlock that is preventing the original write from
proceeding.

Have you tried looking at the other threads in the system? Is there any
activity in your driver? !stacks 2 is useful.

-scott
OSR
@OSRDrivers

wrote in message news:xxxxx@ntfsd…

I have an FSD on which I am now performing some general use testing. I am
experiencing a strange issue where the LazyWriter thread hangs in
MiWaitForPageWriteCompletion and never completes.

My understanding of this problem is as follows:

  1. The Lazy Writer calls CcWriteBehind.
  2. CcWriteBehind executes AcquireForLazyWrite on my FSD. My FSD responds
    by acquiring the Main and PagingIo resources for the file in question.
  3. CcWriteBehind calls MmFlushSection/MiFlushSectionInternal, which
    results in a WRITE PAGING_IO IRP to be sent to my FSD.
  4. After the IRP completes, MiFlushSectionInternal calls
    MiWaitForPageWriteCompletion which then blocks in KeWaitForGate *FOREVER*!
  5. CcWriteBehind never completes and thus the file remains locked
    forever and cannot have any additional operations posted on it.

This problem happens intermittently after executing the command below a few
times. Usually it takes just 2-3 executions of the command for the problem
to appear:
copy /y LARGE-DIR* z:\

An additional detail: my FSD acts as a “proxy” for a user-mode file system
and uses a zero copy technique to map read/write buffers in the virtual
address space of the user-mode file system process. It uses the
MmMapLockedPagesSpecifyCache(UserMode) call to do the mapping (in the
correct process context). It uses MmUnmapLockedPages (again in the correct
process context) during READ/WRITE IRP completion.

I am not certain if this detail is relevant but I am mentioning it because
disassembly of MiWaitForPageWriteCompletion reveals a potential relationship
to MDL/page handling (MiUnlockProtoPoolPage).

I attach the call stack of the Lazy Writer thread and the results of
!irpfind at the time of this hang (the results of !irpfind confirm that I am
not doing something stupid like not completing a WRITE IRP – there is only
a CREATE and IRP_MN_NOTIFY_CHANGE_DIRECTORY posted to my device).

Any insights greatly appreciated.

Bill

kd> !thread fffffa8001a82b00
THREAD fffffa8001a82b00 Cid 0004.0078 Teb: 0000000000000000 Win32Thread:
0000000000000000 WAIT: (WrVirtualMemory) KernelMode Non-Alertable
fffff88002fcc620 Gate
Not impersonating
DeviceMap fffff8a000009770
Owning Process fffffa8001a70040 Image: System
Attached Process N/A Image: N/A
Wait Start TickCount 16901 Ticks: 330 (0:00:00:05.156)
Context Switch Count 2094 IdealProcessor: 0
UserTime 00:00:00.000
KernelTime 00:00:00.250
Win32 Start Address nt!ExpWorkerThread (0xfffff800c1ebc250)
Stack Init fffff88002fccc90 Current fffff88002fcc360
Base fffff88002fcd000 Limit fffff88002fc7000 Call 0
Priority 15 BasePriority 13 UnusualBoost 0 ForegroundBoost 0 IoPriority 0
PagePriority 5
Child-SP RetAddr : Args to Child
: Call Site
fffff88002fcc3a0 fffff800c1eabf7c : 0000000000000001 0000000000000000
0000000000000024 fffffa8004b4d560 : nt!KiSwapContext+0x76
fffff88002fcc4e0 fffff800c1e64d2f : fffff98000000002 fffff80000000000
0000000000000012 0000000000000000 : nt!KiCommitThreadWait+0x23c
fffff88002fcc5a0 fffff800c1e1ac5a : fffffa8003252198 fffffa800172e0f0
fffffa8003252150 fffff800c1ef6f9e : nt!KeWaitForGate+0x10f
fffff88002fcc5f0 fffff800c1efb2ba : 000000007ba058c0 fffffa800172e0f0
fffff8a002124110 fffff8a002124110 : nt!MiWaitForPageWriteCompletion+0xa6
fffff88002fcc660 fffff800c1efd639 : fffff88002fccaf0 0000000000000000
0000000000000000 fffff800c1ee66b4 : nt!MiFlushSectionInternal+0xb5f
fffff88002fcc870 fffff800c1efb98b : 000000000000420b 0000000000004200
0000000000000000 fffffa8002ac0550 : nt!MmFlushSection+0x205
fffff88002fcc930 fffff800c1efe515 : fffff980031d6fd8 0000000000000000
0000000000000001 0000000000000000 : nt!CcFlushCachePriv+0x3fb
fffff88002fcca30 fffff800c1ef6073 : 0000000000000000 0000000000000000
0000000000020000 7fffffffffffffff : nt!CcWriteBehind+0x225
fffff88002fccab0 fffff800c1ebc391 : fffff800c208d080 fffffa8001a82b00
fffffa8001a973f0 fffff800c208d080 : nt!CcWorkerThread+0x233
fffff88002fccb80 fffff800c1e2b521 : 0000000000000000 0000000000000080
fffff800c1ebc250 fffffa8001a82b00 : nt!ExpWorkerThread+0x142
fffff88002fccc10 fffff800c1e69dd6 : fffff800c20ff180 fffffa8001a82b00
fffffa8001a6db00 fffffa8001a70040 : nt!PspSystemThreadStartup+0x59
fffff88002fccc60 0000000000000000 : fffff88002fcd000 fffff88002fc7000
0000000000000000 0000000000000000 : nt!KiStartSystemThread+0x16

kd> !irpfind 0 0 device 0xfffffa80`0321f060
Looking for IRPs with device object == fffffa800321f060

*** CacheSize too low - increasing to 102 MB

Max cache size is : 107347968 bytes (0x19980 KB)
Total memory in cache : 5576 bytes (0x6 KB)
Number of regions cached: 28
251 full reads broken into 264 partial reads
counts: 230 cached/34 uncached, 87.12% cached
bytes : 2784 cached/3768 uncached, 42.49% cached
** Transition PTEs are implicitly decoded
** Prototype PTEs are implicitly decoded

Scanning large pool allocation table for tag 0x3f707249 (Irp?)
(fffffa8004c05000 : fffffa8004d85000)

Searching nonpaged pool (fffffa8001a00000 : fffffa80f5400000) for tag
0x3f707249 (Irp?)

Searching special pool (fffff98000000000 : fffffa8000000000) for tag
0x3f707249 (Irp?)

Irp [Thread] irpStack: (Mj,Mn) DevObj
[Driver] MDL Process
fffff98009882ea0 [fffffa8002ada440] irpStack: ( 0, 0) fffffa800321f060 [
\FileSystem\WinFsp]
fffff980098f2ea0 [fffffa8004818080] irpStack: ( c, 2) fffffa800321f060 [
\FileSystem\WinFsp]

Good morning, everyone. Thanks for your suggestions.

The following windbg dumps are from the same WinDbg session as I have the issue frozen in a VM.

Recall that the scenario is:

  1. The LazyWriter thread has run and is now blocked in MiWaitForPageWriteCompletion.
  2. An IRP_MJ_CREATE for the file currently locked by the LazyWriter is currently pending and cannot complete because it cannot acquire the file’s Main resource.

From Rod Widdowson:

does !locks tell you anything?

<<
kd> !locks
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks…

Resource @ 0xfffffa800321f3f0 Shared 1 owning threads
Threads: fffff98001456f93-01<*> *** Unknown owner, possibly FileSystem
KD: Scanning for held locks…

Resource @ 0xfffff980031d6ed0 Exclusively owned
Contention Count = 1
NumberOfExclusiveWaiters = 1
Threads: fffffa8001a82b00-01<*>
Threads Waiting On Exclusive Access:
fffffa8001ab2b00

Resource @ 0xfffff980031d6f38 Exclusively owned
Threads: fffffa8001a82b00-01<*>
KD: Scanning for held locks…
7795 total locks, 3 locks currently held

>

There are 3 locks held here. All are my FSD locks.

The first resource (0xfffffa800321f3f0) is a special “rename” resource that my FSD holds shared when processing an IRP_MJ_CREATE (to disallow concurrent renames). The locking hierarchy in my FSD is Rename -> Main -> PagingIo, but Rename is only taken on IRP_MJ_CREATE (shared) and renames (exclusive).

The second resource (0xfffff980031d6ed0) is the file Main resource currently held by the Lazy Writer. The “thread waiting on exclusive access” is the Mod Writer (and is currently my foremost suspect in that it is somehow causing this problem). Here is the call stack:
<<
4.0000ac fffffa8001ab2b00 fffd49b1 Blocked nt!KiSwapContext+0x76
nt!KiCommitThreadWait+0x23c
nt!KeWaitForSingleObject+0x1cf
nt!ExpWaitForResource+0xbd
nt!ExAcquireResourceExclusiveLite+0x1e1
nt!VerifierExAcquireResourceExclusiveLite+0x35
winfsp_x64!FspFileNodeAcquireExclusiveF+0x150
winfsp_x64!FspAcquireForModWrite+0xfd
nt!FsRtlAcquireFileForModWriteEx+0x162
nt!MiGatherMappedPages+0x2a8
nt!MiMappedPageWriter+0x13e
nt!PspSystemThreadStartup+0x59
nt!KiStartSystemThread+0x16

>

Can it be that the Lazy Writer is somehow waiting on the Mod Writer on a gate, but the Mod Writer is waiting on the file Main resource acquired by the Lazy Writer (thus deadlock)?

The third resource (0xfffff980031d6f38) is the file PagingIo resource currently held by the Lazy Writer.

This looks to me like a proper locking scenario (within my FSD) where things cannot progress because the Lazy Writer does not release my file’s lock (because it is waiting in KeWaitForGate for an unspecified reason).

From Gabriel Bercea:

Is there anything specific about the flags passed to CreateFile when this
happens ? Something that would change the caching strategy . just wondering

The CreateFile is issued by cmd.exe within the command copy /y LARGE-DIR* z:\ which I execute repeatedly to get this scenario to happen.

From Scott Noone:

This looks like, “gee, there’s already a write in progress to this page, so
I should wait for it.” It would seem to me then that you’ve either lost a
write or you have a deadlock that is preventing the original write from
proceeding.

I am not 100% certain that I have not lost a WRITE, but shouldn’t in this case see the WRITE IRP when I do !irpfind?

Furthermore note that the Lazy Writer thread is not blocked in IoSynchronousPageWrite, but in MiWaitForPageWriteCompletion. The name is suggestive of waiting for an IRP completion, but disassembly seems to suggest a different role (and one that is not clear to me).

[ASIDE: MiWaitForPageWriteCompletion calls the following functions:
KeInitializeGate
MiUnlockProtoPoolPage (immediately followed by)
KeWaitForGate
ExpWaitForSpinLockExclusiveAndAcquire
ExpAcquireSpinLockExclusiveAtDpcLevelInstrumented
HvlNotifyLongSpinWait
ExpReleaseSpinLockExclusiveFromDpcLevelInstrumented
]

Does anyone know what MiWaitForPageWriteCompletion does?

Bill

Hmmm… the more I think about it the more I am wondering if I should just return STATUS_CANT_WAIT or other error code in my AcquireForModWrite callback when I cannot get the resource. Unfortunately fastfat does not have this callback so no clear guidance here.

SOLVED!

Returning STATUS_CANT_WAIT from AcquireForModWrite fixed the problem. It was indeed a deadlock involving my file’s resources and a gate object presumably shared by the Lazy Writer and Mod Writer.

Thanks everyone for your suggestions which actually pointed me to the right direction. I had run !locks last night and saw the blocked ModWriter thread, but dismissed it as irrelevant. Your suggestions made me look into this again.

Bill