I have an FSD on which I am now performing some general use testing. I am experiencing a strange issue where the LazyWriter thread hangs in MiWaitForPageWriteCompletion and never completes.
My understanding of this problem is as follows:
- The Lazy Writer calls CcWriteBehind.
- CcWriteBehind executes AcquireForLazyWrite on my FSD. My FSD responds by acquiring the Main and PagingIo resources for the file in question.
- CcWriteBehind calls MmFlushSection/MiFlushSectionInternal, which results in a WRITE PAGING_IO IRP to be sent to my FSD.
- After the IRP completes, MiFlushSectionInternal calls MiWaitForPageWriteCompletion which then blocks in KeWaitForGate *FOREVER*!
- CcWriteBehind never completes and thus the file remains locked forever and cannot have any additional operations posted on it.
This problem happens intermittently after executing the command below a few times. Usually it takes just 2-3 executions of the command for the problem to appear:
copy /y LARGE-DIR* z:\
An additional detail: my FSD acts as a “proxy” for a user-mode file system and uses a zero copy technique to map read/write buffers in the virtual address space of the user-mode file system process. It uses the MmMapLockedPagesSpecifyCache(UserMode) call to do the mapping (in the correct process context). It uses MmUnmapLockedPages (again in the correct process context) during READ/WRITE IRP completion.
I am not certain if this detail is relevant but I am mentioning it because disassembly of MiWaitForPageWriteCompletion reveals a potential relationship to MDL/page handling (MiUnlockProtoPoolPage).
I attach the call stack of the Lazy Writer thread and the results of !irpfind at the time of this hang (the results of !irpfind confirm that I am not doing something stupid like not completing a WRITE IRP – there is only a CREATE and IRP_MN_NOTIFY_CHANGE_DIRECTORY posted to my device).
Any insights greatly appreciated.
Bill
kd> !thread fffffa8001a82b00
THREAD fffffa8001a82b00 Cid 0004.0078 Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (WrVirtualMemory) KernelMode Non-Alertable
fffff88002fcc620 Gate
Not impersonating
DeviceMap fffff8a000009770
Owning Process fffffa8001a70040 Image: System
Attached Process N/A Image: N/A
Wait Start TickCount 16901 Ticks: 330 (0:00:00:05.156)
Context Switch Count 2094 IdealProcessor: 0
UserTime 00:00:00.000
KernelTime 00:00:00.250
Win32 Start Address nt!ExpWorkerThread (0xfffff800c1ebc250)
Stack Init fffff88002fccc90 Current fffff88002fcc360
Base fffff88002fcd000 Limit fffff88002fc7000 Call 0
Priority 15 BasePriority 13 UnusualBoost 0 ForegroundBoost 0 IoPriority 0 PagePriority 5
Child-SP RetAddr : Args to Child : Call Site
fffff88002fcc3a0 fffff800
c1eabf7c : 0000000000000001 00000000
00000000 0000000000000024 fffffa80
04b4d560 : nt!KiSwapContext+0x76
fffff88002fcc4e0 fffff800
c1e64d2f : fffff98000000002 fffff800
00000000 0000000000000012 00000000
00000000 : nt!KiCommitThreadWait+0x23c
fffff88002fcc5a0 fffff800
c1e1ac5a : fffffa8003252198 fffffa80
0172e0f0 fffffa8003252150 fffff800
c1ef6f9e : nt!KeWaitForGate+0x10f
fffff88002fcc5f0 fffff800
c1efb2ba : 000000007ba058c0 fffffa80
0172e0f0 fffff8a002124110 fffff8a0
02124110 : nt!MiWaitForPageWriteCompletion+0xa6
fffff88002fcc660 fffff800
c1efd639 : fffff88002fccaf0 00000000
00000000 0000000000000000 fffff800
c1ee66b4 : nt!MiFlushSectionInternal+0xb5f
fffff88002fcc870 fffff800
c1efb98b : 000000000000420b 00000000
00004200 0000000000000000 fffffa80
02ac0550 : nt!MmFlushSection+0x205
fffff88002fcc930 fffff800
c1efe515 : fffff980031d6fd8 00000000
00000000 0000000000000001 00000000
00000000 : nt!CcFlushCachePriv+0x3fb
fffff88002fcca30 fffff800
c1ef6073 : 0000000000000000 00000000
00000000 0000000000020000 7fffffff
ffffffff : nt!CcWriteBehind+0x225
fffff88002fccab0 fffff800
c1ebc391 : fffff800c208d080 fffffa80
01a82b00 fffffa8001a973f0 fffff800
c208d080 : nt!CcWorkerThread+0x233
fffff88002fccb80 fffff800
c1e2b521 : 0000000000000000 00000000
00000080 fffff800c1ebc250 fffffa80
01a82b00 : nt!ExpWorkerThread+0x142
fffff88002fccc10 fffff800
c1e69dd6 : fffff800c20ff180 fffffa80
01a82b00 fffffa8001a6db00 fffffa80
01a70040 : nt!PspSystemThreadStartup+0x59
fffff88002fccc60 00000000
00000000 : fffff88002fcd000 fffff880
02fc7000 0000000000000000 00000000
00000000 : nt!KiStartSystemThread+0x16
kd> !irpfind 0 0 device 0xfffffa80`0321f060
Looking for IRPs with device object == fffffa800321f060
*** CacheSize too low - increasing to 102 MB
Max cache size is : 107347968 bytes (0x19980 KB)
Total memory in cache : 5576 bytes (0x6 KB)
Number of regions cached: 28
251 full reads broken into 264 partial reads
counts: 230 cached/34 uncached, 87.12% cached
bytes : 2784 cached/3768 uncached, 42.49% cached
** Transition PTEs are implicitly decoded
** Prototype PTEs are implicitly decoded
Scanning large pool allocation table for tag 0x3f707249 (Irp?) (fffffa8004c05000 : fffffa8004d85000)
Searching nonpaged pool (fffffa8001a00000 : fffffa80f5400000) for tag 0x3f707249 (Irp?)
Searching special pool (fffff98000000000 : fffffa8000000000) for tag 0x3f707249 (Irp?)
Irp [Thread] irpStack: (Mj,Mn) DevObj [Driver] MDL Process
fffff98009882ea0 [fffffa8002ada440] irpStack: ( 0, 0) fffffa800321f060 [\FileSystem\WinFsp]
fffff980098f2ea0 [fffffa8004818080] irpStack: ( c, 2) fffffa800321f060 [\FileSystem\WinFsp]