Hi All,
I have written an FSD (not a filter driver) and am having some problems
with Cache Manager’s lazy writer.
Scenario -
From cmd.exe, copy fileA (on my volume) to fileB (on my volume)
From my logs (which are sequence numbered with an interlocked variable,
so the ordering here is correct), I see the following -
- cmd.exe opens fileA & fileB
- cmd.exe reads bit of fileA, then issues buffered write to fileB
(ie. a standard (while(read) { write(read_bytes) })
- the cached write acquires & releases fileB FCB
mainResource exclusive
- cmd.exe updates timestamps of fileB (IRP_MJ_SET_INFO / basic info)
- cmd.exe sends cleanup & close of fileB (no more userhandles now)
… time passes …
- AcquireForLazyWrite handler called for fileB FCB
(only cache manager has ref to FCB here, no user handles open)
- picks up fcb mainResource exclusive
- series of direct to disk (non-buffered) writes from lazy writer
occur for fileB
- each write picks up / releases fcb pagingIoResource exclusive
- ReleaseFromLazyWrite handler caller for fileB
- releases fcb mainResource
… time passes - lazy writer hasn’t written entire file …
- *asynchronous* paging I/O write occurs for fileB (lazy writer?)
- IRP_MJ_WRITE pended (as I do some fancy stuff that blocks)
- pended IRP from (8) picked up in system worker thread, IRP_MJ_WRITE
tries to pick up fcb pagingIoResource and BLOCKS
- examination with debugger at this point reveals that
pagingIoResource is acquired SHARED by someone
- AcquireForLazyWrite for fileB fcb called
- fcb mainResource picked up exclusive
- synchronous paging I/O write occurs for fileB (lazy writer)
- we block again trying to get pagingIoResource
Steps 8 through 10 are shown below -
115: *** IRP_MJ_WRITE *** FO(864ED1D8)
116: WRITING 65536 bytes from 983040 FO(864ED1D8)
117: PENDING write of PFS file that can’t be blocked. FO(864ED1D8)
118: IRP_MJ_WRITE returning STATUS_PENDING with info 0 FO(864ED1D8)
119: *** IRP_MJ_WRITE *** FO(864ED1D8)
120: WRITING 65536 bytes from 983040 FO(864ED1D8)
121: ===> PAGING I/O (RE-ENTRANT / VMM) <=== FO(864ED1D8)
122: ACQUIRE EXCL 8550171C (fcb.pagingIoResource) for
PBoxWriteNormal (CAN block) … FO(864ED1D8)
123: ===> ACQUIRE FOR LAZY-WRITE (\fileB for System) <===
124: ACQUIRE EXCL 855016E4 (fcb.mainResource) for
PBoxAcquireForLazyWrite (CAN block) …
125: ACQUIRED EXCL 855016E4 (fcb.mainResource) for
PBoxAcquireForLazyWrite
126: *** IRP_MJ_WRITE *** FO(864ED1D8)
127: WRITING 65536 bytes from 786432 FO(864ED1D8)
128: ===> PAGING I/O (LAZY-WRITER) <=== FO(864ED1D8)
129: ACQUIRE EXCL 8550171C (fcb.pagingIoResource) for
PBoxWriteNormal (CAN block) … FO(864ED1D8)
So, stuff starts going wrong at Step 8. Some questions -
- Why does this paging write (assumingly lazy writer) at Step 8 occur
BEFORE AcquireForLazyWrite is called at Step 10? - Why is paging write at Step 8 asynchronous, but paging write at Step
11 synchronous?
If I break in at line 129 the status of the two resources for the FCB
are as follows -
mainResource:
0: kd> dt -y -r ERESOURCE 0x855016e4
+0x000 SystemResourcesList : _LIST_ENTRY [0x8550171c - 0x8551d71c]
+0x000 Flink : 0x8550171c [0x8648ea68 - 0x855016e4]
+0x000 Flink : 0x8648ea68 [0x864e5290 - 0x8550171c]
+0x004 Blink : 0x855016e4 [0x8550171c - 0x8551d71c]
+0x004 Blink : 0x8551d71c [0x855016e4 - 0x8551d6e4]
+0x000 Flink : 0x855016e4 [0x8550171c - 0x8551d71c]
+0x004 Blink : 0x8551d6e4 [0x8551d71c - 0x8657c71c]
+0x008 OwnerTable : (null)
+0x00c ActiveCount : 1
+0x00e Flag : 0x80
+0x010 SharedWaiters : 0x862b4350
+0x000 Header : _DISPATCHER_HEADER
+0x000 Type : 0x5 ‘’
+0x001 Absolute : 0 ‘’
+0x002 Size : 0x5 ‘’
+0x003 Inserted : 0 ‘’
+0x004 SignalState : 0
+0x008 WaitListHead : _LIST_ENTRY [0x862b4358 - 0x862b4358]
+0x010 Limit : 2147483647
+0x014 ExclusiveWaiters : 0x863d1530
+0x000 Header : _DISPATCHER_HEADER
+0x000 Type : 0x1 ‘’
+0x001 Absolute : 0 ‘’
+0x002 Size : 0x4 ‘’
+0x003 Inserted : 0 ‘’
+0x004 SignalState : 0
+0x008 WaitListHead : _LIST_ENTRY [0x863d1538 - 0x863d1538]
+0x018 OwnerThreads : [2] _OWNER_ENTRY
+0x000 OwnerThread : 0x867cb340
+0x004 OwnerCount : 1
+0x004 TableSize : 1
+0x028 ContentionCount : 2
+0x02c NumberOfSharedWaiters : 0
+0x02e NumberOfExclusiveWaiters : 0
+0x030 Address : (null)
+0x030 CreatorBackTraceIndex : 0
+0x034 SpinLock : 0
pagingIoResource:
0: kd> dt -y -r ERESOURCE 0x8550171C
+0x000 SystemResourcesList : _LIST_ENTRY [0x8648ea68 - 0x855016e4]
+0x000 Flink : 0x8648ea68 [0x864e5290 - 0x8550171c]
+0x000 Flink : 0x864e5290 [0x86492b48 - 0x8648ea68]
+0x004 Blink : 0x8550171c [0x8648ea68 - 0x855016e4]
+0x004 Blink : 0x855016e4 [0x8550171c - 0x8551d71c]
+0x000 Flink : 0x8550171c [0x8648ea68 - 0x855016e4]
+0x004 Blink : 0x8551d71c [0x855016e4 - 0x8551d6e4]
+0x008 OwnerTable : (null)
+0x00c ActiveCount : 1
+0x00e Flag : 0
+0x010 SharedWaiters : (null)
+0x014 ExclusiveWaiters : 0x8664a148
+0x000 Header : _DISPATCHER_HEADER
+0x000 Type : 0x1 ‘’
+0x001 Absolute : 0 ‘’
+0x002 Size : 0x4 ‘’
+0x003 Inserted : 0 ‘’
+0x004 SignalState : 0
+0x008 WaitListHead : _LIST_ENTRY [0x867cb3b0 - 0x867cb8a0]
+0x018 OwnerThreads : [2] _OWNER_ENTRY
+0x000 OwnerThread : 0
+0x004 OwnerCount : 0
+0x004 TableSize : 0
+0x028 ContentionCount : 2
+0x02c NumberOfSharedWaiters : 0
+0x02e NumberOfExclusiveWaiters : 2
+0x030 Address : (null)
+0x030 CreatorBackTraceIndex : 0
+0x034 SpinLock : 0
pagingIoResource, which is what I am blocking on, is last acquired
shared by an IRP_MJ_READ, which just happens to be a recursive call into
my FSD on the *last* cached write to fileB -
187: *** IRP_MJ_WRITE *** FO(864ED1D8)
188: WRITING 22528 bytes from 1638400 FO(864ED1D8)
189: ACQUIRE EXCL 855016E4 (fcb.mainResource) for
PBoxWriteNormal (CAN block) … FO(864ED1D8)
190: ACQUIRED EXCL 855016E4 (fcb.mainResource) for
PBoxWriteNormal
191: ===> Using CACHE MANAGER to satisfy write <=== FO(864ED1D8)
–> start recursive FSD entry
192: *** IRP_MJ_READ *** FO(864ED1D8)
193: READING 4096 bytes from 1658880 FO(864ED1D8)
194: PENDING read of PFS file that can’t be blocked. FO(864ED1D8)
195: IRP_MJ_READ returning STATUS_PENDING with Information 0
FO(864ED1D8)
196: *** IRP_MJ_READ *** FO(864ED1D8)
197: READING 4096 bytes from 1658880 FO(864ED1D8)
198: ===> REQUEST IS FROM VMM (PAGING I/O) <=== FO(864ED1D8)
199: ACQUIRE SHRD 8550171C (fcb.pagingIoResource) for
PBoxReadNormal (CAN block) … FO(864ED1D8)
200: ACQUIRED SHRD 8550171C (fcb.pagingIoResource) for
PBoxReadNormal
201: ===> Using DISK DIRECT to satisfy read <=== FO(864ED1D8)
202: Read would exceed EOF, truncated. Reading 2049, returning
2048 bytes FO(864ED1D8)
203: RELEASED 8550171C (fcb.pagingIoResource) for (PBoxReadNormal)
204: IRP_MJ_READ returning STATUS_SUCCESS with Information 2048
FO(864ED1D8)
<– end recursive FSD entry
205: CcCopyWrite() returned SUCCESS FO(864ED1D8)
206: RELEASED 855016E4 (fcb.mainResource) for (PBoxWriteNormal)
207: IRP_MJ_WRITE returning STATUS_SUCCESS with Information 22528
FO(864ED1D8)
So, CcCopyWrite() is triggering a direct read (presumably to ensure an
overlapping page is valid in the cache, or something), which recursively
enters my filesystem. Whilst IRP_MJ_READ acquires pagingIoResource
shared (line 199), it DOES release it (line 203). I’ve actually checked
this with the debugger and ActiveCount drops from 1 to 0 after the release.
So, is Cache Manager using the resources in my FCB common header itself,
without calling my callbacks, and hence I am trying to acquire them when
it already has? Perhaps my implementation of
AcquireForLazyWrite/ReleaseFromLazyWrite is incorrect. I noticed in
FASTFAT they do something like -
IoSetTopLevelIrp((PIRP)FSRTL_CACHE_TOP_LEVEL_IRP);
In AcquireForLazyWrite(). They call it a kludge. I note also that they
rarely, if ever, check WHO is the Top Level Component, in terms of the
FSRTL constants – does Cache Manager use the TLC values?
I would really appreciate it if anyone has any ideas, I’ve been looking
at this for a day or so now.
Furthermore, all this stuff is protected by FsRtlEnterFileSystem()
EXCEPT the AcquireForLazyWrite() / ReleaseFromLazyWrite() callbacks –
they don’t appear to be protected by these in FastFat either.
Cheers,
Josh