On 11/07/2016 10:26 AM, xxxxx@billz.fastmail.fm wrote:
Do you mean that we should retry when we get STATUS_CACHE_PAGE_LOCKED? Should we also throw a KeDelayExecutionThread in there? Something along the lines of (not tested):
static const LONG Delays = { … };
NTSTATUS FspCcCoherencyFlushAndPurgeCache(…)
{
PVOID Result;
LARGE_INTEGER Delay;
for (ULONG i = 0, n = sizeof(Delays) / sizeof(Delays[0]);; i++)
{
try
{
CcCoherencyFlushAndPurgeCache(…);
if (STATUS_CACHE_PAGE_LOCKED != IoStatus.Status)
return IoStatus.Status;
}
except (EXCEPTION_EXECUTE_HANDLER)
{
return GetExceptionCode();
}
Delay.QuadPart = n > i ? Delays[i] : Delays[n - 1];
KeDelayExecutionThread(KernelMode, FALSE, &Delay);
}
}
Also should we limit the retries?
Yes, yes, and yes. The reason for limiting retries is the other reason
for purge failure - if a caller uses VirtualLock or MmProbeAndLockPages
to keep them around. IIRC the current behavior of the system is to
succeed the purge and divorce the page from the section when it’s
locked, but I’m not confident this was always the behavior.
My biggest gripe with the Windows file system is that there are a couple of cases where its behavior seems to be almost non-deterministic. My favorite one being that DeleteFile tells you success, but the file can be still there.
DeleteFile returns success to say that it successfully told the file
system to delete the file at some unknowable point in the future.
What’s more frustrating to me is that IRP_MJ_CLEANUP can still fail
delete and has no way to communicate that fact to anyone.
I must admit that I am not completely following you here? Are you saying that we should be doing CcPurgeCacheSection from the old SectionSize to the newly extended SectionSize, and abort the extension operation if CcPurgeCacheSection fails?
Yes, I’m saying that’s what I would do if I were writing a new FSD
today. Note it’s not what existing FSDs do, so this is only a
suggestion, and since it’s not used it might not work as well as I’d
hope. It just means that truncation becomes possible in the presence of
mapped views, at the price of making extension potentially fail later.
Hopefully the risk is much reduced though, because this failure mode
would require more steps (truncate plus extend) and give time for any
racing views to resolve themselves, so the purge in extend has a good
chance of fixing the issue.
–
http://www.malsmith.net