CcSetFileSizes failure handling

anatoly_mikhailov · November 7, 2016, 2:15pm

> This is some great advice right there. I also think that wrapping Cc, Mm
(and FsRtl) calls and localizing exception handling is mandatory.

If it’s documented for a routine that it raise exception, when you must
wrap with try/except. If there are no any notes about raising of exception
by some routine and it raises anyway, be sure we use the routine wrong.
This kind of bugs are trivial.

I was trying to see if I understand Malcolm’s advice on how to use
CcPurgeCacheSection during extension rather MmCanFileBeTruncated.

MmCanFileBeTruncated protects well. The only problem is delayed pages
truncating even if MmCanFileBeTruncated returns true. It was supposed
CcCoherencyFlushAndPurgeCache routine will solve it. Actually it solves.
But races remain anyway, though they are reduced. For that reason Malcolm
advise truncate file as soon as possible, including before any disk
structures manipulations. This increases time to be completed for delayed
pages truncating.

Bill_Zissimopoulos · November 7, 2016, 2:38pm

>> This is some great advice right there. I also think that wrapping Cc, Mm

> (and FsRtl) calls and localizing exception handling is mandatory.

If it’s documented for a routine that it raise exception, when you must
wrap with try/except. If there are no any notes about raising of exception
by some routine and it raises anyway, be sure we use the routine wrong.

I don’t think I agree. There are a number of routines that may raise, but the fact is not mentioned in the docs. For example, FsRtlIsNameInExpression.

https://msdn.microsoft.com/en-us/library/windows/hardware/ff546850(v=vs.85).aspx

Bill

anatoly_mikhailov · November 7, 2016, 3:03pm

For me that was the only routine which wasn’t documented on MSDN about its
raising. Then, you are right. But as for me i always research routines
before using. WRK, reverse engineering and so on. Also i didn’t wrap with
try/except intentionally if there wasn’t any notes about it for only one
reason, to know exactly how a routine works, you can see always
ExRaiseStatus in call stack if you didn’t catch an exception. Happily there
wasn’t many of such routines. I really can’t stand developing for
environment without exact knoweledge about how it works.

Bill_Zissimopoulos · November 7, 2016, 3:18pm

> I really can’t stand developing for environment without exact knoweledge about how it works.

I agree with that sentiment.

BTW, a lot of FsRtl routines are suspect. They tend to call FsRtlAllocatePoolWithTag which can raise (according to its documentation). I am not sure that the documentation is meticulous enough to mention all the cases.

Bill

anatoly_mikhailov · November 7, 2016, 3:29pm

> BTW, a lot of FsRtl routines are suspect. They tend to call
FsRtlAllocatePoolWithTag which can raise (according to its documentation).

It can.

I am not sure that the documentation is meticulous enough to mention all
the cases.

In average MSDN documents well.

Malcolm_Smith · November 7, 2016, 3:57pm

On 11/07/2016 10:26 AM, xxxxx@billz.fastmail.fm wrote:

Do you mean that we should retry when we get STATUS_CACHE_PAGE_LOCKED? Should we also throw a KeDelayExecutionThread in there? Something along the lines of (not tested):

static const LONG Delays = { … };
NTSTATUS FspCcCoherencyFlushAndPurgeCache(…)
{
PVOID Result;
LARGE_INTEGER Delay;

for (ULONG i = 0, n = sizeof(Delays) / sizeof(Delays[0]);; i++)
{
try
{
CcCoherencyFlushAndPurgeCache(…);
if (STATUS_CACHE_PAGE_LOCKED != IoStatus.Status)
return IoStatus.Status;
}
except (EXCEPTION_EXECUTE_HANDLER)
{
return GetExceptionCode();
}

Delay.QuadPart = n > i ? Delays[i] : Delays[n - 1];
KeDelayExecutionThread(KernelMode, FALSE, &Delay);
}
}

Also should we limit the retries?

Yes, yes, and yes. The reason for limiting retries is the other reason
for purge failure - if a caller uses VirtualLock or MmProbeAndLockPages
to keep them around. IIRC the current behavior of the system is to
succeed the purge and divorce the page from the section when it’s
locked, but I’m not confident this was always the behavior.

My biggest gripe with the Windows file system is that there are a couple of cases where its behavior seems to be almost non-deterministic. My favorite one being that DeleteFile tells you success, but the file can be still there.

DeleteFile returns success to say that it successfully told the file
system to delete the file at some unknowable point in the future.
What’s more frustrating to me is that IRP_MJ_CLEANUP can still fail
delete and has no way to communicate that fact to anyone.

I must admit that I am not completely following you here? Are you saying that we should be doing CcPurgeCacheSection from the old SectionSize to the newly extended SectionSize, and abort the extension operation if CcPurgeCacheSection fails?

Yes, I’m saying that’s what I would do if I were writing a new FSD
today. Note it’s not what existing FSDs do, so this is only a
suggestion, and since it’s not used it might not work as well as I’d
hope. It just means that truncation becomes possible in the presence of
mapped views, at the price of making extension potentially fail later.
Hopefully the risk is much reduced though, because this failure mode
would require more steps (truncate plus extend) and give time for any
racing views to resolve themselves, so the purge in extend has a good
chance of fixing the issue.

M

–
http://www.malsmith.net

Bill_Zissimopoulos · November 8, 2016, 1:37pm

Malcom Smith wrote:

> Do you mean that we should retry when we get STATUS_CACHE_PAGE_LOCKED?
[snip]

Yes, yes, and yes. The reason for limiting retries is the other reason
for purge failure - if a caller uses VirtualLock or MmProbeAndLockPages
to keep them around. IIRC the current behavior of the system is to
succeed the purge and divorce the page from the section when it’s
locked, but I’m not confident this was always the behavior.

Thank you for the explanation. I will change my FSD to perform CcCoherencyFlushAndPurgeCache in the discussed manner.

Yes, I’m saying that’s what I would do if I were writing a new FSD
today. Note it’s not what existing FSDs do, so this is only a
suggestion, and since it’s not used it might not work as well as I’d
hope. It just means that truncation becomes possible in the presence of
mapped views, at the price of making extension potentially fail later.
Hopefully the risk is much reduced though, because this failure mode
would require more steps (truncate plus extend) and give time for any
racing views to resolve themselves, so the purge in extend has a good
chance of fixing the issue.

I think I understand what you propose.

At the risk of revealing that I am still missing an important piece in my mental model of how this all works, here is a related question. If we are willing to change behavior of how this works (by eliminating calls to MmCanFileBeTruncated and replacing them with later calls to CcPurgeCacheSection), why not allow both truncation *and* extension when there is a mapped view?

This also assumes that the file system can cope. I believe I have tried the following experiment with NTFS: open a file with FILE_FLAG_DELETE_ON_CLOSE, create a file mapping and map a view then close the file handle but keep the view open. This of course works without FILE_FLAG_DELETE_ON_CLOSE, but with it I found that NTFS cannot reliably maintain data in the section (it’s been a while, but I think I was getting zeroes instead of data in this case).

Bill

anatoly_mikhailov · November 9, 2016, 3:22am

> by eliminating calls to

MmCanFileBeTruncated and replacing > them with later calls to
CcPurgeCacheSection

If you will eliminate call to MmCanFileBeTruncated then mapped view of
files which are backed by nothing will appear there. Exactly what you saw
with NTFS.

This of course works without
FILE_FLAG_DELETE_ON_CLOSE

It depends on FSD implementation. If it was image file then NTFS didn’t
delete it at all, but FAT did. The only question is a system model. When
file wasn’t deleted because of it was mapped before handle was closed, user
and an application don’t know that. The only way is to check for existing
just deleted file. But also there can be a case when file was created with
the same name right after previous file was deleted. You can get
notifications for directory change of course. But it’s quite complicated to
do this simple file deletion.

anatoly_mikhailov · November 9, 2016, 4:31am

I forgot to mention something.

If you will eliminate call to MmCanFileBeTruncated then mapped view of
files which are backed by nothing will appear there. Exactly what you saw
with NTFS.

if you eliminated a call to MmCanFileBeTruncated and you want to see side
effcets like mapped views without any backing, at least you should use
CcPurgeCacheSection
and evaluate its return value. If it was false, then you can’t trunctate
file size otherwise mapped view without backing will appear.
CcPurgeCacheSection
calls MmPurgeSection which in turn can MmCanFileBeTruncatedInternal.
So if MmCanFileBeTruncatedInternal
return false then CcPurgeCacheSection also will return false. There is only
one side effect that CcPurgeCacheSection calls MmPurgeSection after VACBs
freeing and unmapping. So you don’t get any advantages with using
CcPurgeCacheSection
instead of MmCanFileBeTruncated before truncating.

anatoly_mikhailov · November 9, 2016, 4:32am

> if you eliminated a call to MmCanFileBeTruncated and you want to see side
effcets like mapped views

and you *don’t* want to see side effcets