Get Normalized name when FltGetFileNameInformation fails

bkmz · March 11, 2019, 2:02pm

Thank you, I was afraid I’ll have to do that, but probably you’re right, it’s the only way.

Is it guaranteed that post create is absolutely 100% always called at PASSIVE_LEVEL ?

Dejan_Maksimovic · March 11, 2019, 2:16pm

Yep, PostCreate is guaranteed to also be in the same process context
as the pre-create. That’s not the case for other operations.

Note that stream file objects will often be created, without you
seeing any IRP_MJ_CREATE. This is, unfortunately, "normal.
IIRC, you would have a stream/file context associated with such a
file, but not a handle context (I mean you would provided you
associated a stream/file context with the file in IRP_MJ_CREATE for
the same file before).
I.e.:

IRP_MJ_CREATE arrives for a file
you create and associate a stream, file and handle object with it
At this point, it is possible that the FS will create stream file
objects (not an file for the file streams!), for which you see no
IRP_MJ_CREATE
In read/write/close, you check for your contexts, but only stream/file
would be present, not handle context.

Thank you, I was afraid I’ll have to do that, but probably you’re right,
it’s the only way.

Is it guaranteed that post create is absolutely 100% always called at
PASSIVE_LEVEL ?

bkmz · March 13, 2019, 7:21am

I was thinking about this approach (get name in Create and save it in the context), and I’m still wondering if it’s the most efficient way. The reason for my doubts is the comments of Alex Carp (ex-Microsofot, the author of http://fsfilters.blogspot.com) that I read before.

Basically, he’s saying that calling FltGetFileNameInformation every time is better because it handles all possible cases of name invalidation and it internally does the same as manually storing the name in stream handle context. So I’m still confused, should I really do it, or it can make things worse?

Besides, I would still have to defer completion of some operations in cases when the name wasn’t obtained yet. Wouldn’t it be the same as delaying completion with FltDoCompletionProcessingWhenSafe and then call FltGetFileNameInformation in the delayed post-op? (in theory, I’d have to do that just once, since the name is cached).

https://community.osr.com/discussion/comment/214149/#Comment_214149

Knowing when the name has become invalid is not that easy. You listed postCreate and postRename but in my oppinion that’s an oversimplication. What about overwriting renames? Hardlink creation ? Overwriting hardlink creation? Directory rename (which invalidates all the names in all the files under that directory) ? What about setting a short name on a file or directory which might invalidate some names (opened names) but not others (normalized names)? FltMgr’s name caching is implemented to take these cases into account, but keeping a reference to a name obtained at some point in the past doesn’t guarantee that the name is still accurate. Basically, calling FltGetFileNameInformation() every time is the right way to do it because if FltMgr has seen a condition that required invalidation of the name then you’ll get a new name. Also, the name is cached so it won’t cost you much more in terms of performance than keeping your own reference.

https://community.osr.com/discussion/comment/163696/#Comment_163696

Also, storing the name of the file in a stream context does not help in this case. It only works if you don’t care if it’s accurate at all or if you only want to reflect the name of the file when it was created. However, if you want to report the name of the file as best you can at the time operations happen then you have the issues you’ve already found and IMO you are better off using filter manager’s APIs all the time, because the name cache in filter manager is nothing but a stream handle context for the name, with a lot of optimizations.
So now we’ve come full circle :). FLT_FILE_NAME_QUERY_ALWAYS_ALLOW_CACHE_LOOKUP means ALWAYS look in the cache first, and if the name isn’t there then try to get it from the file system, which is exactly what you would end up implementing if you tried to have your own caching scheme.

Curiously, in one of his comments he links to his own blog post, where he says the exact opposite:

http://fsfilters.blogspot.com/2010/02/names-and-file-systems-filters.html

It is a bad idea to check if the file is interesting by querying and parsing the file name every single time the filters needs to know this. A better design is to cache the information about the file somewhere and then update it only when it changes. Since we are talking about name based policy here, the only place where it can change is in the rename path. Stream contexts are particularly suited for this task and what filters normally do is attach a stream context if the file is interesting. Then, when they need to decide whether the file is interesting or not they can simply get the stream context and if one is present then it is interesting.
The stream context is initialized at create time and is potentially changed at rename. Both these operations happen at PASSIVE_LEVEL. Some filters prefer to query the name when the operation they care about happens, but this approach usually generates more problems that it solves.

bkmz · March 13, 2019, 1:05pm

Wait, there’s another problem. Actually, the same problem.

Regardless of using or not using stream context handle to cache names, the original problem still persists. I still sometimes need to get a normalized name not in Create at least once, when I didn’t see Create and FGFNI fails. How do I do it? I though about posting a work item or using FltDoCompletionProcessingWhenSafe, but they have the same issues:

FltQueueDeferredIoWorkItem - can return STATUS_FLT_NOT_SAFE_TO_POST_OPERATION.
FltQueueGenericWorkItem - doesn’t perform the safety checks, but another comment says “even if you use FltQueueGenericWorkItem for pending IO you should
implement same parameter checking as FltQueueDeferredIoWorkItem perform.”
Return FLT_PREOP_SYNCHRONIZE - “Minifilter drivers must never return FLT_PREOP_SYNCHRONIZE for asynchronous read or write operations. Doing so can … cause deadlocks”

FltDoCompletionProcessingWhenSafe - only checks for IRQL <= APC_LEVEL and doesn’t check TopLevelIrp, so it’s possible that calling FGFNI in it can still fail.

What do I do?

bkmz · March 13, 2019, 1:57pm

Wait, there’s another problem. Actually, the same problem.

Regardless of using or not using stream handle context for caching names, the same problem persists. I still need to get normalized name not in Create at least once, when I didn’t see Create and FGFNI fails. I thought about using work items or DoCompletionWhenSafe, but they have the same issues:

FltQueueDeferredIoWorkItem - can return STATUS_FLT_NOT_SAFE_TO_POST_OPERATION, just like FGFNI when it’s not safe.
FltQueueGenericWorkItem - doesn’t perform the safety checks, but “even if you use FltQueueGenericWorkItem for pending IO you should
implement same parameter checking as FltQueueDeferredIoWorkItem perform”.
FLT_PREOP_SYNCHRONIZE - “Minifilter drivers must never return FLT_PREOP_SYNCHRONIZE for asynchronous read or write operations. Doing so can … cause deadlocks”.
FltDoCompletionProcessingWhenSafe - only checks for IRQL, but doesn’t check TopLevelIrp, so calling FGFNI in it probably can still fail.

What do I do?

bkmz · March 13, 2019, 2:23pm

Maybe FLT_PREOP_SYNCHRONIZE is the way? I’m only worried about possible deadlocks in case of asynchronous writes.
Maybe it’s not an issue since I don’t filter paging operations? MSDN says “FLT_PREOP_SYNCHRONIZE … can cause deadlocks if, for example, the modified page writer thread is blocked”.
Can modified page writer thread be blocked during non-paging IO?

Dejan_Maksimovic · March 13, 2019, 2:37pm

It womt fail in pre or post create if the name can be retrieved at all.
So it does remove the need to post the call.

Invalidation xan occur in post create and rename IIRC?
Im which case you just replace yoir context.

If FltMgr were sure to keep a cache entry, you could query it all the time,
by specifying ONLY ALLOW CACHE LOOKUP during nonCreate events.

Note: The email was trying to reply to an invalid Comment (292997).

bkmz · March 13, 2019, 2:43pm

@Dejan_Maksimovic said:
It womt fail in pre or post create if the name can be retrieved at all.
So it does remove the need to post the call.

Yes, but I’m still trying to figure out what to do if I need to get name outside of CREATE? As I said before, some files are opened before my driver starts.

bkmz · March 13, 2019, 5:24pm

@Dejan_Maksimovic said:
If FltMgr were sure to keep a cache entry, you could query it all the time,
by specifying ONLY ALLOW CACHE LOOKUP during nonCreate events.

My idea was that in cases when FGFNI fails during nonCreate, just delay the IO and call FGFNI later when it’s safe. So that the name cache gets updated, and FGFNI won’t fail next time.

And I will have to implement something like that anyway even if I will use stream context (because, as I said, some creates happen before my driver starts).

bkmz · March 13, 2019, 6:37pm

Here’s the problem as I see it: FltGetFileNameInformation() can fail if IoGetTopLevelIrp() != 0 (among other reasons). In that case, I can’t use work items because FltQueueDeferredIoWorkItem() will also fail, and FltQueueGenericWorkItem() is not a good idea for the same reason. FltDoCompletionProcessingWhenSafe() is useless, because it doesn’t check for IoGetTopLevelIrp().

MSDN says: “If your minifilter driver cannot handle such failures, you should consider using FLT_PREOP_SYNCHRONIZE instead of pending the I/O operation”.

So I’m left with FLT_PREOP_SYNCHRONIZE, but the question is will it work? Synchronized postop is executed at IRQL <= APC_LEVEL, but MSDN says nothing about IoGetTopLevelIrp().

I found an MS presentation, that says “Calling FltGetFileNameInformation will work if the operation is synchronized”, which is good, but it’s not clear if synchronization helps when IoGetTopLevelIrp() != 0.

Another clue from Alex Carp’s blog:

http://fsfilters.blogspot.com/2010/12/more-thoughts-on-fltdocompletionprocess.html

To avoid deadlocks, minifilters should not perform synchronous requests from a postOp callback and should instead either:
queue the operation and return FLT_POSTOP_MORE_PROCESSING_REQUIRED from the postOp callback or
return FLT_PREOP_SYNCHRONIZE from the preOp

So I hope FLT_PREOP_SYNCHRONIZE is the way to get FltGetFileNameInformation() succeed, but I’m not really sure. Any thoughts?

Dejan_Maksimovic · March 14, 2019, 4:14am

You must not synchronize READ/WRITE requests.

Try a test version which DbgPrints exact locations, cases and error
codes returned when FGFNI fails.

I doubt FltDoCompletionProcessingWhenSafe would not post to a queue
where FGFNI call is safe in cases where the call itself fails.

Mind you, I have seen FGFNI fail often enough, that there is no good
answer to the problem, other than being a boot driver. In which case,
you can propagate the failures (if you keep the name in your context!)

bkmz · March 14, 2019, 5:32am

@Dejan_Maksimovic said:
You must not synchronize READ/WRITE requests.

Can’t I synchronize just once request? The name will get cached and I won’t need to synchronize next time. And I’m not going to synchronize paging IO.

Try a test version which DbgPrints exact locations, cases and error
codes returned when FGFNI fails.

That’s a good idea and I will try it, but it’s not that easy because the errors occur on numerous client’s machines that I don’t always have access to.

I doubt FltDoCompletionProcessingWhenSafe would not post to a queue
where FGFNI call is safe in cases where the call itself fails.

Why would it fail if it’s safe?

rod_widdowson · March 14, 2019, 9:44am

That’s a good idea and I will try it, but it’s not that easy because the errors occur on numerous client’s machines that I don’t always have access to.
WPP might help you there.

Alex_Carp · March 15, 2019, 3:28pm

First let me address a point of confusion. When I said **It is a bad idea to check if the file is interesting by querying and parsing the file name every single time ** what i meant was that if you have the kind of driver that does something path based (say, tracks the number of times files that are in the subdirectory of …/blah/… are accessed) then instead of calling FGFNI every time and then running a regular expression that checks that the path matches ./blah/. one should do this ONCE and then remember whether the file was IN or OUT (and update when the path can change). Read that paragraph again, you’ll see that it discusses using a StreamContext to see if should track the file or not (and not even anything IN the StreamContext, just the mere presence of the StreamContext).

As to whether you should cache the name yourself or just call FGFNI, I’ve already stated my preference to just calling FGFNI and using its built in cache. The cache is already designed to only be invalidated when necessary and it is, in my view, difficult to do better.

Finally, synchronizing the kinds of operations where FGFNI will fail because it’s not safe to query the name is probably a bad idea.

I think it’s helpful to think about FGFNI as something that’s trying to help you make the right decisions and not something that randomly fails. So instead of trying to “just get the name on every single operation” realize that it’s not always possible to do so in a safe way and design your driver to handle that case. For example, in such cases, do something else like take a reference on the FileObject and query the name asynchronously later or something. It’s hard to come up with a generic approach because it depends on the type of product the driver is for.

I’d like to use an analogy to explain the point here: If when driving you need to pass but you’re in a NO PASSING zone, you can still pass, but it’s not safe to do so and if you ALWAYS do it, sooner or later you’ll be in trouble. You can certainly complain that you don’t think it should be a NO PASSING zone and that the people that designed the road system were not helpful but the reality is that if you chose to do it, you’ll run into something sooner or later. And in the world on filesystems, you’re designing an algorithm that does the driving and customer machines have different sets of drivers of different versions and so if it works on your test track it doesn’t mean it will work safely on every single road out there.

My advice here is to simply try to find a safe work-around for when FGFNI fails.

Best,
Alex

bkmz · March 19, 2019, 9:33am

Alex, thank you for the detailed response and clarification.

I realized I can still call FltGetFileNameInformation every time, but in addition to that, also call it in post-create to ensure the name is cached from start, so that the filter manager doesn’t have to query filesystem when it’s not safe.

And those rare cases when I don’t see Create, can be handled asynchronously as you suggest.

bkmz · March 19, 2019, 9:46am

Few days ago while testing my driver I saw FGFNI (normalized + always allow cache) returned STATUS_ACCESS_DENIED in pre-Set FileBasicInformation (called by the System process to many dlls). I didn’t call FGFNI in postCreate at that time. The error was consistent, occurred regulary many times but it has disappeared after reboot (because of windows update maybe?), and I can’t reproduce it anymore. It was Windows 10.

It seemed like the file wasn’t opened with enough rights to query name, I’m not 100% sure though. I’ve tried to test such case manually without success.

Now I’m wondering if FGFNI can fail for reasons besides TopLevelIrp or IRQL, and if it can, can it also fail in postCreate (assuming the create has succeeded)? Or maybe it’s not supposed to and it was just a bug?

Dejan_Maksimovic · March 19, 2019, 1:39pm

As a rule, FGFNI can only fail in post create if the caller does not
have Traverse rights. But I think that does not happen on W8/W10, as
the FltMgr can query the FS for a file name then.
If the create was not successful though, it can fail due to traverse rights.

You will often see FGFNI fail in SetFileInfo for a lot of files, due
to either top level IRP being nonNULL or it being a paging I/O call
(which should set the TLI field).

You CANNOT post a query for the latter case.

On 3/19/19, bkmz wrote:
> OSR https://community.osr.com/
> bkmz commented on Get Normalized name when FltGetFileNameInformation fails
>
> Few days ago while testing my driver I saw FGFNI (normalized + always allow
> cache) returned STATUS_ACCESS_DENIED in Set FileBasicinformation (called by
> the System process to many dlls). I didn’t call FGFNI in postCreate at that
> time. The error was consistent, occurred regulary many times but it has
> disappeared after reboot (because of windows update maybe?), and I can’t
> reproduce it anymore. It was Windows 10.
>
> It seemed like the file wasn’t opened with enough rights to query name, I’m
> not 100% sure though. I tried to test such case manually without success.
>
> Now I’m wondering if FGFNI can fail for reasons besides TopLevelIrp or
> IRQL, and if it can, can it also fail in postCreate?

rod_widdowson · March 19, 2019, 1:49pm

Remember to that FltGetFile… can require a transit through every lower filter (on the stack not via a callback). This is via the Name provider API.

Each of NameProvider is at liberty to fail the request in any way it feels is suitable.

Dejan_Maksimovic · March 19, 2019, 1:59pm

Coop is a beast in its own right. You will need lorazepams when you
first start testing with antivirus filters
But other name providers are fairly rare. Antiviruses are present everywhere

Remember to that FltGetFile… can require a transit through every lower
filter (on the stack not via a callback). This is via the Name provider
API.

Each of NameProvider is at liberty to fail the request in any way it feels
is suitable.

Scott_Noone_OSR · March 19, 2019, 3:28pm

I’ve seen FGFNI for normalized names fail on system files (e.g. $Extend\Bloop\Blortz) because the file system won’t let FltMgr open some part of the path. Our fallback is usually to query for the opened name.

That’s usually in the paging read/write path though and not in the SetBasicInformation path, I’d be interested to see the flags and top level IRP values when you get this error.