Handling driver unload with FS mini-filter that manages shadow file objects

OSR_Community_User · September 28, 2010, 4:49pm

I have a FS mini-filter that in some cases takes ownership of file objects that normally NTFS would handle (i.e. shadow file object architecture).

One thing we’ve had trouble with is figuring out exactly how best to deal with the case when our mini-filter gets unloaded while it’s under load and there are I/O pending for file objects that we “own”.

We can keep track of pending file objects that we’ve taken over and we’ve tried a few things to try and clean up gracefully but we still are seeing problems sometimes where one of our stray file objects is floating around the system after we have unloaded that then gets passed down to NTFS who gets all bent out of shape seeing the “foreign” file object and we blue screen.

Just wondering what is the correct procedure for a mini-filter during unload to force cancellation of pending file objects for which we’ve taken over ownership? I’ve been looking over the driver docs trying to find the appropriate API to call but I can’t seem to find the appropriate one or anything that covers this scenario.

Any suggestions and guidance are greatly appreciated. Thanks!

Alex_Carp · September 28, 2010, 5:14pm

Well, as far as I know FltMgr will prevent unloading of any minifilter while
there still are FILE_OBJECTs that it created. So in an SFO scheme you don’t
really need to do anything. FltMgr should wait for all the FILE_OBJECTs you
created to go away, which should happen only when all the FILE_OBJECTs you
own (that you created SFOs for) go away… At least, that’s the theory… are
you seeing something else ?

Are you closing all your SFOs without waiting for the FO to go away ? That
would be a problem.

Thanks,
Alex.

OSR_Community_User · September 28, 2010, 5:48pm

Thanks Alex. I guess maybe what we’re doing isn’t quite the same as a typical SFO scheme and maybe we are “borrowing” FO’s instead of creating them? What we do is this:

During PostCreate in certain cases where NTFS failed the create (usually with STATUS_OBJECT_NOT_FOUND or similar), our mini-filter takes the failed Create and turns it into SUCCESS and sets up all the fields in the original FO. We also attach a context to the FO so we can identify these “borrowed” FO’s later on.
Then later on we intercept any I/O or whatever having to do with the FO we took over and complete everything, making sure that the FO will never accidentally get handed down to NTFS.
We normally clean everything up having to do with the borrowed FO during PreClose.

Would this explain why when we unload FltMgr isn’t cleaning up for us on unload?

On a side note: Even when we aren’t doing any of this “borrowing” stuff with FO’s and just using STATUS_REPARSE to redirect I/O… With remote clients and SMB2: when we are unloaded, if there are files open that we have redirected we do NOT see the files get closed or any FLT_POST_OPERATION_DRAINING stuff happen like we always see with SMB1 clients. This will frequently cause our unload to appear to get stuck or hang, leaving us in kind of limbo’d state. Not sure if anyone has seen this before or not. When driver verifier is turned on we hit a break in FltMgr complaining about the open FO’s.

Alex_Carp · September 28, 2010, 5:59pm

I see… So in this case FltMgr can’t do anything about it. It becomes your
filter’s responsibility to either not allow unloads or enter a state where
it’s just waiting for the number of files it owns to go to 0, while not
accepting new requests.

STATUS_REPARSE should not matter at all for unload.

FLT_POST_OPERATION_DRAINING is a way for filter manager to notify a filter
about all the operations that it has not yet seen a postOp for (so they are
stuck someplace below your filter), so that your filter can cleanup any
resources it allocated for that operation. This has nothing to do with
reparses as well, it is a function of the number of operations for which
you’ve seen a preOp and requested a postOp but haven’t yet seen it.

Can you please copy & paste that verifier break ?

Thanks,
Alex.

OSR_Community_User · September 28, 2010, 6:13pm

Thanks Alex. What you have described is exactly what we currently do - i.e. wait for pending files to go to 0 and reject new requests. Unfortunately under load and in particular with SMB2 (even without much load) we see often where open files (or folders more commonly) associated with a remote Windows 7 client are kept open, seemingly indefinitely so our counter never goes to 0. It may just be that with this type scheme we will just need to require that the filter not allow unloads.

I’ll post the verifier break as well as details of what happens leading up to it later today or tomorrow.

Thanks again,
Tom

rod_widdowson · September 29, 2010, 3:48am

> Well, as far as I know FltMgr will prevent unloading of any minifilter

while
there still are FILE_OBJECTs that it created.

This is true, but it’s not the problem (which is really to do with detach
rather than unload) . The issue is that once InstanceTeardownStartCallback
has been flrmgr will carefull not call your “swap the file objects” function
and so any unclosed file objects are timebombs in waiting.

You can arrange to stall in InstanceQueryTeardownCallback, but that’s not
always called so your safest bet is remain unloadable.

A riskier option (because things may change from release to release) is to
seek out all the places where query won’t be called and ensure that you get
in the way and call FltDetach (which will then call
InstanceQueryTeardownCallback) first. Dismount and unload are the major
cases; I’d probably also plug PnP while I was at it. I’d then add a
bugcheck in InstanceQueryTeardownCallback if my count was non zero - you
*know* that you are going to get a crash, you might was well learn something
from it.

Rod

–
Rod Widdowson
Consulting Partner
Steading System Software LLP
+44 1368 850217 +1 508 915 4790
www.steadingsoftware.com

OSR_Community_User · September 29, 2010, 5:48pm

Thanks for the suggestions Rod.

Alex - here is the verifier break information. What would help me greatly is if the !fltkd.filter FFFFFA800AB9B710 8 1 command would actually work and give me the list of leaked references. I always get the “TreeLink” error message no matter what version of anything I’m running. Is there any way to get this work? Or some other way to figure out which file object is the one leaked? If I knew this then I could get a lot more mileage out of my driver debug traces.

FILTER VERIFIER ERROR: A filter (Filter = FFFFFA800AB9B710 (DSWFlt)) leaked references to the following resources:
00000000 FLT_CONTEXT structures
00000000 FLT_CALLBACK_DATA structures
00000000 FLT_DEFERRED_IO_WORKITEM structures
00000000 FLT_GENERIC_WORKITEM structures
00000000 FLT_FILE_NAME_INFORMATION structures
00000001 FILE_OBJECT structures
00000000 FLT_OBJECT structures
Type “!fltkd.filter FFFFFA800AB9B710 8 1” in the debugger for a list of leaked references

2: kd> !fltkd.filter FFFFFA800AB9B710 8 1

FLT_FILTER: fffffa800ab9b710 “DSWFlt” “385100”
InstanceList : (fffffa800ab9b768)
Resource (fffffa800ab9b7d0) List [fffffa800ab9b7d0-fffffa800ab9b7d0] rCount=0
Object usage/reference information:
References to FLT_CONTEXT : 0
Allocations of FLT_CALLBACK_DATA : 0
Allocations of FLT_DEFERRED_IO_WORKITEM : 0
Allocations of FLT_GENERIC_WORKITEM : 0
References to FLT_FILE_NAME_INFORMATION : 0
Open files : 1
References to FLT_OBJECT : 0
List of objects used/referenced::
Could not read offset of field “TreeLink” from type fltmgr!_FLT_VERIFIER_OBJECT