I wrote a small minifilter which change the TargetInstance of \Device\HarddiskVolume2 to \Device\HarddiskVolume1. It works well but it fails to unload.
When I remove the code that change the target instance the unload succeed.
I spent some time debugging this and I think it’s a bug in fltmgr.sys but I hope someone here could shed some light on this.
I found those threads (but theres no answers there):
http://www.winvistatips.com/minifilter-unload-hang-if-targetinstance-has-been-changed-t193891.html
http://www.osronline.com/showThread.CFM?link=106792
When I look at the instances who refused to go down I see that their OperationRundownRef is not 0 (means that there are operations pending right?)
Anyway I debugged it and it seems that fltmgr!FltpPerformPreCallbacks calls fltmgr!FltpExAquireRundownProtectionCacheAwareEx and raise OperationRundownRef by 4 before the PreOperation. In a normal operation it seems that it calls FltpExReleaseRundownProtectionCacheAwareEx twice so that the OperationRundownRef returns to 0 (if theres only one operation).
But the bug here is that that if I change the target instance in the PreOperation it decrease the ref-count of the old instance once and the new instance once.
So before a PreOperation the RefCount is 4, and at the end of the FltpPerformPreCallbacks loop ends first instance OperationRefCount = 2 and the new instance OperationRefCount is -2 (negative = fffffffe).
So when unloading the ref-count is usually not 0.
Anyway I think there are 2 bugs here. When changing the target instance, the new instance ref-count should go up and I didn’t see that in the assembly nowhere.
Tested on Windows XP SP3. Didn’t test it on another machines yet.
Experts, here are my questions:
* Am I right?
* If Im right, is it still safe to change the CBD target instance if I don’t support minifilter unload?
Thanks!
Hi Danny,
It certainly seems likely that there is a bug in FltMgr in this path. The
way this is supposed to work is by simply changing the TargetInstance to the
new instance and then calling FltSetCallbackDataDirty().
Please note that as indicated in the OSR thread from your post, there is
another pretty serious issue with using this approach for redirection, which
is that the IRP stack might be too small for the other volume you are
redirecting to and you might run out of stack locations. There is almost
nothing you can do to protect yourself against that and FltMgr is supposed
to transparently make it work (though it still doesn’t even in Win7).
Now, FltMgr implementation bugs aside, one potential problem this
architecture has is that depending on your altitude, this redirection might
be problematic because you are injecting data in the middle of the IO stack.
For example, if you redirect some IO from C: to D: and there is an
encryption filter on D: (but not on C:) that encrypts every file on that
volume and if your altitude is below that encryption filter then it is
possible that you will end up writing unencrypted data to disk. Then if the
data is read from the top of the IO stack on D: (and thus going through the
encryption filter) then the file or just the data inside might appear to be
corrupted to the encryption filter and who knows what will happen.
For IRP_MJ_CREATE the more supported/tested approach is to reparse to the
new volume. For other operations (if you want to just redirect some IO to
the different volume) then one options is to issue your own operation at the
top of the IO stack on that volume (ZwCreateFile on that volume and then
ZwWriteFile or whatever operation you want to perform). Nevertheless, if you
simply must inject the operation in the middle of the other IO stack just
like changing the TargetInstance would, then you achieve the same result
(less the dereferencing and IRP stack sizes issues) by calling FltCreateFile
with your instance on the target volume and then use that handle to perform
the operation (either ZwWriteFile or FltWriteFile would work the same way in
this case).
Does this make sense ? Does it answer both your questions ?
Thanks,
Alex.
Thanks Ales, I think I do.
What you describes in the last paragraph requires me to translate every IRP to an appropriate FltXXX (if I want to inject operations in the middle of the stack), which seems non-trivial.
If I overcome the stack size (if I know it’s large enough) and if my altitude range is high enough in both volumes (above everything that “changes” data such as encryption/compression) and if I do not support unloading is it safe to use this?
If I understand correctly, my OperationRefCount on both instances will be wrong (and will move up and down constantly). But is a buggy OperationRefCount field affects anything else besides MinifilterUnload routines (which I won’t have)?
*I meant Alex of course:)
Well, if you really need to redirect every IRP to a different volume then
perhaps you need a different approach ? Could you explain what you are
trying to achieve and why STATUS_REPARSE won’t work for you?
Those were the limitations I was aware of, but there may be more. I haven’t
actually tried to make such a solution work so if you overcome these
problems you might find additional issues (but maybe not).
The OperationRefCount is used when instances are torn down. Unloading a
minifilter will also tear down the instances so that’s why it blocks. But it
is possible to detach an instance even without unloading a minfilter
(fltmc.exe can do it if you want to test this scenario). It is fairly easy
to make a minifilter not unloadable but not so easy to control the life span
of instances (mainly because their life span also depends on the life span
of the underlying volume). I’m not sure what will happen but you should try
it out and see whether it’s something you can live with.
Thanks,
Alex.
Thanks a lot for your help Alex!
I’ll test for instance detaching and volume un-mounting(not sure if it’s the correct phrase) since this solution seems easier then translating every request to the FltXXX equlivent, and the known issues we talked about are non-issues in my case.
Is there a mail where I can report those kind of bugs to microsoft?
Update:
As you suggested, I tested fltmc detach on instances with bad OperationRefCount and it doesn’t work (hangs).
I tested my driver with a usb device (mounted to "E:") and “Safely Remove Hardware” fails.
Then I forcely removed it, and it seems that my driver is no longer attached to "E:" but it is attached to \Device\Harddisk1(WIERD_GARBAGE_VALUES) or something like this. The same thing is also true for other minifilters.
If I attach the usb again, the minifilter do not attach to this usb device (but other minifilter does).
Damn… testing detach was a good idea.