If I have a context that contains a pointer to an ERESOURCE, and my
context cleanup callback gets called by the fltmgr while the resource is
still held, what should the appropriate result be?
Obviously, if the callback is being called, nobody holds a reference to
the context and the resource is held erroneously. I.e. somewhere in my
code, I acquired it and never released it, then called FltReleaseContext
to decrement the reference count.
I suppose the options are:
-
Try to acquire the lock and wait. This is probably a lost cause,
things are already screwed up at this point, and there’s no reason to
expect it to ever be released, also this would cause a kernel thread I
don’t own to wait. This sounds like an all-around terrible idea. -
Pull the rug out from under the holder by freeing the memory the
resource occupies. If the holder never tries to release the resource,
it wouldn’t cause problems for the holder, but I don’t know what happens
within the kernel. If there’s any record of the ERESOURCE held
somewhere, and we destroy it, what’s going to happen? If the holder
does try to release the resource eventually, it’ll probably bugcheck the
system. -
Raise an exception with the expectation that it probably won’t be
handled and bring down the system. Things could be unrecoverably
screwed up anyway if I’m in the situation to begin with. Could further
execution corrupt data? Probably not, but it could cause more erroneous
behavior. This seems like a reasonable approach for a checked build. -
Leak the memory. No immediate harm, but eventually, things will
grind to a halt when the NonPagedPool gets burned up. This might be
preferable to 3 on a free build. -
ASSERT that the resource isn’t held. Hopefully by the time the code
goes to production, anything that could cause the situation to arise in
the first place has been ironed out of the code.
Any thoughts or suggestions as to which of these is the most correct
thing to do? Obviously, if somebody is holding a pointer to memory that
I clean up in the callback, the system will bugcheck if they dereference
it. Options 2 and 3 seem like the most analogous way to deal with
this, with option 3 potentially being the easier one to track down in
testing (it’s guaranteed to fail immediately). Option 4 seems like the
least destructive thing to do to an end user in the short term.
~Eric