I remember discussing the “open without the OBJ_KERNEL_HANDLE flag” with
OSR a couple of plugfests ago. I do not believe this is a verifier false
positive.
Actually, we did follow up internally with the IO team and verifier team
and then got back to someone from OSR (Rod, I think) laying out why this
is a real bug in the filter. I have laid out that scenario in response
to Scott’s comment to this thread.
Let me know if you still think that the verifier warning is a false
positive and we can discuss this further.
Regards,
Sarosh.
File System Filter Lead
Microsoft Corp
This posting is provided “AS IS” with no warranties, and confers no Rights
Tony Mason wrote:
Sarosh,
The two issues that my team discussed with you at prior plug fests of
which I am aware:(1) The kernel handle issue, where using a kernel handle is now enforced
by verifier, but this leads to a serious performance issue due to the
added context switch (of course, only in VERY narrow cases.) Even when
we could demonstrate that our usage was not introducing the bug for
which the verifier check was being made, we were told that we MUST use
kernel handles. So we “fixed it” - when verifier is active, we use
kernel handles. That makes verifier happy and nobody is going to do
performance studies with verifier running.(2) This APC index issue. Let me quote from the documentation for
ExAcquireResourceExclusiveLite:“Normal kernel APC delivery must be disabled before calling this
routine. Disable normal kernel APC delivery by calling
KeEnterCriticalRegion. Delivery must remain disabled until the resource
is released, at which point it can be reenabled by calling
KeLeaveCriticalRegion.”When the verifier check for this was FIRST introduced, we changed to a
model in which our code always acquired the lock via an indirect
mechanism - a stub/wrapper around the acquire that called
KeEnterCriticalRegion (a/k/a FsRtlEnterFileSystem.) Since that time
we’ve added additional infrastructure around that, including a
comprehensive way to check that locks are acquired in order and that we
never exit without releasing them - all by watching the apc index
disable field.What I find fascinating about this now is that we’re being told “your
simple abstraction is broken, you really should NOT do what the
documentation says [call it a “doc bug” if you’d like] and instead
should special case this.” It seems small wonder why file system
filters become hideously complicated rat’s nests of special cases and
exceptions. Follow the rules, make the code work, enhance the system
(the apc index is very useful when combined with a wrapper model this
way because it allows us to find unreleased locks) and then have someone
on your end decide that “the way the OS file systems do this is the
correct way” and discard consideration of any other model is galling.Of course, ultimately it is your OS. We changed our abstraction - we
vary the behavior based upon whether or not verifier is present. In
that fashion, we can still find resource leaks and we can still enforce
our lock hierarchy using our standard primitives. But the code is more
complicated and not quite as easy to understand.Dan - I long ago gave up trying to discuss many of these cases very
actively because I found that it was seldom a fruitful exercise - once
you (Microsoft) have implemented and released a behavior, I have to
support it, even if you change that behavior in a future version of the
OS. I have a code base replete with exactly these types of changes.I agree that the pattern of locking here is complicated, but it is
inherent in the nature of the random callback from other components
model of the OS with respect to file systems. When I receive a
callback, I don’t actually know where it originated - “most of the time”
it will have originated with the Cc/Mm components, but “once in a while”
we’ll find some other layered component that is clever and calls back
into our entry points. The teams at Microsoft are in a superior
position here - if a 3rd party product is installed and causes a
verifier failure or a crash, the 3rd party product will be (by
definition) “at fault”. However, when it comes to two 3rd party
products, it will normally be the fault of the last component installed
on the box. We actually both have the same perspective - we don’t want
a BSOD to point back to our code. You guys have the advantage of being
able to build the tools, so “!analyze -v” can walk the stack and find
the first 3rd party driver and blame it for the crash, thus diverting
attention away. I do not have that luxury, so I try to always do
everything I can to make sure I’m going to work right. If you send my
file object to NTFS, NTFS will crash. If you send NTFS’s file object to
me, I’ll return an error - because I don’t want to crash in my
code/driver.So I have a different way of looking at Verifier than others do: to me,
you are defining what you decide is correct behavior of the OS. Since
you’re with Microsoft, your definition is all that matters. I’ll
“change my abstractions” to work the way you define they should work.
Sometimes that might not be in a fashion of which you approve, but odds
are you’ll be gone from the scene in another few years and someone else
will be in your shoes telling me how THEY think it should be done (and
it’ll be the opposite of what you told me.)Tony
OSR