Server 2008 + verifier + FSD issue

Sarosh_Havewala · March 31, 2009, 7:17pm

I remember discussing the “open without the OBJ_KERNEL_HANDLE flag” with
OSR a couple of plugfests ago. I do not believe this is a verifier false
positive.

Actually, we did follow up internally with the IO team and verifier team
and then got back to someone from OSR (Rod, I think) laying out why this
is a real bug in the filter. I have laid out that scenario in response
to Scott’s comment to this thread.

Let me know if you still think that the verifier warning is a false
positive and we can discuss this further.

Regards,
Sarosh.
File System Filter Lead
Microsoft Corp

This posting is provided “AS IS” with no warranties, and confers no Rights

Tony Mason wrote:

Sarosh,

The two issues that my team discussed with you at prior plug fests of
which I am aware:

(1) The kernel handle issue, where using a kernel handle is now enforced
by verifier, but this leads to a serious performance issue due to the
added context switch (of course, only in VERY narrow cases.) Even when
we could demonstrate that our usage was not introducing the bug for
which the verifier check was being made, we were told that we MUST use
kernel handles. So we “fixed it” - when verifier is active, we use
kernel handles. That makes verifier happy and nobody is going to do
performance studies with verifier running.

(2) This APC index issue. Let me quote from the documentation for
ExAcquireResourceExclusiveLite:

“Normal kernel APC delivery must be disabled before calling this
routine. Disable normal kernel APC delivery by calling
KeEnterCriticalRegion. Delivery must remain disabled until the resource
is released, at which point it can be reenabled by calling
KeLeaveCriticalRegion.”

When the verifier check for this was FIRST introduced, we changed to a
model in which our code always acquired the lock via an indirect
mechanism - a stub/wrapper around the acquire that called
KeEnterCriticalRegion (a/k/a FsRtlEnterFileSystem.) Since that time
we’ve added additional infrastructure around that, including a
comprehensive way to check that locks are acquired in order and that we
never exit without releasing them - all by watching the apc index
disable field.

What I find fascinating about this now is that we’re being told “your
simple abstraction is broken, you really should NOT do what the
documentation says [call it a “doc bug” if you’d like] and instead
should special case this.” It seems small wonder why file system
filters become hideously complicated rat’s nests of special cases and
exceptions. Follow the rules, make the code work, enhance the system
(the apc index is very useful when combined with a wrapper model this
way because it allows us to find unreleased locks) and then have someone
on your end decide that “the way the OS file systems do this is the
correct way” and discard consideration of any other model is galling.

Of course, ultimately it is your OS. We changed our abstraction - we
vary the behavior based upon whether or not verifier is present. In
that fashion, we can still find resource leaks and we can still enforce
our lock hierarchy using our standard primitives. But the code is more
complicated and not quite as easy to understand.

Dan - I long ago gave up trying to discuss many of these cases very
actively because I found that it was seldom a fruitful exercise - once
you (Microsoft) have implemented and released a behavior, I have to
support it, even if you change that behavior in a future version of the
OS. I have a code base replete with exactly these types of changes.

I agree that the pattern of locking here is complicated, but it is
inherent in the nature of the random callback from other components
model of the OS with respect to file systems. When I receive a
callback, I don’t actually know where it originated - “most of the time”
it will have originated with the Cc/Mm components, but “once in a while”
we’ll find some other layered component that is clever and calls back
into our entry points. The teams at Microsoft are in a superior
position here - if a 3rd party product is installed and causes a
verifier failure or a crash, the 3rd party product will be (by
definition) “at fault”. However, when it comes to two 3rd party
products, it will normally be the fault of the last component installed
on the box. We actually both have the same perspective - we don’t want
a BSOD to point back to our code. You guys have the advantage of being
able to build the tools, so “!analyze -v” can walk the stack and find
the first 3rd party driver and blame it for the crash, thus diverting
attention away. I do not have that luxury, so I try to always do
everything I can to make sure I’m going to work right. If you send my
file object to NTFS, NTFS will crash. If you send NTFS’s file object to
me, I’ll return an error - because I don’t want to crash in my
code/driver.

So I have a different way of looking at Verifier than others do: to me,
you are defining what you decide is correct behavior of the OS. Since
you’re with Microsoft, your definition is all that matters. I’ll
“change my abstractions” to work the way you define they should work.
Sometimes that might not be in a fashion of which you approve, but odds
are you’ll be gone from the scene in another few years and someone else
will be in your shoes telling me how THEY think it should be done (and
it’ll be the opposite of what you told me.)

Tony
OSR

OSR_Community_User · March 31, 2009, 7:33pm

Sarosh,

The specific case in point is one where we opened a file with a user
handle and then acquired the file object for that file. The I/O was
done using the file object, not the handle, since the handle has no
stability guarantee in that particular case. No changes to data were
made, no handle was used, other than to acquire the file object. While
it was potentially possible for someone to switch the handle, the worst
case would have been for us to return the wrong file size in a directory
listing.

We started off by using kernel handles, only to find that for certain
applications we would saturate the CPU. This is because the creation of
a kernel handle - on at least ONE version of Windows - does a force
context switch. We made the decision that consuming 100% CPU in a
common case to protect against a relatively uncommon and narrow attack
vector (the impact of which would be to return erroneous size
information in a directory listing) was the right decision. While you
can disagree with our determination, we had a rationale and we had
determined that the handle swap security attack was not an issue in our
specific case (ergo, it cannot be used for an information exposure, DoS,
privilege elevation, etc.) Of course, if there is some specific
scenario we’ve missed, I’m happy to revisit this decision.

I am not suggesting (nor did I suggest at that point) that the check
should be removed from verifier. Even when this occurred I demanded
that the development team work around this specific issue. I view
adding special case behavior for Verifier as distasteful, but then
again, I sometimes ignore prefast warnings as well; I try to justify the
reason for these decisions at the time I make them.

Tony
OSR

Ken_Johnson · March 31, 2009, 10:09pm

How do you dispose of the handle afterwards? ZwClose?

S

-----Original Message-----
From: Scott Noone
Sent: Tuesday, March 31, 2009 11:57
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] Server 2008 + verifier + FSD issue

>What safeguards you from the local app altering the handle, though? Are
>you only running during early process init?

Well, the good news is that in this particular usage case we only need to be
crash proof (and not give the user access to anything they shouldn’t have
access to, of course).

For example, one is part of our, “is this particular file encrypted” logic
(open it, get a file object, look for our signature) and if the app does
something nasty the worst that happens is that we tell him the wrong answer.
Not saying it’s pretty and it adds hidden complexity to the code (don’t let
the intern update it), but it makes a measurable difference in CPU
utilization.

-scott

–
Scott Noone
Consulting Associate
OSR Open Systems Resources, Inc.
http://www.osronline.com

“Skywing” wrote in message news:xxxxx@ntfsd…
Okay, that makes sense. What safeguards you from the local app altering the
handle, though? Are you only running during early process init?

(Sorry if you’ve already explained this before.)

- S

-----Original Message-----
From: Scott Noone
Sent: Tuesday, March 31, 2009 10:55
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] Server 2008 + verifier + FSD issue

>I’d be curious to understand the circumstances where you were seeing
>significant differences here performance-wise.

On older revs of the O/S (XP for sure), if you’re not in the system process
context when you create a kernel handle Ob does a context switch to the
system process (KeStackAttachProcess). Another switch happens when you close
the handle.

If you do enough rapid opens/closes the overhead can become large.

-scott

–
Scott Noone
Consulting Associate
OSR Open Systems Resources, Inc.
http://www.osronline.com

“Skywing” wrote in message news:xxxxx@ntfsd…
Out of curiosity, what was the kernel handle issue that you were alluding
to, or can you not share? I’d be curious to understand the circumstances
where you were seeing significant differences here performance-wise.

- S

-----Original Message-----
From: Tony Mason
Sent: Tuesday, March 31, 2009 08:09
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] Server 2008 + verifier + FSD issue

Sarosh,

The two issues that my team discussed with you at prior plug fests of
which I am aware:

(1) The kernel handle issue, where using a kernel handle is now enforced
by verifier, but this leads to a serious performance issue due to the
added context switch (of course, only in VERY narrow cases.) Even when
we could demonstrate that our usage was not introducing the bug for
which the verifier check was being made, we were told that we MUST use
kernel handles. So we “fixed it” - when verifier is active, we use
kernel handles. That makes verifier happy and nobody is going to do
performance studies with verifier running.

(2) This APC index issue. Let me quote from the documentation for
ExAcquireResourceExclusiveLite:

“Normal kernel APC delivery must be disabled before calling this
routine. Disable normal kernel APC delivery by calling
KeEnterCriticalRegion. Delivery must remain disabled until the resource
is released, at which point it can be reenabled by calling
KeLeaveCriticalRegion.”

When the verifier check for this was FIRST introduced, we changed to a
model in which our code always acquired the lock via an indirect
mechanism - a stub/wrapper around the acquire that called
KeEnterCriticalRegion (a/k/a FsRtlEnterFileSystem.) Since that time
we’ve added additional infrastructure around that, including a
comprehensive way to check that locks are acquired in order and that we
never exit without releasing them - all by watching the apc index
disable field.

What I find fascinating about this now is that we’re being told “your
simple abstraction is broken, you really should NOT do what the
documentation says [call it a “doc bug” if you’d like] and instead
should special case this.” It seems small wonder why file system
filters become hideously complicated rat’s nests of special cases and
exceptions. Follow the rules, make the code work, enhance the system
(the apc index is very useful when combined with a wrapper model this
way because it allows us to find unreleased locks) and then have someone
on your end decide that “the way the OS file systems do this is the
correct way” and discard consideration of any other model is galling.

Of course, ultimately it is your OS. We changed our abstraction - we
vary the behavior based upon whether or not verifier is present. In
that fashion, we can still find resource leaks and we can still enforce
our lock hierarchy using our standard primitives. But the code is more
complicated and not quite as easy to understand.

Dan - I long ago gave up trying to discuss many of these cases very
actively because I found that it was seldom a fruitful exercise - once
you (Microsoft) have implemented and released a behavior, I have to
support it, even if you change that behavior in a future version of the
OS. I have a code base replete with exactly these types of changes.

I agree that the pattern of locking here is complicated, but it is
inherent in the nature of the random callback from other components
model of the OS with respect to file systems. When I receive a
callback, I don’t actually know where it originated - “most of the time”
it will have originated with the Cc/Mm components, but “once in a while”
we’ll find some other layered component that is clever and calls back
into our entry points. The teams at Microsoft are in a superior
position here - if a 3rd party product is installed and causes a
verifier failure or a crash, the 3rd party product will be (by
definition) “at fault”. However, when it comes to two 3rd party
products, it will normally be the fault of the last component installed
on the box. We actually both have the same perspective - we don’t want
a BSOD to point back to our code. You guys have the advantage of being
able to build the tools, so “!analyze -v” can walk the stack and find
the first 3rd party driver and blame it for the crash, thus diverting
attention away. I do not have that luxury, so I try to always do
everything I can to make sure I’m going to work right. If you send my
file object to NTFS, NTFS will crash. If you send NTFS’s file object to
me, I’ll return an error - because I don’t want to crash in my
code/driver.

So I have a different way of looking at Verifier than others do: to me,
you are defining what you decide is correct behavior of the OS. Since
you’re with Microsoft, your definition is all that matters. I’ll
“change my abstractions” to work the way you define they should work.
Sometimes that might not be in a fashion of which you approve, but odds
are you’ll be gone from the scene in another few years and someone else
will be in your shoes telling me how THEY think it should be done (and
it’ll be the opposite of what you told me.)

Tony
OSR

—
NTFSD is sponsored by OSR

For our schedule of debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

—
NTFSD is sponsored by OSR

For our schedule of debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

—
NTFSD is sponsored by OSR

For our schedule of debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Sarosh_Havewala · March 31, 2009, 10:15pm

Here is the sequence I am referring to:

Filter opens handle (in user’s handle table) and references file object
User closes handle
Filter closes handle

Two cleanup IRPs go to the file system and the file system crashes.

Am I missing something here?

Regards,
Sarosh.
File System Filter Lead
Microsoft Corp

This posting is provided “AS IS” with no warranties, and confers no Rights

Tony Mason wrote:

Sarosh,

The specific case in point is one where we opened a file with a user
handle and then acquired the file object for that file. The I/O was
done using the file object, not the handle, since the handle has no
stability guarantee in that particular case. No changes to data were
made, no handle was used, other than to acquire the file object. While
it was potentially possible for someone to switch the handle, the worst
case would have been for us to return the wrong file size in a directory
listing.

We started off by using kernel handles, only to find that for certain
applications we would saturate the CPU. This is because the creation of
a kernel handle - on at least ONE version of Windows - does a force
context switch. We made the decision that consuming 100% CPU in a
common case to protect against a relatively uncommon and narrow attack
vector (the impact of which would be to return erroneous size
information in a directory listing) was the right decision. While you
can disagree with our determination, we had a rationale and we had
determined that the handle swap security attack was not an issue in our
specific case (ergo, it cannot be used for an information exposure, DoS,
privilege elevation, etc.) Of course, if there is some specific
scenario we’ve missed, I’m happy to revisit this decision.

I am not suggesting (nor did I suggest at that point) that the check
should be removed from verifier. Even when this occurred I demanded
that the development team work around this specific issue. I view
adding special case behavior for Verifier as distasteful, but then
again, I sometimes ignore prefast warnings as well; I try to justify the
reason for these decisions at the time I make them.

Tony
OSR

Ken_Johnson · March 31, 2009, 10:19pm

Once cleanup has been issued, aren’t further handle opens blocked in general? Or do I misremember?

(IIRC, there was a discussion that determined this in a thread awhile ago here.)

S

-----Original Message-----
From: Sarosh Havewala
Sent: Tuesday, March 31, 2009 19:15
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] Server 2008 + verifier + FSD issue

Here is the sequence I am referring to:

1. Filter opens handle (in user’s handle table) and references file object

2. User closes handle

3. Filter closes handle

Two cleanup IRPs go to the file system and the file system crashes.

Am I missing something here?

Regards,
Sarosh.
File System Filter Lead
Microsoft Corp

This posting is provided “AS IS” with no warranties, and confers no Rights

Tony Mason wrote:
> Sarosh,
>
> The specific case in point is one where we opened a file with a user
> handle and then acquired the file object for that file. The I/O was
> done using the file object, not the handle, since the handle has no
> stability guarantee in that particular case. No changes to data were
> made, no handle was used, other than to acquire the file object. While
> it was potentially possible for someone to switch the handle, the worst
> case would have been for us to return the wrong file size in a directory
> listing.
>
> We started off by using kernel handles, only to find that for certain
> applications we would saturate the CPU. This is because the creation of
> a kernel handle - on at least ONE version of Windows - does a force
> context switch. We made the decision that consuming 100% CPU in a
> common case to protect against a relatively uncommon and narrow attack
> vector (the impact of which would be to return erroneous size
> information in a directory listing) was the right decision. While you
> can disagree with our determination, we had a rationale and we had
> determined that the handle swap security attack was not an issue in our
> specific case (ergo, it cannot be used for an information exposure, DoS,
> privilege elevation, etc.) Of course, if there is some specific
> scenario we’ve missed, I’m happy to revisit this decision.
>
> I am not suggesting (nor did I suggest at that point) that the check
> should be removed from verifier. Even when this occurred I demanded
> that the development team work around this specific issue. I view
> adding special case behavior for Verifier as distasteful, but then
> again, I sometimes ignore prefast warnings as well; I try to justify the
> reason for these decisions at the time I make them.
>
> Tony
> OSR
>
>

—
NTFSD is sponsored by OSR

For our schedule of debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · March 31, 2009, 11:12pm

When tearing down a user handle, you DO have to be very careful - if you
call ZwClose, you will bug check if the user has already closed the
handle. If you call NtClose, you’re ok, because then the close is being
done by/on behalf of the application and the OS must already handle the
close of an invalid handle case.

Since this is going into the archive, please note that as a general rule
it is NOT good practice to create user handles from a kernel driver. We
did it - in a very specific case - after analyzing it and determining
that we had no viable alternative (sadly, “upgrade to a newer OS” is
often not viable.) I’m not suggesting it as a general technique, nor am
I even suggesting it.

The most effective approach to triggering incorrect behavior here would
be for a filter to call ObReferenceObject and then ObDereferenceObject
on a file object once it’s been cleaned up - that causes the
IRP_MJ_CLOSE to be sent before the IRP_MJ_CLEANUP arrives in the FSD.

I’ve had to actually deal with this case a number of times. I’m very
glad that verifier flags the bumped reference count here, although there
are times when it would be nice to defer the IRP_MJ_CLOSE.

I know there are verifier tests that others don’t like that I do like
(e.g., “always return STATUS_PENDING”. I’m now trying to figure out how
we get an “always return STATUS_REPARSE” test in there.) Verifier is a
tool - a highly useful tool - but it is not a panacea, and it cannot
serve as a blind replacement for understanding.

Tony
OSR

OSR_Community_User · April 1, 2009, 6:54am

>Once cleanup has been issued, aren’t further handle opens blocked in general?

Yes, by FO_CLEANUP_COMPLETE flag.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

OSR_Community_User · April 1, 2009, 9:19am

I have followed the advice of Dan Mihai:

“I have been discussing these issues with folks who work on the Cache
Manager and Memory Manager. We concluded that AcquireForLazyWrite and
AcquireForModWrite routines don’t need to disable normal kernel
APCs…” and “So we recommend removing the KeEnterCriticalRegion calls
from AcquireForLazyWrite or AcquireForModWrite routines, and their
corresponding KeLeaveCriticalRegion from the Release routines…”

Server 2008 verifier no longer complains about APCs being disabled at
the begin of IRP_MJ_WRITE.

I’m still chasing lock issues with the OpenAFS redirector so I took the
position that any time dispatch is entered with an FCB resource lock
already held, I at least need to understand why.

At the moment, I have a situation where, after a multi-megabyte write, I
sometimes get a deadlock situation. The write operation does not
complete nor is it possible to terminate it. When this happens, I note
that IRP_MJ_CLOSE is being sent with the FCB resource lock already held
and that it is owned by the code in a worker thread calling
CcPurgeCacheSection.

My knowledge of cache operations is limited to say the least and an
attempt to discover when and how this function (CcPurgeCacheSection)
should be called has not been successful. As I’m writing this, I don’t
know if cleanup and close have already been called on the file.

Is there a set of rules for using CcPurgeCacheSection that will fit in a
paragraph or two?

Is there a circumstance when IRP_MJ_CLOSE should arrive with the FCB
resource already locked?

Thanks,
Mickey.

OSR_Community_User · April 1, 2009, 10:49am

Mickey Lane wrote:

I have followed the advice of Dan Mihai:

“I have been discussing these issues with folks who work on the Cache
Manager and Memory Manager. We concluded that AcquireForLazyWrite and
AcquireForModWrite routines don’t need to disable normal kernel
APCs…” and “So we recommend removing the KeEnterCriticalRegion calls
from AcquireForLazyWrite or AcquireForModWrite routines, and their
corresponding KeLeaveCriticalRegion from the Release routines…”

This seems like a bit of a hack to get around something in verifier …
can it be guaranteed that an APC will ‘never’ be delivered to a Cm
worker thread causing some sort of re-entrancy back into the file system
stack and potentially leading to a deadlock?

Pete

Server 2008 verifier no longer complains about APCs being disabled at
the begin of IRP_MJ_WRITE.

I’m still chasing lock issues with the OpenAFS redirector so I took the
position that any time dispatch is entered with an FCB resource lock
already held, I at least need to understand why.

At the moment, I have a situation where, after a multi-megabyte write, I
sometimes get a deadlock situation. The write operation does not
complete nor is it possible to terminate it. When this happens, I note
that IRP_MJ_CLOSE is being sent with the FCB resource lock already held
and that it is owned by the code in a worker thread calling
CcPurgeCacheSection.

My knowledge of cache operations is limited to say the least and an
attempt to discover when and how this function (CcPurgeCacheSection)
should be called has not been successful. As I’m writing this, I don’t
know if cleanup and close have already been called on the file.

Is there a set of rules for using CcPurgeCacheSection that will fit in a
paragraph or two?

Is there a circumstance when IRP_MJ_CLOSE should arrive with the FCB
resource already locked?

Thanks,
Mickey.

NTFSD is sponsored by OSR

For our schedule of debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

–
Kernel Drivers
Windows File System and Device Driver Consulting
www.KernelDrivers.com
866.263.9295

OSR_Community_User · April 1, 2009, 1:22pm

The interesting assumption here is that some folks believe these calls
are only made by the OS; in my experience, this is not the case.

The APIs for calling an FSD to perform this locking are public - these
are nothing more than Fast I/O entry points after all (at least in some
OS versions) and they are the only alternative for filters attempting to
do locking in ugly/nasty sorts of ways (think “I want to modify the
behavior of the cache, but I don’t want to assume the layout of the
FSD’s locks. So I just call the fast I/O entry points for these
functions.”) I’ve seen it. Some are more industrious and find the
actual FsRtl APIs (some are in ntifs.h.)

So, recall that in the special case code we are not able to base our
decision on the API invoked, we must do it based upon identifying the
specific thread. This is nothing, if not fragile.

For Microsoft, this is easily handled - “nobody else may use these APIs,
except the OS.” Of course, that would be “except in the case of passing
them through a legacy filter driver.” But anyone that actually does it,
finds things work ok… then they install my file system or my filter
driver and things crash. It’s clearly MY fault as the last driver
installed.

If I tightly constrain the environment I make the problem vastly easier.
For those of you writing for embedded systems, your ability to “bend the
rules” is much easier. For those of you that can define “we install
exactly THIS stuff on the box” your lives are much easier. For those
that write general purpose code that ships on a broad range of systems,
everything on that system must “play by the rules” (as if there were a
conformance test one could run to validate this is the case) or it will
break.

Ironically, the lower cost the individual system, the more likely it
will be loaded with a random assortment of drivers. This in turn means
that those writing the drivers have less incentive to figure out the
rules, since the actual cost of production is the big driver for them.

Microsoft is in the most enviable of positions - they have a very
tightly controlled box. They test it to death in a broad range of
configurations and circumstances, but in the end they have the ultimate
in controlled environments. Please forgive those of us on the opposite
end of the spectrum (“this must work with every version of every
anti-virus product on the market”) that sometimes must do things that
are unseemly to you in order to make sure we work properly.

Tony
OSR

OSR_Community_User · April 2, 2009, 10:16am

Hi everyone,

Thanks for your honest feedback about Verifier - please keep it coming
because it is very useful.

Clearly my experiences have been different from some of yours, so it
is expected that some of my opinions that are different from yours.
I don’t intend to criticize your views - just to present my view too.

Here are my thoughts about some of the topics that came out lately:

Avoiding the use of OBJ_KERNEL_HANDLE:

Tony/Scott, it sounds like you guys analyzed that scenario pretty
thoroughly. Can you also reference that handle just as UserMode,
never as KernelMode? That would be the proper way to reference such a
handle, and there wouldn’t be any performance overhead from a process
attach.

BTW, if anyone follows this discussion but is not aware of the issues
related to OBJ_KERNEL_HANDLE, we tried to describe them at the
beginning of the doc from
http://www.microsoft.com/whdc/devtools/tools/Win7DriverVer.mspx

“Normal kernel APC delivery must be disabled before calling this
routine. Disable normal kernel APC delivery by calling
KeEnterCriticalRegion. Delivery must remain disabled until the
resource is released, at which point it can be reenabled by calling
KeLeaveCriticalRegion.”

Perhaps this doc could use some clarifications. However, even in its
current format, I don’t read it as “it is always a good idea to call
KeEnterCriticalRegion immediately before calling
ExAcquireResourceSharedLite”. If the caller of AcquireForLazyWrite or
AcquireForModWrite promised that APCs were already disabled as
appropriate, it seems reasonable to skip an additional
KeEnterCriticalRegion call.

As I said before, I think this whole model of ntoskrnl.exe dictating a
lock acquire in another driver (by calling AcquireForLazyWrite or
AcquireForModWrite) is very unfortunate, but I’m trying to find ways to
live with it.

“This seems like a bit of a hack to get around something in
verifier … can it be guaranteed that an APC will ‘never’ be delivered
to a Cm worker thread causing some sort of re-entrancy back into the file
system stack and potentially leading to a deadlock?”

Pete, I see this differently, as: removing some useless calls to
KeEnterCriticalRegion, and behaving similarly to all the Microsoft FSDs
and the samples. That also helps drivers cope with the fact that Verifier
is missing a hack that would workaround a hacky locking design that’s
probably as old as Windows.

I believe the KeEnterCriticalRegion call before acquiring Filesystem locks
is intended as a defence mechanism against suspending threads. It is not
intended to avoid re-entrance into Filesystem from some random APC routine.
Nobody is supposed to send willy-nilly APCs to Memory Manager or Cache
Manager threads, and also be evil enough to enter into the Filesystem
from their APC routine.

About spending a bunch of energy to avoid a crash in your driver, or
other problems, when your driver is not at fault:

All of us have to make such compromises every once in a while, but I
would try to stay away from them as much as possible. My experience has
been that after working around the bad behavior of someone else’s driver,
that other driver will still find a way to screw up the entire system,
and then we go back to square one.

Thanks,
Dan

Apurva_Doshi · April 8, 2009, 1:55pm

Mickey Lane wrote:

I have followed the advice of Dan Mihai:

“I have been discussing these issues with folks who work on the Cache
Manager and Memory Manager. We concluded that AcquireForLazyWrite and
AcquireForModWrite routines don’t need to disable normal kernel
APCs…” and “So we recommend removing the KeEnterCriticalRegion calls
from AcquireForLazyWrite or AcquireForModWrite routines, and their
corresponding KeLeaveCriticalRegion from the Release routines…”

Server 2008 verifier no longer complains about APCs being disabled at
the begin of IRP_MJ_WRITE.

I’m still chasing lock issues with the OpenAFS redirector so I took the
position that any time dispatch is entered with an FCB resource lock
already held, I at least need to understand why.

At the moment, I have a situation where, after a multi-megabyte write, I
sometimes get a deadlock situation. The write operation does not
complete nor is it possible to terminate it. When this happens, I note
that IRP_MJ_CLOSE is being sent with the FCB resource lock already held
and that it is owned by the code in a worker thread calling
CcPurgeCacheSection.

My knowledge of cache operations is limited to say the least and an
attempt to discover when and how this function (CcPurgeCacheSection)
should be called has not been successful. As I’m writing this, I don’t
know if cleanup and close have already been called on the file.

Is there a set of rules for using CcPurgeCacheSection that will fit in a
paragraph or two?

Is there a circumstance when IRP_MJ_CLOSE should arrive with the FCB
resource already locked?

Thanks,
Mickey.

Hi Mickey,

The rules around CcPurgeSection are that the file must be held
exclusively by the caller - this means no cached reads/writes should be
in-flight while purge is executing. If you are seeing
CcPurgeCacheSection being called in a CcWorker thread, it is most likely
happening as part of cache map teardown for a file that has been truncated.

An example of when close would come with the FCB Lock held is if the
filesystem calls CcPurgeCacheSection itself (and thus owns some
resource) on a stream that is longer cached by Cc but is still cached by
Mm. The purge could take away all of the references to the control area
which includes (potentially the last) dereference of the representative
fileobject thus triggering an inline close.

For the deadlock you are seeing:

Is the multi-megabyte write a top-level, cached write? Is it
extending? Large cached writes end up performing inline flushes which
would recurse back into your FSD.
Is CcPurgeCacheSection being called for the same stream as the
multi-megabyte write. Is it being called by one of your FSP worker
threads or by Cc?

Thanks
apurva

OSR_Community_User · April 9, 2009, 11:17am

Hi Apurva,

For reference, this is about the OpenAFS redirector FSD.

Moving your 2 questions to the top of the quoted messages…

Apurva Doshi asked:

For the deadlock you are seeing:
>
> 1. Is the multi-megabyte write a top-level, cached write? Is it
> extending? Large cached writes end up performing inline flushes which
> would recurse back into your FSD.

I have 1, 5 & 25 Mbyte test files on the local machine.

I “net use Y: \afs<cell>”
I "copy /y Y:<path><file>

The large file normally takes 5 minutes or so. When it locks, I see
the error reported below after something like 10 minutes. The small
file always works.

>
> 2. Is CcPurgeCacheSection being called for the same stream as the
> multi-megabyte write. Is it being called by one of your FSP worker
> threads or by Cc?

Worker thread.

After working on this for a while, I’ve discovered that if I
turn on massive numbers of trace messages, the operation
completes successfully no doubt due to changes in timing.

Looking at trace logs, I’ve also noted that a (the root?)
failure occurs very early in the process - perhaps after
1 second or so.

I’ve also noted that a utility program for the subsystem
(fs.exe if you’re familiar with AFS) also locks up when
this problem occurs. (‘Locks up’ wrt fs.exe is probably
incorrect. It sends a request and never gets an answer…)

In sum, I think the problems reported below are symptoms of
another issue that occurs a quite a bit prior. I’m working
on that now.

Thanks for your help here. It’s good information.

Mickey.

Apurva Doshi wrote:
> Mickey Lane wrote:
>> I have followed the advice of Dan Mihai:
>>
>> “I have been discussing these issues with folks who work on the Cache
>> Manager and Memory Manager. We concluded that AcquireForLazyWrite and
>> AcquireForModWrite routines don’t need to disable normal kernel
>> APCs…” and “So we recommend removing the KeEnterCriticalRegion
>> calls from AcquireForLazyWrite or AcquireForModWrite routines, and
>> their corresponding KeLeaveCriticalRegion from the Release routines…”
>>
>> Server 2008 verifier no longer complains about APCs being disabled at
>> the begin of IRP_MJ_WRITE.
>>
>>
>> I’m still chasing lock issues with the OpenAFS redirector so I took
>> the position that any time dispatch is entered with an FCB resource
>> lock already held, I at least need to understand why.
>>
>> At the moment, I have a situation where, after a multi-megabyte write,
>> I sometimes get a deadlock situation. The write operation does not
>> complete nor is it possible to terminate it. When this happens, I note
>> that IRP_MJ_CLOSE is being sent with the FCB resource lock already
>> held and that it is owned by the code in a worker thread calling
>> CcPurgeCacheSection.
>>
>> My knowledge of cache operations is limited to say the least and an
>> attempt to discover when and how this function (CcPurgeCacheSection)
>> should be called has not been successful. As I’m writing this, I don’t
>> know if cleanup and close have already been called on the file.
>>
>> Is there a set of rules for using CcPurgeCacheSection that will fit in
>> a paragraph or two?
>>
>> Is there a circumstance when IRP_MJ_CLOSE should arrive with the FCB
>> resource already locked?
>>
>> Thanks,
>> Mickey.
>>
>>
> Hi Mickey,
>
> The rules around CcPurgeSection are that the file must be held
> exclusively by the caller - this means no cached reads/writes should be
> in-flight while purge is executing. If you are seeing
> CcPurgeCacheSection being called in a CcWorker thread, it is most likely
> happening as part of cache map teardown for a file that has been truncated.
>
> An example of when close would come with the FCB Lock held is if the
> filesystem calls CcPurgeCacheSection itself (and thus owns some
> resource) on a stream that is longer cached by Cc but is still cached by
> Mm. The purge could take away all of the references to the control area
> which includes (potentially the last) dereference of the representative
> fileobject thus triggering an inline close.
>
> For the deadlock you are seeing:
>
> 1. Is the multi-megabyte write a top-level, cached write? Is it
> extending? Large cached writes end up performing inline flushes which
> would recurse back into your FSD.
>
> 2. Is CcPurgeCacheSection being called for the same stream as the
> multi-megabyte write. Is it being called by one of your FSP worker
> threads or by Cc?
>
> Thanks
> apurva
>
> —
> NTFSD is sponsored by OSR
>
> For our schedule of debugging and file system seminars
> (including our new fs mini-filter seminar) visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>