MmForceSectionClosed when both DataSectionObject and ImageSectionObject exist

If both DataSectionObject and ImageSectionObject exist, it appears
that the intentional behavior of MmForceSectionClosed is to “only act
against ImageSectionObject.” How can I most intelligently add the
DeleteOnClose flag to the DataSectionObject, at a time when the
ImageSectionObject also exists?

The intention is as you might expect; to get IRP_MJ_CLOSE as soon as
the reference counts from the mappings hit zero. Something a call to
MmForceSectionClosed is successfully achieving when only one or the
other of the sections exist, but not when both exist at the same time.

Alan Adams
Client for Open Enterprise Server
Micro Focus
xxxxx@microfocus.com

> If both DataSectionObject and ImageSectionObject exist, it appears

that the intentional behavior of MmForceSectionClosed is to “only act
against ImageSectionObject.”

I wrote that backwards; when both sections are present, it’s
DataSectionObject that is acted on by MmForceSectionClosed, not
ImageSectionObject.

Alan Adams
Client for Open Enterprise Server
Micro Focus
xxxxx@microfocus.com

Will there be a race condition? If you start when you see that the
reference counts are zero, can an intervening event occur that raises a
count before your operation finishes?

On Mon, Feb 5, 2018 at 12:04 PM, Alan Adams <
xxxxx@lists.osr.com> wrote:

> > If both DataSectionObject and ImageSectionObject exist, it appears
> > that the intentional behavior of MmForceSectionClosed is to “only act
> > against ImageSectionObject.”
>
> I wrote that backwards; when both sections are present, it’s
> DataSectionObject that is acted on by MmForceSectionClosed, not
> ImageSectionObject.
>
> Alan Adams
> Client for Open Enterprise Server
> Micro Focus
> xxxxx@microfocus.com
>
> —
> NTFSD is sponsored by OSR
>
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer&gt;
></http:>

> Will there be a race condition? If you start when you see that the

reference counts are zero, can an intervening event occur that raises a
count before your operation finishes?

Thanks for jumping in.

I expect the simple answer to your question is “Yes”, but I believe
that’s MmForceSectionClosed’ problem to deal with and not mine. He
has his own set of locks that he takes when deciding whether or not to
clean up and remove a pointer from the SECTION_OBJECT_POINTERS that
was passed, which I expect addresses the race condition described.

From the file system driver’s perspective (actually a network
redirector in our case), we don’t “know” what the reference counts are
to begin with. That’s opaque nt!_CONTROL_AREA business, which you
might know if debugging and using !ca. And whether the reference
count is zero isn’t really the key; we just care that DeleteOnClose
flag is added even if the reference count is not yet zero.

After continued debugging today, although the question perhaps still
stands for “how could I cause DeleteOnClose to be added to both
sections when both sections are present”, I think the root cause of my
issue is actually more about “when” the sections become populated,
rather than “I needed MmForceSectionClosed to mark both sections with
DeleteOnClose.”

What currently happens is this:

  1. User mode application calls LoadLibrary.

  2. Windows loader opens FILE_OBJECT to DLL.

  3. Windows loader maps ImageSectionObject for this FILE_OBJECT/FCB.

  4. IRP_MJ_CLEANUP issued for this FILE_OBJECT, as Windows loader
    closes handle (presuming he’s the only/last one).

  5. During IRP_MJ_CLEANUP our redirector invokes MmForceSectionClosed,
    which just marks the ImageSectionObject as DeleteOnClose, since the
    image mapping references are still non-zero at this time.

  6. A third-party kernel-mode caller invokes
    FsRtlCreateSectionForDataScan using the same FILE_OBJECT as step 3.

  7. DataSectionObject is now established on this FILE_OBJECT/FCB, in
    addition to the ImageSectionObject that already existed.

  8. A series of IRP_MJ_READs are issued as paging I/O, as the
    FsRtlCreateSectionForDataScan caller page-faults their way through the
    mapped file.

  9. The FsRtlCreateSectionForDataScan caller correctly closes the
    Section object handle, and ObDereferenceObject’s the Section object
    returned. (Verified via !obtrace.)

  10. At some point later, the user mode application calls FreeLibrary.

  11. The ImageSectionObject reaches zero references, and is destroyed
    due to presence of the DeleteOnClose flag previously applied.

  12. Although the FsRtlCreateSectionForDataScan caller has
    de-referenced the Section they created, the DataSectionObject control
    area remains, because DeleteOnClose was never applied to that section.

  13. IRP_MJ_CLOSE for this FILE_OBJECT “never” comes (until memory
    manager is ready to destroy on his own terms, presumably), because the
    outstanding DataObjectSection keeps the FILE_OBJECT referenced.

So it seems like really, I need to start calling MmForceSectionClosed
in the IRP_MJ_READ paging I/O path, because that’s the first and
“only” opportunity I have to see that DataSectionObject on this same
FILE_OBJECT has become populated by FsRtlCreateSectionForDataScan.

Some more context might be appropriate: We are a legacy file system
redirector. Not a filter, and not mini-/RDBSS-based. The customer
context is Windows 7 SP1, but we see the same in Windows 10 1703. For
what it’s worth we also have no Windows Cache Manager interaction;
SharedCacheMap is always NULL in our SECTION_OBJECT_POINTERS.

Finally, the customer issue is that the network-based .DLL file is
being held open “indefinitely” even though the application process has
completely terminated.

Which we now understand to be a side-effect of the kernel-mode
FsRtlCreateSectionForDataScan caller that established
DataObjectSection, and this additional section not getting marked as
DeleteOnClose by the MmForceSectionClosed work our redirector was
already doing in IRP_MJ_CLEANUP, because the DataSectionObject was not
yet established at that time.

Alan Adams
Client for Open Enterprise Server
Micro Focus
xxxxx@microfocus.com

Attempting to do the purge in the paging read path isn’t really a fix.
Nothing says that you’re even going to receive paging reads in this case as
the data could already be present in memory.

There’s a vague warning about calling FsRtlCreateSectionForDataScan on file
objects post-cleanup:

“Important The FsRtlCreateSectionForDataScan routine should only be used in
cases where a handle to the file object specified in the FileObject
parameter has not yet been created (typically while processing a post-create
operation). If the driver has a handle to the file object or can obtain a
handle to the file object, the driver should use the ZwCreateSection routine
instead.”

Can you determine the context in which FsRtlCreateSectionForDataScan is
being called? A write access breakpoint on the control area should tell you
(assuming this is reproducible enough).

Two other questions:

  1. What failure are you ultimately seeing from this? I’m just curious what
    the manifestation is that would cause your users/customers to complain

  2. Can you reproduce the behavior with the third party product and another
    file system?

-scott
OSR
@OSRDrivers

Hello Scott. Thanks for the additional input.

Attempting to do the purge in the paging read path isn’t really a fix.
Nothing says that you’re even going to receive paging reads in this case as
the data could already be present in memory.

Completely agreed. If the FsRtlCreateSectionForDataScan caller never
actually faults in any file content, the scenario “1 through 13” I
described would be back to not being able to force the
DataSectionObject closed.

The IRP_MJ_READ path just appears to be the only synchronous execution
opportunity our redirector currently receives in the customer
scenario, where we even /could/ call MmForceSectionClosed at a time
when the DataSectionObject is populated. We are /sometimes/ seeing
FileBasicInformation being queried against that same FILE_OBJECT, but
not as consistently as the IRP_MJ_READ.

Can you determine the context in which FsRtlCreateSectionForDataScan is
being called?

We believe that the FsRtlCreateSectionForDataScan caller’s premise is
correct and legitimate. We expect they are filtering IRP_MJ_CREATE
and want to perform data inspection at a time when handles cannot yet
be created, because the IRP_MJ_CREATE processing has not completed
from the Windows I/O manager perspective.

  1. What failure are you ultimately seeing from this? I’m just curious what
    the manifestation is that would cause your users/customers to complain

So far, just the symptom of “the application-specific .DLL file is
being held open across the network, even though we have completely
exited the application and no process using that DLL remains running.”

The customer is also having a sporadic “attempting to re-open the
application fails”, but that’s not been proven to be related to this
DataSectionObject behavior (yet).

The “files are being held open 100% of the time after we exit the
application” is just the first behavior they noticed while attempting
to investigate the failure. Which seems like a reasonable expectation
and cause for the customer’s concern, given that this behavior didn’t
happen until & unless the third-party FsRtlCreateSectionForDataScan
caller is also present.

But no, within just the “DataSectionObject control area remains unless
we’re able to force it closed with MmForceSectionClosed”, there is no
overt “failure” occurring from that scenario. Only the underlying
network file handle(s) being left open, which are subjectively “not
supposed to still be open.”

If some other client workstation wanted exclusive access to those
files (as opposed to just shared read-only-execute), the fact that a
workstation “holds those files open indefinitely, until the local
workstation’s memory manager decides its appropriate to free up the
control area” is probably the best way to cast this current behavior
in the light of “being a problem.”

  1. Can you reproduce the behavior with the third party product and
    another file system?

I haven’t proven that yet; it’s a scenario I was going to look at if
we needed to tell the customer “this is just the way it’s going to be”
and assuage their concerns by demonstrating how it’s happening with
other redirectors, too. For a local file system, probably nobody
cares if this is happening.

Not sure what I would see from MRxSMB handling this situation across
the wire. One thought I have along those lines is that any file
system that uses Windows Cache Manager would have “an extra excuse”
for the file still being open even after the application exited.
Which, since we don’t ever CcInitializeCacheMap in our file system,
ostensibly doesn’t apply in our case.

Alan Adams
Client for Open Enterprise Server
Micro Focus
xxxxx@microfocus.com

There was also a good suggestion to try and use
FsRtlRegisterFileSystemFilterCallbacks and the
PostReleaseForSectionSynchronization callback.

To provide a more definitive opportunity to use MmForceSectionClosed
to mark the created section with DeleteOnClose, rather than depending
on the IRP_MJ_READ paging I/O path getting invoked.

But what I’ve encountered when registering for
FsRtlRegisterFileSystemFilterCallbacks from our legacy file system
driver (network redirector) is that although I do receive the “Pre”
callbacks for AcquireForSectionSync and ReleaseForSectionSync, we do
not receive the corresponding “Post” callbacks.

If I register for FsRtlRegisterFileSystemFilterCallbacks from a legacy
FILTER driver attached to the same stack, I receive both the “Pre” and
the “Post” filters for the same FILE_OBJECTs that the underlying
network redirector driver only receives “Pre” callbacks for.

As though nt!FsFilter* has some reason to think that the “Post”
callbacks shouldn’t be sent to the registrant who had a DEVICE_OBJECT
that wasn’t for a filter. I have in fact NULL’d out the
FAST_IO_DISPATCH entries that correspond to the callbacks (e.g.
ReleaseFileForNtCreateSection), so that there shouldn’t be duplication
or confusion on that front.

Just wanted to mention the “no Post callbacks are received when
registered for FsRtlRegisterFileSystemFilterCallbacks from the actual
file system driver” behavior, in case someone has any experience with
that, or knows why it would actually be by design, etc.

Alan Adams
Client for Open Enterprise Server
Micro Focus
xxxxx@microfocus.com

I’ve never registered this post callback from a file system, so I can’t
comment on the behavior you’re seeing from experience.

However, the idea with the PostReleaseForSectionSynchronization callback is
that it’s called immediately after the file system returns from
AcquireFileForNtCreateSection. This doesn’t actually do you any good because
you already know when you’re at the end of the AcquireFileForNtCreateSection
(it’s your code :)). Also note that all of this happens before the section
is actually created, so I don’t think this actually helps you at all.

Thinking back to your earlier description, if the filter above you is really
calling the data scan API from PostCreate then you should see another
IRP_MJ_CLEANUP happen at some point when the corresponding HANDLE is closed.
Presumably you’re not seeing this?

-scott
OSR
@OSRDrivers

>However, the idea with the PostReleaseForSectionSynchronization callback is

that it’s called immediately after the file system returns from
AcquireFileForNtCreateSection.

Sorry, disregard this…I read “PostReleaseForSectionSynchronization” as
“PostAcquireFor…”, was clearly not paying close enough attention.

However, the point of you being a file system stands. The filter callbacks
wrap around the calls into the file system, if you’re the file system then
you already know when these things are happening. The filter callbacks can
be used and are useful to file systems for other reasons, but they don’t
tell you anything you don’t already know.

-scott
OSR
@OSRDrivers

> However, the point of you being a file system stands. The filter callbacks

wrap around the calls into the file system, if you’re the file system then
you already know when these things are happening. The filter callbacks can
be used and are useful to file systems for other reasons, but they don’t
tell you anything you don’t already know.

Oh no. I guess this means my next question is going to sound very
dumb: Why would I know? The fact “the file system should already
know” has come up in other discussion too, but I have yet to
understand that point.

To my knowledge, I’m not the one wanting to create the section, nor
the one who does create the section. Aside from the fact our FCB must
provide the SECTION_OBJECT_POINTERS storage which will be /used/ for
managing the sections (by code that is NOT my file system driver), I’m
not specifically aware that the FILE_OBJECT is being used for that
purpose, except to infer it by inspecting the current state of the
SECTION_OBJECT_POINTERS.

So apparently I’m missing a big and apparently basic piece of the
puzzle as to why the file system driver will already know when a
section is being created. But that certainly would fit with why
nt!FsFilter* may be intentionally thinking the file system would need
to receive this callback.

Thinking back to your earlier description, if the filter above you is really
calling the data scan API from PostCreate then you should see another
IRP_MJ_CLEANUP happen at some point when the corresponding HANDLE is closed.
Presumably you’re not seeing this?

We do see an IRP_MJ_CLEANUP when the handle the Windows loader had
opened is closed. At that point only the ImageSectionObject exists.
So you’re right, that probably does mean “it’s not a straight-forward
blocking operation in PostCreate”, else the DataSectionObject would
have already been visible at the time of the IRP_MJ_CLEANUP, too.

They do end up performing the FsRtlCreateSectionForDataScan with the
same FILE_OBJECT we received IRP_MJ_CLEANUP for. But I suppose it’s
possible they’re just scheduling something from PostCreate, rather
than actually blocking and waiting there. Since the intention of the
third-party relates to malware detection, I was just assuming they
would want the ability to fail that create.

Alan Adams
Client for Open Enterprise Server
Micro Focus
xxxxx@microfocus.com

FsRtlCreateSectionForDataScan calls the file system at the
AcquireFileForNtCreateSection Fast I/O entry point. It then calls
MmCreateSection to create the section, then calls the file system at the
ReleaseFileForNtCreateSection Fast I/O entry point. There are filter
callbacks around these as well, so the full sequence would be:

FsRtlCreateSectionForDataScan

->FsFilter Callbacks for PreAcquireForSectionSynchronization

->File system’s AcquireFileForNtCreateSection

->FsFilter Callbacks for PostAcquireForSectionSynchronization

->MmCreateSection

->FsFilter Callbacks for PreReleaseForSectionSynchronization

->File system’s ReleaseFileForNtCreateSection

->FsFilter Callbacks for PostReleaseForSectionSynchronization

You can see the result with an example.

NTFS uses the FsFilter callback for PreAcquire and uses the Fast I/O entry
point for release. Stopped at a call to FsRtlCreateSectionForDataScan, we
have a SectionObjectPointer with just an ImageSectionObject:

0: kd> r
nt!FsRtlCreateSectionForDataScan:
fffff800`02bb3480 488bc4 mov rax,rsp

0: kd> ?? ((nt!_file_object *)@r9)->SectionObjectPointer
struct _SECTION_OBJECT_POINTERS * 0xfffffa801b528ea8 +0x000 DataSectionObject : (null) +0x008 SharedCacheMap : (null) +0x010 ImageSectionObject : 0xfffffa801bd23280 Void

// Set some breakpoints and go
0: kd> bp Ntfs!NtfsFilterCallbackAcquireForCreateSection
0: kd> bp Ntfs!NtfsReleaseForCreateSection
0: kd> g

Breakpoint 1 hit
Ntfs!NtfsFilterCallbackAcquireForCreateSection:
fffff880`010d05d0 48895c2408 mov qword ptr [rsp+8],rbx

// Hit acquire, still no section…
0: kd> ?? ((nt!_fs_filter_callback_data
*)@rcx)->FileObject->SectionObjectPointer
struct _SECTION_OBJECT_POINTERS * 0xfffffa801b528ea8 +0x000 DataSectionObject : (null) +0x008 SharedCacheMap : (null) +0x010 ImageSectionObject : 0xfffffa801bd23280 Void

0: kd> g
Breakpoint 2 hit
Ntfs!NtfsReleaseForCreateSection:
fffff880`01042e00 fff3 push rbx

// DataSectionObject populated by the time we hit the release
0: kd> ?? ((nt!_file_object *)@rcx)->SectionObjectPointer
struct _SECTION_OBJECT_POINTERS * 0xfffffa801b528ea8 +0x000 DataSectionObject : 0xfffffa801b587530 Void
+0x008 SharedCacheMap : (null)
+0x010 ImageSectionObject : 0xfffffa80`1bd23280 Void

So, the file system knows that someone is trying to create a section to the
stream specified by the file object. This file object might not end up
backing the section (e.g. if a section already existed there would already
be a file object), but the file system is involved in the operation.

-scott
OSR
@OSRDrivers

> FsRtlCreateSectionForDataScan calls the file system at the

AcquireFileForNtCreateSection Fast I/O entry point. It then calls
MmCreateSection to create the section, then calls the file system at the
ReleaseFileForNtCreateSection Fast I/O entry point.

Thanks Scott, got it. I was incorrectly interpreting that there was
some actual active participation in the section creation process.

As part of registering for FsRtlRegisterFileSystemFilterCallbacks, I
was explicitly NULLing out the AcquireFileForNtCreateSection and
ReleaseFileForNtCreateSection Fast I/O entry points. The legacy
FILTER (not our file system / redirector) had code notes advising to
do this, because asking for both the FsFilter callback and the Fast
I/O callback had led to trouble.

It was really a moot point for our file system / redirector driver,
because we didn’t provide Fast I/O support anyway.

So ultimately the bottom line here is “your full file system driver /
redirector already had access to a callback notification that would
have told you about section creation, without needing to register for
the more granular FsFilter callbacks instead.” Not that the FsFilter
callbacks “should” have been problematic, but apparently as a full
file system driver, they’re probably not the right choice.

Thanks for clarifying, and I’ll take the approach of implementing the
Fast I/O callbacks instead.

Alan Adams
Client for Open Enterprise Server
Micro Focus
xxxxx@microfocus.com