(unexpected?) teardown of caching for some file

Lyndon_J_Clarke-2 · May 11, 2004, 7:56pm

Gentlefolk

I have come across a curious ntfs/cc/vm behavior on W2K SP4 which I am
hoping someone can help me to understand. It also has an impact on context
tracking and might be of more general interest.

To make it concrete I will describe a specific although a little convoluted
scenario.

I have a 1 Gbyte file in my file system lucky me. I have a user application
which runs as follows: open the file FILE_OVERWRITE_IF and (over)write the
whole file in serial order in 64kbyte pieces using non cached writes then
snooze sort of forever before rewriting the first 1Mbyte of data in the file
and then closing the file handle. I have a thread in a filter driver which
runs as follows: open the file for SYNHCRONIZE access and read the first
16Mbytes of data with roll your own irps to the fsd for cached reads then
closes the file, then nap for a couple of tenths of a second, then open the
file likewise and reads the next 16 Mbytes of data likewise and close the
file, then nap for a couple of tenths of a second, and so on and so forth
until the Gbyte has been read. I start the thread in the filter driver then
a few seconds later start the user application.

For the first hundred or so megabytes of action in each thread I am seeing
FileObject->SectionObjectPointer->DataSectionObject is non null in pre-close
[aka dispatch of irp_mj_close] for all file objects which refer to the
example file. However I then start to see
FileObject->SectionObjectPointer->DataSectionObject is null in pre-close for
all file objects which refer to the example file. In addition I see that
FileObject->SectionObjectPoiter->SharedCacheMap is also null. Looks to my
inexperienced eyes like at some level of looking there must have been
(unexpected?) teardown of caching for the file.

I am wondering if anyone else has seen this sort of thing and can illuminate
as to what is really happening.

Trouble is if your filter driver is using the simplified context tracking
algorithm as described in osr nt insider 2k2 [i.e. method using SectionObjec tPointer-\>{Data,Image}SectionObject as a guard on early discard of tracking structures] then well you are a bit out of luck because your guard just
dissapeared.

Cheers
Lyndon

OSR_Community_User · May 11, 2004, 8:50pm

Lyndon,

The behavior you describe is certainly possible given the scenario that
you have listed. It does not indicate a flaw in the ref. counting
scheme, however (this is one of the cases we considered for it.)

By mixing cached and non-cached I/O on the file, you are forcing the
file system to do some pretty crazy things in order to keep things in
synch - a write via the non-cached path forces a purge on the cache.

On a “typical” x86 box, the cache has a total of 2048 separate cached
views it can be using at one time. Each view is 256KB (which gives us
the 512MB cache space). Each view references the control area (“section
object”) that Mm uses to track the VM state (e.g. prototype PTEs and the
associated physical memory backing those PTEs.)

So, SOP->DataSectionObject is the Mm state for the file.
SOP->SharedCacheMap is the Cc state for the file. As the application
performs non-cached writes through, neither of these is set up. When
you perform cached I/O, both are set up. If the application performs
non-cached I/O through a region that is cached, the cache manager will
be asked to tear down the cache in that region of the file. Here is the
comment snippet from FastFat (write.c):

//
// If this is a noncached transfer and is not a paging I/O,
and
// the file has been opened cached, then we will do a flush
here
// to avoid stale data problems. Note that we must flush
before
// acquiring the Fcb shared since the write may try to
acquire
// it exclusive.
//
// The Purge following the flush will garentee cache
coherency.
//

if (NonCachedIo && !PagingIo &&
(FileObject->SectionObjectPointer->DataSectionObject !=
NULL)) {

And it does a cache flush (first) followed by a Purge. This (of course)
will force it to tear down that region in the cache and could lead to a
tear-down of the SharedCacheMap.

Similarly, there are other forces at work constantly reclaiming memory
(via Mm) and if you are writing large quantities of data, it is quite
likely that there is a large amount of memory in use, futher thrashing
things about.

But as to your concern about reference counting: the algorithm only
guarantees that you will not prematurely discard FCB tracking
information. The point is that if Mm/Cc tear down their state, they
don’t have references so it is safe for you to discard that state - in
this case, you’d still have two file objects (one for the app that has
the file open, one for your file object you were using for I/O) so you
haven’t discarded the state. When both of those go away, you can safely
discard your FCB state because you won’t see any Mm/Cc based I/O because
they don’t have any references (if they did, it would have been
associated via SOP).

Indeed, if you look when you DO have Mm/Cc state, you will probably find
that it is *your* file object being used to back them (use
CcGetFileObjectFromSectionPointers to retreive this programmatically or
!ca to dump the section object within the debugger.)

I hope this clarifies things.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Lyndon J Clarke
Sent: Tuesday, May 11, 2004 7:55 PM
To: ntfsd redirect
Subject: [ntfsd] (unexpected?) teardown of caching for some file

Gentlefolk

I have come across a curious ntfs/cc/vm behavior on W2K SP4 which I am
hoping someone can help me to understand. It also has an impact on
context tracking and might be of more general interest.

To make it concrete I will describe a specific although a little
convoluted scenario.

I have a 1 Gbyte file in my file system lucky me. I have a user
application which runs as follows: open the file FILE_OVERWRITE_IF and
(over)write the whole file in serial order in 64kbyte pieces using non
cached writes then snooze sort of forever before rewriting the first
1Mbyte of data in the file and then closing the file handle. I have a
thread in a filter driver which runs as follows: open the file for
SYNHCRONIZE access and read the first 16Mbytes of data with roll your
own irps to the fsd for cached reads then closes the file, then nap for
a couple of tenths of a second, then open the file likewise and reads
the next 16 Mbytes of data likewise and close the file, then nap for a
couple of tenths of a second, and so on and so forth until the Gbyte has
been read. I start the thread in the filter driver then a few seconds
later start the user application.

For the first hundred or so megabytes of action in each thread I am
seeing
FileObject->SectionObjectPointer->DataSectionObject is non null in
FileObject->SectionObjectPointer->pre-close
[aka dispatch of irp_mj_close] for all file objects which refer to the
example file. However I then start to see
FileObject->SectionObjectPointer->DataSectionObject is null in pre-close

FileObject->SectionObjectPointer->for
all file objects which refer to the example file. In addition I see that
FileObject->SectionObjectPoiter->SharedCacheMap is also null. Looks to
FileObject->SectionObjectPoiter->my
inexperienced eyes like at some level of looking there must have been
(unexpected?) teardown of caching for the file.

I am wondering if anyone else has seen this sort of thing and can
illuminate as to what is really happening.

Trouble is if your filter driver is using the simplified context
tracking algorithm as described in osr nt insider 2k2 [i.e. method using SectionObjec tPointer-\>{Data,Image}SectionObject as a guard on early discard of tPointer-\>tracking structures] then well you are a bit out of luck because your guard just
dissapeared.

Cheers
Lyndon

Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@osr.com To unsubscribe
send a blank email to xxxxx@lists.osr.com

Lyndon_J_Clarke-2 · May 12, 2004, 3:23am

Hi Tony

Gosh thank you for such a clear and helpful answer. Regards the reference
counting and context tracking I seem to be still a bit stuck however. I need
to add one little more piece to the scenario before there is anough
information however. Here we go …

I have seen for this FsContext one irp_mj_create from the user application
and one irp_mj_create outstanding from the filter driver thread. I see a
*non* stream irp_mj_close for neither of these file objects but the ‘file
object from nowhere’ (i havent seen an irp_mj_create for this file object)
refers to the FsContext of interest. This decrements the reference count to
one. I then see another irp_mj_close for the file object in the filter
driver thread. This decrements the referecne count to zero. The
SectionObjectPointers are NULL so the state is now discarded. However there
is still an open file object (handle) for this file in the user application
and the next thing I see is irp_mj_write for some FsContext unknown. The
context state discard was somewhat premature.

Just to make clear of course the reference couting and context tracking
code: (i) of course can not have knowledge about user application and filter
driver thread here it just deals with FsContexts and counts in the abstract;
(iii) of course does not remember file objects since it uses the simplified
method from the nt insider article.

In this slightly expanded scenario it is that irp_mj_close for ‘file object
from nowhere’ which causes the reference count to drop to zero early. I am
not sure where that file object came from: maybe a filter driver below me
(for there is nav below) uses a shadow device so my filter driver doesnt see
the irp_mj_create; maybe a filter driver above me rolls an irp_mj_create and
sends it straight to the fsd; maybe somehere else altogether.

Kind regards
Lyndon

“Tony Mason” wrote in message news:xxxxx@ntfsd…
Lyndon,

The behavior you describe is certainly possible given the scenario that
you have listed. It does not indicate a flaw in the ref. counting
scheme, however (this is one of the cases we considered for it.)

By mixing cached and non-cached I/O on the file, you are forcing the
file system to do some pretty crazy things in order to keep things in
synch - a write via the non-cached path forces a purge on the cache.

On a “typical” x86 box, the cache has a total of 2048 separate cached
views it can be using at one time. Each view is 256KB (which gives us
the 512MB cache space). Each view references the control area (“section
object”) that Mm uses to track the VM state (e.g. prototype PTEs and the
associated physical memory backing those PTEs.)

So, SOP->DataSectionObject is the Mm state for the file.
SOP->SharedCacheMap is the Cc state for the file. As the application
performs non-cached writes through, neither of these is set up. When
you perform cached I/O, both are set up. If the application performs
non-cached I/O through a region that is cached, the cache manager will
be asked to tear down the cache in that region of the file. Here is the
comment snippet from FastFat (write.c):

//
// If this is a noncached transfer and is not a paging I/O,
and
// the file has been opened cached, then we will do a flush
here
// to avoid stale data problems. Note that we must flush
before
// acquiring the Fcb shared since the write may try to
acquire
// it exclusive.
//
// The Purge following the flush will garentee cache
coherency.
//

if (NonCachedIo && !PagingIo &&
(FileObject->SectionObjectPointer->DataSectionObject !=
NULL)) {

And it does a cache flush (first) followed by a Purge. This (of course)
will force it to tear down that region in the cache and could lead to a
tear-down of the SharedCacheMap.

Similarly, there are other forces at work constantly reclaiming memory
(via Mm) and if you are writing large quantities of data, it is quite
likely that there is a large amount of memory in use, futher thrashing
things about.

But as to your concern about reference counting: the algorithm only
guarantees that you will not prematurely discard FCB tracking
information. The point is that if Mm/Cc tear down their state, they
don’t have references so it is safe for you to discard that state - in
this case, you’d still have two file objects (one for the app that has
the file open, one for your file object you were using for I/O) so you
haven’t discarded the state. When both of those go away, you can safely
discard your FCB state because you won’t see any Mm/Cc based I/O because
they don’t have any references (if they did, it would have been
associated via SOP).

Indeed, if you look when you DO have Mm/Cc state, you will probably find
that it is your file object being used to back them (use
CcGetFileObjectFromSectionPointers to retreive this programmatically or
!ca to dump the section object within the debugger.)

I hope this clarifies things.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Lyndon J Clarke
Sent: Tuesday, May 11, 2004 7:55 PM
To: ntfsd redirect
Subject: [ntfsd] (unexpected?) teardown of caching for some file

Gentlefolk

I have come across a curious ntfs/cc/vm behavior on W2K SP4 which I am
hoping someone can help me to understand. It also has an impact on
context tracking and might be of more general interest.

To make it concrete I will describe a specific although a little
convoluted scenario.

I have a 1 Gbyte file in my file system lucky me. I have a user
application which runs as follows: open the file FILE_OVERWRITE_IF and
(over)write the whole file in serial order in 64kbyte pieces using non
cached writes then snooze sort of forever before rewriting the first
1Mbyte of data in the file and then closing the file handle. I have a
thread in a filter driver which runs as follows: open the file for
SYNHCRONIZE access and read the first 16Mbytes of data with roll your
own irps to the fsd for cached reads then closes the file, then nap for
a couple of tenths of a second, then open the file likewise and reads
the next 16 Mbytes of data likewise and close the file, then nap for a
couple of tenths of a second, and so on and so forth until the Gbyte has
been read. I start the thread in the filter driver then a few seconds
later start the user application.

For the first hundred or so megabytes of action in each thread I am
seeing
FileObject->SectionObjectPointer->DataSectionObject is non null in
FileObject->SectionObjectPointer->pre-close
[aka dispatch of irp_mj_close] for all file objects which refer to the
example file. However I then start to see
FileObject->SectionObjectPointer->DataSectionObject is null in pre-close

FileObject->SectionObjectPointer->for
all file objects which refer to the example file. In addition I see that
FileObject->SectionObjectPoiter->SharedCacheMap is also null. Looks to
FileObject->SectionObjectPoiter->my
inexperienced eyes like at some level of looking there must have been
(unexpected?) teardown of caching for the file.

I am wondering if anyone else has seen this sort of thing and can
illuminate as to what is really happening.

Trouble is if your filter driver is using the simplified context
tracking algorithm as described in osr nt insider 2k2 [i.e. method using<br>SectionObjec<br>tPointer->{Data,Image}SectionObject as a guard on early discard of<br>tPointer->tracking<br>structures] then well you are a bit out of luck because your guard just
dissapeared.

Cheers
Lyndon

—
Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@osr.com To unsubscribe
send a blank email to xxxxx@lists.osr.com