Page write in transaction

Lyndon_J_Clarke-2 · August 23, 2008, 5:31pm

Gentlefolk

We’re aware that page write in transaction might be a bit of a challenge for
(mini) filter drivers at the moment. So I thought I’d share this
observation. The o/s version is Windows Server 2008 (havent tried with
Vista).

Create a transaction
Transcated create some file and write, non-cached to a size of, oh
say,1MB.
Transcaed open the created file, map a view, update, oh say, half of the
pages, flush the view, unmap, close.
Commit the transaction
Close the transaction.

The page writes due to step 3 are delivered using a stream file object where
FltObjects->Transaction is NULL, for your convenience.

Nice eh?

Cheers,
Lyndon

rod_widdowson · August 25, 2008, 6:47am

Oh Unremitted joy. Another wonderful side effect of TxF that we have to try
to work around.

As a matter of interest, where are the writes going? and what happens to
undo them if the transaction if aborted?

Rod

OSR_Community_User · August 25, 2008, 1:24pm

This sounds like a potential TxF bug - after all, if I commit the
transaction and then the paging write fails, my durability guarantee has
been violated.

Of course, this would only show up as a bad recovery - which tends to
reinforce my observation that transactional systems are easy to build
and difficult to get right.

Tony
OSR

Lyndon_J_Clarke-2 · August 26, 2008, 8:21am

In the reduced test case …

Create a transaction
Transcated create some file and write, non-cached to a size of, oh
say,1MB.
Commit the transaction
Close the transaction.

… I also see paging writes delivered to non transacted stream file object
for the file in 1, for the first 1MB of data, and containg the exact same
distinctive data. It lookssuch a lot like TxF doesnt honour IRP_NO_CACHE
… did I miss some signpost somewhere along the road here?

“Lyndon J Clarke” wrote in message
news:xxxxx@ntfsd…
> Gentlefolk
>
> We’re aware that page write in transaction might be a bit of a challenge
> for (mini) filter drivers at the moment. So I thought I’d share this
> observation. The o/s version is Windows Server 2008 (havent tried with
> Vista).
>
> 1. Create a transaction
> 2. Transcated create some file and write, non-cached to a size of, oh
> say,1MB.
> 3. Transcaed open the created file, map a view, update, oh say, half of
> the pages, flush the view, unmap, close.
> 4. Commit the transaction
> 5. Close the transaction.
>
> The page writes due to step 3 are delivered using a stream file object
> where FltObjects->Transaction is NULL, for your convenience.
>
> Nice eh?
>
> Cheers,
> Lyndon
>
>
>

OSR_Community_User · August 27, 2008, 1:40pm

Hello Lyndon,

this is a known problem and it was fixed in Win7. Of course, this doesn’t quite help you yet so if you need this you can contact MS support and tell them you need a backport for this (tell them to ping me and it should speed things up a bit).

As you might have guessed the problems stems from the fact the paging writes to a stream can happen on any FO for that stream and as it happens in that particular case the FO that was used was not enlisted in any transaction. It can also happen that the FO that is used for paging writes belongs to a completely different transaction so FltObjects->Transaction might point to a different transaction than the one the write is going to.

This particular topic was covered in great detail in a talk Sarosh gave at IFS #18. The suggested workaround is to keep your own trans-locked state for files. I don’t know what you need the transaction for but my belief is that for any meaningful work that involves transactions any filter probably needs to keep trans-locked state for files anyway (but then of course I can’t imagine all possible types of minifilters so maybe you wouldn’t need trans-locked state otherwise … if it’s not sensitive information I’d love to hear about what your minifilter does and how it interacts with transactions).

Of course there is an FSCTL that tells you what the trans-locked state of a file is on the filesystem below, FSCTL_TXFS_GET_METADATA_INFO, but I’m not sure how useful it can be if you’re not monitoring all transaction states on the volume (which brings us to maintain your own trans-locked state …).

Regards,
Alex.
This posting is provided “AS IS” with no warranties, and confers no rights.

Lyndon_J_Clarke-2 · August 27, 2008, 2:42pm

Hi Alex

Thanks for your post. It is really appreciated!

I understand the point about the mm/cc captured file object and bound
transaction. It doesnt seem to be too hard to ‘work around’ that feature. I
had been thinking along the lines you describe. If you dont mind, I’ll mail
you off-line as follow up to some of your other comments/questions.

The second point, however, was that it is looking to me like TxF is not
altogether honouring the IRP_NO_CACHE flag in IRP_MJ_WRITE (non paging). It
would be really helpful, to me at least, to know whether (a) TxF always
honoursIRP_NO_CACHE, and I’ve gone snafu in my experiments; (b) TxF never
honours IRP_NO_CACHE, annd I’ve not gone snafu in my experiments; (c) TxF
sometimes does and sometimes doesnt honour IRP_NO_CACHE (yikes); (d)
something else altogehter

Thanks again
Lyndon

“Alexandru Carp” wrote in message
news:xxxxx@ntfsd…
Hello Lyndon,

this is a known problem and it was fixed in Win7. Of course, this doesn’t
quite help you yet so if you need this you can contact MS support and tell
them you need a backport for this (tell them to ping me and it should speed
things up a bit).

As you might have guessed the problems stems from the fact the paging writes
to a stream can happen on any FO for that stream and as it happens in that
particular case the FO that was used was not enlisted in any transaction. It
can also happen that the FO that is used for paging writes belongs to a
completely different transaction so FltObjects->Transaction might point to a
different transaction than the one the write is going to.

This particular topic was covered in great detail in a talk Sarosh gave at
IFS #18. The suggested workaround is to keep your own trans-locked state for
files. I don’t know what you need the transaction for but my belief is that
for any meaningful work that involves transactions any filter probably needs
to keep trans-locked state for files anyway (but then of course I can’t
imagine all possible types of minifilters so maybe you wouldn’t need
trans-locked state otherwise … if it’s not sensitive information I’d love
to hear about what your minifilter does and how it interacts with
transactions).

Of course there is an FSCTL that tells you what the trans-locked state of a
file is on the filesystem below, FSCTL_TXFS_GET_METADATA_INFO, but I’m not
sure how useful it can be if you’re not monitoring all transaction states on
the volume (which brings us to maintain your own trans-locked state …).

Regards,
Alex.
This posting is provided “AS IS” with no warranties, and confers no rights.

OSR_Community_User · August 27, 2008, 5:15pm

Hi Lyndon,

So it turns out that ALL IO that goes on in a transaction is cached, regardless of what the user requests. And, after it was explained to me by the brilliant people that actually thought it out, it makes perfect sense. If the transaction commits you are guaranteed the IO made it to the disk by the transaction semantics and if it doesn’t (i.e. it rolls back) then it’s less work to undo the changes if they never made it to the disk. I have to say this is a pretty cool trick…

Please note that this might change in the future (but currently there are no plans to do so).

I hope this answers your question.

Regards,
Alex.
This posting is provided “AS IS” with no warranties, and confers no rights.

OSR_Community_User · August 27, 2008, 7:04pm

As long as there is a guarantee that the data will make it to disk at
some point, there’s no requirement that the data go to the location
specified by the user. The data is stored persistently in the log,
though.

Note that I’ve been warning people for a very long time that an
application requesting “non cached” behavior on files is an optional
request to the underlying FSD (e.g., compressed files in NTFS.
Honestly, in our own data modification kit we don’t respect the
non-cached request either although we fall back to write-through, since
that guarantees the write semantics.)

The logical way to think about this is probably that when you do a
“flush buffers” call, you want a guarantee that the data really IS on
the disk. Whether it is in its final repository isn’t the requirement -
merely that when you try to read it back (even after a crash) you’ll get
the right data back. Provided that this guarantee is preserved, how it
is achieved isn’t really important.

Tony
OSR

Lyndon_J_Clarke-2 · August 27, 2008, 7:22pm

Hi Alex - Thanks, your a star! Cheers, Lyndon

“Alexandru Carp” wrote in message
news:xxxxx@ntfsd…
Hi Lyndon,

So it turns out that ALL IO that goes on in a transaction is cached,
regardless of what the user requests. And, after it was explained to me by
the brilliant people that actually thought it out, it makes perfect sense.
If the transaction commits you are guaranteed the IO made it to the disk by
the transaction semantics and if it doesn’t (i.e. it rolls back) then it’s
less work to undo the changes if they never made it to the disk. I have to
say this is a pretty cool trick…

Please note that this might change in the future (but currently there are no
plans to do so).

I hope this answers your question.

Regards,
Alex.
This posting is provided “AS IS” with no warranties, and confers no rights.

Lyndon_J_Clarke-2 · August 27, 2008, 7:30pm

Hi Tony

Yes, of course, here is much wisdom. But, we know there are a few out there
writing, oh say, crypto filters, who will have some surprises when apparent
non cached i/o is in fact handled as cached. Yeah, perhaps they should be
using your kit, well if I was doing a crypto filter at the moment, well then
for sure I’d be having a look at your kit.

Hope to bump into you at DDC

Best Wishes
Lyndon

“Tony Mason” wrote in message news:xxxxx@ntfsd…
As long as there is a guarantee that the data will make it to disk at
some point, there’s no requirement that the data go to the location
specified by the user. The data is stored persistently in the log,
though.

Note that I’ve been warning people for a very long time that an
application requesting “non cached” behavior on files is an optional
request to the underlying FSD (e.g., compressed files in NTFS.
Honestly, in our own data modification kit we don’t respect the
non-cached request either although we fall back to write-through, since
that guarantees the write semantics.)

The logical way to think about this is probably that when you do a
“flush buffers” call, you want a guarantee that the data really IS on
the disk. Whether it is in its final repository isn’t the requirement -
merely that when you try to read it back (even after a crash) you’ll get
the right data back. Provided that this guarantee is preserved, how it
is achieved isn’t really important.

Tony
OSR

OSR_Community_User · August 27, 2008, 8:52pm

At this point, anyone that relies on understanding the underlying
caching state of the file system beneath them is broken if they sit on
top of RDR anyway. There’s a very ugly hack for pre-Vista, but that
hack does not work in Vista and beyond. This observation about
transactions merely underscores that good design does not rely upon the
underlying cache state of the file system.

Of course, as I opined in a talk that I gave recently, it is a serious
deficiency in certain network file systems that they won’t tell us about
state changes in their cache policy - for us, it is a matter of
performance, not correctness. I’m actually looking at approaches in
which we build a server side assist so we can exploit oplocks via RDR
(send an FSCTL to a filter on the server, have the server side take out
the oplock on our behalf. We’ll end up rewriting a fair chunk of code
that already exists in RDR and SRV. All is fair in performance
optimization.)

I routinely now observe to people that one thing we are very good at in
our business is lying - we tell people what they want to hear, they
interpret it in the way they believe it is true and we do something they
never even envisioned. What’s fun about it is that engineers in general
are *terrible* liars, but we’re very, very good at rationalizing things.

Thanks for the plug Lyndon. Now to convince you that you need to do
data modifications for your product.

Tony
OSR