Oplock break on IRP_MJ_READ (Paging IO)/NTFS

Hello again,

I had some more questions related to oplock processing.

Does NTFS initiate an oplock break on IRP_MJ_READ for PAGING_IO? I ask
because the fastfat sources in winddk seem to suggest that PAGING_IO
reads do not lead to an oplock break. Could this be a potential
workaround, if my filter wants to read contents of a file before
denying it in IRP_MJ_CREATE which was received with
FILE_COMPLETE_IF_OPLOCKED? Otherwise, it seems, we have no option but
to let this create through.

I have also noticed that Srv sends this create (SrvNtCreateFile) in a
thread after acquiring some resource (with tag LSpm) exclusively and
tries to reacquire it again in a different thread for a
(SrvCompleteRfcbClose) oplock break processing. So there doesn’t seem
to be a way out other than letting the create through, or else it will
cause a deadlock.

How does Srv select a thread to perform the oplock break processing
in? It seems to be a random one, rather than the one in which the FSD
initiated the completion of the original oplock IRP.

Faras

> Does NTFS initiate an oplock break on IRP_MJ_READ for PAGING_IO?

No…

I ask
because the fastfat sources in winddk seem to suggest that PAGING_IO
reads do not lead to an oplock break. Could this be a potential
workaround, if my filter wants to read contents of a file before
denying it in IRP_MJ_CREATE which was received with
FILE_COMPLETE_IF_OPLOCKED?

So long as you don’t mind getting stale data, yes.

Just asking - are you happy that you have your mind around the implications
of FILE_COMPLETE_IF_OPLOCKED and SRV and filters?

Thanks for your answer.

Just asking - are you happy that you have your mind around the implications
of FILE_COMPLETE_IF_OPLOCKED and SRV and filters?

I still have plenty of unanswered questions (my last 2 posts). Even
though oplocks seem to well documented in msdn and other resources,
there aren’t enough good examples on how to deal with them in filters,
especially ones that look at content.

Could you please explain (at a high level) what your filter does ? Why does
it need to read the data of the file ?

In general the FILE_COMPLETE_IF_OPLOCKED flag should be treated by filters
as a “do not block this create” flag, as you have noticed.

Reading your comments it seems that you’re trying to read the data in the
file before the oplock is even broken. There might be no data at all. Rod
hinted at this in his comment that you can read using PAGING_IO (or any
other read method for that matter) “So long as you don’t mind getting stale
data…”. Take this one step further and imagine a client that while it has
an oplock on the file it removes all the data in the file and writes all the
(potentially new) data before acknowledging the break. Even if your
minifilter could find a way to read the data, what would it see ?

There are some ways some types of products work around different OPLOCK
issues, which is why I’m asking what it is you are trying to do…

Thanks,
Alex.

In the ideal case for every execution attempt (which is naive and
relies on access permissions), we want to read the contents of the
file and check if the hash of the file contents match our whitelist.
If yes, we allow the create, else we deny the execution.

For various reasons, we don’t want to defer this validation to usermode.

In some cases we see these CREATES in the context of SRV but we still
want to validate them against the whitelist.

How can we do these content based checks in CREATES which have the
FILE_COMPLETE_IF_OPLOCKED set.

I agree, doing non-cache-coherent reads will have the issues you
mentioned i.e. seeing stale or no data.

Do you have ideas on how we can work around these issues?

Faras

On Fri, Sep 10, 2010 at 8:55 PM, Alex Carp wrote:
> Could you please explain (at a high level) what your filter does ? Why does
> it need to read the data of the file ?
>
> In general the FILE_COMPLETE_IF_OPLOCKED flag should be treated by filters
> as a “do not block this create” flag, as you have noticed.
>
> Reading your comments it seems that you’re trying to read the data in the
> file before the oplock is even broken. There might be no data at all. Rod
> hinted at this in his comment that you can read using PAGING_IO (or any
> other read method for that matter) “So long as you don’t mind getting stale
> data…”. Take this one step further and imagine a client that while it has
> an oplock on the file it removes all the data in the file and writes all the
> (potentially new) data before acknowledging the break. Even if your
> minifilter could find a way to read the data, what would it see ?
>
> There are some ways some types of products work around different OPLOCK
> issues, which is why I’m asking what it is you are trying to do…
>
> Thanks,
> Alex.
>
>
> —
> NTFSD is sponsored by OSR
>
> For our schedule of debugging and file system seminars
> (including our new fs mini-filter seminar) visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>

So do you really see execution attempts where the IRP_MJ_CREATE is issued by
SRV ? I didn’t think this would ever happen on the local machine… Or are
these execution requests on a different machine and the file resides on the
system you’re running on ? How does this happen ?

I thought (but please correct me if I’m wrong) that when a file is opened
for execution, it does not share write. Did you find otherwise ? It this is
true then it means that if an IRP_MJ_CREATE for execution succeeds, you can
know for sure there are no other writable handles to the file, so if the
file at that time is valid, then it’s not going to change. This is a way to
work around the fundamental problem that the file contents might not be
updated at the time when the IRP_MJ_CREATE reaches you…

Can you please comment on these two questions ?

Thanks,
Alex.

So assuming that when a file is opened for execution it does not share
write, you can try something like this.

  1. on each normal IRP_MJ_CREATE (no FILE_COMPLETE_IF_OPLOCKED) check if the
    file matches the whitelist and store that information in a stream context
    (context->Matches = 1).
  2. on a write operation mark that the file no longer matches the whitelist
    (since it was just written to) (context->Matches = 0)
  3. on an IRP_MJ_CREATE with FILE_COMPLETE_IF_OPLOCKED that you care about
    (i.e. that you might want to not allow the open to proceed if the file
    doesn’t match the whitelist), you must let the create through and on
    postCreate, get the context and see if context->Matches == 1. If it does,
    allow the create. If it doesn’t, fail the create and schedule a different
    thread to close the newly opened file (I don’t think you can do it inline
    because I think it will be blocked behind the oplock… but you need to
    investigate this further).

I believe that this might work because since the file being opened for
execution does not share write, you are guaranteed that if the create is
succeeded by the FS then there is no other writer. So you know that the file
contents at that time won’t change while the newly create handle is around.

Some limitations in this approach are:

  1. if the modification of a file would still make it match a whitelist entry
    (probably a different entry), you still might fail access because step 2
    doesn’t actually attempt to revalidate the while after the write. Perhaps
    you could work around this by ‘remembering’ (in a StreamHandleContext) to
    revalidate the file in Cleanup if there was a write to it ?

Anyway, this is what I could come up with without knowing more about your
solution (such as why is SRV trying to open a file for execution on the
local system ?)… Hope it helps a bit.

Thanks,
Alex.

Alex,

  1. Yes, these are remote executions which show up in the context of Srv.
  2. Yes, Image executions i.e. CreateProcessW… opens the file with
    FILE_SHARE_READ | FILE_SHARE_DELETE.

I get the general idea of what you are suggesting, but there are other
cases (e.g. execution of non PE files which is totally up to the whims
of the corresponding interpreter) which violate these assumptions.

There is also a case where (due to loading our filter on an already
running system), the first CREATE we see will be with
FILE_COMPLETE_IF_OPLOCKED. In this case we won’t have a stream context
for it set. Now how do we do the content check here to validate this
CREATE?

Also do you know if there are other users of OPLOCKS (apart from SRV)
and especially do they use the FILE_COMPLETE_IF_OPLOCKED? Past posts
seem to suggest that serarch indexer etc. do use oplocks, but do they
use this specific flag for deadlock avoidance?

Thanks for your suggestions.

On Sat, Sep 11, 2010 at 12:00 AM, Alex Carp
wrote:
> So assuming that when a file is opened for execution it does not share
> write, you can try something like this.
>
> 1. on each normal IRP_MJ_CREATE (no FILE_COMPLETE_IF_OPLOCKED) check if the
> file matches the whitelist and store that information in a stream context
> (context->Matches = 1).
> 2. on a write operation mark that the file no longer matches the whitelist
> (since it was just written to) (context->Matches = 0)
> 3. on an IRP_MJ_CREATE with FILE_COMPLETE_IF_OPLOCKED that you care about
> (i.e. that you might want to not allow the open to proceed if the file
> doesn’t match the whitelist), you must let the create through and on
> postCreate, get the context and see if context->Matches == 1. If it does,
> allow the create. If it doesn’t, fail the create and schedule a different
> thread to close the newly opened file (I don’t think you can do it inline
> because I think it will be blocked behind the oplock… but ?you need to
> investigate this further).
>
> I believe that this might work because since the file being opened for
> execution does not share write, you are guaranteed that if the create is
> succeeded by the FS then there is no other writer. So you know that the file
> contents at that time won’t change while the newly create handle is around.
>
> Some limitations in this approach are:
> 1. if the modification of a file would still make it match a whitelist entry
> (probably a different entry), you still might fail access because step 2
> doesn’t actually attempt to revalidate the while after the write. Perhaps
> you could work around this by ‘remembering’ (in a StreamHandleContext) to
> revalidate the file in Cleanup if there was a write to it ?
>
> Anyway, this is what I could come up with without knowing more about your
> solution (such as why is SRV trying to open a file for execution on the
> local system ?)… Hope it helps a bit.
>
> Thanks,
> Alex.
>
>
>
> —
> NTFSD is sponsored by OSR
>
> For our schedule of debugging and file system seminars
> (including our new fs mini-filter seminar) visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>