ANNOUNCEMENT: File System Filters and SMB2 Leasing (enhanced oplocks)

Folks,

In Windows 7, there have been significant enhancements to the existing oplock model. These enhancements were primarily
geared towards improving the performance of access to files over remote file systems. (SMB-RDR)

At the past several IFS Plugfests, the file system team has talked about these enhancements to the existing oplock
model. In spite of this, we have found some file system filters are not inter-operating correctly with the model. This
is causing serious performance degradation in scenarios that involve access to files over remote file systems. I thought
it would be a good idea to give all of you some context behind these changes and explain how filters can coexist with
oplocks in Windows 7.

This will be especially relevant to AV (anti-virus) applications that often scan content when it is open or modified by
remote clients over the SMB protocol.

Overview of Oplocks & Leasing

The SMB 2.1 revision of the SMB2 protocol introduced with Windows 7 includes a new feature called “leasing”. Leasing
can be thought of as an enhancement to the traditional oplock model.

Oplocks are used by SMB clients and server as a mechanism to enforce distributed cache coherency. Conceptually oplocks
allow SMB clients to cache reads, writes and open file handles. The traditional oplock model defines 3 oplock states:

a. Batch oplock - allows clients to cache reads, writes and open handles. Batch oplocks are exclusive - i.e. can be
granted only if there is 1 open handle to a given file.

b. Exclusive oplock – allows clients to cache reads and writes.

c. Level2 oplock – allows clients to cache reads. Level 2 oplocks are shared … i.e can be granted to multiple handles.

However, oplocks have 2 significant limitations which result in suboptimal caching behavior.

  1. Oplocks are maintained per open handle. This implies 2 open handles to the same file, even if they are from the same
    SMB client, will break the oplock.

  2. Oplocks do not allow granular specification of caching intents. For example, there is no way for multiple clients to
    cache reads and open handles.

To work around some of these limitations, Windows 7 introduces leases (also referred to as enhanced oplocks.). Leases
are different from traditional oplocks in the following ways:

  1. Allows clients to request any combination of (R)ead, (W)rite and (H)andle caching intentions. (we currently support
    only RWH, RW, RH, R combinations.)

  2. Leases are associated with a “lease key” instead of a handle. This implies that multiple handles can share the same
    lease as long as they share the same lease key. (in other words a 2nd open to the same file using the same “lease key”
    will not break the lease held on the first open.)

Leasing provides a 30-35% reduction in network traffic as well as a similar increase in server scalability under a
variety of workloads. So it is very critical that filesystem filters running on the server understand the semantics
associated with the leasing model and do not cause unexpected behavior or negate the performance improvements.

Filters & Leasing

Filters which run on the client side (i.e filtering \Device\Mup above the remote filesystem stack.) will not be affected
by the leasing changes. However, filters running on the server, which monitor and take action on file opens need to be
leasing aware so that they don’t negate the benefits of leasing. Specifically:

  1. Filters must handle the new FSCTL_REQUEST_OPLOCK fsctl similar to how they handle the traditional oplock fsctls.
    Filters can monitor this FSCTL to monitor lease requests issued by the SMB server and the completion routines can be
    used to monitor lease breaks.

  2. Filters which piggyback on a handle opened by srv2.sys to perform I/O to a file will continue to work without
    changes. This is the recommended way to avoid any interference with existing open handles and oplocks.

  3. Filters which open a new handle in response to an open from srv2.sys must associate its handle with the same “lease
    key” as the srv2 open. The lease key (or oplock key) is associated with a handle via an ECP attached to the create. The
    format of the ECP is as defined below:

typedef struct _OPLOCK_KEY_ECP_CONTEXT {

//
// The caller places a GUID of their own devising here to serve as
// the oplock key.
//

GUID OplockKey;

//
// This must be set to zero.
//

ULONG Reserved;

} OPLOCK_KEY_ECP_CONTEXT, *POPLOCK_KEY_ECP_CONTEXT;

The ECP is identified by following GUID and can be queried using the routines available to manipulate ECPs:

GUID_ECP_OPLOCK_KEY

The definitions for these structures is available in ntifs.h in the W7 WDK.

The leasing-aware SMB 2 server (srv2.sys) serving a leasing aware client will attach the above ECP. The filter must
capture the oplock key from the ECP and use the same key when opening another handle to the file. Since ECPs are not
available from user mode, for AVs issuing opens from user mode for scanning, the recommendation is that the AV filter
must intercept the create request from their usermode process and attach the oplock key ECP to it. Note that a filter
blocking a create from a client while waiting for a lease break from the same client for the same file will cause a
deadlock on the client. (the SMB2 server will break the deadlock after 35 secs to allow forward progress, but a 35 sec
hang should be considered as a real deadlock and must be avoided.)

  1. Filters must handle the COMPLETED_IF_OPLOCKED flag the same way as with traditional oplocks. In particular, they
    should not block a create which is issued with this flag.

If you have questions or concerns surrounding this please send e-mail to: xxxxx@microsoft.com

The file system filter team plans to make available a comprehensive “Filter Driver Test Suite” that file system filters
can use to detect if they are regressing the functionality of the Windows platform in any way. This test suite will
include tests that will help validate that your filter is functioning correctly with respect to leasing as well. I
encourage you to download the suite and test your filter products with the test suite when it becomes available.

Also, at the upcoming IFS Plugfest in August, the remote file systems team will be at hand to help with any issues that
you may find or questions that you may have.

Regards,

File System Filter Team
Microsoft Corp.

This posting is provided “AS IS” with no warranties, and confers no Rights

As long as we are on the topic of oplocks and performance, I would like to bring up an issue that I have seen the OSR folks complain about multiple times:

MS needs to make available a way for a filter/layered FSD on the client to know what the underlying redirector is doing with regard to oplocks.

Because my filter is now handling all caching, the only way for me to get correct behavior (without writing a network filter to sniff SMB which I assume is what OSR has done) is to open the shadow file with exclusive non-cached access to insure that no oplock is taken out.

Much better behavior could be achieved if there was a callback mechanism as has been asked for previously by OSR and I’m sure others (add me too now!).

Thanks,
Matt

Matt,

We (File System Team and OSR) discussed this at the last IFS Plugfest. The conclusion we came to was that there is no
clean solution for this. We discussed the option of using write-through caching in the filter on the client (RDR-side)
and always flush on close

The other option would be to monitor oplocks on the server-side, and communicate breaks to the filter layer on the
client. This would be non-trivial to achieve.

Regards,
Sarosh.
File System Filter Lead
Microsoft Corp

This posting is provided “AS IS” with no warranties, and confers no Rights

xxxxx@yahoo.com wrote:

As long as we are on the topic of oplocks and performance, I would like to bring up an issue that I have seen the OSR folks complain about multiple times:

MS needs to make available a way for a filter/layered FSD on the client to know what the underlying redirector is doing with regard to oplocks.

Because my filter is now handling all caching, the only way for me to get correct behavior (without writing a network filter to sniff SMB which I assume is what OSR has done) is to open the shadow file with exclusive non-cached access to insure that no oplock is taken out.

Much better behavior could be achieved if there was a callback mechanism as has been asked for previously by OSR and I’m sure others (add me too now!).

Thanks,
Matt