Monitoring Oplocks Activity

Fernando_Roberto · July 23, 2012, 1:07pm

Hi all,

I’ve been working on a layered file system and, as a file system, it owns the FCB and CCB on the file object opened on it. My file system deals with the cache manager directly calling CcInitializeCacheMap and it receives paging I/O as a regular file system.

As a layered file system, it gets the actual data from a different file system using a different file object. On this scenario I have two different file objects, the upper one, which belongs to my file system, and the lower one, which belongs to the underlying file system. These two file objects have different FCBs and they have different SectionObjectPointers as well.

Everything seems OK so far, but now I’m trying to use a network redirector as the underlying file system. The issue I’m facing is about having the underlying file being changed remotely and the upper file cached data not updated.

I’m wondering if I’d need to implement something like oplocks between the local and the remote node by myself to make my cached data as consistent as the underling file system.

Can I filter the oplock activity with a filter attached to a network redirector on the client side or are these IRPs only seen on the server side?

Regards,

Fernando Roberto da Silva
DriverEntry Kernel Development
http://www.driverentry.com.br

OSR_Community_User · July 23, 2012, 1:19pm

This is a long-standing complaint I have with the RDR implementation - it won’t share the oplock knowledge with anything above it. Thus, the only way to do this on client side is to build a filter in the *network* stack and try to track the oplock state from that.

Otherwise, the only solution of which I’m aware that works reasonably well is to use exclusive access on the file (we actually do a read+write, shared read scheme and then fall back to read-only, shared read+write to enhance compatibility with the MS Office products.)

This isn’t transparent, but it does provide correct behavior.

Tony
OSR

Fernando_Roberto · July 23, 2012, 1:43pm

Hi Tony,

Would filtering oplock activity on the network layer require different implementations for different redirectors? I mean, filtering oplocks on SMB, NFS and so on…

Additionally, I’ve been thinking of changing the way the actual data is acquired. Instead of having two different file objects on a layered model, I’d just return STATUS_REPARSE from my virtual volume to the actual one. By doing so, the cached data would be already maintained by the existing oplock. But after reading some of the great posts from Alex Carp about the problems STATUS_REPARSE can cause, I’m afraid of fixing a bug and getting several others.

http://fsfilters.blogspot.com.br/2012/02/problems-with-statusreparse-part-i.html
http://fsfilters.blogspot.com.br/2012/02/problems-with-statusreparse-part-ii.html

Thanks for your help.

Regards,
Fernando.

OSR_Community_User · July 23, 2012, 5:26pm

Assuming the redirector had a cache coherency scheme, yes, it would require filtering each one. For example, NFS traditionally used a timeout based scheme (NFS 4 is different in this regard) and I’m not sure what DAV is using at all.

The STATUS_REPARSE technique does work and overcomes issues in older OS platforms. When I first started building filters, we used the layered filter approach; we moved to the parallel file system approach (STATUS_REPARSE) because there were too many broken Windows components that would bypass our filter and pass our file object to the underlying FSD.

For example, we’ve seen at least one instance of this as recently as on XP - from filter manager - where it passes the primary file object to NTFS (or FAT) rather than calling the filter so the file object can be properly translated. That cause the system to crash. My favorite part of this bug was that it generally only shows up when someone else’s filter calls a particular API.

But in newer versions of the OS Microsoft has cleaned this up. While we STILL see components passing our file object down, it’s becoming far less common and in general we can convince the owner to fix it - however, I’ve seen CUSTOMERS who refuse to update the other broken component and this becomes a fire drill that requires “fixing” the problem in some ugly fashion.

But STATUS_REPARSE won’t fix any issues with cache coherency.

Tony
OSR

Fernando_Roberto · July 23, 2012, 10:55pm

As far as I could understand, by using the layered model, I had to deal with the cache manager by my self, calling CcInitializeCacheMap and completing the paging I/O sent by Mm/Cc reading from and writing to the underlying file system.

The underlying file system has its own cache, which is different from mine’s. If my file system’s user mapped the file’s content into its address space, it’d map different pages from the ones used by the underlying file system cache.

Supposing the underlying file system is a redirector, its cache would be flushed/purged whenever the oplock requires to. Assuming this activity is not observed from my file system, my file object would still keep the old data in its cache pages and subsequent read requests would still get the old data.

By using STATUS_REPARSE from my virtual volume to the actual one, there would be only one file object and only one cache map. The file object used by an application that opened it from my namespace would actually belong to the underlying file system.

If the underlying file system was a redirector and a flush/purge occurred due to an oplock, no matter if my filter was aware of that activity or not, the application would get the most recent data on subsequent read requests bringing the data from the server.

Am I missing anything?

Thanks for your help,
Fernando.

Alex_Carp · July 24, 2012, 1:58pm

I believe that the semantics for remote files are different enough from local files that trying to implement the approach you’ve suggested might not work. I haven’t tried this myself but I’ve heard some of the stories…

However, i agree with the general concept that you could address the cache coherency by using this scheme, though I’m afraid other things will stop working.

Thanks,
Alex.