Hi Everyone,
I have a layered file system (implemented as a minifilter) which provides cache manager integration. I’m struggling to understand if there is a workable solution for receiving oplock breaks for the underlying CIFS files. Years ago I understood the answer to be “no” going off of info provided in posts such as these: https://community.osr.com/discussion/171306/intercepting-oplock-requests-in-a-minifilter
But recently, while stepping through rdbss internals I came across FsRtlUpperOplockFsctrl() for the first time (apparently introduced with Win 8.1). When I open the CIFS target file and then send an FSCTL_REQUEST_OPLOCK down (using FltPerformAsynchronousIo so I can register a completion callback) I always get a status of STATUS_OPLOCK_NOT_GRANTED in my callback. Stepping through rdbss!RxOplockRequest() I see that it is passing my IRP along with its own oplock to FsRtlUpperOplockFsctrl(). That function is returning the STATUS_OPLOCK_NOT_GRANTED which is getting passed along to my callback.
The documentation for FsRtlUpperOplockFsctrl and FsRtlCheckUpperOplock both explicitly say they are designed to allow oplock checking in layered file systems. But will these work in my scenario and with CIFS specifically? I cannot find any mention of the APIs on these forums, or in github examples, or anywhere other than the two MSDN pages.
I think it is clear, but for the inevitable “what are you trying to do?” question, I’m needing to invalidate my caching of a CIFS file on host 1 when an update on host 2 breaks the oplock. Rdbss itself would know about this, but it isn’t the layer providing the cache manager integration.
Thanks!
-JT
1 Like
I’ve never been able to get anything to work and I’ve been needing this facility for 25 years (yes, before nt4) but it would be awesome if someone could point the way.
Layered oplocks appear to have been de-emphasised of late.
Thanks for weighing in, Rod. Knowing your depth of knowledge I’m not feeling confident that there is a solution. Have you explored the APIs I mentioned to rule them out as a possible means to a solution?
-JT
Not recently. Things dragged me in other directions so I have lost touch with this particular thread. Seeing this post re-ignited my interest and I’d be fascinated to heard what you discover, or what others have to say. The fact that rdbss is in the loop is promising, but the lack of filter manage support might make for a lot of pain for many
I’ve continued to dig into this but have not made any real progress. The more I re-read the MSDN blurbs and look at the rdbss code it feels almost like rdbss is the upper FS in this arrangement. Originally I was hoping that my layered FS could be the upper and would coordinate the oplock break with the lower (rdbss). But if rdbss is the upper who is the lower? And is it possible to add another layer? I.e. could you have A and B acting as lower and upper, and B and C acting as lower and upper, thus B is both an upper and a lower?
Is there anyone from MSFT who can weigh in on the intended usage of these APIs and whether CIFS can play with an upper layered file system?
Nice find! I didn’t know about these APIs so I did my own digging around with the debugger and some test code. I don’t have any definitive answers, but I can say that the way this currently works would not support the use case that you (and we) need.
Here’s the behavior I’m seeing:
- RDBSS now handles oplock FSCTLs on the Client (which is new).
- The oplock FSCTLs are only supported if they’re coming from kernel mode. I patched this check out in the debugger so that I could play with it using FileTest
- As part of these changes RDBSS was also instrumented to make all the correct FsRtl oplock calls to break due to activity on the Client. I have confirmed this works by being granted an oplock on the Client and then breaking it with activity on the Client.
- I believe in this case the “Upper Filesystem” is the Client and the “Lower Filesystem” is the Server. In other words, FsRtlUpperOplockFsctrl will grant the oplock to the requestor on the Client if the oplock is compatible with the oplock RDBSS has on the Server. RDBSS “knows” the current oplock it holds on the Server via the FCB and passes it to the FsRtlUpperOplockFsctl API.
- FsRtlCheckUpperOplock is designed to be called by RDBSS to break the Client oplock when the state of the oplock on the Server changes. This is a critical part we would need and I have been unsuccessful in making it happen. RxDowngradeOplockState executes on Server oplock breaks and there’s a path that calls FsRtlCheckUpperOplock but it never fires in my basic test cases.
So, as implemented, you can get oplocks on the Client and activity on the Client will break them, but not activity on the Server (at least in my simple test cases). If my research is accurate I’m not exactly sure what the use case is here.
1 Like
I’m not exactly sure what the use case is here
The usual people use oplocks these days (if they are not writing a remote caching subsystem) is to allow things like search indexers to take long lived handles and to “get out of the way seamlessly” when some other activity happens. There was a flurry of things doing that coming out of Redmond about 10 years or so.
So if I had to SWAG I’d say it was in support of something like that.
Or I suppose it might be explorer (which IIRC uses a combination of DCN and directory oplocks these days)
Thanks for joining in Scott and for doing some experimenting of your own.
@“Scott_Noone_(OSR)” said:
4. I believe in this case the “Upper Filesystem” is the Client and the “Lower Filesystem” is the Server. In other words, FsRtlUpperOplockFsctrl will grant the oplock to the requestor on the Client if the oplock is compatible with the oplock RDBSS has on the Server. RDBSS “knows” the current oplock it holds on the Server via the FCB and passes it to the FsRtlUpperOplockFsctl API.
Yeah that has become my sense as well. That their usage of “layered file system” doesn’t seem to involve multiple layers on the same system, so definitely a bit different from what we typically call a layered file system around here.
- FsRtlCheckUpperOplock is designed to be called by RDBSS to break the Client oplock when the state of the oplock on the Server changes. This is a critical part we would need and I have been unsuccessful in making it happen. RxDowngradeOplockState executes on Server oplock breaks and there’s a path that calls FsRtlCheckUpperOplock but it never fires in my basic test cases.
That’s what I’m seeing as well. No changes by another CIFS client, or on the server directly, are triggering a break that I get notified of. Although I wasn’t completely surprised given the params passed to the APIs. For example, FsRtlCheckUppperOplock takes an oplock, which is rdbss’s internal oplock, but how would there be any association between that oplock and one I am holding? The OS knows that oplock 1 is associated with rdbss’s FCB, and it also knows that oplock 2 is associated with my FCB, but I cannot see any way to “bond” those two together such that a break of oplock 1 cascades to a break of oplock 2. Though the whole upper/lower/layered terminology in the MSDN pages sure make it feel like that’s the purpose of these APIs!
For example, FsRtlCheckUppperOplock takes an oplock, which is rdbss’s internal oplock, but how would there be any association between that oplock and one I am holding?
The terminology is very confusing and appear to be unique to these APIs and doc pages, so that’s great…
My guess for the expected flow here:
- Driver on Client A opens a file on Server X. As part of opening the file, RDBSS does its on the wire thing to get an oplock on the server
- Driver wants to cache file data, so it sends down the FSCTL to request an oplock
- RDBSS calls FsRtlUpperOplockFsctl to see if the oplock is compatible with what it currently has on the server. If not, you get an error. If yes, driver gets back STATUS_PENDING and the oplock is granted
- Driver now caches data and goes on its merry way
- Someone on Server X twiddles the file in such a way that it invalidates RDBSS’ oplock. This triggers the on the wire oplock break protocol
- As part of handling this oplock break, RDBSS could (but apparently does not) call FsRtlCheckUpperOplock to see if the driver requested oplock also needs to break.
- Let’s say in this case the break needs to happen…FsRtlCheckUpperOplock returns STATUS_PENDING and the FsRtl package proceeds to break the driver’s oplock by completing the IRP.
- The driver then does any processing necessary to handle the break and sends the break acknowledge. Once this happens the OPLOCK_WAIT_COMPLETE_ROUTINE passed to FsRtlCheckUpperOplock fires and RDBSS can finish acknowledging the break on the server
Basically the “upper oplock” is the one requested by the driver and handled by the FsRtl. The “lower oplock” is the on the wire one managed by RDBSS. This seems very specific to getting FsRtl oplocks layered on top of RDBSS and not a generic mechanism
@rod_widdowson said:
I’m not exactly sure what the use case is here
The usual people use oplocks these days (if they are not writing a remote caching subsystem) is to allow things like search indexers to take long lived handles and to “get out of the way seamlessly” when some other activity happens. There was a flurry of things doing that coming out of Redmond about 10 years or so.
So if I had to SWAG I’d say it was in support of something like that.
Or I suppose it might be explorer (which IIRC uses a combination of DCN and directory oplocks these days)
That’s what I thought but you can’t do it from user mode…I thought maybe Defender but that doesn’t appear to use this. It’s a mystery.
If they’d ever have a plugfest again maybe we could get some answers
1 Like
Hi there.
We also have a layered file system over here, but as a legacy full file system driver, like FASTFAT is. We already use oplocks to implement cache integration and potencial changes to the underlying file system files.
Now, we are implementing oplocks on our file system, so that drivers using our file system could request oplocks on us, but that depends on the oplock we hold on the underlying file system.
My understanding is the same as Scott mentioned on his guess above. I’m finishing my changes here and I’ll start testing soon. I’ll share my findings when I get it done.
Please, keep updating this thread.
Regards,