Can FSD Behavior Affect Client Redirector Cache Behavior?

I’ve got an issue with a client application written in .NET (see https://social.msdn.microsoft.com/Forums/vstudio/en-US/0927a63e-b5c0-49c5-b68b-56f0a96cd81d/directorycreatedirectory-throws-ioexception-quotfile-or-directory-with-the-same-name-already?forum=netfxbcl) against our FSD. When running multiple threads all creating the same directory structure, the app throws an exception on 1 or more threads when a directory already exists. The API for the .NET function Directory.CreateDirectory says it shouldn’t do that, and looking at the source code for the function, I’ve pretty much determined as such (again, see link for details.)

So I monitored on the Server side and can see the STATUS_OBJECT_NAME_COLLISION being returned to the System process for mr lan man server and so that makes sense. The odd thing, however, is if I point the app to a remote NTFS drive on the same server instead of a drive backed by our FSD, I can see the exact same error behavior, BUT, I can see the client ask for the attributes of the directory immediately after it. Again, looking at the source code for the function, that makes sense, it tries to determine if its a directory.

Eventually I tried disabling the SMB2 redirector cache FileInfoCacheLifetime (set to 0) and the problem has gone away. With it disabled, the client asks for the directory attributes immediately after the error and when it determines its a directory, the function succeeds.

The odd thing is that if you look at the source code of the function (https://referencesource.microsoft.com/#mscorlib/system/io/directory.cs,e3885fef9f6af9de), the only way it should return that error is really if there is a FILE and not a DIRECTORY at that same path. I’ve look at that code through and through and that’s the only code path that would produce that. So it implies that our FSD is returning that its a FILE. I’ve analyze this through hundreds of test runs and nowhere is our FSD ever returning attributes on these entries that say they are a file. So even if the redirector is caching attributes for that entry, it should be telling the client app that its a directory, right?

There is one glaring difference between NTFS and our FSD; the former supports oplocks on directories and we do not. We took the path that CDFS has and just return STATUS_INVALID_PARAMETER if its a directory. Could that be what is causing the client’s redirectory to not refresh the entry’s metadata on that directory?

Finally, can someone can explain why lanmanserver would use FSCTL_REQUEST_OPLOCK with NTFS and FSCTL_REQUEST_OPLOCK_LEVEL_2 with our FSD?

I’ll start by saying I don’t have answers, but this sounds weird so I’m
interested :slight_smile:

Can you share the ProcMon logs from the server side? It’s unfortunate that
you can’t get a trace on the client side, that could be interesting.

Have you tried getting a Wireshark trace? Maybe the SMB traffic will point
to why the redirector is caching your attributes as a directory. Also,
according to these slides the cache is based on a FID returned in the create
response:

https://www.snia.org/sites/default/orig/sdc_archives/2010_presentations/wednesday/DavidKruse_Analyzing_Metadata_Caching_Windows_SMB2_Client.pdf

Do you see a FileInternalInformation query as part of the sequence?

I’m also very curious as to why you’re not seeing FSCTL_REQUEST_OPLOCK. My
target system is Win10 16229 and when I access an NTFS share from a Win10
workstation I see FSCTL_REQUEST_OPLOCK coming from SRV to get a lease on the
directory (srv2!Smb2LeaseAcquire). There are a bunch of paths here that
would result in SRV alternatively requesting an oplock
(srv2!Smb2RequestOplock), which looks like you could potentially see as a
FSCTL_REQUEST_OPLOCK_LEVEL_2*.

I don’t have it in me at the moment to try to figure out why the lease path
would be circumvented, though I’m interested…Do you EVER see an
FSCTL_REQUEST_OPLOCK from SRV?

*More about oplock versus lease here:
https://blogs.msdn.microsoft.com/openspecification/2009/05/22/client-caching-features-oplock-vs-lease/.
In the end, it’s oplocks all the way down…

-scott
OSR
@OSRDrivers

THAT’s IT! Looking at FileInternalInformation, the values returned are identical for a short period of time, including for a FILE which is created just after the directory. So the cache must be storing those attributes as a file and not going to the server!

Turns out the reason is we have refactored our app recently and our file ids are increased from 8 to 12 bytes long. For responding to FileInternalInformation, the first 8 bytes of the file id is returned. As it turns out, for files created within the same second, the first 8 bytes can be identical.

The solution turns out to be to rejiggering which 8 bytes we return to guarantee uniqueness. With the cache turned on, we now no longer get this issue!

Regarding the oplock thing, my server is actually Win10E 1709 and my client is Server2016 1607. Our driver is compiled against the Win7 DDK (7600.16385.1) Do you think this could be why we never see FSCTL_REQUEST_OPLOCK? In any case, that’s now immaterial to the topic at hand. If I ever need to track that down, I’ll post a new topic.

Thank you, thank you, thank you!