Volume UpperFilter driver - IRP_MJ_WRITE buffer contents change

I have a volume UpperFilter driver that captures IRP_MJ_WRITE buffer contents that are coming down from NTFS to the volume. I’m making a copy of the write data in my dispatch routine, using MmGetSystemAddressForMdlSafe(pIrp->MdlAddress, NormalPagePriority) to get a pointer to the data.

Occasionally, under heavy load, I see that the system buffer returned by MmGetSystemAddressForMdlSafe contains different data than what actually gets committed to the volume.

I know that, with Direct I/O, it’s possible that the user can change the buffer contents while it’s being sent. I’m assuming that’s what is happening here - NTFS is constructing the buffer after initiating the write request, and we’re seeing the data before it is finalized. If that’s true, then there must be some mechanism that NTFS uses to notify the disk driver that the data is complete and ready to be committed.

Assuming that’s all correct, is there some way that our filter driver can use the same mechanism to wait for the buffer to be complete? Or am I off base with my analysis?

NTFS is constructing the buffer after initiating the write request, and we’re seeing the data before it is finalized.

Clever, but there is no such mechanism in Windows. What you see is what gets written… of course, depending on any other filters and drivers that are below you. Of course, it is also possible that the user is changing the contents of the buffer… but that would result in essentially random data being written… because of the lack of the sync mechanism you asked about.

So, let’s back up. Are you witnessing data corruption? Or do you have “some test” that you’re using to determine the data is different? How are you doing this, exactly?

Peter

Yes, we are seeing corruption. I’ve verified the corruption by introducing debug code into the driver - when we handle a write, I log a debug message that contains a few of the bytes in the system buffer so that we can compare that information to what we see on disk later.

When corruption occurs, the bytes that I logged don’t match what’s on disk.

I have also considered the possibility that NTFS is buffering the data and that maybe we haven’t seen the latest changes yet. But if that’s the case, NTFS has to be buffering for more than 2 hours. We logged all writes over a two hour period after the data block was modified, and that block never was updated again in that time. That seems like way too long of a time to be able to explain the behavior as a result of NTFS write buffering.

NTFS isn’t buffering anything, unless you mean the cache manager. Even then, the data will coherent with what’s on disk in a matter of a few seconds.

You didn’t really explain what your test does, or how you’re determining that you have corruption. I’ve done some significant work in the storage branch, including multiple upper volume filters, and I might be able to help you. But you have to explain what you’re doing, and how you’ve come to the conclusion that the data on disk is corrupt.

Are you changing the data you write? If so, are you making a copy of the buffer before you write it (you cannot do modification in place).

Peter

We don’t change the data. But yes we do make a copy of the data and use that later for replication.

The test I do is to log a debug message that contains a small subset of the data in the buffer that we got back from MmGetSystemAddressForMdlSafe. For example if the write length is 8K, I grab 4 bytes from the buffer at 0, 2K, 4K, and 6K and just include those 4 bytes in the message (hex value).

Later, if we detect corruption at that block, I can look at the debug logs to see what data we saw. In this case we saw different data than what is actually stored on the volume at that location.

So, I know for a fact that replication of write data at this level works
correctly, my working assumption is therefore that your code is doing
something wrong.

When are you copying the data?

Mark Roddy

Dummy Pages?

https://blogs.msmvps.com/kernelmustard/2005/05/04/dummy-pages/