CcPinRead interface ...

Hello experts,

can someone please explain me why my FSD’s IRP_MJ_WRITE is called twice for the same range when I use the CcPinRead interface. In my driver processing the IRP_MJ_WRITE paging requests I pin the write buffers so that I could mark them dirty if I did not receive an acknowledgement from the server that the data was flushed to disk.

I pin and lock the write buffers using the following code:

CcPinRead ( FO, &mapFileOffset, length, MAP_WAIT, &mapBcb, &mapBuffer );
IoAllocateMdl( mapBuffer, (ULONG)length, FALSE, FALSE, NULL );
MmProbeAndLockPages( pMdl, KernelMode, IoReadAccess );

Thanks in advance for your help!

-Ilya.

Hello Ilya,

You don’t mention which OS Version you are on, but my guess is some really crufty old downlevel system like XP (oh, sorry, Plugfest will do that to you. If you aren’t running Vista SP1, you are running ancient crufty old downlevel code.)

Generally, the duplicate write issue is an artifact of several distinct factors: first, that you have two different locations for dirty page information (one in the page table entry that is set by the hardware, and one in the page frame database that is managed by the memory manager.) Second, you have two different background work threads that try to flush out memory (lazy writer + modified page writer.) MPW looks at the PFN entry. Lazy Writer “looks” (albeit indirectly) at the PTE entry.

When LW wants to clean a set of pages he flushes them; this process takes the contents of the PTE dirty bit, transfers it to the PFN entry and clears the dirty bit in the PTE, then asks MM to write out the dirty pages. When MPW wants to clean a set of pages he clears the dirty bit in the PFN and writes them back to disk.

Thus, if MPW gets to the pages first, he does the first I/O. But since the PTE still ALSO shows the pages are dirty, when LW gets there, he flushes them again. Voila, two writes of the same page without any intermediate changes. Thus, you actually are not guaranteed of seeing two writes, but you MIGHT see two writes.

The problem can also be exacerbated by creating additional mappings (e.g., if you have a UM program that memory maps the file, it gets another set of PTEs, with another set of dirty bits.) Play your cards right and you can actually see the SAME page written once for EACH of those mappings.

The claim is that this sitatuion is vastly improved in Vista (think of the uber simple solution - at least have MPW clear the dirty bit in the ONE PTE it can already find. That would eliminate almost all the duplicate write cases between Cc and Mm. I don’t know if that’s what they implemented, but for years I’ve wondered why nobody ever did this, and while it adds code, eliminating an extra I/O in exchange for some extra instructions seems like a good trade-off to me.)

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

Hello Tony,

thanks for the reply. The OS is “crufty old downlevel” W2K3 SP2 system.

Can you please explain me why calling the following APIs exacerbates the issue:
CcPinRead ( FO, &mapFileOffset, length, MAP_WAIT, &mapBcb, &mapBuffer );
IoAllocateMdl( mapBuffer, (ULONG)length, FALSE, FALSE, NULL );
MmProbeAndLockPages( pMdl, KernelMode, IoReadAccess );

Without calling the above APIs I do not see the “duplication” issue.
Also, any suggestions how to fix it?

Thanks,
Ilya.

-------------- Original message --------------
From: xxxxx@osr.com

Hello Ilya,

You don’t mention which OS Version you are on, but my guess is some really
crufty old downlevel system like XP (oh, sorry, Plugfest will do that to you.
If you aren’t running Vista SP1, you are running ancient crufty old downlevel
code.)

Generally, the duplicate write issue is an artifact of several distinct factors:
first, that you have two different locations for dirty page information (one in
the page table entry that is set by the hardware, and one in the page frame
database that is managed by the memory manager.) Second, you have two different
background work threads that try to flush out memory (lazy writer + modified
page writer.) MPW looks at the PFN entry. Lazy Writer “looks” (albeit
indirectly) at the PTE entry.

When LW wants to clean a set of pages he flushes them; this process takes the
contents of the PTE dirty bit, transfers it to the PFN entry and clears the
dirty bit in the PTE, then asks MM to write out the dirty pages. When MPW wants
to clean a set of pages he clears the dirty bit in the PFN and writes them back
to disk.

Thus, if MPW gets to the pages first, he does the first I/O. But since the PTE
still ALSO shows the pages are dirty, when LW gets there, he flushes them again.
Voila, two writes of the same page without any intermediate changes. Thus, you
actually are not guaranteed of seeing two writes, but you MIGHT see two writes.

The problem can also be exacerbated by creating additional mappings (e.g., if
you have a UM program that memory maps the file, it gets another set of PTEs,
with another set of dirty bits.) Play your cards right and you can actually see
the SAME page written once for EACH of those mappings.

The claim is that this sitatuion is vastly improved in Vista (think of the uber
simple solution - at least have MPW clear the dirty bit in the ONE PTE it can
already find. That would eliminate almost all the duplicate write cases between
Cc and Mm. I don’t know if that’s what they implemented, but for years I’ve
wondered why nobody ever did this, and while it adds code, eliminating an extra
I/O in exchange for some extra instructions seems like a good trade-off to me.)

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: xxxxx@comcast.net
To unsubscribe send a blank email to xxxxx@lists.osr.com

Hello Tony,

thanks for the reply. The OS is “crufty old downlevel” W2K3 SP2 system.

Can you please explain me why calling the following APIs exacerbates the issue:
CcPinRead ( FO, &mapFileOffset, length, MAP_WAIT, &mapBcb, &mapBuffer );
IoAllocateMdl( mapBuffer, (ULONG)length, FALSE, FALSE, NULL );
MmProbeAndLockPages( pMdl, KernelMode, IoReadAccess );

Without calling the above APIs I do not see the “duplication” issue.
Also, any suggestions how to fix it?

Thanks,
Ilya.

From: xxxxx@osr.com

Hello Ilya,

You don’t mention which OS Version you are on, but my guess is some really
crufty old downlevel system like XP (oh, sorry, Plugfest will do that to you.
If you aren’t running Vista SP1, you are running ancient crufty old downlevel
code.)

Generally, the duplicate write issue is an artifact of several distinct factors:
first, that you have two different locations for dirty page information (one in
the page table entry that is set by the hardware, and one in the page frame
database that is managed by the memory manager.) Second, you have two different
background work threads that try to flush out memory (lazy writer + modified
page writer.) MPW looks at the PFN entry. Lazy Writer “looks” (albeit > indirectly) at the PTE entry.

When LW wants to clean a set of pages he flushes them; this process takes the
contents of the PTE dirty bit, transfers it to the PFN entry and clears the
dirty bit in the PTE, then asks MM to write out the dirty pages. When MPW wants
to clean a set of pages he clears the dirty bit in the PFN and writes them back
to disk.

Thus, if MPW gets to the pages first, he does the first I/O. But since the PTE
still ALSO shows the pages are dirty, when LW gets there, he flushes them again.
Voila, two writes of the same page without any intermediate changes. Thus, you
actually are not guaranteed of seeing two writes, but you MIGHT see two writes.

The problem can also be exacerbated by creating additional mappings (e.g., if
you have a UM program that memory maps the file, it gets another set of PTEs,
with another set of dirty bits.) Play your cards right and you can actually see
the SAME page written once for EACH of those mappings.

The claim is that this sitatuion is vastly improved in Vista (think of the uber
simple solution - at least have MPW clear the dirty bit in the ONE PTE it can
already find. That would eliminate almost all the duplicate write cases between
Cc and Mm. I don’t know if that’s what they implemented, but for years I’ve
wondered why nobody ever did this, and while it adds code, eliminating an extra
I/O in exchange for some extra instructions seems like a good trade-off to me.)

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

>IRP_MJ_WRITE paging requests I pin the write buffers so that I could mark them

dirty if I did not receive an acknowledgement from the server that the data
was
flushed to disk.

Just fail the write in this case, Cc will do the thing for you.


Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com