Paged I/O and CcFlushCache questions

OSR_Community_User · February 5, 2004, 4:33pm

Hi all,

I have two independent, through related, questions. Any help, suggestions,
or pointers would be most appreciated.

When CcFlushCache() is called by the VMM or my FSD, multiple paged I/O
requests are passed to the page handler of my IRP_MJ_WRITE routine. My FSD
synchronously writes each request to disk. However, in some cases
CcFlushCache() performs two writes for every 64KB request.

For instance, the following output shows four writes from an Explorer file
copy from a different. Process 4 then performs a sync operation which calls
CcFlushCache(). CcFlushCache() triggers two paged writes for each offset
request. This trace is repeatable and I know for certain that the first
write (offset=0) makes it to disk.

[864,644] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=0, SyncIo=1,
NonBufferedIo=0, WriteThrough=0
[864,644] sfs_write(1694) ino=114, offset=65536, size=65536, PageIo=0,
SyncIo=1, NonBufferedIo=0, WriteThrough=0
[864,644] sfs_write(1694) ino=114, offset=131072, size=65536, PageIo=0,
SyncIo=1, NonBufferedIo=0, WriteThrough=0
[864,644] sfs_write(1694) ino=114, offset=196608, size=21707, PageIo=0,
SyncIo=1, NonBufferedIo=0, WriteThrough=0
[4,1408] sfs_fsync(645) ino=114, difl=0x506e1, size=218315, vn=2,
cr=1076002386,669875000, line=1744
[4,1408] sfs_pagecache(1602) flush 114, difl=0x506e1, l=750
CcFlushCache() called here…
[4,1408] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=1, SyncIo=1,
NonBufferedIo=1, WriteThrough=0
[4,1408] sfs_write(1694) ino=114, offset=65536, size=65536, PageIo=1,
SyncIo=1, NonBufferedIo=1, WriteThrough=0
[4,1408] sfs_write(1694) ino=114, offset=131072, size=65536, PageIo=1,
SyncIo=1, NonBufferedIo=1, WriteThrough=0
[4,1408] sfs_write(1694) ino=114, offset=196608, size=24576, PageIo=1,
SyncIo=1, NonBufferedIo=1, WriteThrough=0
[4,1408] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=1, SyncIo=1,
NonBufferedIo=1, WriteThrough=0
[4,1408] sfs_write(1694) ino=114, offset=65536, size=65536, PageIo=1,
SyncIo=1, NonBufferedIo=1, WriteThrough=0
[4,1408] sfs_write(1694) ino=114, offset=131072, size=65536, PageIo=1,
SyncIo=1, NonBufferedIo=1, WriteThrough=0
[4,1408] sfs_write(1694) ino=114, offset=196608, size=24576, PageIo=1,
SyncIo=1, NonBufferedIo=1, WriteThrough=0
CcFlushCache() ends here
[4,1408] sfs_pagecache(1615) e=0

However, this behavior does not occur when using “copy” from a command
prompt or from a homegrown application that performs simple WriteFile()'s.
As far as I can tell, all FileObject, Irp, and create flags are the same.
Any ideas what could be causing the double write from CcFlushCache()? Any
ideas where else to look? Is is possible to tell within IRP_MJ_WRITE
routines where pages are dirty?

Is there any way to tune the page write size of the VMM? Currently, my
IRP_MJ_WRITE routine only receives 64KB or smaller requests during paged
I/O. The FSD is very good at sequentially allocating disk blocks, so
increasing request sizes would reduce the number of disk I/O’s.

Thanks in advance. Best regards,

Steve Soltis

OSR_Community_User · February 5, 2004, 6:24pm

For question 1 –

CcFlushCache() turns into a call to MmFlushSection so that the memory
manager will flush all dirty pages for the data section in the range
specified. The memory manager will only issue a paging write for pages
that the memory manager is tracking as being dirty.

The first set of writes in your trace coming from [864, 644] look like
cached writes – PagingIo flag is not set, NonBufferedIo flag is not
set.

The second set of writes look like writes from the cache manager’s lazy
writer thread – there are from the system process (process 4), in
sequential order, and the PagingIo and NonBufferedIo flags are set.

For question 2 –

No, there is no way for you to tune the memory manager’s maximum paging
write size – it’s 64K.

Thanks,
Molly Brown
Microsoft Corporation

This posting is provided “AS IS” with no warranties and confers no
rights.

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Steve Soltis
Sent: Thursday, February 05, 2004 1:32 PM
To: Windows File Systems Devs Interest List
Subject: [ntfsd] Paged I/O and CcFlushCache questions

Hi all,

I have two independent, through related, questions. Any help,
suggestions,
or pointers would be most appreciated.

When CcFlushCache() is called by the VMM or my FSD, multiple paged
I/O
requests are passed to the page handler of my IRP_MJ_WRITE routine. My
FSD
synchronously writes each request to disk. However, in some cases
CcFlushCache() performs two writes for every 64KB request.

For instance, the following output shows four writes from an Explorer
file
copy from a different. Process 4 then performs a sync operation which
calls
CcFlushCache(). CcFlushCache() triggers two paged writes for each offset
request. This trace is repeatable and I know for certain that the first
write (offset=0) makes it to disk.

[864,644] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=0,
SyncIo=1,
NonBufferedIo=0, WriteThrough=0
[864,644] sfs_write(1694) ino=114, offset=65536, size=65536, PageIo=0,
SyncIo=1, NonBufferedIo=0, WriteThrough=0
[864,644] sfs_write(1694) ino=114, offset=131072, size=65536, PageIo=0,
SyncIo=1, NonBufferedIo=0, WriteThrough=0
[864,644] sfs_write(1694) ino=114, offset=196608, size=21707, PageIo=0,
SyncIo=1, NonBufferedIo=0, WriteThrough=0
[4,1408] sfs_fsync(645) ino=114, difl=0x506e1, size=218315, vn=2,
cr=1076002386,669875000, line=1744
[4,1408] sfs_pagecache(1602) flush 114, difl=0x506e1, l=750
CcFlushCache() called here…
[4,1408] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=1,
SyncIo=1,
NonBufferedIo=1, WriteThrough=0
[4,1408] sfs_write(1694) ino=114, offset=65536, size=65536, PageIo=1,
SyncIo=1, NonBufferedIo=1, WriteThrough=0
[4,1408] sfs_write(1694) ino=114, offset=131072, size=65536, PageIo=1,
SyncIo=1, NonBufferedIo=1, WriteThrough=0
[4,1408] sfs_write(1694) ino=114, offset=196608, size=24576, PageIo=1,
SyncIo=1, NonBufferedIo=1, WriteThrough=0
[4,1408] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=1,
SyncIo=1,
NonBufferedIo=1, WriteThrough=0
[4,1408] sfs_write(1694) ino=114, offset=65536, size=65536, PageIo=1,
SyncIo=1, NonBufferedIo=1, WriteThrough=0
[4,1408] sfs_write(1694) ino=114, offset=131072, size=65536, PageIo=1,
SyncIo=1, NonBufferedIo=1, WriteThrough=0
[4,1408] sfs_write(1694) ino=114, offset=196608, size=24576, PageIo=1,
SyncIo=1, NonBufferedIo=1, WriteThrough=0
CcFlushCache() ends here
[4,1408] sfs_pagecache(1615) e=0

However, this behavior does not occur when using “copy” from a command
prompt or from a homegrown application that performs simple
WriteFile()'s.
As far as I can tell, all FileObject, Irp, and create flags are the
same.
Any ideas what could be causing the double write from CcFlushCache()?
Any
ideas where else to look? Is is possible to tell within IRP_MJ_WRITE
routines where pages are dirty?

Is there any way to tune the page write size of the VMM? Currently,
my
IRP_MJ_WRITE routine only receives 64KB or smaller requests during paged
I/O. The FSD is very good at sequentially allocating disk blocks, so
increasing request sizes would reduce the number of disk I/O’s.

Thanks in advance. Best regards,

Steve Soltis

Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@windows.microsoft.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

OSR_Community_User · February 5, 2004, 6:38pm

Steve,

This is normal behavior - I actually talk about this in file systems
class because it hits WORM media people particulary hard.

Essentially, there are two places where Windows tracks if a given page
is dirty - one is in the Page Table Entry that references the physical
memory and the other is in the Page Frame Database the Memory Manager
maintains to track the state of the page.

Combine this with two background threads: one from the Cache Manager
(the “lazy writer”) and one from the Memory Manager (the “modified page
writer” along with its sidekick the “mapped page writer”). When the
Cache Manager wants to write data out to a file, it flushes the state of
the dirty bit in the PTE back to the PFN and then asked the Memory
Manager to write out the page (or range of pages generally). So this is
what happens when the lazy writer scribbles it out.

When the Memory Manager writes out the page, it clears the dirty bit in
the PFN entry. It does not change, nor does it know about the state of
the PTE bit.

When a file is extended, the pages are zero-filled (generally they are
demand zero) and the initial write on the memory causes an allocation of
a page of zeros, marks the page as dirty in the PFN and of course the
hardware marks the page as dirty in the PTE.

The race: if the lazy writer gets to the page FIRST, then it flushes out
the PTE dirty bit (clears it) and then asks the Memory Manager to write
back the dirty page. If the modified page writer gets there first, it
writes the page, clears the PFN bit, and THEN when the lazy writer gets
there it flushes out the PTE dirty bit (clears it) and asks the Memory
Manager to write back the “dirty” page.

Thus, the reason you are seeing different behavior is because - for
whatever reason - you are getting different ordering of events. Instead
of focusing on the write events, look at the file size modification
events and the write-through bits, things that would definitely affect
the order in which I/O operations occur.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Steve Soltis
Sent: Thursday, February 05, 2004 4:32 PM
To: ntfsd redirect
Subject: [ntfsd] Paged I/O and CcFlushCache questions

Hi all,

I have two independent, through related, questions. Any help,
suggestions, or pointers would be most appreciated.

When CcFlushCache() is called by the VMM or my FSD, multiple paged
I/O requests are passed to the page handler of my IRP_MJ_WRITE routine.
My FSD synchronously writes each request to disk. However, in some cases
CcFlushCache() performs two writes for every 64KB request.

For instance, the following output shows four writes from an Explorer
file copy from a different. Process 4 then performs a sync operation
which calls CcFlushCache(). CcFlushCache() triggers two paged writes for
each offset request. This trace is repeatable and I know for certain
that the first write (offset=0) makes it to disk.

[864,644] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=0,
SyncIo=1, NonBufferedIo=0, WriteThrough=0 [864,644] sfs_write(1694)
ino=114, offset=65536, size=65536, PageIo=0, SyncIo=1, NonBufferedIo=0,
WriteThrough=0 [864,644] sfs_write(1694) ino=114, offset=131072,
size=65536, PageIo=0, SyncIo=1, NonBufferedIo=0, WriteThrough=0
[864,644] sfs_write(1694) ino=114, offset=196608, size=21707, PageIo=0,
SyncIo=1, NonBufferedIo=0, WriteThrough=0 [4,1408] sfs_fsync(645)
ino=114, difl=0x506e1, size=218315, vn=2, cr=1076002386,669875000,
line=1744 [4,1408] sfs_pagecache(1602) flush 114, difl=0x506e1, l=750
CcFlushCache() called here…
[4,1408] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=1,
SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694)
ino=114, offset=65536, size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1,
WriteThrough=0 [4,1408] sfs_write(1694) ino=114, offset=131072,
size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408]
sfs_write(1694) ino=114, offset=196608, size=24576, PageIo=1, SyncIo=1,
NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694) ino=114,
offset=0, size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1,
WriteThrough=0 [4,1408] sfs_write(1694) ino=114, offset=65536,
size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408]
sfs_write(1694) ino=114, offset=131072, size=65536, PageIo=1, SyncIo=1,
NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694) ino=114,
offset=196608, size=24576, PageIo=1, SyncIo=1, NonBufferedIo=1,
WriteThrough=0
CcFlushCache() ends here
[4,1408] sfs_pagecache(1615) e=0

However, this behavior does not occur when using “copy” from a command
prompt or from a homegrown application that performs simple
WriteFile()'s.
As far as I can tell, all FileObject, Irp, and create flags are the
same.
Any ideas what could be causing the double write from CcFlushCache()?
Any ideas where else to look? Is is possible to tell within IRP_MJ_WRITE
routines where pages are dirty?

Is there any way to tune the page write size of the VMM? Currently,
my IRP_MJ_WRITE routine only receives 64KB or smaller requests during
paged I/O. The FSD is very good at sequentially allocating disk blocks,
so increasing request sizes would reduce the number of disk I/O’s.

Thanks in advance. Best regards,

Steve Soltis

Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@osr.com To unsubscribe
send a blank email to xxxxx@lists.osr.com

OSR_Community_User · February 5, 2004, 6:39pm

And for the 2nd question…

64KB is the largest I/O size supported. From ntifs.h:

//
// Define maximum disk transfer size to be used by MM and Cache Manager,
// so that packet-oriented disk drivers can optimize their packet
allocation
// to this size.
//

#define MM_MAXIMUM_DISK_IO_SIZE (0x10000)

So there you have it!

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Steve Soltis
Sent: Thursday, February 05, 2004 4:32 PM
To: ntfsd redirect
Subject: [ntfsd] Paged I/O and CcFlushCache questions

Hi all,

I have two independent, through related, questions. Any help,
suggestions, or pointers would be most appreciated.

When CcFlushCache() is called by the VMM or my FSD, multiple paged
I/O requests are passed to the page handler of my IRP_MJ_WRITE routine.
My FSD synchronously writes each request to disk. However, in some cases
CcFlushCache() performs two writes for every 64KB request.

For instance, the following output shows four writes from an Explorer
file copy from a different. Process 4 then performs a sync operation
which calls CcFlushCache(). CcFlushCache() triggers two paged writes for
each offset request. This trace is repeatable and I know for certain
that the first write (offset=0) makes it to disk.

[864,644] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=0,
SyncIo=1, NonBufferedIo=0, WriteThrough=0 [864,644] sfs_write(1694)
ino=114, offset=65536, size=65536, PageIo=0, SyncIo=1, NonBufferedIo=0,
WriteThrough=0 [864,644] sfs_write(1694) ino=114, offset=131072,
size=65536, PageIo=0, SyncIo=1, NonBufferedIo=0, WriteThrough=0
[864,644] sfs_write(1694) ino=114, offset=196608, size=21707, PageIo=0,
SyncIo=1, NonBufferedIo=0, WriteThrough=0 [4,1408] sfs_fsync(645)
ino=114, difl=0x506e1, size=218315, vn=2, cr=1076002386,669875000,
line=1744 [4,1408] sfs_pagecache(1602) flush 114, difl=0x506e1, l=750
CcFlushCache() called here…
[4,1408] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=1,
SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694)
ino=114, offset=65536, size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1,
WriteThrough=0 [4,1408] sfs_write(1694) ino=114, offset=131072,
size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408]
sfs_write(1694) ino=114, offset=196608, size=24576, PageIo=1, SyncIo=1,
NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694) ino=114,
offset=0, size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1,
WriteThrough=0 [4,1408] sfs_write(1694) ino=114, offset=65536,
size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408]
sfs_write(1694) ino=114, offset=131072, size=65536, PageIo=1, SyncIo=1,
NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694) ino=114,
offset=196608, size=24576, PageIo=1, SyncIo=1, NonBufferedIo=1,
WriteThrough=0
CcFlushCache() ends here
[4,1408] sfs_pagecache(1615) e=0

However, this behavior does not occur when using “copy” from a command
prompt or from a homegrown application that performs simple
WriteFile()'s.
As far as I can tell, all FileObject, Irp, and create flags are the
same.
Any ideas what could be causing the double write from CcFlushCache()?
Any ideas where else to look? Is is possible to tell within IRP_MJ_WRITE
routines where pages are dirty?

Is there any way to tune the page write size of the VMM? Currently,
my IRP_MJ_WRITE routine only receives 64KB or smaller requests during
paged I/O. The FSD is very good at sequentially allocating disk blocks,
so increasing request sizes would reduce the number of disk I/O’s.

Thanks in advance. Best regards,

Steve Soltis

Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@osr.com To unsubscribe
send a blank email to xxxxx@lists.osr.com

Nick_Ryan · February 6, 2004, 12:06am

When the Lazy Writer flushes PTEs that are marked dirty but are backed
by non-dirty PFNs (i.e. MPW got there first), will Mm simply ignore the
request, or will there be a performance hit?

Tony Mason wrote:

Steve,

This is normal behavior - I actually talk about this in file systems
class because it hits WORM media people particulary hard.

Essentially, there are two places where Windows tracks if a given page
is dirty - one is in the Page Table Entry that references the physical
memory and the other is in the Page Frame Database the Memory Manager
maintains to track the state of the page.

Combine this with two background threads: one from the Cache Manager
(the “lazy writer”) and one from the Memory Manager (the “modified page
writer” along with its sidekick the “mapped page writer”). When the
Cache Manager wants to write data out to a file, it flushes the state of
the dirty bit in the PTE back to the PFN and then asked the Memory
Manager to write out the page (or range of pages generally). So this is
what happens when the lazy writer scribbles it out.

When the Memory Manager writes out the page, it clears the dirty bit in
the PFN entry. It does not change, nor does it know about the state of
the PTE bit.

When a file is extended, the pages are zero-filled (generally they are
demand zero) and the initial write on the memory causes an allocation of
a page of zeros, marks the page as dirty in the PFN and of course the
hardware marks the page as dirty in the PTE.

The race: if the lazy writer gets to the page FIRST, then it flushes out
the PTE dirty bit (clears it) and then asks the Memory Manager to write
back the dirty page. If the modified page writer gets there first, it
writes the page, clears the PFN bit, and THEN when the lazy writer gets
there it flushes out the PTE dirty bit (clears it) and asks the Memory
Manager to write back the “dirty” page.

Thus, the reason you are seeing different behavior is because - for
whatever reason - you are getting different ordering of events. Instead
of focusing on the write events, look at the file size modification
events and the write-through bits, things that would definitely affect
the order in which I/O operations occur.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Steve Soltis
Sent: Thursday, February 05, 2004 4:32 PM
To: ntfsd redirect
Subject: [ntfsd] Paged I/O and CcFlushCache questions

Hi all,

I have two independent, through related, questions. Any help,
suggestions, or pointers would be most appreciated.

When CcFlushCache() is called by the VMM or my FSD, multiple paged
I/O requests are passed to the page handler of my IRP_MJ_WRITE routine.
My FSD synchronously writes each request to disk. However, in some cases
CcFlushCache() performs two writes for every 64KB request.

For instance, the following output shows four writes from an Explorer
file copy from a different. Process 4 then performs a sync operation
which calls CcFlushCache(). CcFlushCache() triggers two paged writes for
each offset request. This trace is repeatable and I know for certain
that the first write (offset=0) makes it to disk.

[864,644] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=0,
SyncIo=1, NonBufferedIo=0, WriteThrough=0 [864,644] sfs_write(1694)
ino=114, offset=65536, size=65536, PageIo=0, SyncIo=1, NonBufferedIo=0,
WriteThrough=0 [864,644] sfs_write(1694) ino=114, offset=131072,
size=65536, PageIo=0, SyncIo=1, NonBufferedIo=0, WriteThrough=0
[864,644] sfs_write(1694) ino=114, offset=196608, size=21707, PageIo=0,
SyncIo=1, NonBufferedIo=0, WriteThrough=0 [4,1408] sfs_fsync(645)
ino=114, difl=0x506e1, size=218315, vn=2, cr=1076002386,669875000,
line=1744 [4,1408] sfs_pagecache(1602) flush 114, difl=0x506e1, l=750
CcFlushCache() called here…
[4,1408] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=1,
SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694)
ino=114, offset=65536, size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1,
WriteThrough=0 [4,1408] sfs_write(1694) ino=114, offset=131072,
size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408]
sfs_write(1694) ino=114, offset=196608, size=24576, PageIo=1, SyncIo=1,
NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694) ino=114,
offset=0, size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1,
WriteThrough=0 [4,1408] sfs_write(1694) ino=114, offset=65536,
size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408]
sfs_write(1694) ino=114, offset=131072, size=65536, PageIo=1, SyncIo=1,
NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694) ino=114,
offset=196608, size=24576, PageIo=1, SyncIo=1, NonBufferedIo=1,
WriteThrough=0
CcFlushCache() ends here
[4,1408] sfs_pagecache(1615) e=0

However, this behavior does not occur when using “copy” from a command
prompt or from a homegrown application that performs simple
WriteFile()'s.
As far as I can tell, all FileObject, Irp, and create flags are the
same.
Any ideas what could be causing the double write from CcFlushCache()?
Any ideas where else to look? Is is possible to tell within IRP_MJ_WRITE
routines where pages are dirty?

Is there any way to tune the page write size of the VMM? Currently,
my IRP_MJ_WRITE routine only receives 64KB or smaller requests during
paged I/O. The FSD is very good at sequentially allocating disk blocks,
so increasing request sizes would reduce the number of disk I/O’s.

Thanks in advance. Best regards,

Steve Soltis

Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@osr.com To unsubscribe
send a blank email to xxxxx@lists.osr.com

–

Nick Ryan
Microsoft MVP for DDK

OSR_Community_User · February 6, 2004, 6:03am

Nick,

By the time Mm receives the request to flush the pages, the PFN entries
ARE marked as dirty. The dirty bit in the PTE is cleared, the bit in
the PFN is set, and then the page is written out. We’ve observed this
behavior consistently over they years and I always remark on it in file
systems class because it hits WORM people particularly hard.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Nick Ryan
Sent: Friday, February 06, 2004 12:05 AM
To: ntfsd redirect
Subject: Re:[ntfsd] Paged I/O and CcFlushCache questions

When the Lazy Writer flushes PTEs that are marked dirty but are backed
by non-dirty PFNs (i.e. MPW got there first), will Mm simply ignore the
request, or will there be a performance hit?

Tony Mason wrote:

Steve,

This is normal behavior - I actually talk about this in file systems
class because it hits WORM media people particulary hard.

Essentially, there are two places where Windows tracks if a given page

is dirty - one is in the Page Table Entry that references the physical

memory and the other is in the Page Frame Database the Memory Manager
maintains to track the state of the page.

Combine this with two background threads: one from the Cache Manager
(the “lazy writer”) and one from the Memory Manager (the “modified
page writer” along with its sidekick the “mapped page writer”). When
the Cache Manager wants to write data out to a file, it flushes the
state of the dirty bit in the PTE back to the PFN and then asked the
Memory Manager to write out the page (or range of pages generally).
So this is what happens when the lazy writer scribbles it out.

When the Memory Manager writes out the page, it clears the dirty bit
in the PFN entry. It does not change, nor does it know about the
state of the PTE bit.

When a file is extended, the pages are zero-filled (generally they are

demand zero) and the initial write on the memory causes an allocation
of a page of zeros, marks the page as dirty in the PFN and of course
the hardware marks the page as dirty in the PTE.

The race: if the lazy writer gets to the page FIRST, then it flushes
out the PTE dirty bit (clears it) and then asks the Memory Manager to
write back the dirty page. If the modified page writer gets there
first, it writes the page, clears the PFN bit, and THEN when the lazy
writer gets there it flushes out the PTE dirty bit (clears it) and
asks the Memory Manager to write back the “dirty” page.

Thus, the reason you are seeing different behavior is because - for
whatever reason - you are getting different ordering of events.
Instead of focusing on the write events, look at the file size
modification events and the write-through bits, things that would
definitely affect the order in which I/O operations occur.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Steve Soltis
Sent: Thursday, February 05, 2004 4:32 PM
To: ntfsd redirect
Subject: [ntfsd] Paged I/O and CcFlushCache questions

Hi all,

I have two independent, through related, questions. Any help,
suggestions, or pointers would be most appreciated.

When CcFlushCache() is called by the VMM or my FSD, multiple paged
I/O requests are passed to the page handler of my IRP_MJ_WRITE
routine.
My FSD synchronously writes each request to disk. However, in some
cases
CcFlushCache() performs two writes for every 64KB request.

For instance, the following output shows four writes from an Explorer
file copy from a different. Process 4 then performs a sync operation
which calls CcFlushCache(). CcFlushCache() triggers two paged writes
for each offset request. This trace is repeatable and I know for
certain that the first write (offset=0) makes it to disk.

[864,644] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=0,
SyncIo=1, NonBufferedIo=0, WriteThrough=0 [864,644] sfs_write(1694)
ino=114, offset=65536, size=65536, PageIo=0, SyncIo=1,
NonBufferedIo=0, WriteThrough=0 [864,644] sfs_write(1694) ino=114,
offset=131072, size=65536, PageIo=0, SyncIo=1, NonBufferedIo=0,
WriteThrough=0 [864,644] sfs_write(1694) ino=114, offset=196608,
size=21707, PageIo=0, SyncIo=1, NonBufferedIo=0, WriteThrough=0
[4,1408] sfs_fsync(645) ino=114, difl=0x506e1, size=218315, vn=2,
cr=1076002386,669875000,
line=1744 [4,1408] sfs_pagecache(1602) flush 114, difl=0x506e1, l=750
CcFlushCache() called here…
[4,1408] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=1,
SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694)
ino=114, offset=65536, size=65536, PageIo=1, SyncIo=1,
NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694) ino=114,
offset=131072, size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1,
WriteThrough=0 [4,1408]
sfs_write(1694) ino=114, offset=196608, size=24576, PageIo=1,
SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694)
ino=114, offset=0, size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1,
WriteThrough=0 [4,1408] sfs_write(1694) ino=114, offset=65536,
size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1, WriteThrough=0
[4,1408]
sfs_write(1694) ino=114, offset=131072, size=65536, PageIo=1,
SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694)
ino=114, offset=196608, size=24576, PageIo=1, SyncIo=1,
NonBufferedIo=1, WriteThrough=0
CcFlushCache() ends here
[4,1408] sfs_pagecache(1615) e=0

However, this behavior does not occur when using “copy” from a command

prompt or from a homegrown application that performs simple
WriteFile()'s.
As far as I can tell, all FileObject, Irp, and create flags are the
same.
Any ideas what could be causing the double write from CcFlushCache()?
Any ideas where else to look? Is is possible to tell within
IRP_MJ_WRITE routines where pages are dirty?

Is there any way to tune the page write size of the VMM? Currently,

my IRP_MJ_WRITE routine only receives 64KB or smaller requests during
paged I/O. The FSD is very good at sequentially allocating disk
blocks, so increasing request sizes would reduce the number of disk
I/O’s.

Thanks in advance. Best regards,

Steve Soltis

Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@osr.com To unsubscribe

send a blank email to xxxxx@lists.osr.com

–

Nick Ryan
Microsoft MVP for DDK

Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@osr.com To unsubscribe
send a blank email to xxxxx@lists.osr.com

Nick_Ryan · February 6, 2004, 11:36pm

When the MPW flushes a page, can’t it backtrack through the PFN to the
PTE to clear the PTE’s dirty flag before clearing the PFN’s dirty flag
and flushing it? From what I understand a PFN holds a back reference to
the PTE pointing to it (for unshared pages) or to the PPTE which back
references multiple PTEs (for shared pages).

Tony Mason wrote:

Nick,

By the time Mm receives the request to flush the pages, the PFN entries
ARE marked as dirty. The dirty bit in the PTE is cleared, the bit in
the PFN is set, and then the page is written out. We’ve observed this
behavior consistently over they years and I always remark on it in file
systems class because it hits WORM people particularly hard.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Nick Ryan
Sent: Friday, February 06, 2004 12:05 AM
To: ntfsd redirect
Subject: Re:[ntfsd] Paged I/O and CcFlushCache questions

When the Lazy Writer flushes PTEs that are marked dirty but are backed
by non-dirty PFNs (i.e. MPW got there first), will Mm simply ignore the
request, or will there be a performance hit?

Tony Mason wrote:

>Steve,
>
>This is normal behavior - I actually talk about this in file systems
>class because it hits WORM media people particulary hard.
>
>Essentially, there are two places where Windows tracks if a given page

>is dirty - one is in the Page Table Entry that references the physical

>memory and the other is in the Page Frame Database the Memory Manager
>maintains to track the state of the page.
>
>Combine this with two background threads: one from the Cache Manager
>(the “lazy writer”) and one from the Memory Manager (the “modified
>page writer” along with its sidekick the “mapped page writer”). When
>the Cache Manager wants to write data out to a file, it flushes the
>state of the dirty bit in the PTE back to the PFN and then asked the
>Memory Manager to write out the page (or range of pages generally).
>So this is what happens when the lazy writer scribbles it out.
>
>When the Memory Manager writes out the page, it clears the dirty bit
>in the PFN entry. It does not change, nor does it know about the
>state of the PTE bit.
>
>When a file is extended, the pages are zero-filled (generally they are

>demand zero) and the initial write on the memory causes an allocation
>of a page of zeros, marks the page as dirty in the PFN and of course
>the hardware marks the page as dirty in the PTE.
>
>The race: if the lazy writer gets to the page FIRST, then it flushes
>out the PTE dirty bit (clears it) and then asks the Memory Manager to
>write back the dirty page. If the modified page writer gets there
>first, it writes the page, clears the PFN bit, and THEN when the lazy
>writer gets there it flushes out the PTE dirty bit (clears it) and
>asks the Memory Manager to write back the “dirty” page.
>
>Thus, the reason you are seeing different behavior is because - for
>whatever reason - you are getting different ordering of events.
>Instead of focusing on the write events, look at the file size
>modification events and the write-through bits, things that would
>definitely affect the order in which I/O operations occur.
>
>Regards,
>
>Tony
>
>Tony Mason
>Consulting Partner
>OSR Open Systems Resources, Inc.
>http://www.osr.com
>
>
>-----Original Message-----
>From: xxxxx@lists.osr.com
>[mailto:xxxxx@lists.osr.com] On Behalf Of Steve Soltis
>Sent: Thursday, February 05, 2004 4:32 PM
>To: ntfsd redirect
>Subject: [ntfsd] Paged I/O and CcFlushCache questions
>
>Hi all,
>
>I have two independent, through related, questions. Any help,
>suggestions, or pointers would be most appreciated.
>
>1. When CcFlushCache() is called by the VMM or my FSD, multiple paged
>I/O requests are passed to the page handler of my IRP_MJ_WRITE

routine.

>My FSD synchronously writes each request to disk. However, in some
>cases
>CcFlushCache() performs two writes for every 64KB request.
>
>For instance, the following output shows four writes from an Explorer
>file copy from a different. Process 4 then performs a sync operation
>which calls CcFlushCache(). CcFlushCache() triggers two paged writes
>for each offset request. This trace is repeatable and I know for
>certain that the first write (offset=0) makes it to disk.
>
>[864,644] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=0,
>SyncIo=1, NonBufferedIo=0, WriteThrough=0 [864,644] sfs_write(1694)
>ino=114, offset=65536, size=65536, PageIo=0, SyncIo=1,
>NonBufferedIo=0, WriteThrough=0 [864,644] sfs_write(1694) ino=114,
>offset=131072, size=65536, PageIo=0, SyncIo=1, NonBufferedIo=0,
>WriteThrough=0 [864,644] sfs_write(1694) ino=114, offset=196608,
>size=21707, PageIo=0, SyncIo=1, NonBufferedIo=0, WriteThrough=0
>[4,1408] sfs_fsync(645) ino=114, difl=0x506e1, size=218315, vn=2,
>cr=1076002386,669875000,
>line=1744 [4,1408] sfs_pagecache(1602) flush 114, difl=0x506e1, l=750
>CcFlushCache() called here…
>[4,1408] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=1,
>SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694)
>ino=114, offset=65536, size=65536, PageIo=1, SyncIo=1,
>NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694) ino=114,
>offset=131072, size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1,
>WriteThrough=0 [4,1408]
>sfs_write(1694) ino=114, offset=196608, size=24576, PageIo=1,
>SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694)
>ino=114, offset=0, size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1,
>WriteThrough=0 [4,1408] sfs_write(1694) ino=114, offset=65536,
>size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1, WriteThrough=0
>[4,1408]
>sfs_write(1694) ino=114, offset=131072, size=65536, PageIo=1,
>SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694)
>ino=114, offset=196608, size=24576, PageIo=1, SyncIo=1,
>NonBufferedIo=1, WriteThrough=0
>CcFlushCache() ends here
>[4,1408] sfs_pagecache(1615) e=0
>
>However, this behavior does not occur when using “copy” from a command

>prompt or from a homegrown application that performs simple
>WriteFile()'s.
>As far as I can tell, all FileObject, Irp, and create flags are the
>same.
>Any ideas what could be causing the double write from CcFlushCache()?
>Any ideas where else to look? Is is possible to tell within
>IRP_MJ_WRITE routines where pages are dirty?
>
>2. Is there any way to tune the page write size of the VMM? Currently,

>my IRP_MJ_WRITE routine only receives 64KB or smaller requests during
>paged I/O. The FSD is very good at sequentially allocating disk
>blocks, so increasing request sizes would reduce the number of disk

I/O’s.

>Thanks in advance. Best regards,
>
>Steve Soltis
>
>
>
>—
>Questions? First check the IFS FAQ at
>https://www.osronline.com/article.cfm?id=17
>
>You are currently subscribed to ntfsd as: xxxxx@osr.com To unsubscribe

>send a blank email to xxxxx@lists.osr.com
>
>
>

–

Nick Ryan

Microsoft MVP for DDK

Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@osr.com To unsubscribe
send a blank email to xxxxx@lists.osr.com

–

Nick Ryan
Microsoft MVP for DDK

OSR_Community_User · February 7, 2004, 12:31am

All mapped files are, by their nature, shared. Unless something has
changed again (which is certainly possible) there are no mappings from
the PFN to the PTEs that reference it (what is called an “inverted page
table”.) Thus, there’s really no way to clear the dirty bits.

Of course, in the final analysis, I suspect *I* overthought the original
question. Molly nailed it on the head when she pointed out that the
first round of his writes were user I/O (cached, not paging) and the
second paging I/O.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Nick Ryan
Sent: Friday, February 06, 2004 11:36 PM
To: ntfsd redirect
Subject: Re:[ntfsd] Paged I/O and CcFlushCache questions

When the MPW flushes a page, can’t it backtrack through the PFN to the
PTE to clear the PTE’s dirty flag before clearing the PFN’s dirty flag
and flushing it? From what I understand a PFN holds a back reference to
the PTE pointing to it (for unshared pages) or to the PPTE which back
references multiple PTEs (for shared pages).

Tony Mason wrote:

Nick,

By the time Mm receives the request to flush the pages, the PFN
entries
ARE marked as dirty. The dirty bit in the PTE is cleared, the bit in
the PFN is set, and then the page is written out. We’ve observed this
behavior consistently over they years and I always remark on it in
file
systems class because it hits WORM people particularly hard.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Nick Ryan
Sent: Friday, February 06, 2004 12:05 AM
To: ntfsd redirect
Subject: Re:[ntfsd] Paged I/O and CcFlushCache questions

When the Lazy Writer flushes PTEs that are marked dirty but are backed
by non-dirty PFNs (i.e. MPW got there first), will Mm simply ignore
the
request, or will there be a performance hit?

Tony Mason wrote:

>Steve,
>
>This is normal behavior - I actually talk about this in file systems
>class because it hits WORM media people particulary hard.
>
>Essentially, there are two places where Windows tracks if a given page

>is dirty - one is in the Page Table Entry that references the physical

>memory and the other is in the Page Frame Database the Memory Manager
>maintains to track the state of the page.
>
>Combine this with two background threads: one from the Cache Manager
>(the “lazy writer”) and one from the Memory Manager (the “modified
>page writer” along with its sidekick the “mapped page writer”). When
>the Cache Manager wants to write data out to a file, it flushes the
>state of the dirty bit in the PTE back to the PFN and then asked the
>Memory Manager to write out the page (or range of pages generally).
>So this is what happens when the lazy writer scribbles it out.
>
>When the Memory Manager writes out the page, it clears the dirty bit
>in the PFN entry. It does not change, nor does it know about the
>state of the PTE bit.
>
>When a file is extended, the pages are zero-filled (generally they are

>demand zero) and the initial write on the memory causes an allocation
>of a page of zeros, marks the page as dirty in the PFN and of course
>the hardware marks the page as dirty in the PTE.
>
>The race: if the lazy writer gets to the page FIRST, then it flushes
>out the PTE dirty bit (clears it) and then asks the Memory Manager to
>write back the dirty page. If the modified page writer gets there
>first, it writes the page, clears the PFN bit, and THEN when the lazy
>writer gets there it flushes out the PTE dirty bit (clears it) and
>asks the Memory Manager to write back the “dirty” page.
>
>Thus, the reason you are seeing different behavior is because - for
>whatever reason - you are getting different ordering of events.
>Instead of focusing on the write events, look at the file size
>modification events and the write-through bits, things that would
>definitely affect the order in which I/O operations occur.
>
>Regards,
>
>Tony
>
>Tony Mason
>Consulting Partner
>OSR Open Systems Resources, Inc.
>http://www.osr.com
>
>
>-----Original Message-----
>From: xxxxx@lists.osr.com
>[mailto:xxxxx@lists.osr.com] On Behalf Of Steve Soltis
>Sent: Thursday, February 05, 2004 4:32 PM
>To: ntfsd redirect
>Subject: [ntfsd] Paged I/O and CcFlushCache questions
>
>Hi all,
>
>I have two independent, through related, questions. Any help,
>suggestions, or pointers would be most appreciated.
>
>1. When CcFlushCache() is called by the VMM or my FSD, multiple paged
>I/O requests are passed to the page handler of my IRP_MJ_WRITE

routine.

>My FSD synchronously writes each request to disk. However, in some
>cases
>CcFlushCache() performs two writes for every 64KB request.
>
>For instance, the following output shows four writes from an Explorer
>file copy from a different. Process 4 then performs a sync operation
>which calls CcFlushCache(). CcFlushCache() triggers two paged writes
>for each offset request. This trace is repeatable and I know for
>certain that the first write (offset=0) makes it to disk.
>
>[864,644] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=0,
>SyncIo=1, NonBufferedIo=0, WriteThrough=0 [864,644] sfs_write(1694)
>ino=114, offset=65536, size=65536, PageIo=0, SyncIo=1,
>NonBufferedIo=0, WriteThrough=0 [864,644] sfs_write(1694) ino=114,
>offset=131072, size=65536, PageIo=0, SyncIo=1, NonBufferedIo=0,
>WriteThrough=0 [864,644] sfs_write(1694) ino=114, offset=196608,
>size=21707, PageIo=0, SyncIo=1, NonBufferedIo=0, WriteThrough=0
>[4,1408] sfs_fsync(645) ino=114, difl=0x506e1, size=218315, vn=2,
>cr=1076002386,669875000,
>line=1744 [4,1408] sfs_pagecache(1602) flush 114, difl=0x506e1, l=750
>CcFlushCache() called here…
>[4,1408] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=1,
>SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694)
>ino=114, offset=65536, size=65536, PageIo=1, SyncIo=1,
>NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694) ino=114,
>offset=131072, size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1,
>WriteThrough=0 [4,1408]
>sfs_write(1694) ino=114, offset=196608, size=24576, PageIo=1,
>SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694)
>ino=114, offset=0, size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1,
>WriteThrough=0 [4,1408] sfs_write(1694) ino=114, offset=65536,
>size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1, WriteThrough=0
>[4,1408]
>sfs_write(1694) ino=114, offset=131072, size=65536, PageIo=1,
>SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694)
>ino=114, offset=196608, size=24576, PageIo=1, SyncIo=1,
>NonBufferedIo=1, WriteThrough=0
>CcFlushCache() ends here
>[4,1408] sfs_pagecache(1615) e=0
>
>However, this behavior does not occur when using “copy” from a command

>prompt or from a homegrown application that performs simple
>WriteFile()'s.
>As far as I can tell, all FileObject, Irp, and create flags are the
>same.
>Any ideas what could be causing the double write from CcFlushCache()?
>Any ideas where else to look? Is is possible to tell within
>IRP_MJ_WRITE routines where pages are dirty?
>
>2. Is there any way to tune the page write size of the VMM? Currently,

>my IRP_MJ_WRITE routine only receives 64KB or smaller requests during
>paged I/O. The FSD is very good at sequentially allocating disk
>blocks, so increasing request sizes would reduce the number of disk

I/O’s.

>Thanks in advance. Best regards,
>
>Steve Soltis
>
>
>
>—
>Questions? First check the IFS FAQ at
>https://www.osronline.com/article.cfm?id=17
>
>You are currently subscribed to ntfsd as: xxxxx@osr.com To unsubscribe

>send a blank email to xxxxx@lists.osr.com
>
>
>

–

Nick Ryan

Microsoft MVP for DDK

Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@osr.com To unsubscribe
send a blank email to xxxxx@lists.osr.com

–

Nick Ryan
Microsoft MVP for DDK

Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@osr.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

OSR_Community_User · February 7, 2004, 1:02pm

Tony,

64KB is the largest I/O size supported. From ntifs.h:

Is there any clever (or not so clever) why to combine multiple 64KB requests
into larger ones before passing them to lower level drivers? Is there any
particular reason for this size?

I’m concerned that sending small, 64KB requests to the device drivers will
result in slow streaming performance. One idea is to buffer multiple writes
within the file system; however, this would requires some large memory
buffers and extra data copies.

Thanks,

Steve

“Tony Mason” wrote in message news:xxxxx@ntfsd…
And for the 2nd question…

64KB is the largest I/O size supported. From ntifs.h:

//
// Define maximum disk transfer size to be used by MM and Cache Manager,
// so that packet-oriented disk drivers can optimize their packet
allocation
// to this size.
//

#define MM_MAXIMUM_DISK_IO_SIZE (0x10000)

So there you have it!

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

OSR_Community_User · February 7, 2004, 1:02pm

Molly,

Thanks for the input. The trace in original posting is difficult to read
because the lines wrapped; however, the double writes I refer to are all
after the first three non-PagingIo writes. See “CcFlushCache() called
here…” message. I trust Tony’s explanation for the cause, though I am
still unsure if there is anything I can do to limit or eliminate the double
writes.

Best regards,

Steve

“Molly Brown” wrote in message
news:xxxxx@ntfsd…
For question 1 –

CcFlushCache() turns into a call to MmFlushSection so that the memory
manager will flush all dirty pages for the data section in the range
specified. The memory manager will only issue a paging write for pages
that the memory manager is tracking as being dirty.

The first set of writes in your trace coming from [864, 644] look like
cached writes – PagingIo flag is not set, NonBufferedIo flag is not
set.

The second set of writes look like writes from the cache manager’s lazy
writer thread – there are from the system process (process 4), in
sequential order, and the PagingIo and NonBufferedIo flags are set.

For question 2 –

No, there is no way for you to tune the memory manager’s maximum paging
write size – it’s 64K.

Thanks,
Molly Brown
Microsoft Corporation

This posting is provided “AS IS” with no warranties and confers no
rights.

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Steve Soltis
Sent: Thursday, February 05, 2004 1:32 PM
To: Windows File Systems Devs Interest List
Subject: [ntfsd] Paged I/O and CcFlushCache questions

Hi all,

I have two independent, through related, questions. Any help,
suggestions,
or pointers would be most appreciated.

1. When CcFlushCache() is called by the VMM or my FSD, multiple paged
I/O
requests are passed to the page handler of my IRP_MJ_WRITE routine. My
FSD
synchronously writes each request to disk. However, in some cases
CcFlushCache() performs two writes for every 64KB request.

For instance, the following output shows four writes from an Explorer
file
copy from a different. Process 4 then performs a sync operation which
calls
CcFlushCache(). CcFlushCache() triggers two paged writes for each offset
request. This trace is repeatable and I know for certain that the first
write (offset=0) makes it to disk.

[864,644] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=0,
SyncIo=1,
NonBufferedIo=0, WriteThrough=0
[864,644] sfs_write(1694) ino=114, offset=65536, size=65536, PageIo=0,
SyncIo=1, NonBufferedIo=0, WriteThrough=0
[864,644] sfs_write(1694) ino=114, offset=131072, size=65536, PageIo=0,
SyncIo=1, NonBufferedIo=0, WriteThrough=0
[864,644] sfs_write(1694) ino=114, offset=196608, size=21707, PageIo=0,
SyncIo=1, NonBufferedIo=0, WriteThrough=0
[4,1408] sfs_fsync(645) ino=114, difl=0x506e1, size=218315, vn=2,
cr=1076002386,669875000, line=1744
[4,1408] sfs_pagecache(1602) flush 114, difl=0x506e1, l=750
CcFlushCache() called here…
[4,1408] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=1,
SyncIo=1,
NonBufferedIo=1, WriteThrough=0
[4,1408] sfs_write(1694) ino=114, offset=65536, size=65536, PageIo=1,
SyncIo=1, NonBufferedIo=1, WriteThrough=0
[4,1408] sfs_write(1694) ino=114, offset=131072, size=65536, PageIo=1,
SyncIo=1, NonBufferedIo=1, WriteThrough=0
[4,1408] sfs_write(1694) ino=114, offset=196608, size=24576, PageIo=1,
SyncIo=1, NonBufferedIo=1, WriteThrough=0
[4,1408] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=1,
SyncIo=1,
NonBufferedIo=1, WriteThrough=0
[4,1408] sfs_write(1694) ino=114, offset=65536, size=65536, PageIo=1,
SyncIo=1, NonBufferedIo=1, WriteThrough=0
[4,1408] sfs_write(1694) ino=114, offset=131072, size=65536, PageIo=1,
SyncIo=1, NonBufferedIo=1, WriteThrough=0
[4,1408] sfs_write(1694) ino=114, offset=196608, size=24576, PageIo=1,
SyncIo=1, NonBufferedIo=1, WriteThrough=0
CcFlushCache() ends here
[4,1408] sfs_pagecache(1615) e=0

However, this behavior does not occur when using “copy” from a command
prompt or from a homegrown application that performs simple
WriteFile()'s.
As far as I can tell, all FileObject, Irp, and create flags are the
same.
Any ideas what could be causing the double write from CcFlushCache()?
Any
ideas where else to look? Is is possible to tell within IRP_MJ_WRITE
routines where pages are dirty?

2. Is there any way to tune the page write size of the VMM? Currently,
my
IRP_MJ_WRITE routine only receives 64KB or smaller requests during paged
I/O. The FSD is very good at sequentially allocating disk blocks, so
increasing request sizes would reduce the number of disk I/O’s.

Thanks in advance. Best regards,

Steve Soltis

—
Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@windows.microsoft.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

OSR_Community_User · February 7, 2004, 1:02pm

>Thus, the reason you are seeing different behavior is because - for

whatever reason - you are getting different ordering of events. Instead
of focusing on the write events, look at the file size modification
events and the write-through bits, things that would definitely affect
the order in which I/O operations occur.

Tony,

Thank you for the explanation. Just so I have it clear, within a single
call to CcFlushCache(), the race condition can cause the same pages to be
written twice.

Is there any workaround to avoid this situation? Or will I always have
double paged I/O writes on occasion? You
mention “look at file size modification events” instead of the write events.
What do you mean by this? I would prefer to avoid write-through caching, if
possible, and keep the I/O in the background.

Cheers,

Steve

“Tony Mason” wrote in message news:xxxxx@ntfsd…
Steve,

This is normal behavior - I actually talk about this in file systems
class because it hits WORM media people particulary hard.

Essentially, there are two places where Windows tracks if a given page
is dirty - one is in the Page Table Entry that references the physical
memory and the other is in the Page Frame Database the Memory Manager
maintains to track the state of the page.

Combine this with two background threads: one from the Cache Manager
(the “lazy writer”) and one from the Memory Manager (the “modified page
writer” along with its sidekick the “mapped page writer”). When the
Cache Manager wants to write data out to a file, it flushes the state of
the dirty bit in the PTE back to the PFN and then asked the Memory
Manager to write out the page (or range of pages generally). So this is
what happens when the lazy writer scribbles it out.

When the Memory Manager writes out the page, it clears the dirty bit in
the PFN entry. It does not change, nor does it know about the state of
the PTE bit.

When a file is extended, the pages are zero-filled (generally they are
demand zero) and the initial write on the memory causes an allocation of
a page of zeros, marks the page as dirty in the PFN and of course the
hardware marks the page as dirty in the PTE.

The race: if the lazy writer gets to the page FIRST, then it flushes out
the PTE dirty bit (clears it) and then asks the Memory Manager to write
back the dirty page. If the modified page writer gets there first, it
writes the page, clears the PFN bit, and THEN when the lazy writer gets
there it flushes out the PTE dirty bit (clears it) and asks the Memory
Manager to write back the “dirty” page.

Thus, the reason you are seeing different behavior is because - for
whatever reason - you are getting different ordering of events. Instead
of focusing on the write events, look at the file size modification
events and the write-through bits, things that would definitely affect
the order in which I/O operations occur.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

Nick_Ryan · February 7, 2004, 2:23pm

Hmm… the source I took this information from was “Inside Memory
Management, Part 2” by Russinovich:

http://www.winntmag.com/Articles/ArticleID/3774/pg/3/3.html

He says:

“In most cases, a PFN Database entry contains a pointer to a PTE that
references a page. However, if two or more processes share the same
page, multiple PTEs reference the page: one PTE in the virtual address
map of each process sharing the page. Instead of pointing the PFN
Database entry at one of these PTEs, the Memory Manager points the PFN
Database entry at a data structure the Memory Manager allocates, called
a Prototype PTE (PPTE), as Figure 3 shows. I’ll describe the way the
Memory Manager uses PPTEs to manage shared and mapped memory.”

Maybe the prototype PPTEs contain no back pointers, then. In WinDbg I
see a backpointer in _MMPFN but not in _MMPTE_PROTOTYPE.

Now that I think about it, you can allocate multiple MDLs for the same
physical pages and map those MDLs to different virtual addresses. In
this case there is no PPTE mechanism being used and thus the back
pointer in the PFN entry would point to only one of the PTEs mapping
that page.

Tony Mason wrote:

All mapped files are, by their nature, shared. Unless something has
changed again (which is certainly possible) there are no mappings from
the PFN to the PTEs that reference it (what is called an “inverted page
table”.) Thus, there’s really no way to clear the dirty bits.

Of course, in the final analysis, I suspect *I* overthought the original
question. Molly nailed it on the head when she pointed out that the
first round of his writes were user I/O (cached, not paging) and the
second paging I/O.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Nick Ryan
Sent: Friday, February 06, 2004 11:36 PM
To: ntfsd redirect
Subject: Re:[ntfsd] Paged I/O and CcFlushCache questions

When the MPW flushes a page, can’t it backtrack through the PFN to the
PTE to clear the PTE’s dirty flag before clearing the PFN’s dirty flag
and flushing it? From what I understand a PFN holds a back reference to
the PTE pointing to it (for unshared pages) or to the PPTE which back
references multiple PTEs (for shared pages).

Tony Mason wrote:

>Nick,
>
>By the time Mm receives the request to flush the pages, the PFN

entries

>ARE marked as dirty. The dirty bit in the PTE is cleared, the bit in
>the PFN is set, and then the page is written out. We’ve observed this
>behavior consistently over they years and I always remark on it in

file

>systems class because it hits WORM people particularly hard.
>
>Regards,
>
>Tony
>
>Tony Mason
>Consulting Partner
>OSR Open Systems Resources, Inc.
>http://www.osr.com
>
>
>-----Original Message-----
>From: xxxxx@lists.osr.com
>[mailto:xxxxx@lists.osr.com] On Behalf Of Nick Ryan
>Sent: Friday, February 06, 2004 12:05 AM
>To: ntfsd redirect
>Subject: Re:[ntfsd] Paged I/O and CcFlushCache questions
>
>When the Lazy Writer flushes PTEs that are marked dirty but are backed
>by non-dirty PFNs (i.e. MPW got there first), will Mm simply ignore

the

>request, or will there be a performance hit?
>
>Tony Mason wrote:
>
>
>
>>Steve,
>>
>>This is normal behavior - I actually talk about this in file systems
>>class because it hits WORM media people particulary hard.
>>
>>Essentially, there are two places where Windows tracks if a given page
>
>
>>is dirty - one is in the Page Table Entry that references the physical
>
>
>>memory and the other is in the Page Frame Database the Memory Manager
>>maintains to track the state of the page.
>>
>>Combine this with two background threads: one from the Cache Manager
>>(the “lazy writer”) and one from the Memory Manager (the “modified
>>page writer” along with its sidekick the “mapped page writer”). When
>>the Cache Manager wants to write data out to a file, it flushes the
>>state of the dirty bit in the PTE back to the PFN and then asked the
>>Memory Manager to write out the page (or range of pages generally).
>>So this is what happens when the lazy writer scribbles it out.
>>
>>When the Memory Manager writes out the page, it clears the dirty bit
>>in the PFN entry. It does not change, nor does it know about the
>>state of the PTE bit.
>>
>>When a file is extended, the pages are zero-filled (generally they are
>
>
>>demand zero) and the initial write on the memory causes an allocation
>>of a page of zeros, marks the page as dirty in the PFN and of course
>>the hardware marks the page as dirty in the PTE.
>>
>>The race: if the lazy writer gets to the page FIRST, then it flushes
>>out the PTE dirty bit (clears it) and then asks the Memory Manager to
>>write back the dirty page. If the modified page writer gets there
>>first, it writes the page, clears the PFN bit, and THEN when the lazy
>>writer gets there it flushes out the PTE dirty bit (clears it) and
>>asks the Memory Manager to write back the “dirty” page.
>>
>>Thus, the reason you are seeing different behavior is because - for
>>whatever reason - you are getting different ordering of events.
>>Instead of focusing on the write events, look at the file size
>>modification events and the write-through bits, things that would
>>definitely affect the order in which I/O operations occur.
>>
>>Regards,
>>
>>Tony
>>
>>Tony Mason
>>Consulting Partner
>>OSR Open Systems Resources, Inc.
>>http://www.osr.com
>>
>>
>>-----Original Message-----
>>From: xxxxx@lists.osr.com
>>[mailto:xxxxx@lists.osr.com] On Behalf Of Steve Soltis
>>Sent: Thursday, February 05, 2004 4:32 PM
>>To: ntfsd redirect
>>Subject: [ntfsd] Paged I/O and CcFlushCache questions
>>
>>Hi all,
>>
>>I have two independent, through related, questions. Any help,
>>suggestions, or pointers would be most appreciated.
>>
>>1. When CcFlushCache() is called by the VMM or my FSD, multiple paged
>>I/O requests are passed to the page handler of my IRP_MJ_WRITE
>
>routine.
>
>
>>My FSD synchronously writes each request to disk. However, in some
>>cases
>>CcFlushCache() performs two writes for every 64KB request.
>>
>>For instance, the following output shows four writes from an Explorer
>>file copy from a different. Process 4 then performs a sync operation
>>which calls CcFlushCache(). CcFlushCache() triggers two paged writes
>>for each offset request. This trace is repeatable and I know for
>>certain that the first write (offset=0) makes it to disk.
>>
>>[864,644] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=0,
>>SyncIo=1, NonBufferedIo=0, WriteThrough=0 [864,644] sfs_write(1694)
>>ino=114, offset=65536, size=65536, PageIo=0, SyncIo=1,
>>NonBufferedIo=0, WriteThrough=0 [864,644] sfs_write(1694) ino=114,
>>offset=131072, size=65536, PageIo=0, SyncIo=1, NonBufferedIo=0,
>>WriteThrough=0 [864,644] sfs_write(1694) ino=114, offset=196608,
>>size=21707, PageIo=0, SyncIo=1, NonBufferedIo=0, WriteThrough=0
>>[4,1408] sfs_fsync(645) ino=114, difl=0x506e1, size=218315, vn=2,
>>cr=1076002386,669875000,
>>line=1744 [4,1408] sfs_pagecache(1602) flush 114, difl=0x506e1, l=750
>>CcFlushCache() called here…
>>[4,1408] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=1,
>>SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694)
>>ino=114, offset=65536, size=65536, PageIo=1, SyncIo=1,
>>NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694) ino=114,
>>offset=131072, size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1,
>>WriteThrough=0 [4,1408]
>>sfs_write(1694) ino=114, offset=196608, size=24576, PageIo=1,
>>SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694)
>>ino=114, offset=0, size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1,
>>WriteThrough=0 [4,1408] sfs_write(1694) ino=114, offset=65536,
>>size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1, WriteThrough=0
>>[4,1408]
>>sfs_write(1694) ino=114, offset=131072, size=65536, PageIo=1,
>>SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694)
>>ino=114, offset=196608, size=24576, PageIo=1, SyncIo=1,
>>NonBufferedIo=1, WriteThrough=0
>>CcFlushCache() ends here
>>[4,1408] sfs_pagecache(1615) e=0
>>
>>However, this behavior does not occur when using “copy” from a command
>
>
>>prompt or from a homegrown application that performs simple
>>WriteFile()'s.
>>As far as I can tell, all FileObject, Irp, and create flags are the
>>same.
>>Any ideas what could be causing the double write from CcFlushCache()?
>>Any ideas where else to look? Is is possible to tell within
>>IRP_MJ_WRITE routines where pages are dirty?
>>
>>2. Is there any way to tune the page write size of the VMM? Currently,
>
>
>>my IRP_MJ_WRITE routine only receives 64KB or smaller requests during
>>paged I/O. The FSD is very good at sequentially allocating disk
>>blocks, so increasing request sizes would reduce the number of disk
>
>I/O’s.
>
>
>>Thanks in advance. Best regards,
>>
>>Steve Soltis
>>
>>
>>
>>—
>>Questions? First check the IFS FAQ at
>>https://www.osronline.com/article.cfm?id=17
>>
>>You are currently subscribed to ntfsd as: xxxxx@osr.com To unsubscribe
>
>
>>send a blank email to xxxxx@lists.osr.com
>>
>>
>>
>
>
>–
>- Nick Ryan
>- Microsoft MVP for DDK
>
>—
>Questions? First check the IFS FAQ at
>https://www.osronline.com/article.cfm?id=17
>
>You are currently subscribed to ntfsd as: xxxxx@osr.com To unsubscribe
>send a blank email to xxxxx@lists.osr.com
>
>
>

–

Nick Ryan
Microsoft MVP for DDK

OSR_Community_User · February 7, 2004, 5:47pm

Nick,

I would disagree with Mark on this point. Shared memory is far more
common than this would suggest, and any shared (or potentially shared)
memory is going to run through a prototype page table entry. Indeed, if
you look at the control area structure (WinDBG: “dt nt!_CONTROL_AREA”)
you will notice that part of what it tracks are the prototype page table
entries associated with the given section.

For file systems, essentially everything we deal with is going to be
described via prototype PTEs associated with a control area (the control
area is used to back the section object.) I actually submitted an
article for The NT Insider recently on using the debugger to track from
file object, to control area and shared cache map and then to the actual
virtual mapping into the cache. A nice debugging trick, but also useful
to understand for file systems people.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Nick Ryan
Sent: Saturday, February 07, 2004 2:22 PM
To: ntfsd redirect
Subject: Re:[ntfsd] Paged I/O and CcFlushCache questions

Hmm… the source I took this information from was “Inside Memory
Management, Part 2” by Russinovich:

http://www.winntmag.com/Articles/ArticleID/3774/pg/3/3.html

He says:

“In most cases, a PFN Database entry contains a pointer to a PTE that
references a page. However, if two or more processes share the same
page, multiple PTEs reference the page: one PTE in the virtual address
map of each process sharing the page. Instead of pointing the PFN
Database entry at one of these PTEs, the Memory Manager points the PFN
Database entry at a data structure the Memory Manager allocates, called
a Prototype PTE (PPTE), as Figure 3 shows. I’ll describe the way the
Memory Manager uses PPTEs to manage shared and mapped memory.”

Maybe the prototype PPTEs contain no back pointers, then. In WinDbg I
see a backpointer in _MMPFN but not in _MMPTE_PROTOTYPE.

Now that I think about it, you can allocate multiple MDLs for the same
physical pages and map those MDLs to different virtual addresses. In
this case there is no PPTE mechanism being used and thus the back
pointer in the PFN entry would point to only one of the PTEs mapping
that page.

Tony Mason wrote:

All mapped files are, by their nature, shared. Unless something has
changed again (which is certainly possible) there are no mappings from

the PFN to the PTEs that reference it (what is called an “inverted
page
table”.) Thus, there’s really no way to clear the dirty bits.

Of course, in the final analysis, I suspect *I* overthought the
original question. Molly nailed it on the head when she pointed out
that the first round of his writes were user I/O (cached, not paging)
and the second paging I/O.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Nick Ryan
Sent: Friday, February 06, 2004 11:36 PM
To: ntfsd redirect
Subject: Re:[ntfsd] Paged I/O and CcFlushCache questions

When the MPW flushes a page, can’t it backtrack through the PFN to the

PTE to clear the PTE’s dirty flag before clearing the PFN’s dirty flag

and flushing it? From what I understand a PFN holds a back reference
to the PTE pointing to it (for unshared pages) or to the PPTE which
back references multiple PTEs (for shared pages).

Tony Mason wrote:

>Nick,
>
>By the time Mm receives the request to flush the pages, the PFN

entries

>ARE marked as dirty. The dirty bit in the PTE is cleared, the bit in
>the PFN is set, and then the page is written out. We’ve observed this

>behavior consistently over they years and I always remark on it in

file

>systems class because it hits WORM people particularly hard.
>
>Regards,
>
>Tony
>
>Tony Mason
>Consulting Partner
>OSR Open Systems Resources, Inc.
>http://www.osr.com
>
>
>-----Original Message-----
>From: xxxxx@lists.osr.com
>[mailto:xxxxx@lists.osr.com] On Behalf Of Nick Ryan
>Sent: Friday, February 06, 2004 12:05 AM
>To: ntfsd redirect
>Subject: Re:[ntfsd] Paged I/O and CcFlushCache questions
>
>When the Lazy Writer flushes PTEs that are marked dirty but are backed

>by non-dirty PFNs (i.e. MPW got there first), will Mm simply ignore

the

>request, or will there be a performance hit?
>
>Tony Mason wrote:
>
>
>
>>Steve,
>>
>>This is normal behavior - I actually talk about this in file systems
>>class because it hits WORM media people particulary hard.
>>
>>Essentially, there are two places where Windows tracks if a given
>>page
>
>
>>is dirty - one is in the Page Table Entry that references the
>>physical
>
>
>>memory and the other is in the Page Frame Database the Memory Manager

>>maintains to track the state of the page.
>>
>>Combine this with two background threads: one from the Cache Manager
>>(the “lazy writer”) and one from the Memory Manager (the “modified
>>page writer” along with its sidekick the “mapped page writer”). When

>>the Cache Manager wants to write data out to a file, it flushes the
>>state of the dirty bit in the PTE back to the PFN and then asked the
>>Memory Manager to write out the page (or range of pages generally).
>>So this is what happens when the lazy writer scribbles it out.
>>
>>When the Memory Manager writes out the page, it clears the dirty bit
>>in the PFN entry. It does not change, nor does it know about the
>>state of the PTE bit.
>>
>>When a file is extended, the pages are zero-filled (generally they
>>are
>
>
>>demand zero) and the initial write on the memory causes an allocation

>>of a page of zeros, marks the page as dirty in the PFN and of course
>>the hardware marks the page as dirty in the PTE.
>>
>>The race: if the lazy writer gets to the page FIRST, then it flushes
>>out the PTE dirty bit (clears it) and then asks the Memory Manager to

>>write back the dirty page. If the modified page writer gets there
>>first, it writes the page, clears the PFN bit, and THEN when the lazy

>>writer gets there it flushes out the PTE dirty bit (clears it) and
>>asks the Memory Manager to write back the “dirty” page.
>>
>>Thus, the reason you are seeing different behavior is because - for
>>whatever reason - you are getting different ordering of events.
>>Instead of focusing on the write events, look at the file size
>>modification events and the write-through bits, things that would
>>definitely affect the order in which I/O operations occur.
>>
>>Regards,
>>
>>Tony
>>
>>Tony Mason
>>Consulting Partner
>>OSR Open Systems Resources, Inc.
>>http://www.osr.com
>>
>>
>>-----Original Message-----
>>From: xxxxx@lists.osr.com
>>[mailto:xxxxx@lists.osr.com] On Behalf Of Steve Soltis
>>Sent: Thursday, February 05, 2004 4:32 PM
>>To: ntfsd redirect
>>Subject: [ntfsd] Paged I/O and CcFlushCache questions
>>
>>Hi all,
>>
>>I have two independent, through related, questions. Any help,
>>suggestions, or pointers would be most appreciated.
>>
>>1. When CcFlushCache() is called by the VMM or my FSD, multiple paged

>>I/O requests are passed to the page handler of my IRP_MJ_WRITE
>
>routine.
>
>
>>My FSD synchronously writes each request to disk. However, in some
>>cases
>>CcFlushCache() performs two writes for every 64KB request.
>>
>>For instance, the following output shows four writes from an Explorer

>>file copy from a different. Process 4 then performs a sync operation
>>which calls CcFlushCache(). CcFlushCache() triggers two paged writes
>>for each offset request. This trace is repeatable and I know for
>>certain that the first write (offset=0) makes it to disk.
>>
>>[864,644] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=0,
>>SyncIo=1, NonBufferedIo=0, WriteThrough=0 [864,644] sfs_write(1694)
>>ino=114, offset=65536, size=65536, PageIo=0, SyncIo=1,
>>NonBufferedIo=0, WriteThrough=0 [864,644] sfs_write(1694) ino=114,
>>offset=131072, size=65536, PageIo=0, SyncIo=1, NonBufferedIo=0,
>>WriteThrough=0 [864,644] sfs_write(1694) ino=114, offset=196608,
>>size=21707, PageIo=0, SyncIo=1, NonBufferedIo=0, WriteThrough=0
>>[4,1408] sfs_fsync(645) ino=114, difl=0x506e1, size=218315, vn=2,
>>cr=1076002386,669875000,
>>line=1744 [4,1408] sfs_pagecache(1602) flush 114, difl=0x506e1, l=750
>>CcFlushCache() called here…
>>[4,1408] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=1,
>>SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694)
>>ino=114, offset=65536, size=65536, PageIo=1, SyncIo=1,
>>NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694) ino=114,
>>offset=131072, size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1,
>>WriteThrough=0 [4,1408]
>>sfs_write(1694) ino=114, offset=196608, size=24576, PageIo=1,
>>SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694)
>>ino=114, offset=0, size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1,
>>WriteThrough=0 [4,1408] sfs_write(1694) ino=114, offset=65536,
>>size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1, WriteThrough=0
>>[4,1408]
>>sfs_write(1694) ino=114, offset=131072, size=65536, PageIo=1,
>>SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694)
>>ino=114, offset=196608, size=24576, PageIo=1, SyncIo=1,
>>NonBufferedIo=1, WriteThrough=0
>>CcFlushCache() ends here
>>[4,1408] sfs_pagecache(1615) e=0
>>
>>However, this behavior does not occur when using “copy” from a
>>command
>
>
>>prompt or from a homegrown application that performs simple
>>WriteFile()'s.
>>As far as I can tell, all FileObject, Irp, and create flags are the
>>same.
>>Any ideas what could be causing the double write from CcFlushCache()?
>>Any ideas where else to look? Is is possible to tell within
>>IRP_MJ_WRITE routines where pages are dirty?
>>
>>2. Is there any way to tune the page write size of the VMM?
>>Currently,
>
>
>>my IRP_MJ_WRITE routine only receives 64KB or smaller requests during

>>paged I/O. The FSD is very good at sequentially allocating disk
>>blocks, so increasing request sizes would reduce the number of disk
>
>I/O’s.
>
>
>>Thanks in advance. Best regards,
>>
>>Steve Soltis
>>
>>
>>
>>—
>>Questions? First check the IFS FAQ at
>>https://www.osronline.com/article.cfm?id=17
>>
>>You are currently subscribed to ntfsd as: xxxxx@osr.com To
>>unsubscribe
>
>
>>send a blank email to xxxxx@lists.osr.com
>>
>>
>>
>
>
>–
>- Nick Ryan
>- Microsoft MVP for DDK
>
>—
>Questions? First check the IFS FAQ at
>https://www.osronline.com/article.cfm?id=17
>
>You are currently subscribed to ntfsd as: xxxxx@osr.com To unsubscribe

>send a blank email to xxxxx@lists.osr.com
>
>
>

–

Nick Ryan
Microsoft MVP for DDK

Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@osr.com To unsubscribe
send a blank email to xxxxx@lists.osr.com

OSR_Community_User · February 8, 2004, 12:20am

It cannot ignore the request.
This exactly picture can result if the data was modified in the user
mapping by pointer. Such a modification must be flushed.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

----- Original Message -----
From: “Nick Ryan”
Newsgroups: ntfsd
To: “Windows File Systems Devs Interest List”
Sent: Friday, February 06, 2004 8:05 AM
Subject: Re:[ntfsd] Paged I/O and CcFlushCache questions

> When the Lazy Writer flushes PTEs that are marked dirty but are backed
> by non-dirty PFNs (i.e. MPW got there first), will Mm simply ignore the
> request, or will there be a performance hit?
>
> Tony Mason wrote:
>
> > Steve,
> >
> > This is normal behavior - I actually talk about this in file systems
> > class because it hits WORM media people particulary hard.
> >
> > Essentially, there are two places where Windows tracks if a given page
> > is dirty - one is in the Page Table Entry that references the physical
> > memory and the other is in the Page Frame Database the Memory Manager
> > maintains to track the state of the page.
> >
> > Combine this with two background threads: one from the Cache Manager
> > (the “lazy writer”) and one from the Memory Manager (the “modified page
> > writer” along with its sidekick the “mapped page writer”). When the
> > Cache Manager wants to write data out to a file, it flushes the state of
> > the dirty bit in the PTE back to the PFN and then asked the Memory
> > Manager to write out the page (or range of pages generally). So this is
> > what happens when the lazy writer scribbles it out.
> >
> > When the Memory Manager writes out the page, it clears the dirty bit in
> > the PFN entry. It does not change, nor does it know about the state of
> > the PTE bit.
> >
> > When a file is extended, the pages are zero-filled (generally they are
> > demand zero) and the initial write on the memory causes an allocation of
> > a page of zeros, marks the page as dirty in the PFN and of course the
> > hardware marks the page as dirty in the PTE.
> >
> > The race: if the lazy writer gets to the page FIRST, then it flushes out
> > the PTE dirty bit (clears it) and then asks the Memory Manager to write
> > back the dirty page. If the modified page writer gets there first, it
> > writes the page, clears the PFN bit, and THEN when the lazy writer gets
> > there it flushes out the PTE dirty bit (clears it) and asks the Memory
> > Manager to write back the “dirty” page.
> >
> > Thus, the reason you are seeing different behavior is because - for
> > whatever reason - you are getting different ordering of events. Instead
> > of focusing on the write events, look at the file size modification
> > events and the write-through bits, things that would definitely affect
> > the order in which I/O operations occur.
> >
> > Regards,
> >
> > Tony
> >
> > Tony Mason
> > Consulting Partner
> > OSR Open Systems Resources, Inc.
> > http://www.osr.com
> >
> >
> > -----Original Message-----
> > From: xxxxx@lists.osr.com
> > [mailto:xxxxx@lists.osr.com] On Behalf Of Steve Soltis
> > Sent: Thursday, February 05, 2004 4:32 PM
> > To: ntfsd redirect
> > Subject: [ntfsd] Paged I/O and CcFlushCache questions
> >
> > Hi all,
> >
> > I have two independent, through related, questions. Any help,
> > suggestions, or pointers would be most appreciated.
> >
> > 1. When CcFlushCache() is called by the VMM or my FSD, multiple paged
> > I/O requests are passed to the page handler of my IRP_MJ_WRITE routine.
> > My FSD synchronously writes each request to disk. However, in some cases
> > CcFlushCache() performs two writes for every 64KB request.
> >
> > For instance, the following output shows four writes from an Explorer
> > file copy from a different. Process 4 then performs a sync operation
> > which calls CcFlushCache(). CcFlushCache() triggers two paged writes for
> > each offset request. This trace is repeatable and I know for certain
> > that the first write (offset=0) makes it to disk.
> >
> > [864,644] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=0,
> > SyncIo=1, NonBufferedIo=0, WriteThrough=0 [864,644] sfs_write(1694)
> > ino=114, offset=65536, size=65536, PageIo=0, SyncIo=1, NonBufferedIo=0,
> > WriteThrough=0 [864,644] sfs_write(1694) ino=114, offset=131072,
> > size=65536, PageIo=0, SyncIo=1, NonBufferedIo=0, WriteThrough=0
> > [864,644] sfs_write(1694) ino=114, offset=196608, size=21707, PageIo=0,
> > SyncIo=1, NonBufferedIo=0, WriteThrough=0 [4,1408] sfs_fsync(645)
> > ino=114, difl=0x506e1, size=218315, vn=2, cr=1076002386,669875000,
> > line=1744 [4,1408] sfs_pagecache(1602) flush 114, difl=0x506e1, l=750
> > CcFlushCache() called here…
> > [4,1408] sfs_write(1694) ino=114, offset=0, size=65536, PageIo=1,
> > SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694)
> > ino=114, offset=65536, size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1,
> > WriteThrough=0 [4,1408] sfs_write(1694) ino=114, offset=131072,
> > size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408]
> > sfs_write(1694) ino=114, offset=196608, size=24576, PageIo=1, SyncIo=1,
> > NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694) ino=114,
> > offset=0, size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1,
> > WriteThrough=0 [4,1408] sfs_write(1694) ino=114, offset=65536,
> > size=65536, PageIo=1, SyncIo=1, NonBufferedIo=1, WriteThrough=0 [4,1408]
> > sfs_write(1694) ino=114, offset=131072, size=65536, PageIo=1, SyncIo=1,
> > NonBufferedIo=1, WriteThrough=0 [4,1408] sfs_write(1694) ino=114,
> > offset=196608, size=24576, PageIo=1, SyncIo=1, NonBufferedIo=1,
> > WriteThrough=0
> > CcFlushCache() ends here
> > [4,1408] sfs_pagecache(1615) e=0
> >
> > However, this behavior does not occur when using “copy” from a command
> > prompt or from a homegrown application that performs simple
> > WriteFile()'s.
> > As far as I can tell, all FileObject, Irp, and create flags are the
> > same.
> > Any ideas what could be causing the double write from CcFlushCache()?
> > Any ideas where else to look? Is is possible to tell within IRP_MJ_WRITE
> > routines where pages are dirty?
> >
> > 2. Is there any way to tune the page write size of the VMM? Currently,
> > my IRP_MJ_WRITE routine only receives 64KB or smaller requests during
> > paged I/O. The FSD is very good at sequentially allocating disk blocks,
> > so increasing request sizes would reduce the number of disk I/O’s.
> >
> > Thanks in advance. Best regards,
> >
> > Steve Soltis
> >
> >
> >
> > —
> > Questions? First check the IFS FAQ at
> > https://www.osronline.com/article.cfm?id=17
> >
> > You are currently subscribed to ntfsd as: xxxxx@osr.com To unsubscribe
> > send a blank email to xxxxx@lists.osr.com
> >
> >
> >
>
> –
> - Nick Ryan
> - Microsoft MVP for DDK
>
> —
> Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17
>
> You are currently subscribed to ntfsd as: xxxxx@storagecraft.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com