I have found very good article on OSRONLINE that explains “rename file”
operation. Is there any kernel mode support to “copy file”?
Thank you
Leonid
I have found very good article on OSRONLINE that explains “rename file”
operation. Is there any kernel mode support to “copy file”?
Thank you
Leonid
There is no “copy” in the NT kernel world. Copying
file is nothing more that reading one and writing to
another.
L.
On Fri, 21 May 2004 07:08:27 +0200, Ladislav Zezula
wrote:
> There is no “copy” in the NT kernel world. Copying
> file is nothing more that reading one and writing to
> another.
>
> L.
>
Not exactly, the CopyFile operation in KERNEL32.DLL (UserMode) uses some
quite intelligent tricks to indirectly optimize the kernel mode activity,
although it is all done through regular Zw system calls. As far as I
recall, the tricks include:
1. Map the source file (or parts of it) with ZwMapViewOfSection, then pass
the mapped (user mode) address to ZwWriteFile, thus causing each write to
get its read input directly from the file cache as paging I/O. Large
files are probably mapped in multiple lumps of 256KB to avoid running out
of Virtual Address space and to match the internal behaviour of the cache
manager (But that is just a guess).
2. Carefully sequence various calls to make sure timestamps and attributes
are copied too. The “Created” timestamp is not copied, but at least the
“Modified” timestamp is. Also the A attribute is set in the copy, and the
R attribute should be cleared (but sometimes isn’t).
3. Probably optimize disk allocation by passing the source file size as
the allocation arg to ZwCreateFile for the copy.
4. And then some fun with overlapped I/O, FILE_FLAG_SEQUENTIAL, etc.
So a good kernel mode copy routine should try to use all those same
tricks. It would also need to run partially in the System process to
avoid application interference with the buffers (security requirement).
J
–
#include <disclaimer.h></disclaimer.h>
Jakob is actually right. There are a number of tricks to implementing a
file copy. Keep in mind that a good copy routine has to optimize for
transferring large files (as Jakob points out) as well as handle
multiple file streams (think NTFS here), EAs, as well as define the
semantics for other attributes (e.g., security).
There are additional tricks that one could play in the kernel
environment (e.g., using the MDL routines to minimize or even eliminate
memory-to-memory copy.)
Years ago we published a simple copy routine, but that was more to
demonstrate building IRPs within a driver rather than copying a file,
but even then it was clear that copying a file in a clean fashion is not
a trivial exercise!
Regards,
Tony
Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com
Looking forward to seeing you at the Next OSR File Systems Class October
18, 2004 in Silicon Valley!
-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Jakob Bohm
Sent: Friday, May 21, 2004 3:23 AM
To: ntfsd redirect
Subject: Re:[ntfsd] Copy file operation
On Fri, 21 May 2004 07:08:27 +0200, Ladislav Zezula
wrote:
> There is no “copy” in the NT kernel world. Copying
> file is nothing more that reading one and writing to
> another.
>
> L.
>
Not exactly, the CopyFile operation in KERNEL32.DLL (UserMode) uses some
quite intelligent tricks to indirectly optimize the kernel mode
activity,
although it is all done through regular Zw system calls. As far as I
recall, the tricks include:
1. Map the source file (or parts of it) with ZwMapViewOfSection, then
pass
the mapped (user mode) address to ZwWriteFile, thus causing each write
to
get its read input directly from the file cache as paging I/O. Large
files are probably mapped in multiple lumps of 256KB to avoid running
out
of Virtual Address space and to match the internal behaviour of the
cache
manager (But that is just a guess).
2. Carefully sequence various calls to make sure timestamps and
attributes
are copied too. The “Created” timestamp is not copied, but at least the
“Modified” timestamp is. Also the A attribute is set in the copy, and
the
R attribute should be cleared (but sometimes isn’t).
3. Probably optimize disk allocation by passing the source file size as
the allocation arg to ZwCreateFile for the copy.
4. And then some fun with overlapped I/O, FILE_FLAG_SEQUENTIAL, etc.
So a good kernel mode copy routine should try to use all those same
tricks. It would also need to run partially in the System process to
avoid application interference with the buffers (security requirement).
J
–
#include <disclaimer.h>
—
Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17
You are currently subscribed to ntfsd as: xxxxx@osr.com
To unsubscribe send a blank email to xxxxx@lists.osr.com</disclaimer.h>
Thanks everybody for detailed explanation.
Leonid
-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Tony Mason
Sent: Friday, May 21, 2004 6:10 AM
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] Copy file operation
Jakob is actually right. There are a number of tricks to implementing a
file copy. Keep in mind that a good copy routine has to optimize for
transferring large files (as Jakob points out) as well as handle
multiple file streams (think NTFS here), EAs, as well as define the
semantics for other attributes (e.g., security).
There are additional tricks that one could play in the kernel
environment (e.g., using the MDL routines to minimize or even eliminate
memory-to-memory copy.)
Years ago we published a simple copy routine, but that was more to
demonstrate building IRPs within a driver rather than copying a file,
but even then it was clear that copying a file in a clean fashion is not
a trivial exercise!
Regards,
Tony
Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com
Looking forward to seeing you at the Next OSR File Systems Class October
18, 2004 in Silicon Valley!
-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Jakob Bohm
Sent: Friday, May 21, 2004 3:23 AM
To: ntfsd redirect
Subject: Re:[ntfsd] Copy file operation
On Fri, 21 May 2004 07:08:27 +0200, Ladislav Zezula
wrote:
> There is no “copy” in the NT kernel world. Copying
> file is nothing more that reading one and writing to
> another.
>
> L.
>
Not exactly, the CopyFile operation in KERNEL32.DLL (UserMode) uses some
quite intelligent tricks to indirectly optimize the kernel mode
activity,
although it is all done through regular Zw system calls. As far as I
recall, the tricks include:
1. Map the source file (or parts of it) with ZwMapViewOfSection, then
pass
the mapped (user mode) address to ZwWriteFile, thus causing each write
to
get its read input directly from the file cache as paging I/O. Large
files are probably mapped in multiple lumps of 256KB to avoid running
out
of Virtual Address space and to match the internal behaviour of the
cache
manager (But that is just a guess).
2. Carefully sequence various calls to make sure timestamps and
attributes
are copied too. The “Created” timestamp is not copied, but at least the
“Modified” timestamp is. Also the A attribute is set in the copy, and
the
R attribute should be cleared (but sometimes isn’t).
3. Probably optimize disk allocation by passing the source file size as
the allocation arg to ZwCreateFile for the copy.
4. And then some fun with overlapped I/O, FILE_FLAG_SEQUENTIAL, etc.
So a good kernel mode copy routine should try to use all those same
tricks. It would also need to run partially in the System process to
avoid application interference with the buffers (security requirement).
J
–
#include <disclaimer.h>
—
Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17
You are currently subscribed to ntfsd as: xxxxx@osr.com
To unsubscribe send a blank email to xxxxx@lists.osr.com
—
Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17
You are currently subscribed to ntfsd as: xxxxx@trustededge.com To
unsubscribe send a blank email to xxxxx@lists.osr.com</disclaimer.h>
On Fri, 21 May 2004 07:09:43 -0400, Tony Mason wrote:
> Jakob is actually right. There are a number of tricks to implementing a
> file copy. Keep in mind that a good copy routine has to optimize for
> transferring large files (as Jakob points out) as well as handle
> multiple file streams (think NTFS here), EAs, as well as define the
> semantics for other attributes (e.g., security).
>
> There are additional tricks that one could play in the kernel
> environment (e.g., using the MDL routines to minimize or even eliminate
> memory-to-memory copy.)
>
Note that the usermode trick with the section already does that,
the input is a section, which the memory manager aliases to the cache,
the output is FILE_FLAG_NO_BUFFERING, and is already aligned becuse of
the way the memory manager maps sections to virtual addresses. And of
cause the actual read/write does not transition kernel/user because the
write Irp “simply” triggers a page fault which triggers a page in read
Irp, all from kernel mode.
P.S. I think it was Tony who first told me about this trick during an OSR
seminar near Stockholm. We talked a lot about sections that day.
Jakob
–
#include <disclaimer.h></disclaimer.h>
No.
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com
----- Original Message -----
From: “Leonid Meyerovich”
To: “Windows File Systems Devs Interest List”
Sent: Friday, May 21, 2004 3:40 AM
Subject: [ntfsd] Copy file operation
>
> I have found very good article on OSRONLINE that explains “rename file”
> operation. Is there any kernel mode support to “copy file”?
>
> Thank you
> Leonid
>
> —
> Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17
>
> You are currently subscribed to ntfsd as: xxxxx@storagecraft.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>
> 4. And then some fun with overlapped I/O, FILE_FLAG_SEQUENTIAL, etc.
Why overlapped IO? Open the destination as uncached, and write in 64KB chunks.
With a file copy, cache will not provide any benefit.
It is more necessary for UNIX-style text apps with redirected stdout which
possibly write the file 1 byte per syscall.
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com
On Sun, 23 May 2004 03:49:10 +0400, Maxim S. Shatskih
wrote:
>> 4. And then some fun with overlapped I/O, FILE_FLAG_SEQUENTIAL, etc.
>
> Why overlapped IO? Open the destination as uncached, and write in 64KB
> chunks.
>
> With a file copy, cache will not provide any benefit.
>
> It is more necessary for UNIX-style text apps with redirected stdout
> which
> possibly write the file 1 byte per syscall.
>
CopyFile does not use the cache to do caching.
CopyFile uses the memory mapping aspect of the cache to allow the write
Irps to access the samme physical pages as the read Irps, thus saving the
cost of a memcpy and doing everything by DMA. It also saves a lot of user
mode / kernel mode transitions because the Read Irps are generated by
kernel mode page faults generated by the Write Irps.
It also uses the read-ahead aspect of the Cache to increase the chance
that the Disk level I/O scheduler can do a good job of reducing disk seeks
and other latency.
Overlapped I/O can allow the next cache slot (256KB or 64KB depending on
memory configuration) to enter the disk level I/O scheduler before the
previous one is completed, thus further increasing the chance of using
hardware level parallelism and reordering.
These benefits really shine in two cases:
A) Source and destination are on two different SCSI controllers or two
spindles on the same high end controller.
B) The hardware supports tagged command queueing and disconnects, and
either the on-disk firmware or the volume level driver stack knows how to
do elevator seeking or similar I/O scheduling optimizations.
–
#include <disclaimer.h></disclaimer.h>
Jakob, if you were referring to the Win32 CopyFile API, you have been
misinformed as to how CopyFile works.
For small files (<256K), CopyFile() uses memory mapped IO. For large
files, CopyFile() does buffered IO, which means it uses the system file
cache. Note that the system file cache is essentially memory mapped IO
as well, which is why the reads come through looking like page faults.
By using the system file cache, CopyFile() gets the benefit of the
system cache’s read-ahead for reading the source file.
CopyFile() does not explicitly use asynchronous IO to write to the
target file. It relies on Mm’s mapped page writer thread (which will
issue asynchronous writes) or the lazy writer (which will issue
synchronous writes) to actually write the data to disk.
Thanks,
Molly Brown
Microsoft Corporation
This posting is provided “AS IS” with no warranties and confers no
rights.
-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Jakob Bohm
Sent: Monday, May 24, 2004 2:42 AM
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] Re:Copy file operation
On Sun, 23 May 2004 03:49:10 +0400, Maxim S. Shatskih
wrote:
>> 4. And then some fun with overlapped I/O, FILE_FLAG_SEQUENTIAL, etc.
>
> Why overlapped IO? Open the destination as uncached, and write in 64KB
> chunks.
>
> With a file copy, cache will not provide any benefit.
>
> It is more necessary for UNIX-style text apps with redirected stdout
> which possibly write the file 1 byte per syscall.
>
CopyFile does not use the cache to do caching.
CopyFile uses the memory mapping aspect of the cache to allow the write
Irps to access the samme physical pages as the read Irps, thus saving
the cost of a memcpy and doing everything by DMA. It also saves a lot
of user mode / kernel mode transitions because the Read Irps are
generated by kernel mode page faults generated by the Write Irps.
It also uses the read-ahead aspect of the Cache to increase the chance
that the Disk level I/O scheduler can do a good job of reducing disk
seeks and other latency.
Overlapped I/O can allow the next cache slot (256KB or 64KB depending on
memory configuration) to enter the disk level I/O scheduler before the
previous one is completed, thus further increasing the chance of using
hardware level parallelism and reordering.
These benefits really shine in two cases:
A) Source and destination are on two different SCSI controllers or two
spindles on the same high end controller.
B) The hardware supports tagged command queueing and disconnects, and
either the on-disk firmware or the volume level driver stack knows how
to do elevator seeking or similar I/O scheduling optimizations.
–
#include <disclaimer.h>
—
Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17
You are currently subscribed to ntfsd as: xxxxx@windows.microsoft.com
To unsubscribe send a blank email to xxxxx@lists.osr.com</disclaimer.h>
On Tue, 25 May 2004 16:14:50 -0700, Molly Brown
wrote:
> Jakob, if you were referring to the Win32 CopyFile API, you have been
> misinformed as to how CopyFile works.
>
Thanks, I don’t have a source license, so I was going by things I heard
combined with some thinking on how write-from-memmapped-read would be most
optimal, allowing for fragmented disks, virus scanners etc. I didn’t mind
being wrong on details, as the initial thread question was how to
optimally reimplement CopyFile in a driver.
My guess presumed that mapping in 2 or 3 views of a file at a time and
having an overlapped write pending for each would be faster than buffered
I/O, because of the saved memcpy operation between two sets of cache
pages, and because it could allow one write to a second spindle to start
before the pages for the next write Irp were ready to satisfy the Mdl
lock. Of cause there are also the SIS considerations, direct SCSI
transfers and the desire to avoid seeks if many megabytes of write-behind
can be in the cache, so I am sure that the current implementation is the
result of a lot of tuning and research.
But it is actually quite interesting to know the real story of what load
CopyFile places on filter drivers, just for future reference. So Thanks!
Jakob
> For small files (<256K), CopyFile() uses memory mapped IO. For large
> files, CopyFile() does buffered IO, which means it uses the system file
> cache. Note that the system file cache is essentially memory mapped IO
> as well, which is why the reads come through looking like page faults.
> By using the system file cache, CopyFile() gets the benefit of the
> system cache’s read-ahead for reading the source file.
>
> CopyFile() does not explicitly use asynchronous IO to write to the
> target file. It relies on Mm’s mapped page writer thread (which will
> issue asynchronous writes) or the lazy writer (which will issue
> synchronous writes) to actually write the data to disk.
>
> Thanks,
> Molly Brown
> Microsoft Corporation
> This posting is provided “AS IS” with no warranties and confers no
> rights.
>
> -----Original Message-----
> From: xxxxx@lists.osr.com
> [mailto:xxxxx@lists.osr.com] On Behalf Of Jakob Bohm
> Sent: Monday, May 24, 2004 2:42 AM
> To: Windows File Systems Devs Interest List
> Subject: Re:[ntfsd] Re:Copy file operation
>
> On Sun, 23 May 2004 03:49:10 +0400, Maxim S. Shatskih
> wrote:
>
>>> 4. And then some fun with overlapped I/O, FILE_FLAG_SEQUENTIAL, etc.
>>
>> Why overlapped IO? Open the destination as uncached, and write in 64KB
>
>> chunks.
>>
>> With a file copy, cache will not provide any benefit.
>>
>> It is more necessary for UNIX-style text apps with redirected stdout
>> which possibly write the file 1 byte per syscall.
>>
>
> CopyFile does not use the cache to do caching.
>
> CopyFile uses the memory mapping aspect of the cache to allow the write
> Irps to access the samme physical pages as the read Irps, thus saving
> the cost of a memcpy and doing everything by DMA. It also saves a lot
> of user mode / kernel mode transitions because the Read Irps are
> generated by kernel mode page faults generated by the Write Irps.
>
> It also uses the read-ahead aspect of the Cache to increase the chance
> that the Disk level I/O scheduler can do a good job of reducing disk
> seeks and other latency.
>
> Overlapped I/O can allow the next cache slot (256KB or 64KB depending on
> memory configuration) to enter the disk level I/O scheduler before the
> previous one is completed, thus further increasing the chance of using
> hardware level parallelism and reordering.
>
> These benefits really shine in two cases:
>
> A) Source and destination are on two different SCSI controllers or two
> spindles on the same high end controller.
>
> B) The hardware supports tagged command queueing and disconnects, and
> either the on-disk firmware or the volume level driver stack knows how
> to do elevator seeking or similar I/O scheduling optimizations.
>
>
> –
> #include <disclaimer.h>
>
> —
> Questions? First check the IFS FAQ at
> https://www.osronline.com/article.cfm?id=17
>
> You are currently subscribed to ntfsd as: xxxxx@windows.microsoft.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>
–
#include <disclaimer.h></disclaimer.h></disclaimer.h>
The current Win32 CopyFile() implementation works ok most of the time on
general purpose hardware :). If you are able to make more specific
assumptions about your hardware, you can definitely write a more
efficient CopyFile routine where the strategies you suggest provide a
big performance win.
Back to the original posters question: I had to implement file copy in
the System Restore filter driver for Windows XP. I ended up using
memory mapped IO to read the source file so that I could bypass any
byte-range locks on the file. Then I used buffered IO to write to the
target. And, of course, you have to deal with multiple data streams,
ACLs, EA, etc. as other have previously pointed out.
Our answer on performance was “do what you can to avoid the copy”
because no matter how much we optimized the copy code, at the end of the
day copy is expensive. This is especially true if your filter runs on
server operating systems which have tend to have much higher IO load
than your typical client operating system (System Restore only runs on
client OSes).
Thanks,
Molly Brown
Microsoft Corporation
This posting is provided “AS IS” with no warranties and confers no
rights.
-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Jakob Bohm
Sent: Wednesday, May 26, 2004 3:37 AM
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] Re:Copy file operation
On Tue, 25 May 2004 16:14:50 -0700, Molly Brown
wrote:
> Jakob, if you were referring to the Win32 CopyFile API, you have been
> misinformed as to how CopyFile works.
>
Thanks, I don’t have a source license, so I was going by things I heard
combined with some thinking on how write-from-memmapped-read would be
most optimal, allowing for fragmented disks, virus scanners etc. I
didn’t mind being wrong on details, as the initial thread question was
how to optimally reimplement CopyFile in a driver.
My guess presumed that mapping in 2 or 3 views of a file at a time and
having an overlapped write pending for each would be faster than
buffered I/O, because of the saved memcpy operation between two sets of
cache pages, and because it could allow one write to a second spindle to
start before the pages for the next write Irp were ready to satisfy the
Mdl lock. Of cause there are also the SIS considerations, direct SCSI
transfers and the desire to avoid seeks if many megabytes of
write-behind can be in the cache, so I am sure that the current
implementation is the result of a lot of tuning and research.
But it is actually quite interesting to know the real story of what load
CopyFile places on filter drivers, just for future reference. So
Thanks!
Jakob
> For small files (<256K), CopyFile() uses memory mapped IO. For large
> files, CopyFile() does buffered IO, which means it uses the system
> file cache. Note that the system file cache is essentially memory
> mapped IO as well, which is why the reads come through looking like
page faults.
> By using the system file cache, CopyFile() gets the benefit of the
> system cache’s read-ahead for reading the source file.
>
> CopyFile() does not explicitly use asynchronous IO to write to the
> target file. It relies on Mm’s mapped page writer thread (which will
> issue asynchronous writes) or the lazy writer (which will issue
> synchronous writes) to actually write the data to disk.
>
> Thanks,
> Molly Brown
> Microsoft Corporation
> This posting is provided “AS IS” with no warranties and confers no
> rights.
>
> -----Original Message-----
> From: xxxxx@lists.osr.com
> [mailto:xxxxx@lists.osr.com] On Behalf Of Jakob Bohm
> Sent: Monday, May 24, 2004 2:42 AM
> To: Windows File Systems Devs Interest List
> Subject: Re:[ntfsd] Re:Copy file operation
>
> On Sun, 23 May 2004 03:49:10 +0400, Maxim S. Shatskih
> wrote:
>
>>> 4. And then some fun with overlapped I/O, FILE_FLAG_SEQUENTIAL, etc.
>>
>> Why overlapped IO? Open the destination as uncached, and write in
>> 64KB
>
>> chunks.
>>
>> With a file copy, cache will not provide any benefit.
>>
>> It is more necessary for UNIX-style text apps with redirected stdout
>> which possibly write the file 1 byte per syscall.
>>
>
> CopyFile does not use the cache to do caching.
>
> CopyFile uses the memory mapping aspect of the cache to allow the
> write Irps to access the samme physical pages as the read Irps, thus
> saving the cost of a memcpy and doing everything by DMA. It also
> saves a lot of user mode / kernel mode transitions because the Read
> Irps are generated by kernel mode page faults generated by the Write
Irps.
>
> It also uses the read-ahead aspect of the Cache to increase the chance
> that the Disk level I/O scheduler can do a good job of reducing disk
> seeks and other latency.
>
> Overlapped I/O can allow the next cache slot (256KB or 64KB depending
> on memory configuration) to enter the disk level I/O scheduler before
> the previous one is completed, thus further increasing the chance of
> using hardware level parallelism and reordering.
>
> These benefits really shine in two cases:
>
> A) Source and destination are on two different SCSI controllers or two
> spindles on the same high end controller.
>
> B) The hardware supports tagged command queueing and disconnects, and
> either the on-disk firmware or the volume level driver stack knows how
> to do elevator seeking or similar I/O scheduling optimizations.
>
>
> –
> #include <disclaimer.h>
>
> —
> Questions? First check the IFS FAQ at
> https://www.osronline.com/article.cfm?id=17
>
> You are currently subscribed to ntfsd as:
> xxxxx@windows.microsoft.com To unsubscribe send a blank email to
> xxxxx@lists.osr.com
>
–
#include <disclaimer.h>
—
Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17
You are currently subscribed to ntfsd as: xxxxx@windows.microsoft.com
To unsubscribe send a blank email to xxxxx@lists.osr.com</disclaimer.h></disclaimer.h>