how to read disk high speed?

hello,
i want to backup the data in one partition,and i use the way below:
use CreateFile() to open the disk,then read one block data and write the
data into a file,but i tested ,the effiency is so lower,but i have not
other good idea,who can help me?


ÓëÁª»úµÄÅóÓѽøÐн»Á÷£¬ÇëʹÓà MSN Messenger: http://messenger.msn.com/cn

> i want to backup the data in one partition,and i use the way below:

use CreateFile() to open the disk,then read one block data and write the
data into a file,but i tested ,the effiency is so lower,but i have not
other good idea,who can help me?

You’re description is rather limited. You should NOT do the I/O one
block at a time, but rather in larger disk block segments.

Is the file you are backing up to on a different disk???

Maximizing speed when doing disk backups involves a lot of
things including optimizing data reads around the location
on disk, buffering, etc… The output location/methodology
also matters.

Doing the I/O asynchronously and keeping several requested
queued up will make a difference, but if each I/O is causing
significant head movement, then you have problems.

Without a much more detailed description of what you are doing
(not why, just the mechanics) it’s hard to give any concrete
advice.

Rick Cadruvi…

Read larger blocks of data at a time - say 64k instead of 512 bytes.

You should overlap the reads and the writes. Issue one or more reads to the source disk and as they complete issue writes to the destination drive. The a good way to do this is with overlapped (or asynchronous) I/O and I/O completion ports. You can find these documented in MSDN.

-p

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of shark mouse
Sent: Wednesday, August 03, 2005 8:07 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] how to read disk high speed?

hello,
i want to backup the data in one partition,and i use the way below:
use CreateFile() to open the disk,then read one block data and write the
data into a file,but i tested ,the effiency is so lower,but i have not
other good idea,who can help me?


???ѽ??н???ʹ?? MSN Messenger: http://messenger.msn.com/cn


Questions? First check the Kernel Driver FAQ at http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@windows.microsoft.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

thanks,and sorry for not giving out detailed information.
this time i will make up for it:
the backup file is not in the save driver,but maybe in same disk,such
as different partitions.
by the way,where is the document Peter said in the MSDN,i can not find.


ÏíÓÃÊÀ½çÉÏ×î´óµÄµç×ÓÓʼþϵͳ¡ª MSN Hotmail¡£ http://www.hotmail.com

> thanks,and sorry for not giving out detailed information.

this time i will make up for it:
the backup file is not in the save driver,but maybe in same disk,such
as different partitions.

Still not much info.

If the file you are witing to is on the same disk (different partition),
then the strategy I would use would be to read lots of large
buffers (at least 64KB, but perhaps even 2MB at a time) from as contiguous
disk locations as possible into a huge memory buffer. When the memory
buffer is basically full, start sending out pieces to the disk in writes
(stop your reads) until you have written it all and then go back and do
the reads again.

The reason for this is that you don’t want to be moving the heads. If the
output is on the same disk it can cause a lot of head movement if you are
interleaving reads and writes.

Best approach is make the output file on a different disk and then
interleave reads and writes. The reason 64KB comes up as a buffer
size is that the SCSI standard only used to allow for 64KB byte
transfers at a time. I don’t know if that is still true.

However, my own studies have shown that reading in 2MB boundaries
does move ata faster than 64KB. I suspect the reason is all the
smarts in the OS and disk controller/drive software below your I/Os.
If I wanted to make it REALLY fast, I would empirically determine
a good buffer size as the application was running by testing different
buffer sizes during the early reads/writes and once the software had
determined a good size, leave it alone.

Be sure you pre-allocate much space in the output file as you know you
will need and extend it by large amounts when you need to extend it.
This will tend to allow for more contiguous extents and therefore
MUCH better performance. On the read side, do a scatter/gather
algorithm for data so that you can be reading as contiguous data as
possible. File system fragmentation WILL definately slow your
performance. I know that years ago, Digital Equipment Corporation
got huge performance gains on their backup product by doing read
scatter/gather operations where they read from differnt files at
the same time so they could read large contiguous chunks of disk
space. Disks are faster now, but the problem is still the same.

I would agree with the recommendations that you should probably queue
at least 2-4 asynchronous requests at a time (read first, then write,
unless on different drive in which case 2-4 each interleaved).

Be sure to heed the tip to use page aligned buffers for the I/O and
if you do a good job of managing the I/O, turning off FILE_BUFFERING
(caching) would also be a good idea. This is ESPECIALLY true if you
intend to read and write a LOT of data since it will just flush the
cache for other uses and total system performance will suffer. Besides,
this kind of application isn’t as well suited to caching as what you can
do managing the I/O yourself.

I can’t speak to the documentation pointed to by Peter because
I don’t have that email any longer. It wouldn’t hurt when you
reference something like that if you gave the details in your
email.

Hope this helps.

Rick Cadruvi…

Hello Rick,

Thanks for the detailed message - it makes interesting reading. Can you
please clarify what you mean by scatter / gather reads? I understand OP
was not interfacing with h/w directly but instead using the existing
disk device driver.

Thanks
Udas

xxxxx@rdperf.com wrote:

> thanks,and sorry for not giving out detailed information.
> this time i will make up for it:
> the backup file is not in the save driver,but maybe in same disk,such
>as different partitions.

Still not much info.

If the file you are witing to is on the same disk (different partition),
then the strategy I would use would be to read lots of large
buffers (at least 64KB, but perhaps even 2MB at a time) from as contiguous
disk locations as possible into a huge memory buffer. When the memory
buffer is basically full, start sending out pieces to the disk in writes
(stop your reads) until you have written it all and then go back and do
the reads again.

The reason for this is that you don’t want to be moving the heads. If the
output is on the same disk it can cause a lot of head movement if you are
interleaving reads and writes.

Best approach is make the output file on a different disk and then
interleave reads and writes. The reason 64KB comes up as a buffer
size is that the SCSI standard only used to allow for 64KB byte
transfers at a time. I don’t know if that is still true.

However, my own studies have shown that reading in 2MB boundaries
does move ata faster than 64KB. I suspect the reason is all the
smarts in the OS and disk controller/drive software below your I/Os.
If I wanted to make it REALLY fast, I would empirically determine
a good buffer size as the application was running by testing different
buffer sizes during the early reads/writes and once the software had
determined a good size, leave it alone.

Be sure you pre-allocate much space in the output file as you know you
will need and extend it by large amounts when you need to extend it.
This will tend to allow for more contiguous extents and therefore
MUCH better performance. On the read side, do a scatter/gather
algorithm for data so that you can be reading as contiguous data as
possible. File system fragmentation WILL definately slow your
performance. I know that years ago, Digital Equipment Corporation
got huge performance gains on their backup product by doing read
scatter/gather operations where they read from differnt files at
the same time so they could read large contiguous chunks of disk
space. Disks are faster now, but the problem is still the same.

I would agree with the recommendations that you should probably queue
at least 2-4 asynchronous requests at a time (read first, then write,
unless on different drive in which case 2-4 each interleaved).

Be sure to heed the tip to use page aligned buffers for the I/O and
if you do a good job of managing the I/O, turning off FILE_BUFFERING
(caching) would also be a good idea. This is ESPECIALLY true if you
intend to read and write a LOT of data since it will just flush the
cache for other uses and total system performance will suffer. Besides,
this kind of application isn’t as well suited to caching as what you can
do managing the I/O yourself.

I can’t speak to the documentation pointed to by Peter because
I don’t have that email any longer. It wouldn’t hurt when you
reference something like that if you gave the details in your
email.

Hope this helps.

Rick Cadruvi…