I defer to Ravi on the point of setting valid data-length.
-p
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Ravisankar
Pudipeddi
Sent: Wednesday, December 14, 2005 11:26 AM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] Writing faster to disks
I agree with most of the recommendations below from Peter: use
asynchronous (i.e. overlapped in Win32 API terms), non-buffered i/o, if
you want to talk to the metal and get close to raw disk bandwidth. You
do not necessarily need completion ports unless you are doing
multi-threaded i/o: note you don’t need multi-threaded i/o to saturate
disk bandwidth. You just need a deep enough pipeline (i.e. enough number
of i/o requests pending). Use large buffers, and post as much as you can
afford without tanking the system, and keep the pipeline going. You can
do this all from user mode, no need for kernel drivers and fancier
miniports.
Setting end-of-file ahead is imperative as Peter points out. However
please do not set the valid data length: when you call
SetValidDataLength(), NTFS simply updates the VDL - which means that if
there was a crash before the file was completely overwritten, users can
read uninitialized data, which has bad security implications
(disclosure). And SetValidDataLength() is also a privileged operation.
If this is a file you repeatedly do i/o to/from (say such as a database
file), you can benefit by simply zeroing out the file to begin with - a
one time cost, but now i/o to the file will incur minimal FS overhead.
Ravi
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Peter Wieland
Sent: Wednesday, December 14, 2005 10:43 AM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] Writing faster to disks
Gary, i’m not sure where you got the idea that in SCSIPORT transfers to
a LUN are synchronous. \SCSIPORT can handle up to 250ish requests
outstanding at a time (device and memory conditions permitting) spread
across all LUNs. Most modern controllers can handle more than one
request at a time to a particular ITL nexus (there was a time when this
wasn’t true) and most drives have a reasonable queue depth before they
start fending off requests.
You are correct about request splitting - requests are split up based on
the maximum transfer size reported by the port driver, which is based
more on the number of SG list breaks the device says it can support than
on any transfer size limitations. There are some registry settings the
admin can use to up the number of breaks allowed to the maximum that the
controller can support, but pushing this limit up costs more in
pre-allocated memory (srb extension sizes go up with this count). 68KB
ends up being the default size (to allow for a 64KB transfer buffer
which is sector but not page page aligned).
The OP will get the best performance benefit first by switching to an
asynchronous I/O model that uses completion ports. This will let them
send the most I/O with the least number of threads.
He should also be pre-allocating the space for files by creating them
and then setting the valid data length (not just the file size) out in
large chunks. Otherwise the writes through the file system will result
in a lot of file extensions, which are synchronizing operations.
And he should examine their SCSI configurations to be sure they aren’t
saturating the PCI bus or the SCSI bus - SCSI starts to degrade (if i
recall) once you go past 3 devices on a chain.
Finally he should look into the configuration settings the SCSI adapters
provide - including the number of physical breaks that are allowed - and
then adjust their I/O size accordingly to avoid the need for the driver
to split them.
There’s plenty that can be done before writing your own miniport or
trying to redesign the way NT I/O works.
-p
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@seagate.com
Sent: Wednesday, December 14, 2005 7:36 AM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] Writing faster to disks
Is this a SCSIPORT or STORPORT mini-port? If it’s SCSIPORT, transfers to
each LUN are basically synchronous, so it won’t matter how many threads
send how much data to a given LUN. How large is your SCSI block transfer
size? Unless you have changed it, I believe that 64K is the standard, so
every 5MB block you send from the application threads gets broken up
into lots of 64k chunks, and each chunk gets transferred synchronously.
Now consider map registers. When your mini-port allocates its DMA
adapter you may have a delay as the mini-port waits for resources
because not enough map registers were allocated.
The point is LOTS of things in the system cause overhead that slows down
throughput. The application layer has limitied control, but
therearetweaks that can be done. If you own the mini-port you have a
little more control, but when all is said and done, you have a storage
stack consisting of many device drivers and services that have to have
their say. The only way you can do what you want is to write a PCI
driver for a given SCSI HBA, but even that will have it’s limitations.
Gary G. Little
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@googlemail.com
Sent: Wednesday, December 14, 2005 6:57 AM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] Writing faster to disks
Hi
I check the writing speed with files of around 5 MB for one disk. If
this is x, then writing parallely (using threads) to lets say 8 disks
should be around 8*x. This is of course a very simple assumption (and
probably incorrect). The number of logical CPUs is a constraint. I am
reaching a speed of around 6*x and would like to extract more juice.
To measure the speed, I have used high performance counter and a buffer
overhead of approx. 5 MB. In my case x is around 70 MBps. I am using no
intermediate buffering. Any suggestions?
regards
Gary
PS: Valeriy, I would check the use of overlapped structure and see if I
can get better performance with WriteFileEx().
On 12/14/05, Mark Roddy wrote:
WriteFile translates into IRP_MJ_WRITE using direct IO, which would make
it more efficient than putting your driver in the middle of the
operation. What exactly do you mean by ‘write data faster to SCSI disks’
and how exactly are you measuring this?
=====================
Mark Roddy DDK MVP
Windows 2003/XP/2000 Consulting
Hollis Technology Solutions 603-321-1032
www.hollistech.com http:</http:>
________________________________
From: xxxxx@lists.osr.com [mailto:
xxxxx@lists.osr.com
mailto:xxxxx] On Behalf Of Gary Leonne
Sent: Wednesday, December 14, 2005 6:20 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Writing faster to disks
Hi all
I am toying with different possibilities to write data faster to
SCSI disks other than using the standard WriteFile() API call. My
constraint is that my data must be recognisable to NTFS so I cannot
write at sector level. One way that I could gather was:
Share memory between kernel and user level and call the NTFS
driver with IRP_MJ_WRITE, which contains the buffer to write.
My question is: Is it going to give me a comprehensible gain in
speed ? Only if the gain is large enough than the normal WriteFile()
API, I would take the pain to write the kernel driver. Has anyone had
experiance with this? Is there some other way as well?
regards
Gary
— Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256 You are currently subscribed
to ntdev as: unknown lmsubst tag argument: ‘’ To unsubscribe send a
blank email to xxxxx@lists.osr.com
—
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256
You are currently subscribed to ntdev as: unknown lmsubst tag argument:
‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com
— Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256 You are currently subscribed
to ntdev as: unknown lmsubst tag argument: ‘’ To unsubscribe send a
blank email to xxxxx@lists.osr.com
—
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256
You are currently subscribed to ntdev as: unknown lmsubst tag argument:
‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com
—
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256
You are currently subscribed to ntdev as: unknown lmsubst tag argument:
‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com
—
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256
You are currently subscribed to ntdev as: unknown lmsubst tag argument:
‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com</mailto:xxxxx>