Your application sends some number of requests to your pci driver using an
IOCTL (or a read for that matter) using DIRECT_IO. These requests are each
marked pending and queued in your driver and are used as the DMA sink for
data transfer from your device. As each buffer is filled, you complete the
corresponding io request, which notifies the app that the data is available
for writing to disk. No copies are involved other than the original DMA
operation from pci device to system memory, and the DMA operation from
system memory to the HBA. Your app has to feed your driver at an appropriate
rate, but you certainly ought to be able to do better than 8Mbs. Your app
provides ‘big enough’ buffers. I don’t know the details of your data format
- you will have to figure this out. The idea is that your app keeps the
‘pump primed’ so that there is always another buffer waiting to be filled in
the queue. You should, with a bit of work, easily be able to keep up with
the disk, which ought to be the bottleneck.
Of course if your device doesn’t do scatter gather dma this probably won’t
help as the extra copy operation is going to happen anyhow.
=====================
Mark Roddy
-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Nikolas Stylianides
Sent: Monday, May 16, 2005 1:07 PM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] DMA + common buffer ARC
How does this scheme work? How does this scheme eliminate buffer copy?
You mean before transfer my device interrupts and let me know about the
amount of data to give me. After that I prepare a buffer in user mode and
give it to the device to fill it. After completion my device interrupts
again and so I flush the data to the disk. Is this the scheme?
-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Roddy, Mark
Sent: Monday, May 16, 2005 6:00 PM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] DMA + common buffer ARC
Perhaps you should consider using direct IO and MDL based transfers from
your PCI device into your user mode writer app instead of using a common
buffer scheme that forces you into a buffer copy.
=====================
Mark Roddy
-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Nikolas Stylianides
Sent: Monday, May 16, 2005 3:26 AM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] DMA + common buffer ARC
Tim, I have already tested it in user mode. I am able to flush about
8Mbit/sec and that’s it. I want to eliminate the action of copying the
buffer from the DMA transaction into a User space buffer and then flush it
to disk. Have you ever tested it? How can you be sure that is a waste of
time to try it?
P.S. I even thought to use the disk.sys directly and be sector aligned in
order to be faster.
-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Tim Roberts
Sent: Friday, May 13, 2005 8:02 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] DMA + common buffer ARC
Nikolas Stylianides wrote:
I have already began to design the kernel mode version. I want to test
it and see if there is significant improvement. I will send my results
back to the newsgroup.
How are you going to judge “improvement”? Are you already doing it in user
mode and falling behind? If not, why go to the trouble?
Some people seem to think that the CPU somehow runs its cycles faster in
kernel mode than it does in user mode. It ain’t so. The only thing you
would gain by doing your disk writes in the kernel is a reduction in the
kernel/user transitions. For the most part, that overhead is going to be
completely lost in the overhead of writing to the disk itself.
There is more at stake here than raw performance. If a user-mode scheme
handles your data and runs the CPU at 35% load, and a kernel-mode scheme
would run at 30% load, it would be a huge mistake to release the kernel
scheme.
Now, if your user-mode scheme runs the CPU at 110%, so that you drop data,
and you have already exhausted other performance issues (like buffer sizes,
buffer counts, batching up work), then it might be worth an experiment.
–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256
You are currently subscribed to ntdev as: xxxxx@4plus.com To unsubscribe
send a blank email to xxxxx@lists.osr.com
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256
You are currently subscribed to ntdev as: xxxxx@stratus.com To
unsubscribe send a blank email to xxxxx@lists.osr.com
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256
You are currently subscribed to ntdev as: xxxxx@4plus.com To unsubscribe
send a blank email to xxxxx@lists.osr.com
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256
You are currently subscribed to ntdev as: xxxxx@stratus.com To
unsubscribe send a blank email to xxxxx@lists.osr.com