RE: How to make PCI BAR addresses usable in the user mode?

Well, I think it’s an excellent approach, just one that is rather unusual.
Unusual only because most hardware doesn’t need such a high I/O rate. Or,
let me be precise – such a high rate of *individual* I/O requests. A lot
of other devices do move a lot of data between system RAM and device, but
they usually do it in batches – disk I/O, video, 3D pipelines, etc. So the
1us delay is usually amortized over many hundreds of thousands of requests
that are all carried in the same single IRP.

So, to the original poster: Consider using batched I/O, if it makes sense to
do so. Then when your driver receives the IRP, you can lock down the MDL
and read directly from the memory, spin in a loop, and write to your
hardware in kernel-mode.

If you MUST have direct access to hardware from your user-mode app, then I
have a question for Russell: Can you elucidate on how your design works?
Can you guarantee safety – that is, that no user-mode app can corrupt
system or device state?

– arlie

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Russell Poffenberger
Sent: Thursday, October 27, 2005 9:43 AM
To: Windows System Software Devs Interest List
Cc: Windows System Software Devs Interest List
Subject: Re:[ntdev] How to make PCI BAR addresses usable in the user mode?

I do the same thing in my driver. I am sure I will get flamed for it, but I
have no choice. In my application (which is a dedicated interface to a high
performance semiconductor tester), we make millions of calls to the tester
hardware through the PCI interface board. If I had to code it so all calls
go through the driver, then the product would be so slow we might as well
not even put it on the market, it takes at least 10x longer to do an IO
using an IOCTL to the driver because of the overhead, than simply having the
address space mapped into the user process space. The 64bit/66Mhz PCI card
that we designed inhouse can do a PCI transaction in about 120ns, it seems a
shame to wrap 1us or more overhead around it.

Now, I don’t do ALL the I/O through a memory mapped space, since 80% of our
access to the hardware is writes, then these are done through the memory
map. 5% is reads which DO use an IOCTL simply because our hardware may take
several microseconds to return a read result, and we can’t have the PC wait
around that long, so it uses a delayed delivery mechanism that interrupts
the driver when the reusult is available and the IRP is completed. In fact,
our tester hardware has a “threading” concept where we can have up to 256
separate outstanding reads queued to the tester hardware at any one time.

Oh, the remaining 15% is bus master scatter/gather DMA.

Having said that, if the OP is interested in how I performed this mapping,
email me directly since I doubt you will get any help otherwise.

Hi Arthur,

Glad I could be of help.

Sometimes we just need to do what we need to do.

Later,

At 10:25 PM 10/27/2005, you wrote:

Thanks to Tim Roberts and Russell Poffenberger, they pointed me to the
right direction to correct my mistake. Now the whole thing works
beautifully, and I am leaving it to run overnight. My customer is also
testing it now overseas, and they are very happy with the flexibility and
performance provided by the new driver. I really appreciate you guys’ help!

Russ Poffenberger
Credence Systems Corp.
xxxxx@credence.com