At 02:17 PM 1/20/2004 +0100, you wrote:
Thanks to all who took their time for an answer
After comprehending the page size problem, we decided to try out to split
up the DMA to 4k bulks. I changed the Dpc and the AdapterControl routine
and checked it out with a 100M DMA transfer (actually 245 pages). We
measured a speed of about 20MB/s. The theoretical reachable value of the
PCI (32-Bit, 33MHz) is about 120MB/s. The measured speed is not
astonishing, but we know how to increase it: Scatter/gather. The hardware
designer thinks now about a redisign (a bigger FPGA). We decided for page
sized DMA because if we allocate in kernel contiguous memory it’s not sure
if we get it and if it’s big enough, especially if the card is running on
a system with low memory. So I thought it’s more reliable to split up and
make small DMA bulks but write it directly to user mem and also not burden
the proc with an additional copy from kernel to user mem.
Here is what I do for our custom PCI card. It too does not support
scatter/gather. I am using Compuware’s DriverWorks, but these are just
wrappers around the standard kernel routines for the most part, so there
should be a DDK equivalent.
The DW DMA concept has you setup a KDmaAdapter object. In this you specify
attributes of the DMA transfer such as scatter/gather support (true/false),
interface type (PCI), width, etc. When it comes time to perform the DMA, a
KDmaTransfer object is initialized (it takes the KDmaAdapter instance as an
argument), and calls the Initiate method on it (which takes an mdl and
callback as an argument). This method sets up mapping, and invokes the
callback. My callback can get (from the KDmaTransfer object) the number of
bytes remaining, and if non-zero, calls GetTransferDescriptors (on
KDmaTransfer) to get a DMA segment pair. This pair is a PCI physical
address and length. The card’s DMA engine is programmed and the DMA is
started. When the ISR gets the interrupt, it calls the Continue method on
KDmaTransfer. This sets up the next contiguous segment (from the original
mdl) and invokes the callback again. If the card had implemented
scatter/gather, the first invocation of the callback would get enough
transfer descriptors to program however many scatter/gather registers the
hardware supported. If there were more transfers than registers, then you
would use the above method to program and start another sequence of
segments. SO in a way, not having scatter/gather is just a special case
where only one register exists.
In this way, it breaks the DMA up into contiguous segments directly from
the mdl. In my experience, the first segment is typically smaller than the
rest, but usually 6 or 7K. After that, segments are usually 32k or more.
Our card is a 64bit/66Mhz device, and we routinely achieve 250Mbytes/sec on
DMA writes (PCI device to memory) and 90Mbytes/sec on DMA reads (memory to
PCI device). It took a bit of tuning on the hardware side as well. A good
PCI analyzer is a must (we have a VMetro). What is very important to
getting good DMA performance is getting the burst size as large as
possible. When bursting, transfers are 15ns each (66Mhz PCI bus, 30ns for
33Mhz). However, if the burst is short (a burst can be terminated by either
the PCI device or the CPU), then it can take several hundred nanoseconds to
re-negotiate the bus and the PCI device to re-acquire bus mastership, this
is the killer for DMA performance.
You need to keep burst sizes above 128 transfers to start reaping
performance benefits.
I have also noticed that different motherboard/chipset combos have
different performance, mainly due to their ability to sustain a burst,
which is probably a measure of their PCI bridge and memory bandwidth. The
best results I have seen are on an Intel motherboard using the E7501
chipset. A close second is a SuperMicro MB using an Intel E7505 chipset.
(both using P4 Xeon processors). I have an AMD Athlon MP based MB which
comes in third (at about 150Mbytes/s DMA write), and last was a SuperMicro
motherboard (P4 Xeon) using the ServerWorks GC-LE chipset at 75Mbytes/sec
DMA write.
Russ Poffenberger
NPTest, Inc.
xxxxx@NPTest.com