Hi all,
In the WDM/WDF DMA model, the recommended way to do continuous DMA is to
use a common buffer. But what if a driver needed to perform continuous
DMA to an externally-supplied circular buffer? Copying to a common buffer
could be unnecessarily expensive. One obvious solution would be to use a
repeating sequence of packet-based transfers, i.e. break the circular
buffer into segments and repeatedly call
GetScatterGatherList()/PutScatterGatherList() on each segment. The
problem with this approach is that the logical addresses returned by
GetScatterGatherList() (or MapTransfer() if you’re old-school) aren’t
guaranteed to be the same for successive calls on the same buffer segment.
This means you’ll have to reprogram your device for each buffer segment,
which can be expensive. And it can be even more expensive if the call to
GetScatterGatherList() involves programming physical->logical address
translations into an IOMMU.
It seems like one thing the WDM/WDF DMA APIs could really use is a
separate routine that *only* does DMA synchronization on a buffer, without
regenerating the logical address mapping. Depending on when it’s called
relative to when the buffer segment is transferred, it could do any
necessary cache-flushing or bounce-buffer copying. So for a continuous
transfer, a driver could call GetScatterGatherList() only once to
establish the logical address mapping, and then just repeatedly call this
new synchronization routine on each buffer segment during the transfer,
finally calling PutScatterGatherList() to teardown the mapping when the
transfer is stopped.
I understand that the packet-based APIs were designed to enforce equal
distribution of map registers among devices, and this approach could allow
a driver to reserve a set of map registers for a long time. But in my
experience map registers really aren’t *that* limited, at least with
32-bit HW on a 64-bit OS (32-bit HW on a 32-bit OS w/ PAE is a different
story, but I’m choosing to ignore that case here:). It’s also interesting
to note that other OSes already have this kind of de-coupled
synchronization routine:
Mac OS X: IODMACommand::synchronize()
*BSD: bus_dmamap_sync()
Linux: dma_sync_sg_for_cpu()
Solaris: ddi_dma_sync()
So is this a limitation that other people have encountered w/ WDM? If so,
is it just something I have to live with, or are there tricks I’m not
aware of for working around it?
Thanks,
Jason