QUESTION:
Why is writing to a DirectDraw overlay buffer slower from a driver DPC
than it is from an application?
DETAILS:
I am writing a specialized PCI video capture driver for an industrial
vision product. No DirectShow or Kernel Streaming here, just capture
frames to contiguous buffers that are permanently allocated when the
driver loads. These buffers are also mapped into the app’s user virtual
space. It’s really pretty simple. (Good, 'cause I’m kinda new at this.)
Sometimes we want to DMA the incoming video to one of these buffers
continuously, and copy each frame to a Direct Draw overlay too, so live
video can be seen. The driver provides a way for an app thread to
synchronize with the end of each frame grab. The app thread takes a
certain amount of CPU time to copy each frame from the RAM buffer to the
overlay. (The app created the overlay.) This works fine.
Sometimes we want the driver to DMA directly to the overlay. So the app
passes the overlay buffer pointer to the driver. The driver creates an MDL
from that, then does a dirty trick, looking at the physical page addresses
in the MDL to get a hardware DMA address so the frame grabber can DMA
directly to the overlay. This works fine. (Please no lectures about how
this won’t work on a platform that really needs an adapter to utilize
mapping hardware. I KNOW I’m bad! That’s another discussion…)
The problem came when we tried to DMA to a RAM buffer then have a DPC
(interrupt after each frame capture is done) copy it to the overlay
instead of having the app copy it to the overlay. The driver uses
MmMapLockedPagesSpecifyCache (one time, when the app told the driver about
the overlay) to get its own Kernel-mode pointer to the overlay. I do
specify MmCached here. Yet it takes considerably more CPU time for the DPC
to copy data from the RAM buffer to the overlay, than it takes the app to
do it. Experimentation shows the time is mainly eaten up by the writes to
the overlay, not the reads from the RAM.
Why is the DPC copy so much slower than the same copy done from the app?
BTW READING from the overlay is excruciatingly slow, I think the hardware
just isn’t designed to do that fast.
Thanks for any ideas,
Paul Braun