Hi,
I have a DMA problem that looks rather spooky so I would be grateful if you could contribute any ideas towards overcoming it.
Summary: DMA operations on WinXp64 introduce artifacts into the data.
Background: 1394 OHCI adapter using our drivers (not the MS drivers) that receives large images from a 1394 digital camera. Adapter can only do DMA on 32bit addresses. Target system is WinXP 64 on exotic hardware (NUMA from what I have come to understand).
Images can be quite large, 5-6 MB, and we generally write our software so that one frame corresponds to “one” DMA operation. The whole buffer gets locked down, mapped and then required map registers are calculated, a call to DmaOperations->AllocateAdapterChannel is made, then the physical addresses are obtained with successive calls to DmaOperations->MapTransfer, etc.
Everything works like a breeze on x86 systems and on normal x64 systems. Then we met the exotic system, which as far as I have understood is a NUMA creature. In this system the call to IoGetDmaAdapter returns a maximum of only 256 map registers no matter what value I specify in DEVICE_DESCRIPTION::MaximumLength. Of course if I specify less, I get less, for example specifying MaximumLength of 262144 yields 65 map registers, which also leads me to understand that the page size is 4KB.
Although a max of 256 map registers should be available we found out that we could only DMA buffers of size up to 56KB. Of course we queued several of these smaller DMA requests, each time the AdapterControl routine returning DeallocateObjectKeepRegisters. At some moment there were no more map registers so the IRPs were internally queued.
So we modified the software to split the single user-mode buffer over multiple DMA requests. The size of each DMA request is not a multiple of page size, but a multiple of the 1394 image packet size. This means that a single physical page might get mapped up to some extent in one DMA operation, and the rest of it on the next DMA operation. This got me skeptical initially, but the solution seemed to work fine on the first 8CPU/8GB machine where we tested.
However after moving the solution to a different WinXp64 machine the client started having a weird phenomenon. Every second or third frame artifacts would appear in the image. We had the camera transmit in test mode, where all quadlets are 0xBCBCBCBC, and we captured the frames at the lowest possible level and the artifacts were there.
The artifacts in all cases were exactly 56 bytes long, except for exactly one case where it was 32. All 56-byte artifacts contained exactly the same byte pattern. Moreover, since the application recycles the user mode buffers it uses to receive frames, the offsets of the artifacts where exactly the same when the same user mode buffer was being used. There might have been a small difference like 1 artifact that was present on one frame and not present on the next capture but other than that the artifacts on the same user buffer were on exactly the same offsets.
To make matters even more interesting, the client told me that if he reduces memory from 4GB to 2GB then the artifacts are gone.
Does anyone have an idea why this is happening? My main suspect is the fact that the same page is often used in two independent and consequent DMA transfers and we might be facing some weird cache coherency issue on these machines.
The only option I can currently consider is to use separately allocated user-mode buffers for each of the DMA operations and then reconstruct the image from the pieces doing an extra memory copy that is currently not needed.
Any advice will be greatly appreciated.
Dimitris Staikos
Unibrain