Guys… there’s GOT to be a simple explanation here. I can pretty much guarantee you that, whatever your problem is, it has nothing to do with cache, memory barriers, fences, neutrinos, left-spin vs right-spin, or anything else similarly esoteric.
If you were experienced Windows devs, and all of a sudden, you were seeing a problem… or if you were seeing a problem SOMEtimes… or you were on ARM64, which probably hasn’t been nearly as well tested (in terms of the Windows abstractions)… then MAYBE I’d buy that this is a memory barrier problem.
Even worrying about KeFlushIoBuffers is a bit of a stretch. Until just a few years ago, this function was a noop on x86 and x64 architecture systems. From wdm.h:
#if (NTDDI_VERSION >= NTDDI_WINTHRESHOLD)
VOID
KeFlushIoBuffers (
_In_ PMDL Mdl,
_In_ BOOLEAN ReadOperation,
_In_ BOOLEAN DmaOperation
);
#else
#define KeFlushIoBuffers(Mdl, ReadOperation, DmaOperation)
#endif
Soooo… isn’t it much more likely that (a) there’s a bug in your FPGA, or (b) you’re making some simple error in your Windows API calls?
I just did this with a FPGA dev, four weeks ago. He SWORE he was doing a DMA to the memory segment I provided… but on closer inspection of his code, he saw… oooops… wrong address. So he was doing a DMA to some random place in physical memory. Ooopsie!
And presence or absence of an IOMMU has no bearing on anything. This is all cooked into calling GetAlignedLogicalAddress… the “Logical Address” is provided by the HAL and takes into account the IOMMU. This is why we don’t call MmGetPhysicalAddress, but instead WdfCommonBufferGetAlignedLogicalAddress.
SO, let’s go back to first principals, shall we?
- Let’s be sure your programming your registers with the LogicalAddress – all 64-bits of it.
- Let’s be sure the rest of the registers are setup right… For the guy using AXI CMDA… did you ever tell me whether this was simple mode or not? Regardless, see if you can get things working first with simple mode… then if you need to worry about S/G and descriptors you can.
- Let’s be sure you’re looking at the data after a device-to-host memory transfer in the debugger, and NOT from some program you’ve written (too many chances for errors)
- Let’s make sure that when you look at the data in the debugger, you try looking at it using the memory window, using first the kernel virtual address (that you get back from GetAlignedVirtualAddress) and the “physical memory” address you get back from GetAlignedLogicalAddress
- Setup ChipScope or SignalTap or whatever… and see if you can monitor that DMA operation (easy for ME to say, never having actually used either one of these tools… I’m a host-side software guy, not an FPGA guy… though I sometimes masquerade as one).
Peter