Contiguous memory is allocated using the following code snippet! The target system is x64 (Asus Z170 deluxe).
PHYSICAL_ADDRESS LowestAcceptableAddress = { 0 }, HighestAcceptableAddress = { 0 }, BoundaryAddressMultiple = { 0 };
HighestAcceptableAddress.QuadPart = 0xFFFFFFFFFFFFFFFF;
Protect = PAGE_READWRITE | PAGE_NOCACHE;
ChunkSize = 32*1024*1024;
NumaNode = 0;
SystemVA = MmAllocateContiguousNodeMemory(ChunkSize,
LowestAcceptableAddress,
HighestAcceptableAddress,
BoundaryAddressMultiple,
Protect,
NumaNode
);
pMdl = IoAllocateMdl(SystemVA, ChunkSize, FALSE, FALSE, NULL);
MmBuildMdlForNonPagedPool(pMdl);
UserVA =(((ULONG_PTR)PAGE_ALIGN(MmMapLockedPagesSpecifyCache(pMdl, UserMode, MmNonCached, NULL, FALSE, HighPagePriority))) + MmGetMdlByteOffset(pMdl));
MappedPhyAddr = MmGetPhysicalAddress(SystemVA);
The 64 bit application pseudocode is as follows.
- Get the UserVA from the driver through an IOCTL call.
- Alloc Temp memory for 32MB and initialize with zeros
- Do memcpy in mentioned chunks (16 byte or 1024 byte) for 32MB.
Profiling is done using QueryPerformanceCounter() wrapped around memcpy().
As you mentioned, as the system is x64, I would want to definitely take advantage of cache if the system is cache coherent for DMA operations. I would have to test the system for this.
The contiguous buffer which I allocated is used by both the user space appl. and Device for R/W simultaneously. But at any point of time, only one of them access certain memory range in the buffer.