@Tim_Roberts said:
When you call IoAllocateMdl, are you passing the user-mode virtual address? How are you getting the address in kernel mode? I assume you must have done this in the user process context, otherwise it wouldn’t work at all. Are you quite sure you are mapping the entire buffer? The operating system won’t change virtual-to-physical mappings of a locked buffer, so there must be something strange going on.
Thanks @Tim_Roberts & @Peter_Viscarola_(OSR) for your replies!
Tim, below are (more) direct answers to your questions. I will write a better description after this to provide better context as suggested by Peter:
- Yes, we do pass the user mode virtual address through calling IoAllocateMdl in kernel mode
- Not sure if you were referring to physical address (or logic address) or virtual address in your 2nd questions. If it’s later I will explain in the description below, but for logic address, we use PMAP_TRANSFER . These logic addresses are DMA read by the card in batches from memory and used as destination address for DMA writes conducted by a DMA engine on the card. This part is confirmed to be working, not impacted by the bug we are targeting here
- Yes, we allocated the buffers in user mode, passed the base virtual address and the buffer sizes to a DeviceIoControl , in which we do IoAllocateMdl and MmProbeAndLockPages
- No, we are not mapping the entire buffer in one shot but we do map the entire buffer. The circular buffer could be 512MB or 1GB in size, but depending on how big is the system memory, we allocate the circular buffer in multiple 132MB segments in user mode. Each virtual address segment’s base address is passed to our driver function to establish MDL and lock in kernel mode. In the driver function use IoAllocateMdl to map and lock in 32MB blocks until the whole segment is mapped and locked. The MDLs, as well as the logic address was passed back to app in user mode again. We do this for every segment until the whole circular buffer is done. Note that logic addresses passed back are pointers to 4KB physical pages, based on which we create even finer 1KB DMA descriptors in user space, for Bus-Master DMA engine in the card to use later for 1KB DMA writes.
- For your question earlier about NUMA node, initially (let’s say version 0) we didn’t use CreateFileMappingNumaA, instead, we used CreateFileMappingA. We changed to CreateFileMappingNumaA to remove one variable, but it still failed after the change. Your question get us looking a little deeper into this, and realized that it may be more complicated to properly map the memory to a Numa… do we need to which Numa node the current app is on and lock the app first, then map the memory to the corresponding Numa node? if you could provide some guidance on how to properly do this, that would be great.