How does MDL chain passed to a DMA transaction work with common buffer?

arielle_Deng · January 6, 2025, 8:40am

Hi experts in community!

I am implementing DMA transactions based purely on a common buffer for a dedicated PCIe device, expecting that individual read and write IRP requests will no longer be limited by the size of my common buffer.

Previously, I called WdfCommonBufferCreate to create a 64KB common buffer and provided the device with the logical address of this buffer. For user write requests, the driver copied data from the user buffer to the common buffer using RtlCopyMemory, then notify the device; read requests were handled similarly. Everything works fine.

However, this implementation does not support user read/write requests larger than common buffer length. Therefore, I believe I need to implement DMA transactions that break down requests into multiple transfers. My code is like:


WdfRequestRetrieveOutputWdmMdl(Request, &mdl);
virtualAddress = MmGetMdlVirtualAddress(mdl);
length = MmGetMdlByteCount(mdl);
WdfDmaTransactionInitialize(dmaTransaction, 
                            PLxEvtProgramReadDma,
                            WdfDmaDirectionReadFromDevice,
                            mdl,
                            virtualAddress,
                            length);

I'm not quite clear on how the MDL and common buffer work together to perform DMA transfers. Therefore, I tried to inform the hardware of the data it needs for each ProgramReadDma callback and, in the DPC, copy the data from the common buffer to the corresponding offset in the user's buffer.
Obviously, this does not work at all; I noticed that after a successful copy to user's buffer in each DPC, the data in the user's buffer was changed in the subsequent transfer.

So how does MDL chain passed to the WdfDmaTransactionInitialize works with a DMA common buffer?

I've looked through PLX9x6x's code, MSDN docs and "Developing Derivers with Microsoft WDF" but did not find answers. Thanks in advance!

prasad_bm · January 7, 2025, 10:09pm

DMA transfer transaction is created based on the Request; did you try using the WdfDmaTransactionInitializeUsingRequest method?

arielle_Deng · January 8, 2025, 1:18am

Hi

Yes I've tried but failed.
I guess it's becauce I'm using DMA based on pure common-buffer design. And I read (starting from the 5th paragraph below) this from the book "Developing Drivers with the microsoft WDF". So I use WdfDmaTransactionInitialize instead.

One of the parameters for API WdfDmaTrasactionInitialize is a pointer to MDL, so what should it be? A pointer to the mdl of the user's buffer or to my DMA common buffer?

I tried passing the mdl retrieving from WdfRequestRetrieveOutputWdmMdl to this API, but it did not work. I guess it's something stupid and I'm missing something really basic here.

Tim_Roberts · January 8, 2025, 8:14am

Where do you want the device to put the data? Presumably, in the common buffer. The DMA abstraction doesn't need to know where you will eventually copy the data.

prasad_bm · January 9, 2025, 9:50pm

If you are using Common Buffer DMA, you should be using the function
PMDL IoAllocateMdl(

[in, optional] __drv_aliasesMem PVOID VirtualAddress,*
[in] ULONG Length,*
[in] BOOLEAN SecondaryBuffer,*
[in] BOOLEAN ChargeQuota,*
[in, out, optional] PIRP Irp*
);
For getting/mapping the MDL for buffer?

Mark_Roddy · January 9, 2025, 11:39pm

I don't understand why you don't just use WdfDmaProfilePacket64 (assuming the device supports 64 bit addresses) instead, and let the OS deal with your non-scatter-gather device using map registers.

Otherwise, for each write IRP, get the VA for the MDL, copy common buffer sized blocks of data to the common buffer, notify your device to copy from the buffer, have your device notify you when the copy is complete, repeat until all data is transferred. (Similar for reads).

This is what is going to happen with the packet profiles, except you can use the WDF dma facility and let it allocate 'map registers' and do the copy to or from them.

arielle_Deng · January 17, 2025, 9:21am

Thank you for your advice; I apologize for my delayed response.

@Mark_Roddy I have implemented a version based on your suggestion. I'm using RtlCopyMemory to copy data from user's buffer to DMA common buffer. And my device will read and copy it to device's local memory. Now, my driver works fine, achieving an overall data transfer(from device local memory to pc user app) rate of up to about 350MB/s (PCIe Gen2 ×1).

I am puzzled as to why the WDF framework does not have an API for DMA transactions designed for a common buffer. I understand that if it's a purely common buffer DMA-designed device, the driver needs to manually split (just ad @Mark_Roddy suggested) the data according to the size of the common buffer. Is the WDF API missing a function similar to WdfDmaTransactionInitializeWithCommonBuffer? (Split the user's request into several transfers)

Additionally, my colleague and I have a disagreement regarding this issue: For purely common buffer designed PCIe devices, at what stage does DMA transfer between the RC (host PC) and EP (device) occur? Is it when the device accesses the common-buffer located in PC memory, or is it when the device initiates its own DMA to copy data from the common-buffer to its local memory?

I believe that the DMA transfer occurs when the device accesses the data in the common buffer; however, she thinks that this access is just a PCIe data transfer (a PCIe outbound transaction), and the DMA operation only happens when the device initiates a DMA to copy data from the common buffer to the device.

arielle_Deng · January 17, 2025, 9:44am

Put the data to device's local memory. According to the system design, the user data transfer should involve two parts (or requests): one to specify where the data should be placed in the device's memory, and the data to transmit.

I guess it's unneccessary to use DMA transaction API from WDF for my pure common-buffer designed DMA devcie. Is my understanding correct?

Tim_Roberts · January 17, 2025, 7:16pm

I am puzzled as to why the WDF framework does not have an API for DMA transactions designed for a common buffer.

What would such an API do? There are WDF APIs for allocating a common buffer that can return the virtual and physical addresses. That's really all the assistance you need, right? It's physically contiguous, so there is no need for an MDL. The method of copying is all up to you. There's no need for a "transaction", because there's nothing to clean up.

That's quite different from an on-demand DMA scheme, where there is a lot of overhead in locking down the memory, creating a scatter/gather list, and remembering to unlock it after the request is complete. That's the kind of error-prone overhead that WDF is very good at handling for you.

Tim_Roberts · January 17, 2025, 7:21pm

I'm not sure what distinction you're trying to draw here. You have used two phrases that mean exactly the same thing. "DMA" is really a phrase left over from the early days of the PC. What's happening here is not particularly special. All we're talking about are PCIe transactions. These transactions happen to be initiated by the device, instead of by the root complex, but otherwise they are identical. "DMA" in this sense is just a normal PCIe transaction initiated by the device.

Put another way, the way that the "device accesses the common-buffer" is by initiating "its own DMA to copy data". That's not two separate things.

MBond2 · January 18, 2025, 12:54am

DMA stands for direct (or device) memory access. All it means is that there are some memory accesses (reads or writes) that are initiated by your device instead of by a CPU.

From the point of view of the OS and your driver, what it needs to know is that this particular block of physical memory is being used by your device, so don't touch it now.

Your device can use the data in that block of memory in any way that makes sense to your device and is understood by your driver. usually that something is to copy from system memory to device memory or the other way around

But before your device can do whatever it needs to, it needs to be told what block of memory to use and what to do. Usually your driver does that by writing to a register etc. and then waiting for an interrupt from the device to indicate that the device is finished whatever it was told to do. That's more complex if your device has special limitations on the addresses that it understands, but that's the basic sequence

arielle_Deng · January 20, 2025, 2:15am

@Tim_Roberts @MBond2 Thanks a lot! It really helps us. I understand your meanings now, and the debate within our group should be able to come to an end.

system · March 21, 2025, 2:16am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.