is it OK to implement DMA in a KMDF driver without using WdfDMATransactionInitialize and related API

I have a rather simple PCie device (It is experimental and learning purposes for FPGA development).

The device does not use interrupts for any DMA completions nor does it support Scatter/gather.
The maximum transfer length is 4k (in both directions).

Instead driver uses polling (reading some register on a PCIe memory mapped control register).
for now, DMA is sequential (one after another).

my KMDF driver has allocated a common DMA buffer using WdfCommonBufferCreate() and got the Logical Address and system virtual address.

Normally, I would do create DMA transactions for both directions and go that way as I have done in the past.
But I am just wondering if I have to do all of that and why the following would not work.

if it is relevant, I have a x64 system with Windows 10 and IOMMU(VT-d) enabled.

My question is :-
a) if I just take the logical address and program it into my device DMA control registers and start the DMA (from device to system memory) by writing to my device’s control registers, why would the DMA not work?.
Can I not assume that since I have received the logical address, that once the DMA starts, the IOMMU would let the access (in this case bus master write) go through?.

The data does not show up in my common buffer and the internal counter in my device showing number of bytes transferred has not moved.
Since these are memory writes (posted writes), if IOMMU fails the transactions, I do not think there is a PCIe AER that the device would know (I can see why memory reads might not work).

In this particular case, how would the device know if the IOMMU is failing the request (if the IOMMU is not programmed by HAL to set up the page tables)?.

Before I get lambasted, I am working on doing this the “KMDF” way of creating DMA transactions and the kosher method but I am just curious.

Thanks,
RK

If you’re getting the Logical Address of the Common Buffer, and programming your DMA Controller with that, you already are doing it the KMDF way.

Nothing wrong with that at all —It’s the way it’s designed to work, and DMAR works properly via the IOMMU.

I am presently working on a driver that works exactly this same way (we’ll, minus the polling thing that you’re doin, which is a bit unusual — just please tell me you’re doing that polling in a thread you create with PsCreateSystemThread and NOT any other way)…

Peter

Peter, thank you very much for the reply. Actually, I have a user mode thread come in via a ioctl calling in frequently to poll the device control register which indicates bytes transferred.

I was thinking about a PsCreateSystemThread based routine that would poll it but since this is just testing code for testing my FPGA PCIe device, I punted on that and just doing ioctl from user mode.

So the fact that I do not see the bytes transferred counter move means I am probably not programming the DMA controller properly.
All the DMA control registers are in PCIe memory space (mapped to a 64-bit PCIe bar). There are some DMA error registers and they are not getting flagged which means that at least as far as I can see there are no errors.

Still need to debug but thanks for confirming my method will work.

Thanks,
RK

When I look at the PLC pci9x5x sample in the Windows driver samples, it does use common buffer DMA but it also creates DmaTransactions for both DMA read and write and schedules them separately. When the callback gets called that is when they do the DMA. I know the PLX device supports scatter/gather and probably that is the reason they have to use this (because Scatter/gather probably needs map registers).

@Ramakrishna_Saripalli said:
Peter, thank you very much for the reply. Actually, I have a user mode thread come in via a ioctl calling in frequently to poll the device control register which indicates bytes transferred.

I was thinking about a PsCreateSystemThread based routine that would poll it but since this is just testing code for testing my FPGA PCIe device, I punted on that and just doing ioctl from user mode.

So the fact that I do not see the bytes transferred counter move means I am probably not programming the DMA controller properly.
All the DMA control registers are in PCIe memory space (mapped to a 64-bit PCIe bar). There are some DMA error registers and they are not getting flagged which means that at least as far as I can see there are no errors.

Still need to debug but thanks for confirming my method will work.

Thanks,
RK

Sounds reasonable … I might make a few small changes to the design, though … the idea is that you want to keep data as close to code as possible; so, if the data lives in kernel mode then the “worker” should also live there. The other idea is that you want to keep things impossible to fail; so if you’ve got a DMA transaction that’s in the works then everything needed to complete that call should be as isolated as possible from the rest of the system …

So what I would do would be to first have a system DMA thread to set up and manage the DMA (remembering to synchronize that with the ISR/ DMA return path) into two common buffers … and that’s all the thread does, pulls data off the FPGA into the common buffers. You can make this as simple/ elaborate as needed/ desired, but the thread DMA’s into buffer A, then into buffer B, then into buffer A, etc.

The usermode IOCTL I would make into an inverted call (there are many examples here of that) which would simply present a usermode buffer of common buffer size and wait until the system thread filled one of the buffers ( A or B ) and marked it “dirty”, at which time the usermode IOCTL handler would copy the contents of the “dirty” common buffer into the IOCTL buffer, mark the buffer “clean” and complete the call. You would need to synchronize the copy of the system common buffer into the usermode buffer and you could make this more elaborate (a timeout return, or an error code) but the idea is that when there’s a “dirty” common buffer from the DMA thread it’s available to be copied into the inverted call buffer by the usermode IOCTL thread and marked “clean” for the DMA

By doing it this way you decouple the usermode IOCTL thread (which could get cancelled, or not have a large enough buffer, or have multiple callers, or not be able to keep up with the DMA throughput) from the system thread actually doing the DMA. You are free to add whatever flow control logic between the two without them constraining each other …

1 Like

@craig_howard thank you very much. I like the suggestions and I plan to do that. This is eventually going to become a full blown driver bus master scatter gather and all. It is just that I am teaching myself some FPGA programming and I want to do this step wise and not introduce a whole bunch of variables. For now, I am not even doing interrupts but plan to do MSI-X later.

know the PLX device supports scatter/gather and probably that is the reason they have to use this (because Scatter/gather probably needs map registers).

Nope. The PLX sample (not one of my favorites, and based on a sample driver OSR wrote many, many, years ago) creates a common buffer where a circular buffer is shared between it and the DMA Controller. The Transaction is used to handle individual user Requests. So, very different design t what you are using.

Map Registers, DMAR, the IOMMU…. All that works with Common Buffers. The only reason we don’t use Common Buffers more is that we want to DMA data directly into and out of the users data buffer. Using a Common Buffer you have to memcpy the data to/from the Common Buffer first.

Peter

@“Peter_Viscarola_(OSR)” Thank you very much. Now I see what you are saying. Yes the PLX driver handles Read and write from user mode using Direct method. For now, common buffer DMA is ok because my DMA is not in the fast path. I DMA a 4K of data to the device, device does some processing and then returns some status. That is the model now but that will change.