Implement DMA in packet-mode in a PCI/VME Bridge

As your device doesn’t support scatter gather you have to use a common buffer. You are going to do a copy from there into your user’s buffer, so there is no point in trying to share anything explicitly.

Thank you Mark for your quick answer.
Is it possible to do this copy from the common buffer into user’s buffer using an IOCTL call ?

https://docs.microsoft.com/en-us/windows-hardware/drivers/wdf/handling-i-o-requests-in-a-kmdf-driver-for-a-bus-master-dma-device

WDF is going to handle the copy details for you if your dma is connected to io requests… The general practice is to either initiate the DMA based on a user ioctl arriving at your driver, or to have the user pend a set of such ioctls to be used by your driver, with the driver independently initiating dma requests.

Is it possible to do this copy from the common buffer into user’s buffer using an IOCTL call ?

That is, in fact, the ONLY way to do it.

Thanks again.
What I do not understand in this model, it is the link between the PSCATTER_GATHER_LIST SgList in the EvtProgramDma
and what I get from IOCTL request in the call:
WdfDmaTransactionInitializeUsingRequest( dmaTransaction,
Request,
EvtProgramDma,
direction );
In DMA packet mode, I know that I have to use a single pair
SgList->Elements[0].Address.LowPart;
SgList->Elements[0].Length;
but I do not understand which address I got in this structure.
Thanks.

If you are using a common buffer, then you won’t be using WdfDmaTransactionInitializeUsingRequest. You have the physical address of of the common buffer, and copying into the user’s request when the DMA is done is just a simple RtlMoveMemory call.

Thank you Tim for your answer. So I understand that I have to not use WdfTransactionInitializeUsingRequest, do I use instead WdfDmaTransactionInitialize method ?
Also, in my EvtProgramDma, does I use the SgList parameter ?

This talk of “common buffers” has, in fact, confused you I think. Let me see if I can clear that up.

Use the standard KMDF DMA APIs. When you create your DMA Enabler, specify one of the packet-based profiles but NOT one that indicates that you support scatter/gather (because you don’t).

When your EvtProgramDma callback is called, the s/g list passed to you will have exactly one element. You use this as the base address and length to program your hardware. Done!

Ignore all this talk of Common Buffers… which could provide you an alternative way to do what you want, but does not really directly answer your question.

Peter

Thank you Peter for ypue reply. You are right about common buffer confusioning, and I will follow your suggestion about the s/g list point and take you informed if I have any further question.

Just one point: the start address for starting DMA on both PCI bus and VME bus of the Device must be programmed by user app. For example, for a Write access (PCI to VME direction), I have to program a register for the PCI Address and one register for the VME address. How the s/g list base address is programmed in this case ? In other words, what is the value of the first s/g list element ?

Just one point: the start address for starting DMA on both PCI bus and VME bus of the Device must be programmed by user app.

I’m not sure I understand: The user app calls (for example) WriteFile, providing a pointer to the data buffer that they want to write, and the length of that buffer. You then create a DMA Transaction and initialize it using the Request (InitializeDmaTransactionUsingRequest). Your EvtProgramDma Event Processing Callback gets called with the scatter/gather list (with exactly one element). You program your hardware.

Again… see this: https://docs.microsoft.com/en-us/windows-hardware/drivers/wdf/handling-i-o-requests-in-a-kmdf-driver-for-a-bus-master-dma-device which I think explains the process pretty clearly.

Peter

Just one point: the start address for starting DMA on both PCI bus and VME bus of the Device must be programmed by user app.

No. The programming of the DMA hardware must be entirely under the control of the driver. User mode code is naturally insecure; it’s way too easy for another app to interfere, and cause your hardware to do transfers for immoral purposes. The address of the user’s buffer is passed in the IRP, and is validated by the I/O subsystem. The address on the device side should be passed to the driver as a parameter, where the driver can validate it.

Thank you Peter and Tim for your reply.
I try to understand : the s/g list provides one element sglist.Elements[0].Address that I have to use to program my device in EvtProgramDma, but I do not see the link between this address and the address of the user’s buffer passed through the IRP.
Could you explain please at which address correspond this sglist.Elements[0].Address ?

Hmmmm… I’m not sure what you don’t understand.

The first byte of the buffer to which the s/g list points is the first byte of the user data buffer, as passed by the user in WriteFile or ReadFile.

Peter

sglist.Elements[0].Address is the physical address of the page whose virtual address is in the IRP. Are you familiar with virtual memory and physical memory? None of the addresses you use in your code, either in user mode or kernel mode, are actually the addresses of the bytes in memory. Those are virtual addresses, and they have to be passed through the page tables in order to get the address that goes out on the memory bus.

@Peter,
Ok so if I use a Writefile(handle, buf, bufsize, NULL, NULL ), or an IOCTL call like deviceIoControl( handle, IOCTL_WRITE_DMA, NULL, 0, buf, bufsize, NULL, NULL ), s/g list point to the first byte of “buf”, correct ?
@Tim,
Yes, I have a few knowlegde of the Virtual/Physical memory adresses concept and I read recently the Microsoft documentation about contiguous and non-contiguous block of memory associated to the s/g list concept because in fact it was new for me, you are right.
And so, if I have to program my 32-bit only device, I have to use sglist.Element[0].Address.LowPart that corresponds to the physical address translated by the framework, correct ?

Yes.

It would probably be most helpful to not refer to the contents of sglist.Element[0].Address as the “physical address” of the users data buffer. Because it may be or it may not be. It is the device bus logical address of the users data buffer, suitable for use with DMA. In the OP’s case, it almost certainly will not be the user data buffer physical address.

Peter

Thank you Peter for your reply.
And so on, if I use an IOCTL like in my example using Direct I/O type, is it possible to pass a structure instead of a buffer ? In this case, if the device bus logical address is the first element of this structure, does it mean that sglist.Element[0].Address point to this element ?
In my OP’s case, I want to perform DMA transfer using 1 IOCTL in User App for programming my device and I wonder if this is the good method because
all PCI examples use WriteFile() method.

And so, if I have to program my 32-bit only device, I have to use sglist.Element[0].Address.LowPart that corresponds to the physical address translated by the framework, correct ?

You’ve raised a couple of brand-new issues here. Is this a very old device? Because any modern hardware designer who creates a PCIe device that is limited to 32-bit addressing is guilty of malpractice. Windows has had 64-bit physical addresses since the very beginning, clear back in the 20th Century. ALL current PCIe IP blocks supports 64-bit physical addresses. There’s no excuse.

When your DMA is limited to 32 bits, the system has to take an extra step. You can’t control where the user’s buffer lives, and since modern systems often have 16GB or 32GB or more of RAM, the average user-mode buffer these days is statistically going to be above the 4GB mark. In that case, the operating system has to allocate special space below the physical 4GB limit. These are called “bounce buffers”. When the user submits a request, the I/O system will copy his buffer into a “bounce buffer” before calling your EvtProgramDma callback. There are a limited number of “bounce buffers”, which means your user request might be chopped into several pieces, with each piece getting another call to EvtProgramDma. This is mostly done without your knowledge, but since you have to maintain a destination address, this may be something you need to know.

Peter is quite right to point out the difference between “physical” and “logical” addresses. A bus address is not necessarily the same as a physical address. I admit to being lax with this terminology, because in my career I have never encountered a system where the two were not identical.

In my OP’s case, I want to perform DMA transfer using 1 IOCTL in User App for programming my device and I wonder if this is the good method because all PCI examples use WriteFile() method.

There’s almost no difference. It may not be obvious from above, but from a driver standpoint, ReadFile, WriteFile and DeviceIoControl are all virtually identical. The driver just gets an IRP, and the buffers are stored in the same places. In YOUR case, there is an additional consideration, because you need to specify a destination address. With DeviceIoControl, you have the opportunity to send two buffers. You can put the address in buffer 1, and the data in buffer 2. Without that, you have to invent some other scheme, and it doesn’t seem as natural.

And so on, if I use an IOCTL like in my example using Direct I/O type, is it possible to pass a structure instead of a buffer ?

It’s just a block of bytes. No one in the system knows or cares how it is interpreted. That’s between the driver and its client.

Having said that, let me offer a couple of just-in-case cautions. Do not pass a structure that contains pointers. It is tricky (although not impossible) to handle user-mode pointers in a kernel mode driver. As long as you follow the rules, the I/O system will make sure all addresses are kernel-mode addresses by the time the request gets to you, but as I said, it can’t know what’s inside your buffers.

Secondly, remember to consider field sizes and packing. Remember that your 64-bit driver can be called by 32-bit and 64-bit applications, and the structure packing rules are different. It is a pain in the butt for a driver to have to translate structures coming in from a 32-bit app. The best plan is to design your ioctl structures so they are independent of the app bittedness.