Implement DMA in packet-mode in a PCI/VME Bridge

Thank you Peter for ypue reply. You are right about common buffer confusioning, and I will follow your suggestion about the s/g list point and take you informed if I have any further question.

Just one point: the start address for starting DMA on both PCI bus and VME bus of the Device must be programmed by user app. For example, for a Write access (PCI to VME direction), I have to program a register for the PCI Address and one register for the VME address. How the s/g list base address is programmed in this case ? In other words, what is the value of the first s/g list element ?

Just one point: the start address for starting DMA on both PCI bus and VME bus of the Device must be programmed by user app.

I’m not sure I understand: The user app calls (for example) WriteFile, providing a pointer to the data buffer that they want to write, and the length of that buffer. You then create a DMA Transaction and initialize it using the Request (InitializeDmaTransactionUsingRequest). Your EvtProgramDma Event Processing Callback gets called with the scatter/gather list (with exactly one element). You program your hardware.

Again… see this: https://docs.microsoft.com/en-us/windows-hardware/drivers/wdf/handling-i-o-requests-in-a-kmdf-driver-for-a-bus-master-dma-device which I think explains the process pretty clearly.

Peter

Just one point: the start address for starting DMA on both PCI bus and VME bus of the Device must be programmed by user app.

No. The programming of the DMA hardware must be entirely under the control of the driver. User mode code is naturally insecure; it’s way too easy for another app to interfere, and cause your hardware to do transfers for immoral purposes. The address of the user’s buffer is passed in the IRP, and is validated by the I/O subsystem. The address on the device side should be passed to the driver as a parameter, where the driver can validate it.

Thank you Peter and Tim for your reply.
I try to understand : the s/g list provides one element sglist.Elements[0].Address that I have to use to program my device in EvtProgramDma, but I do not see the link between this address and the address of the user’s buffer passed through the IRP.
Could you explain please at which address correspond this sglist.Elements[0].Address ?

Hmmmm… I’m not sure what you don’t understand.

The first byte of the buffer to which the s/g list points is the first byte of the user data buffer, as passed by the user in WriteFile or ReadFile.

Peter

sglist.Elements[0].Address is the physical address of the page whose virtual address is in the IRP. Are you familiar with virtual memory and physical memory? None of the addresses you use in your code, either in user mode or kernel mode, are actually the addresses of the bytes in memory. Those are virtual addresses, and they have to be passed through the page tables in order to get the address that goes out on the memory bus.

@Peter,
Ok so if I use a Writefile(handle, buf, bufsize, NULL, NULL ), or an IOCTL call like deviceIoControl( handle, IOCTL_WRITE_DMA, NULL, 0, buf, bufsize, NULL, NULL ), s/g list point to the first byte of “buf”, correct ?
@Tim,
Yes, I have a few knowlegde of the Virtual/Physical memory adresses concept and I read recently the Microsoft documentation about contiguous and non-contiguous block of memory associated to the s/g list concept because in fact it was new for me, you are right.
And so, if I have to program my 32-bit only device, I have to use sglist.Element[0].Address.LowPart that corresponds to the physical address translated by the framework, correct ?

Yes.

It would probably be most helpful to not refer to the contents of sglist.Element[0].Address as the “physical address” of the users data buffer. Because it may be or it may not be. It is the device bus logical address of the users data buffer, suitable for use with DMA. In the OP’s case, it almost certainly will not be the user data buffer physical address.

Peter

Thank you Peter for your reply.
And so on, if I use an IOCTL like in my example using Direct I/O type, is it possible to pass a structure instead of a buffer ? In this case, if the device bus logical address is the first element of this structure, does it mean that sglist.Element[0].Address point to this element ?
In my OP’s case, I want to perform DMA transfer using 1 IOCTL in User App for programming my device and I wonder if this is the good method because
all PCI examples use WriteFile() method.

And so, if I have to program my 32-bit only device, I have to use sglist.Element[0].Address.LowPart that corresponds to the physical address translated by the framework, correct ?

You’ve raised a couple of brand-new issues here. Is this a very old device? Because any modern hardware designer who creates a PCIe device that is limited to 32-bit addressing is guilty of malpractice. Windows has had 64-bit physical addresses since the very beginning, clear back in the 20th Century. ALL current PCIe IP blocks supports 64-bit physical addresses. There’s no excuse.

When your DMA is limited to 32 bits, the system has to take an extra step. You can’t control where the user’s buffer lives, and since modern systems often have 16GB or 32GB or more of RAM, the average user-mode buffer these days is statistically going to be above the 4GB mark. In that case, the operating system has to allocate special space below the physical 4GB limit. These are called “bounce buffers”. When the user submits a request, the I/O system will copy his buffer into a “bounce buffer” before calling your EvtProgramDma callback. There are a limited number of “bounce buffers”, which means your user request might be chopped into several pieces, with each piece getting another call to EvtProgramDma. This is mostly done without your knowledge, but since you have to maintain a destination address, this may be something you need to know.

Peter is quite right to point out the difference between “physical” and “logical” addresses. A bus address is not necessarily the same as a physical address. I admit to being lax with this terminology, because in my career I have never encountered a system where the two were not identical.

In my OP’s case, I want to perform DMA transfer using 1 IOCTL in User App for programming my device and I wonder if this is the good method because all PCI examples use WriteFile() method.

There’s almost no difference. It may not be obvious from above, but from a driver standpoint, ReadFile, WriteFile and DeviceIoControl are all virtually identical. The driver just gets an IRP, and the buffers are stored in the same places. In YOUR case, there is an additional consideration, because you need to specify a destination address. With DeviceIoControl, you have the opportunity to send two buffers. You can put the address in buffer 1, and the data in buffer 2. Without that, you have to invent some other scheme, and it doesn’t seem as natural.

And so on, if I use an IOCTL like in my example using Direct I/O type, is it possible to pass a structure instead of a buffer ?

It’s just a block of bytes. No one in the system knows or cares how it is interpreted. That’s between the driver and its client.

Having said that, let me offer a couple of just-in-case cautions. Do not pass a structure that contains pointers. It is tricky (although not impossible) to handle user-mode pointers in a kernel mode driver. As long as you follow the rules, the I/O system will make sure all addresses are kernel-mode addresses by the time the request gets to you, but as I said, it can’t know what’s inside your buffers.

Secondly, remember to consider field sizes and packing. Remember that your 64-bit driver can be called by 32-bit and 64-bit applications, and the structure packing rules are different. It is a pain in the butt for a driver to have to translate structures coming in from a 32-bit app. The best plan is to design your ioctl structures so they are independent of the app bittedness.

Thank you Tim for your reply and your explnations.
I am just a little bit confuses about the fact that I can put the address in buffer 1, and the data in buffer 2. So, what buffer is used by the sg_list parameter in DMA ?

You need to read more about IRPs. DeviceIoControl has two buffers. When you’re using direct I/O, as you would here, the first buffer is copied into and/or/out of kernel memory. The second buffer is mapped into kernel memory. That’s the one you will pass to WdfTransactionInitializeUsingRequest.

Thank you very much Tim.

I have an other question about the end of DMA: my device triggers a PCI interrupt if the DMA ends normally or if an error on both PCI or VME bus occurs during a DMA transfer.
If the DMA ends due to an error, I wonder if I deal with this error in my ISR routine or in my DPC routine after calling WdfDmaTransactionComplete ? Because in this case, I have to stop at once the DMA transaction and cancel the DMA I/O Request associated without the Framework schedules another DMA transfer.

You do it in your DPC.

Peter

Thank you Peter.

I have a WDF VIOLATION error due to a bugcheck with an Arg = 0x00000003 = Windows Driver Framework Verifier has encountered a fatal error.
in particular, an I/O request has been completed, but a framework request object cannot be deleted because there are outstanding references to the input buffer or the output buffer, or both. The driver’s IFR will include details on the outstanding references.

I performed a check of the log dump file and the last lines related to buffer in IOCTL says that:
35: FxRequest::CompleteInternal - WDFREQUEST 0x0000567C11539B38, PIRP 0xFFFF8F8FC58E6D80, Major Function 0xe, completed with outstanding references on WDFMEMORY 0x0000567C11539A61 or 0x0000567C11539A51 retrieved from this request
36: FxRequest::CompleteInternal - WDFMEMORY 0x0000567C11539A61, buffer FFFF8F8FC617AFE0, PMDL 0000000000000000, length 20 bytes
37: FxRequest::CompleteInternal - IOCTL output WDFMEMORY 0x0000567C11539A51, buffer FFFFE481CAE12010, PMDL FFFFA983EE8C6750, length 4096 bytes
---- end of log ----

I do not see any particular error but I identify in the code that the WDF method that causes the bugcheck is WdfTransactionExecute().
Did you have any suggestion to help me to debug the driver ? I am a newbie with the WinDbg debugger. Thanks.

Like the log says, a WDFMEMORY object has a reference when the Request is completed.

How are your input and output buffers for this Request created? Do you build them? Are you make a copy of either of those buffers using a WDFMEMORY object?

Peter