DMA Scatter Gather List

I am currently trying to write a windows driver using KMDF that will program common buffer based scatter gather DMA for a PCIe endpoint device.

So far I have

  • created a DMA enabler and DMA transaction objects
  • set appropriate alignment requirement
  • Created Common buffers that will hold SGL elements in the correct format for the device
  • Written an EVT_WDF_PROGRAM_DMA function that copies the addresses and lengths from the provided scatter gather list into the common buffers
  • I start the DMA operation by calling WdfDmaTransactionInitializeUsingRequest and WdfDmaTransactionExecute
  • After the transfers are complete I receive an interrupt and call ISR and DPC functions
  • Using device IO control with METHOD_IN_DIRECT or METHOD_OUT_DIRECT for write and read DMA operations respectively

My current issue is that when WdfDmaTransactionExecute is called and then subsequently EVT_WDF_PROGRAM_DMA, the DMA operations do start and end appropriately, but the data that the device receives and sends is usually out of order at different offsets or repeated. I used MmapIoSpace to map the address given by the scatter gather list in EVT_WDF_PROGRAM_DMA and then printed the contents, confirming that the data in the SGL was not exactly the same as the output buffer that I had passed to the driver.

Any help regarding the cause of this behavior or any steps that I have missed that would cause this would be greatly appreciated.

So, the common buffers are only used to hold your scatter/gather list, in your hardware’s format? The most likely cause, of course, is that you have created your descriptors incorrectly. Have you examined the descriptors by hand after you create them and before you send them to your hardware?

The descriptors hold addresses and lengths which I get from the SGL, and have flags that set direction and location of the buffer(on endpoint or on host system). And since the operation is happening and the data is also correct just slightly out of order I assumed the descriptors should be correct. Also the locations where the data starts to misbehave is always the same.

Here is the code for how I set up my descriptors. In my implementation, the DMA hardware already has the control bit set, and so I only update my buffers with new elements and update the LIMIT register which specifies the last transfer in the buffer.

   PDMA_DESCRIPTOR srcList = (PDMA_DESCRIPTOR)qContext->channel.srcBufferVA;
   PDMA_DESCRIPTOR dstList = (PDMA_DESCRIPTOR)qContext->channel.dstBufferVA;

    for (unsigned int i = 0;i < sgList->NumberOfElements;i++,k++)
    {
        //Circle back to beginning of buffer if needed
        if (k > MAX_SGL - 1)k = 0;

        srcList[k].address.u.lower = sgList->Elements[i].Address.LowPart;
        srcList[k].address.u.upper = sgList->Elements[i].Address.HighPart;
       // KdPrint(("SGL Address: 0x%xl, Length: %d\n", sgList->Elements[i].Address.QuadPart, sgList->Elements[i].Length));
        srcList[k].control.u.byteCount = sgList->Elements[i].Length;
        srcList[k].control.u.flags = 0b00000000;
        srcList[k].control.u.userHandle = 1;
        srcList[k].control.u.userID = 1;

        //SETS EOP BIT ON LAST ELEMENT TO GENERATE INTERRUPT
        if (i == sgList->NumberOfElements - 1)
        {
            KdPrint(("EOP set\n"));
            srcList[k].control.u.flags |= 0b110;
        }

        dstList[k].address.u.lower = *(qContext->channel.address) + offset;
        dstList[k].address.u.upper = 0x0;
        dstList[k].control.u.byteCount = sgList->Elements[i].Length;
        dstList[k].control.u.flags = 0b11;
        dstList[k].control.u.userHandle = 1;

        offset += sgList->Elements->Length;
    }

Interesting. The typical DMA design has the source and destination addresses in different fields of a single descriptor. Do you really have separate lists? That seems like an error-prone design. Which DMA IP is this?

Are you checking that your transfer fits within the buffer? You do wraparound, but you aren’t checking whether you are overwriting a currently active transfer. Could a previous transfer be underway while you are filling out the next one?

This DMA IP was on the Xilinx Zynq MPSoC, and it uses different lists for source and destination addresses. Thanks for your help turns out there was a minor bug in my code which would not increment the correct destination addresses at times.

I want to thank you for following up. So often, we never get closure when an unsolved thread fades away.

1 Like