Quad DMA transfer

We are trying to optimize our DMA transfers by creating 4 independent DMA transfers.
A WDFREQUEST is passed in from an ioctl used from the function in DeviceAdd, WdfIoQueueCreate.
The idea is to create 4 different WDRREQUEST structures that is compatible
with WdfObjectAllocateContext, WdfDmaTansacationCreate, WdfDmaTransactionExecute
and the scatter / gather lists.

So, you’re talking about taking a single buffer from a single ioctl, and spreading it across four DMA transfers? Why would you do that? What’s the point? I would hope it is clear that transferring N bytes with 4 simultaneous transfers takes exactly the same amount of time as transferring N bytes with 1 transfer. The limitation is the bus.

Having said that, one you are in your EvtProgramDma function, you can divide up the duties however you want. You can subdivide the scatter/gather lists into separate parameter blocks and program 4 engines to act on them. When you’re completing each part of the transfer, the kernel neither knows nor cares how many DMA engines were involved in the process. When the last block completes, the transfer ends.

However, it’s stilly.

I don’t know why you would want to do this… it makes no sense on its face. But you must create multiple, separate, DMA Enablers for the parallel operations.

Peter

The idea is to try to optimize DMA transfers. This is what we are trying. We randomly get lost frames in DMA transfers. We have multiple video sources. The driver sorts each frame from the WDRREQUEST structures when adding to the DMA transfers, and discards frames sent by the application programmer that are too old or duplicates.

I am experiencing two problems. The scatter gather list NumberOfElements is always 1 in these transfers. The WdfObjectDelete() for the WDRREQUEST I made, crashes the driver. I am assuming I am not setting up WDRREQUEST properly.

I am using WdfRequestCreate() to make the request. Then WdfObjectAllocateContext() on that request for my instance data. After that WdfDmaTransactionCreate(). I tried using or copying the mdl list. Then WdfDmaTransactionInitialize() with the mdl list. After that the WdfDMATrasactionExecute() is called. The entire engine runs as expected at that point, with the exception of the NumberOfElements in the scatter gather list is always 1 and I get a crash calling WdfObjectDelete() on the request.

optimize DMA transfers

Artificially forcing multiple DMA transfers simultaneously, IF your device can really even do this, isn’t going to help.

We randomly get lost frames in DMA transfers

So, your current DMA scheme isn’t working and you want to make it MORE complex by using a unique design? With all due respect, I don’t think that’s the best possible plan.

using WdfRequestCreate() to make the request. Then WdfObjectAllocateContext() on that request for my instance data

Hmmm… you know that you could allocate the context as part of WdfRequestCreate, right? Using the WDF_OBJECT_ATTRIBUTES structure?

The scatter gather list NumberOfElements is always 1 in these transfers

Hmmm… very suspicious.

Does that 1 element map the entire requested I/O operation (read or write)? How are you setting up your DMA Enabler? What profile did you choose??

I get a crash calling WdfObjectDelete()

Have you enabled WDF Verifier? Looked at the !WDFLOGDUMP?

BTW, far better to save the Request someplace and then reuse it later (rather than create a new one every time).

Peter

Some background:
The hardware device has multiple independent channels and can simultaneously handle DMA requests for every channel.
The single large buffer supplied by the application represents DMA data for several channels
The hardware HAS to DMA each channel independently
The application wishes to interact with the driver with the large unified buffer representing multiple channels
The driver needs to glue the two together. :slight_smile:

@cstein OK… that’s reads like a bit of a different story from the one Mr. @Jamie_Finch was telling us… but it’s entirely reasonable.

I think a lot of us, me included, got stuck on Mr. @Jamie_Finch’s comment that initiating multiple parallel DMA transfers was somehow more optimal. Which, from the system’s standpoint at least, it is not.

Sounds like you have a reasonably challenging DMA device to deal with,

Peter

Thank you for your comments. With those comments I did get the 4 request to execute properly.
The issue I am having now, is I cannot shutdown / unload the driver. If you try to shut down
the system, it will hang on shutdown. I cannot find any entry into the driver on shutdown
that is being called. Something has the driver locked. Waiting for something to be released.
I get this message in WinDbg:
Thread 0xFFFFE00B213D1040 is waiting for all inflight requests to be acknowledged on WDFQUEUE 0x00001FF4DCD5D6B8.
I have 2 fragments of code below where I release the requests and I created the requests.
I am hoping the someone will spot something that I a missing, that is hanging the driver on
shutdown. Any hints on how to debug this would be greatly appreciated.

This previous problem was because the reference count was out of sync. So I don’t call
WdfObjectDereference( Request );
When I remove the requests I allocate from the active queue, I do not call dereference now.
I am hoping, some can spot something simple like this, for the shutdown problem.

This is how we complete the request.

WDFREQUEST Request_ = (WDFREQUEST) WdfCollectionGetFirstItem( devContext->Collection_DpcDelayRequestComplete );
while(Request_)
{
PREQUEST_CONTEXT_DMA ReqContext_ = GetRequestContext_DMA( Request_ );
WdfCollectionRemoveItem( devContext->Collection_DpcDelayRequestComplete, 0 );
if( ( ReqContext_ != NULL ) && ( ReqContext_->AllocRequest ) )
{
// Was made by WdfRequestCreate
//
WdfObjectDelete( Request_ );
HdFreeMdlList( ReqContext_->mdl );
ReqContext_->mdl = NULL;
}
else
{
WdfRequestCompleteWithPriorityBoost( Request_, STATUS_SUCCESS, IO_NEWTEK_INCREMENT );
}
Request_ = (WDFREQUEST) WdfCollectionGetFirstItem( devContext->Collection_DpcDelayRequestComplete );
}

This is how we create the 4 local requests.
IN WDFREQUEST Request, is passed in from driver.

WDFIOTARGET Target_ = WdfDeviceGetIoTarget( devContext->WdfDevice );
for( int Quad_ = 0; NT_SUCCESS( ntStatus_ ) && ( Quad_ < 4 ); Quad_++ )
{
WDFREQUEST Request_;
ntStatus_ = WdfRequestCreate( WDF_NO_OBJECT_ATTRIBUTES, Target_, &Request_ );

if( NT_SUCCESS( ntStatus_ ) )
{
    PREQUEST_CONTEXT_DMA ReqContext_ = NULL;
    {
        WDF_OBJECT_ATTRIBUTES RequestAttributes_;
        WDF_OBJECT_ATTRIBUTES_INIT_CONTEXT_TYPE( &RequestAttributes_, REQUEST_CONTEXT_DMA );

        // Setup a Cleanup callback for this request to release the resources we
        // allocate, no mater how the request completes. Without it, a close or
        // termination could leak resources.
        //
        RequestAttributes_.EvtCleanupCallback = HdRequestCleanup;
        ntStatus_ = WdfObjectAllocateContext( Request_, &RequestAttributes_, (PVOID *) &ReqContext_ );
    }

    // We *NEED* a request context to hold the resources we allocate.
    //
    if( NT_SUCCESS( ntStatus_ ) )
    {
        ReqContext_->AllocRequest  = true;
        ReqContext_->DevContext    = devContext;

        WDFDMATRANSACTION DmaTransaction_ = HdAcquireDmaTransaction( devContext, ReqContext_->Channel );
        if( DmaTransaction_ != NULL )
        {
            ReqContext_->DmaTransaction = DmaTransaction_;

            PMDL mdl;
            ntStatus_ = WdfRequestRetrieveOutputWdmMdl( Request, &mdl );
            if( NT_SUCCESS( ntStatus_ ) )
            {
                ReqContext_->mdl = HdCopyMdlList( mdl );
                if(ReqContext_->mdl == NULL )
                {
                    WdfObjectDelete( Request_ );
                    return STATUS_NO_MEMORY;
                }

                PVOID virtualAddress;
                ULONG length;

                virtualAddress = MmGetMdlVirtualAddress( ReqContext_->mdl );
                length         = MmGetMdlByteCount(      ReqContext_->mdl );

                ntStatus_ = WdfDmaTransactionInitialize( ReqContext_->DmaTransaction, HdEvtProgramDma, WdfDmaDirectionWriteToDevice, ReqContext_->mdl, virtualAddress, length );
                if( NT_SUCCESS( ntStatus_ ) )
                {
                    ntStatus_ = WdfDmaTransactionExecute( ReqContext_->DmaTransaction, (PVOID) ReqContext_ );
                }
            }
        }
    }
}

}

The address of the mdl list after the copy is the same address as the original mdl list.

Hmmmm… I have a lot of random observations, but I don’t know if they’re going to be helpful:

  • why are you getting the MDL and calling WdfDmaTransactionInitialize instead of WdfDmaTransactionInitializeUsingRequest?
  • Is there any chance of a race between the time you call WdfCollectionGetFirstItem and WdfCollectionRemoveItem? I don’t see a lock protecting this.
  • It is extremely, extremely, rare to ever call WdfDereferenceObject in a WDF driver. I think I’ve called it in one driver… ever. So…
  • are you handling cancel?
  • Have you used the WDFKD extensions to see what’s going on in your driver when you can’t unload it?
  • why do you need the Request cleanup callback? What resources will you leak?

It’s really hard to know what’s going on, or even make legit comments or suggestions, without doing a proper code review.

Peter