I need to know what I am doing wrong. We have a legacy driver that takes IOCTL’s request and performs DMA transfers on a single frame.
An optimization that was asked for was a quad frame DMA transfers. The hardware supports this.
After some help from OSR group, I got that running. Now it seems I have a memory leak in the 4 queues that handle the video DMA transfers.
I take the original request from the IOCTL, and create 4 local driver allocated requests.
My allocated requests have an AllocRequest boolean in them, so I can tell the request was allocated.
I reference count the allocated request and I know I am not losing track on any allocate requests.
Normally to complete the request, WdfRequestCompletexxxx is called, but the allocated requests are special. You must call WdfObjectDelete.
This is how I complete the request. This is how I got it running and was able to shut down normally.
StopAcknowledge and CancelSendRequest allow the driver to shut down, on power down.
The need to call those 2 functions, maybe a symptom of the problem.
One of my thoughts is since it does not call a completion routine, it just calls WdfObjectDelete.
Maybe wdf does not send some sort of message to remove the request from the queue.
What the memory leak is, all my allocated requests are stuck in the 4 queues even though I think I have removed them.
The basic system is, to remove the allocated resource from the DMA queue’s and place them into a collection to be freed later.
This is the delete request subroutine.
static VOID HdDeleteRequest(PHD_DEVICE_CONTEXT devContext, WDFREQUEST Request_)
{
PREQUEST_CONTEXT_DMA ReqContext_ = GetRequestContext_DMA( Request_ );
if ( ( ReqContext_ != NULL ) && ( ReqContext_->AllocRequest ) )
{
// Was made by WdfRequestCreate
//
WdfRequestStopAcknowledge( Request_, FALSE ); /* Don't requeue. */
WdfRequestCancelSentRequest( Request_ );
WdfObjectDelete( Request_ );
}
else
{
WdfRequestCompleteWithPriorityBoost( Request_, STATUS_SUCCESS, 0 );
}
}
This is the loop used to go through the delay request completion collection.
The queued data is moved to the collection to be disposed of later.
WDFREQUEST Request_ = (WDFREQUEST) WdfCollectionGetFirstItem( devContext->Collection_DpcDelayRequestComplete );
while(Request_)
{ WdfCollectionRemoveItem( devContext->Collection_DpcDelayRequestComplete, 0 );
HdDeleteRequest( devContext, Request_ );
Request_ = (WDFREQUEST) WdfCollectionGetFirstItem( devContext->Collection_DpcDelayRequestComplete );
}
This is, I think, where the problem of the allocated requests being stuck in the queue first shows up.
Normally in the FindRequest / RetrieveFoundRequest logic, you need to call WdfObjectDereference after the RetrieveFoundRequest.
But if I do, I BSOD. So I have to check for allocated requests after RevrievedFoundRequest and not call WdfObjectDerererence.
This is the subroutine to Dereference the request.
static VOID HdObjectDereference(WDFREQUEST Request)
{
PREQUEST_CONTEXT_DMA ReqContext_ = GetRequestContext_DMA(Request);
if ((ReqContext_ != NULL) && (ReqContext_->AllocRequest))
{
}
else
{
WdfObjectDereference(Request);
}
}
Code block with FindRequest / RetrieveFoundRequest.
Note calling the subroutine with the HdObjectDereference.
Added code to check for invalid ntStatus_ results, there are only successful removals.
WDFREQUEST Request_ = NULL;
WDF_REQUEST_PARAMETERS RequestParams_;
WDF_REQUEST_PARAMETERS_INIT(&RequestParams_);
ntStatus_ = WdfIoQueueFindRequest( Queue_, RequestPrev_, NULL, &RequestParams_, &Request_ );
ntStatus_ = WdfIoQueueRetrieveFoundRequest( Queue_, Request_, &Request );
HdObjectDereference(Request_);
The code that does this, come from this article, almost exactly.
To shut down / power down, I had to add a QueueIOStop routine.
The QueueIOStop routine is never called. If I don’t add the function, the system will not shut down.
This is the startup code that sets up the queues.
static VOID HdIODMAQueueIOStop( _In_ WDFQUEUE Queue, _In_ WDFREQUEST Request, _In_ ULONG ActionFlags )
{
}
This is how I build the DMA queues. The quad video frames are in DMA channels 32 throught 35.
// - - - - - - - - - - - - - - - - - - - - - - - - - - - - MANUAL QUEUE DMA
//
for (int Channel_ = 0; Channel_ < MAX_DMA_CHANNELS; Channel_++)
{
WDF_IO_QUEUE_CONFIG ioQueueConfig;
WDF_IO_QUEUE_CONFIG_INIT( &ioQueueConfig, WdfIoQueueDispatchManual );
ioQueueConfig.PowerManaged = WdfTrue;
if( Channel_ >= 32 && Channel_ <= 35 )
{
ioQueueConfig.EvtIoStop = HdIODMAQueueIOStop;
}
ntStatus_ = WdfIoQueueCreate ( Device, &ioQueueConfig, WDF_NO_OBJECT_ATTRIBUTES, &devContext_->QueueDMA[Channel_] );
}
This is how I create the delay request collection., on startup.
// - - - - - - - - - - - - - - - - - - - - - DPC DELAYED REQUEST COLLECTION
//
WDF_OBJECT_ATTRIBUTES CollectionAttributes;
WDF_OBJECT_ATTRIBUTES_INIT(&CollectionAttributes);
CollectionAttributes.ParentObject = Device;
ntStatus_ = WdfCollectionCreate( &CollectionAttributes, &devContext_->Collection_DpcDelayRequestComplete );
When the IOCTL is called, with the original request. The mdl list, contains the 4 video frames.
This is the loop that creates the 4 Request_'s from the original Request.
WDFIOTARGET Target_ = WdfDeviceGetIoTarget( devContext->WdfDevice );
for( int Quad_ = 0; NT_SUCCESS( ntStatus_ ) && ( Quad_ < 4 ); Quad_++ )
{
WDFREQUEST Request_ = NULL;
ntStatus_ = WdfRequestCreate( WDF_NO_OBJECT_ATTRIBUTES, Target_, &Request_ );
if( NT_SUCCESS( ntStatus_ ) )
{
PREQUEST_CONTEXT_DMA ReqContext_ = NULL;
{
WDF_OBJECT_ATTRIBUTES RequestAttributes_;
WDF_OBJECT_ATTRIBUTES_INIT_CONTEXT_TYPE( &RequestAttributes_, REQUEST_CONTEXT_DMA );
RequestAttributes_.EvtCleanupCallback = HdRequestCleanup;
ntStatus_ = WdfObjectAllocateContext( Request_, &RequestAttributes_, (PVOID *) &ReqContext_ );
}
if( NT_SUCCESS( ntStatus_ ) )
{
ReqContext_->AllocRequest = true;
ReqContext_->Channel = (UCHAR) pHdioDma_->Q[ Quad_ ].Channel;
....
WDFDMATRANSACTION DmaTransaction_ = HdAcquireDmaTransaction( devContext, ReqContext_->Channel );
if( DmaTransaction_ != NULL )
{
ReqContext_->DmaTransaction = DmaTransaction_;
PMDL mdl;
ntStatus_ = WdfRequestRetrieveOutputWdmMdl( Request, &mdl );
if( NT_SUCCESS( ntStatus_ ) )
{
PVOID virtualAddress = MmGetMdlVirtualAddress( mdl );
ULONG length = MmGetMdlByteCount( mdl );
ntStatus_ = WdfDmaTransactionInitialize( ReqContext_->DmaTransaction, HdEvtProgramDma_QuadFrame, WdfDmaDirectionWriteToDevice, mdl, virtualAddress, length );
if( NT_SUCCESS( ntStatus_ ) )
{
ntStatus_ = WdfDmaTransactionExecute( ReqContext_->DmaTransaction, (PVOID) ReqContext_ );
if( NT_SUCCESS( ntStatus_ ) )
{
KIRQL oldIrql_;
KeAcquireSpinLock( &devContext->DpcSpinLock, &oldIrql_ );
HdFrameBuf_OnAdd( devContext, Request_, ReqContext_ );
KeReleaseSpinLock( &devContext->DpcSpinLock, oldIrql_ );
}
}
}
}
}
}
}
Or you can create the 4 requests from an IRP. That does not change the behavior of the memory leak at all.
PIRP irp = IoAlloateIrp( IoGetRemainingStackSize()>>3, FALSE );
ntStatus_ = WdfRequestCreateFromIrp( WDF_NO_OBJECT_ATTRIBUTES, irp, TRUE, &Request_);
This is how the original request is terminated.
The original request is terminated, normally before the quad request DMA transfer occur.
ULONG CompletionInformation_ = 0;
WdfRequestCompleteWithInformation( Request, ntStatus_, CompletionInformation_ );
When running the driver normally, it performs excellently.
The driver does shut down when the system is turned off.
But if you try to replace the driver, release hardware is called and the driver hangs with the buffers in flight message.
If you execute !wdfkf.wdfqueue 0xxxxx in the debugger, you get a huge number of buffers with the message Request is marked cancelled.
EtIoStop may not have been called for this request.
kd> !wdfkd.wdfqueue 0x00002AF937218588
Treating handle as a KMDF handle!
Dumping WDFQUEUE 0x00002af937218588
=========================
Manual, Power-managed, PowerPurgeDriverNotified, Shut down, Cannot accept, Can dispatch, ExecutionLevelDispatch, SynchronizationScopeNone
Number of driver owned requests: 8212
Power transition in progress
Number of waiting requests: 0
Abort: list count of 200 entries exceeded, could be a corrupted list
Number of requests notified about power change: 201
!wdfrequest 0x00002af92f7fb648 !irp 0xffffd506bd90b550
(Request is marked cancelled, EvtIoStop may not have been called for this request)
...
Abort: list count of 20 entries exceeded
Use 0x10 flag to view unlimited number of requests
EvtIoStop: (0xfffff8048cb893d0) NewTekHD
0: kd> g
Thread 0xFFFFD506BD5A3080 is waiting for all inflight requests to be acknowledged on WDFQUEUE 0x00002AF937218588