System freeze caused when aborting multiple asynchronous transfers

Francois_Dimitri · September 17, 2015, 12:19pm

I’m encountering system freeze on my WDF USB driver which seem to happen as a result of aborting multiple asynchronous requests. Can anyone give me ideas on possible causes. It seems a race condition issue but not clear to me why. It does not immediately happen after an abort but rather only after the succeeding sending of succeeding asynchronous requests. Although most like an effect of the aborting because if i dont abort ( just wait for all succeeding to complete) before retransmitting 16 asynch requests then there is no issue.

Test application
My test application sends 16 asynchronous write transfers to the driver using DeviceIoControl(WRITE) with overlapped parameter. If 1 completes, it sends another one so there is always 16 pending requests. then when user clicks on STOP button, it will call DeviceIoControl(ABORT) which will cancel all 16 requests.

Driver (IOCTL WRITE)
The DeviceIoControl WRITE simply forwards the WDF request to a power-managed WDF IO Queue with synchronous dispatching to a function called WriteHandler. WriteHandler.

Driver (WriteHandler)
WriteHandler creates 2 WDF requests for each request received from the WDF IO Queue. The 2 WDF requests sends the data in stages. Before sending down the stack, it inserts the 2 WDF requests to a WDF collection for cancellation purposes. Once the 2 request completes, it removes it from the collection then deletes it with WDFObjectDelete before completing the original request received from the queue.

Driver (IOCTL ABORT)
The DeviceIOControl ABORT performs 3 things. First, it purges the queued requests in the WDF IO Queue asynchronously using WdfIoQueuePurge (which will call a completion routine that calls WdfIoQueueStart). Secondly, it cancels the 2 WDF request in the WDF colledtion. Thirdly, it sends some stop command to the device through another pipe.

I verified through the WDF Io Queue that the all requests are cancelled after the abort through the queue state. But after when I restart streaming the 16 asynchronous request again, the system freezes. The system freeze always happen after queuing all 16 requests to the queue immediately after a previous aborting.

Francois_Dimitri · September 17, 2015, 12:34pm

Below is my handling of WDF collection for the 2 subrequests created for each request dispatched from the WDF Io Queue. Is there a race condition with my handling of collections?

Inserting (called by writehandler)
wdfspinlockacquire
wdfobjectreference
Wdfcollectionadd
Wdfspinlockrelease

Removing (called by writehandler completion routine)
Wdfspinlockacquire
Find request in collection
Wdfcollectionremove
Wdfobjectdereference
Wdfspinlockrelease

Canceling (called by ioctl abort)
Wdfspinlockacquire
For each request in collection
Wdfobjectreference
Wdfrequestcancelsentrequest
Wdfobjectdereference
Wdfspinlockrelease

Tim_Roberts · September 17, 2015, 12:55pm

xxxxx@yahoo.com wrote:

Below is my handling of WDF collection for the 2 subrequests created for each request dispatched from the WDF Io Queue. Is there a race condition with my handling of collections?

Do you have a completion routine for the requests you submitted?
Remember that you’ll get a completion callback after your cancel the
requests. Is it possible you have an interlock issue in the completion
routine?

Have you attached a debugger to see what state things are in? If you
see all of your processors waiting for your spinlock, that would be a
big clue.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Francois_Dimitri · September 17, 2015, 1:16pm

Yes, the writehandler completion routine is called. When the request is cancelled, the completion routine indicates a failed status. Since i have 2 wdf request for each dispatched request, i only complete the main request once the there is only 1 left in the collection (meaning 1 of the 2 requests already cancelled or completed).

I havent setup remote debugging yet. Will do that. But is my approach correct? Is there a more efficient way of handling the same thing?

Tim_Roberts · September 17, 2015, 1:26pm

xxxxx@yahoo.com wrote:

Yes, the writehandler completion routine is called. When the request is cancelled, the completion routine indicates a failed status. Since i have 2 wdf request for each dispatched request, i only complete the main request once the there is only 1 left in the collection (meaning 1 of the 2 requests already cancelled or completed).

I havent setup remote debugging yet. Will do that. But is my approach correct? Is there a more efficient way of handling the same thing?

Efficiency is not really the driving force here. I can think of other
implementations, but I couldn’t argue that they were better. For
example, you could store the two created requests in the context
structure of the original request. That way, you’d eliminate the need
for the collection, and rely on the I/O queue. However, you’d still
have interlock issues.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.