Call to DeviceIoControl() doesn't always return

I’m brand new to Windows and Windows Driver development so please forgive any oversights…

I’m working with a few other developers on an XP embedded design where developing a driver and application. We’ve been occasionally seeing our DeviceIoControl() calls to our driver never return, although all our debugging seems to show the driver doing everything it needs to complete the call. I’ve accidentally stumbled upon reproduceable sequence that causes the DeviceIoControl() call to always hang.

I’m including the WDF trace with my explaination of events as I can understand them…

My application thread sends a request to our driver via DeviceIoControl().
During this request, the driver sends messages to custom hardware that will eventually create an interrupt, causing the driver to send a response buffer up to the application message thread.

Here is the driver request fromt he application thread…
00002818 io 860 872 0 1458 07\31\2007-10:22:03:432 WDFDEVICE 0x7B2B23C8 !devobj 0x84DB1630 0x0000000e(IRP_MJ_DEVICE_CONTROL), IRP_MN 0, IRP 0x85FB0F48
00002819 core 860 872 0 1459 07\31\2007-10:22:03:432 Allocating FxRequest* 84ACBE78, WDFREQUEST 7B534180
00002820 io 860 872 0 1460 07\31\2007-10:22:03:432 Queuing WDFREQUEST 0x7B534180 on WDFQUEUE 0x7B2BC8D8
00002821 io 860 872 0 1461 07\31\2007-10:22:03:432 Thread 84C0C980 is processing WDFQUEUE 0x7B2BC8D8
00002822 io 860 872 0 1462 07\31\2007-10:22:03:432 Calling driver EvtIoDeviceControl for WDFREQUEST 0x7B534180
00002823 core 860 872 0 1463 07\31\2007-10:22:03:443 Enter: WDFREQUEST 0x7B534180
00002824 core 860 872 0 1464 07\31\2007-10:22:03:464 Enter: WDFREQUEST 0x7B534180, Information 0x00000000
00002825 core 860 872 0 1465 07\31\2007-10:22:03:464 Completing WDFREQUEST 0x7B534180, STATUS_SUCCESS
00002826 core 860 872 0 1466 07\31\2007-10:22:03:464 Completing WDFREQUEST 0x7B534180 for IRP 0x85FB0F48 with Information 0x0, STATUS_SUCCESS
00002827 io 860 872 0 1467 07\31\2007-10:22:03:464 Enter: WDFQUEUE 0x7B2BC8D8, WDFREQUEST 0x7B534180

Now the custom hardware interrupts the driver, which creates a DPC to handle it. The DPC sends a response buffer up to the application’s message thread, and receives a replacement response buffer.
00002828 core 860 872 0 1468 07\31\2007-10:22:03:474 Enter: WDFREQUEST 0x7B429E38
00002829 core 860 872 0 1469 07\31\2007-10:22:03:505 Completing WDFREQUEST 0x7B429E38, STATUS_SUCCESS
00002830 core 860 872 0 1470 07\31\2007-10:22:03:505 Completing WDFREQUEST 0x7B429E38 for IRP 0x85538F48 with Information 0x1, STATUS_SUCCESS
00002831 io 860 872 0 1471 07\31\2007-10:22:03:505 Enter: WDFQUEUE 0x7B25F448, WDFREQUEST 0x7B429E38
00002832 io 860 872 0 1472 07\31\2007-10:22:03:505 Presentation lock for WDFQUEUE 0x7B25F448 is already held, deferring to dpc or workitem
00002833 core 860 872 0 1473 07\31\2007-10:22:03:505 Free FxRequest* 84BD61C0 memory
00002834 io 860 872 0 1474 07\31\2007-10:22:03:505 Dispatching requests from DPC
00002835 io 860 872 0 1475 07\31\2007-10:22:03:505 Thread 84C0C980 is processing WDFQUEUE 0x7B25F448
00002836 io 860 872 0 1476 07\31\2007-10:22:03:505 Driver has configured WDFQUEUE 0x7B25F448 for WdfIoQueueDispatchManual
00002837 io 860 1480 0 1477 07\31\2007-10:22:03:505 WDFDEVICE 0x7B2B23C8 !devobj 0x84DB1630 0x0000000e(IRP_MJ_DEVICE_CONTROL), IRP_MN 0, IRP 0x856AAF48
00002838 core 860 1480 0 1478 07\31\2007-10:22:03:505 Allocating FxRequest* 84BD61C0, WDFREQUEST 7B429E38
00002839 io 860 1480 0 1479 07\31\2007-10:22:03:505 Queuing WDFREQUEST 0x7B429E38 on WDFQUEUE 0x7B2BC8D8
00002840 io 860 1480 0 1480 07\31\2007-10:22:03:505 Thread 845A7A90 is processing WDFQUEUE 0x7B2BC8D8
00002841 io 860 1480 0 1481 07\31\2007-10:22:03:505 Calling driver EvtIoDeviceControl for WDFREQUEST 0x7B429E38
00002842 core 860 1480 0 1482 07\31\2007-10:22:03:515 Enter: WDFREQUEST 0x7B429E38
00002843 io 860 1480 0 1483 07\31\2007-10:22:03:515 Queuing WDFREQUEST 0x7B429E38 on WDFQUEUE 0x7B25F448
00002844 io 860 1480 0 1484 07\31\2007-10:22:03:515 Presentation lock for WDFQUEUE 0x7B25F448 is already held, deferring to dpc or workitem
00002845 io 860 1480 0 1485 07\31\2007-10:22:03:515 Dispatching requests from DPC
00002846 io 860 1480 0 1486 07\31\2007-10:22:03:515 Thread 845A7A90 is processing WDFQUEUE 0x7B25F448
00002847 io 860 1480 0 1487 07\31\2007-10:22:03:515 Driver has configured WDFQUEUE 0x7B25F448 for WdfIoQueueDispatchManual
00002848 io 860 1480 0 1488 07\31\2007-10:22:03:515 WDFREQUEST 0x7B429E38 dispatched to driver
00002849 io 860 1480 0 1489 07\31\2007-10:22:03:515 No requests on WDFQUEUE 0x7B2BC8D8

Now there seems to be some cleanup from my original DeviceIoControl() call.
00002850 io 860 872 0 1490 07\31\2007-10:22:03:515 WDFREQUEST 0x7B534180 dispatched to driver
00002851 core 860 872 0 1491 07\31\2007-10:22:03:515 Free FxRequest* 84ACBE78 memory
00002852 io 860 872 0 1492 07\31\2007-10:22:03:515 No requests on WDFQUEUE 0x7B2BC8D8

After this, my application thread is stopped, waiting for DeviceIoControl() to return, although the driver and message threads are still active.

If I alter the system hardware to not respond with the interrupt, the DeviceIoControl() completes.

Obviously the interrupt and DPC play some part in the problem, but I’m not sure I what is actually going wrong.

TIA
BK

xxxxx@teleco.com wrote:

I’m brand new to Windows and Windows Driver development so please forgive any oversights…

I’m working with a few other developers on an XP embedded design where developing a driver and application. We’ve been occasionally seeing our DeviceIoControl() calls to our driver never return, although all our debugging seems to show the driver doing everything it needs to complete the call. I’ve accidentally stumbled upon reproduceable sequence that causes the DeviceIoControl() call to always hang.

I’m including the WDF trace with my explaination of events as I can understand them…

Is this a multiprocessor machine? The lines in your trace don’t seems
to be in the right order.

So, you get a WDFREQUEST in your EvtDeviceIoControl handler. You put
that request on a manual queue, fire off some hardware magic, but you do
NOT call WdfRequestComplete. Right?

Then, in your DPC, you pull the top request from the manual queue, fill
in some data, and finally call WdfRequestComplete. The app then starts
the whole process over again. Is that right?

That should work. It’s a classic inverted call scheme. But that
doesn’t seem to be what the trace describes. Can you post pieces of the
code?


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

I agree some better insight (such as source) is needed, but:

The trace order is all right (admittedly not easy to interpret)- but it doesn’t quite match an inverted call.

Some call compeltes in the dispatch routine, a previous inverted call completes (presumably from DPC if OP is correct), and a third (the “replacement buffer”) is received, forwarded to the manual queue, and that’s where things end.

A high level view of the trace is:

1st request (78534180)comes in
1st request dispatched to driver
1st request completed in EvtDeviceIoControl Information 0)- it is NOT forwarded to the manual queue, but completed in the dispatch routine.
(presumably our DPC occurs here, but why?)
**2nd request (7B329E38) (origin not clear from this trace, but it is completed after being presented from a manual queue which fits the description) is completed- Information 1
**2nd request is cleaned up
****3rd request received (presumably because 2nd is completed)- it happens to get the same WDFEQUEST handle as the second, because we use lookasides
****3rd request dispatched to EvtDeviceIoControl handler
****3nd request placed in manual queue
1st request cleaned up

No * = one call to EvtDeviceIoControl
Two * = DPC that completed some previous request
Four * = Second call to EvtDeviceIoControl

Note that this is a parallel execution of the DeviceIoControl handler (1st and 3rd are in it at “the same time”). So I’m wondering if there is any non-local state used in that routine? Such as one governing whether you have sent a message to the underlying device? Or perhaps there is some assumption that each WDFREQUEST will have a unique handle, and since this is the same as the one just completed, the synchronization “scheme” broke?

Anyway, according to the outlined scheme- that 3rd request should have caused something to happen that would eventually result in an interrupt from the device, and it appears that isn’t happening…

Or, since the 1st request was never queued, some error condition occurred of which OP is not aware (since the request completed STATUS_SUCCESS)? And we are seeing a “Dropped” case because of it?

A framework trace can only tell you so much- in this case it leaves lots of questions, but no clear answers [to me, anyway].

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Tim Roberts
Sent: Tuesday, July 31, 2007 3:09 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] Call to DeviceIoControl() doesn’t always return

xxxxx@teleco.com wrote:

I’m brand new to Windows and Windows Driver development so please forgive any oversights…

I’m working with a few other developers on an XP embedded design where developing a driver and application. We’ve been occasionally seeing our DeviceIoControl() calls to our driver never return, although all our debugging seems to show the driver doing everything it needs to complete the call. I’ve accidentally stumbled upon reproduceable sequence that causes the DeviceIoControl() call to always hang.

I’m including the WDF trace with my explaination of events as I can understand them…

Is this a multiprocessor machine? The lines in your trace don’t seems
to be in the right order.

So, you get a WDFREQUEST in your EvtDeviceIoControl handler. You put
that request on a manual queue, fire off some hardware magic, but you do
NOT call WdfRequestComplete. Right?

Then, in your DPC, you pull the top request from the manual queue, fill
in some data, and finally call WdfRequestComplete. The app then starts
the whole process over again. Is that right?

That should work. It’s a classic inverted call scheme. But that
doesn’t seem to be what the trace describes. Can you post pieces of the
code?


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

I started this reply before the last post, so some of this may be redundant.

We’re not running a multiprocessor system. We’re using an AMD LX800, so it doesn’t even have hyperthreading.

For the WDFREQUEST in question, that’s not quite the way it works. The handler gets the request from the command thread, fires off some magic hardware, and then completes the request.

VOID
DioEvtDeviceControl(WDFQUEUE Queue,
WDFREQUEST Request,
size_t OutputBufferLength,
size_t InputBufferLength,
ULONG IoControlCode)
{

case IOCTL_APOLLO_DSP_COMMAND:
APOLLO_DSP_COMMAND *command;

status = WdfRequestRetrieveInputBuffer(Request,
sizeof(APOLLO_DSP_COMMAND), (PVOID*)&command, &IOBufferLength);

if (NT_SUCCESS(status))
{
status = dsp_command(command->command_id,
command->param1,
command->param2,
command->dsp_command_data);

if (status == DSP_ERR_OK)
{
status = STATUS_SUCCESS;
}
else
{
tracef( -1, “Apollo DSP command error %d\n”, status);
status = STATUS_UNSUCCESSFUL;
}
}
else
{
tracef( -1, “Apollo Could not get DSP_COMMAND input buffer %x\n”,
status);
}
break;

//
// Set the information field to the number of bytes read/written
//
WdfRequestSetInformation(Request, len);

//
// Complete the request with “status”
//
WdfRequestComplete(Request, status);
}

Some time later, the magic hardware will respond with an interrupt. The DPC will take the top request off of the manual queue (which was sent down at application start), fill in some data, and call WdfrequestComplete for that request. The message thread will then send another request down to the manual queue. Here’s the response that’s called by the DPC

VOID AddToHostMessageQ( WDFDPC Dpc,
APOLLO_DRIVER_EVENT *data,
ULONG len)
{

//
// Get a pointer to our device extension
//
devContext = DioGetContextFromDevice(WdfDpcGetParentObject(Dpc));

//
// Might as well count 'em – Gives us something to do.
//
InterlockedIncrement(&devContext->InterruptsServed);

// get forwarded request
status = WdfIoQueueRetrieveNextRequest( devContext->AppQueue,
&req );

if(NT_SUCCESS(status))
{
#if DBG
//tracef( -1, “Got forwarded request\n”);
#endif
status = WdfRequestRetrieveOutputBuffer(req, 0, (PVOID*)&dataBuffer,
&length);

//tracef( -1, “Request length %d\n”, length);

if(NT_SUCCESS(status) && length > 0)
{
event = (APOLLO_DRIVER_EVENT *)dataBuffer;

tracef( -1, “DSP buffer number %d\n”, (ULONG)(event->type));

RtlCopyMemory(event, data, len);

tracef( -1, “Sending Event %02x\n”, (ULONG)event->type);

//
// Complete the request with success, pass back 1 in the information
// field.
//
WdfRequestCompleteWithInformation(req, STATUS_SUCCESS, 1);

This seems to work in general, but the specific hardware command that’s being sent causes the hardware to respond with an interrupt very quickly. From the trace, it would appear that the DPC is occuring before the the initial request IRP was released. The end result is that the command thread is stuck waiting for the request to complete.

While I was trying to debug this, I learned a few things. I was trying to see what the IRP list looked like for the command thread, since that thread was stuck, and noticed that there were a lot of pending IRP’s. I realized that this was because the command thread send the requests down to the driver for the manual queue before is starts the other threads.

The debugger won’t show more than 17 IRP’s, so I moved the requests to the message thread. After I did this, the command thread wouldn’t hang anymore.

I tried another test with the original application where the request for the manual queue were sent from the command thread as before. This time, instead of sending the request that hangs immediately, I had the hardware create driver events that would empty the manual queue of all request from the command thread. Since the message queue refill’s the manual queue as it recieves the completions, the manual queue is eventually filled with request from the message thread.

I tried the supect request again, and it didn’t hang. I verified that the DPC was interrupting the cleanup as before. My undestanding was that it didn’t matter what thread sent the original request. The message thread will recieve the completion for them all.

Again, TIA
BK