Hung system

I’ve got a KMDF (1.7) driver which is getting hung up. I’m running on
XP SP3. I’ve got 5 instances of a test app running; the first four are
all going the same thing: issuing an IOCTL which tells the driver to
wait for a specific (hardware) event to occur. The requests get moved
to a manual queue, and (in this case) all should get completed when the
event occurs.

The 5th instance of the test app is actually causing the event to occur.
When that happens, the machine hangs; no mouse movement, no keyboard,
nothing. I can break into the debugger; the output of !running is at
the end of this message.

Processor 0 is trying to enqueue the next wait request (when a wait is
satisfied, the app just turns around and issues another one). All
requests get preprocessed in an EvtIoInCallerContext callback; it’s hung
up trying to enqueue the request.

Processor 1 is in a DPC (fired from the ISR, which was triggered by app
#5), going through the list of queued wait requests, trying to complete
each one.

I’ve got both verifier and wdfverifier turned on, and neither one has
complained. It appears I’ve got a deadlock, but I don’t really see why
these two things are stepping on each other. Can someone please fill in
the missing gaps in my understanding?

TIA,

–mkj

0: kd> !running -it

System Processors 3 (affinity mask)
Idle Processors 0

Prcbs Current Next
0 ffdff120 8623a540 …

ChildEBP RetAddr
ed9999b4 80545191 nt!RtlpBreakWithStatusInstruction
ed9999b4 806e785e nt!KeUpdateSystemTime+0x175
ed999a38 ba3a0356 hal!KfAcquireSpinLock+0x2e
ed999a60 ba3a3e8f wdf01000!FxVerifierLock::Lock+0x113
ed999a78 ba3ae502 wdf01000!FxCallbackSpinLock::Lock+0x75
ed999a8c ba3af924 wdf01000!FxIoQueueIoDeviceControl::Invoke+0x1e
ed999abc ba3b1fb8 wdf01000!FxIoQueue::DispatchRequestToDriver+0x445
ed999adc ba3b3929 wdf01000!FxIoQueue::DispatchEvents+0x4e9
ed999af8 ba3b4b4b wdf01000!FxIoQueue::QueueRequestFromForward+0x1e3
ed999b1c ba396972 wdf01000!FxPkgIo::EnqueueRequest+0x21d
ed999b30 ba3ef50a wdf01000!imp_WdfDeviceEnqueueRequest+0x49
ed999b44 ba3ef350 vmebus5!WdfDeviceEnqueueRequest+0x1a
[c:\ddk\6001.18002\inc\wdf\kmdf\1.7\wdfdevice.h @ 3181]
ed999ba0 ba3b472f vmebus5!EvtIoInCallerContext+0x550
[c:\clients\xembedded\src\trunk\vmebus5\deviceio.cpp @ 384]
ed999bc8 ba3a3665 wdf01000!FxPkgIo::Dispatch+0x249
ed999bd4 804ef19f wdf01000!FxDevice::Dispatch+0x7f
ed999be4 80659128 nt!IopfCallDriver+0x31
ed999c08 80663b28 nt!IovCallDriver+0xa0
ed999c1c 804ef19f nt!ViDriverDispatchGeneric+0x2a
ed999c2c 80659128 nt!IopfCallDriver+0x31
ed999c50 8057f98e nt!IovCallDriver+0xa0

1 f7717120 8627b020 86a10818 …

ChildEBP RetAddr
f78b6e38 ba3a07fb hal!KfReleaseSpinLock+0x1a
f78b6e60 ba3af27b wdf01000!FxVerifierLock::Unlock+0x403
f78b6e84 ba3ae407 wdf01000!FxIoQueue::PeekRequest+0x145
f78b6eb4 ba3f7916 wdf01000!imp_WdfIoQueueFindRequest+0xe8
f78b6ed4 ba3f769d vmebus5!WdfIoQueueFindRequest+0x26
[c:\ddk\6001.18002\inc\wdf\kmdf\1.7\wdfio.h @ 653]
f78b6f18 ba3f7461 vmebus5!vmeintWaitDequeueRequest+0x4d
[c:\clients\xembedded\src\trunk\vmebus5\vmeint.cpp @ 1538]
f78b6fa4 ba3aaf60 vmebus5!vmeintWaitComplete+0x3b1
[c:\clients\xembedded\src\trunk\vmebus5\vmeint.cpp @ 1436]
f78b6fb8 ba3aaf9e wdf01000!FxDpc::DpcHandler+0x36
f78b6fcc 80545eef wdf01000!FxDpc::FxDpcThunk+0x16
f78b6ff4 80545a5b nt!KiRetireDpcList+0x61
f78b6ff8 ed730c9c nt!KiDispatchInterrupt+0x2b
WARNING: Frame IP not in any known module. Following frames may be wrong.
80545a5b 00000000 0xed730c9c


//
// Michael K. Jones
// Stone Hill Consulting, LLC
// http://www.stonehill.com
//_______________________________________________

Any chance you have an infinite loop running on processor 1? The fact that
you’re in Unlock on that processor makes me think that maybe you’re looping
over and over and we’re just looking at some random state (typically tough
to get stuck in an Unlock routine!).

I’d put a breakpoint in your driver where WdfIoQueueFindRequest returns in
the DPC, see if it returns, and then step through the code if it does. If it
doesn’t return then we have another issue.

-scott


Scott Noone
Consulting Associate and Chief System Problem Analyst
OSR Open Systems Resources, Inc.
http://www.osronline.com

“Michael Jones” wrote in message news:xxxxx@ntdev…

I’ve got a KMDF (1.7) driver which is getting hung up. I’m running on
XP SP3. I’ve got 5 instances of a test app running; the first four are
all going the same thing: issuing an IOCTL which tells the driver to
wait for a specific (hardware) event to occur. The requests get moved
to a manual queue, and (in this case) all should get completed when the
event occurs.

The 5th instance of the test app is actually causing the event to occur.
When that happens, the machine hangs; no mouse movement, no keyboard,
nothing. I can break into the debugger; the output of !running is at
the end of this message.

Processor 0 is trying to enqueue the next wait request (when a wait is
satisfied, the app just turns around and issues another one). All
requests get preprocessed in an EvtIoInCallerContext callback; it’s hung
up trying to enqueue the request.

Processor 1 is in a DPC (fired from the ISR, which was triggered by app
#5), going through the list of queued wait requests, trying to complete
each one.

I’ve got both verifier and wdfverifier turned on, and neither one has
complained. It appears I’ve got a deadlock, but I don’t really see why
these two things are stepping on each other. Can someone please fill in
the missing gaps in my understanding?

TIA,

–mkj

0: kd> !running -it

System Processors 3 (affinity mask)
Idle Processors 0

Prcbs Current Next
0 ffdff120 8623a540 …

ChildEBP RetAddr
ed9999b4 80545191 nt!RtlpBreakWithStatusInstruction
ed9999b4 806e785e nt!KeUpdateSystemTime+0x175
ed999a38 ba3a0356 hal!KfAcquireSpinLock+0x2e
ed999a60 ba3a3e8f wdf01000!FxVerifierLock::Lock+0x113
ed999a78 ba3ae502 wdf01000!FxCallbackSpinLock::Lock+0x75
ed999a8c ba3af924 wdf01000!FxIoQueueIoDeviceControl::Invoke+0x1e
ed999abc ba3b1fb8 wdf01000!FxIoQueue::DispatchRequestToDriver+0x445
ed999adc ba3b3929 wdf01000!FxIoQueue::DispatchEvents+0x4e9
ed999af8 ba3b4b4b wdf01000!FxIoQueue::QueueRequestFromForward+0x1e3
ed999b1c ba396972 wdf01000!FxPkgIo::EnqueueRequest+0x21d
ed999b30 ba3ef50a wdf01000!imp_WdfDeviceEnqueueRequest+0x49
ed999b44 ba3ef350 vmebus5!WdfDeviceEnqueueRequest+0x1a
[c:\ddk\6001.18002\inc\wdf\kmdf\1.7\wdfdevice.h @ 3181]
ed999ba0 ba3b472f vmebus5!EvtIoInCallerContext+0x550
[c:\clients\xembedded\src\trunk\vmebus5\deviceio.cpp @ 384]
ed999bc8 ba3a3665 wdf01000!FxPkgIo::Dispatch+0x249
ed999bd4 804ef19f wdf01000!FxDevice::Dispatch+0x7f
ed999be4 80659128 nt!IopfCallDriver+0x31
ed999c08 80663b28 nt!IovCallDriver+0xa0
ed999c1c 804ef19f nt!ViDriverDispatchGeneric+0x2a
ed999c2c 80659128 nt!IopfCallDriver+0x31
ed999c50 8057f98e nt!IovCallDriver+0xa0

1 f7717120 8627b020 86a10818 …

ChildEBP RetAddr
f78b6e38 ba3a07fb hal!KfReleaseSpinLock+0x1a
f78b6e60 ba3af27b wdf01000!FxVerifierLock::Unlock+0x403
f78b6e84 ba3ae407 wdf01000!FxIoQueue::PeekRequest+0x145
f78b6eb4 ba3f7916 wdf01000!imp_WdfIoQueueFindRequest+0xe8
f78b6ed4 ba3f769d vmebus5!WdfIoQueueFindRequest+0x26
[c:\ddk\6001.18002\inc\wdf\kmdf\1.7\wdfio.h @ 653]
f78b6f18 ba3f7461 vmebus5!vmeintWaitDequeueRequest+0x4d
[c:\clients\xembedded\src\trunk\vmebus5\vmeint.cpp @ 1538]
f78b6fa4 ba3aaf60 vmebus5!vmeintWaitComplete+0x3b1
[c:\clients\xembedded\src\trunk\vmebus5\vmeint.cpp @ 1436]
f78b6fb8 ba3aaf9e wdf01000!FxDpc::DpcHandler+0x36
f78b6fcc 80545eef wdf01000!FxDpc::FxDpcThunk+0x16
f78b6ff4 80545a5b nt!KiRetireDpcList+0x61
f78b6ff8 ed730c9c nt!KiDispatchInterrupt+0x2b
WARNING: Frame IP not in any known module. Following frames may be wrong.
80545a5b 00000000 0xed730c9c


//
// Michael K. Jones
// Stone Hill Consulting, LLC
// http://www.stonehill.com
//_______________________________________________

I feel sheepish; I should have thought to check that. I guess I haven’t
quite earned all of those grey hairs in my beard :slight_smile:

When I reproduced this three times in a row with essentially the same
stack, I leapt to the conclusion that there must be some KMDF locking
thing that I’m not doing (or doing wrong).

I will check this and report back; thanks for the pointer!

Cheers,

–mkj

On 8/29/2011 10:25 AM, Scott Noone wrote:

Any chance you have an infinite loop running on processor 1? The fact
that you’re in Unlock on that processor makes me think that maybe you’re
looping over and over and we’re just looking at some random state
(typically tough to get stuck in an Unlock routine!).

I’d put a breakpoint in your driver where WdfIoQueueFindRequest returns
in the DPC, see if it returns, and then step through the code if it
does. If it doesn’t return then we have another issue.

-scott


//
// Michael K. Jones
// Stone Hill Consulting, LLC
// http://www.stonehill.com
//_______________________________________________

Yep, that was it–I had bollocksed up the loop which searched for and
removed the request from my manual queue. End result was an infinite
loop under some circumstances.

I wish I could install the SNoone-verifier on all my systems :slight_smile:

Thanks a bunch!

–mkj

On 8/29/2011 10:25 AM, Scott Noone wrote:

Any chance you have an infinite loop running on processor 1? The fact
that you’re in Unlock on that processor makes me think that maybe you’re
looping over and over and we’re just looking at some random state
(typically tough to get stuck in an Unlock routine!).

I’d put a breakpoint in your driver where WdfIoQueueFindRequest returns
in the DPC, see if it returns, and then step through the code if it
does. If it doesn’t return then we have another issue.

-scott


//
// Michael K. Jones
// Stone Hill Consulting, LLC
// http://www.stonehill.com
//_______________________________________________

Nice! Glad to hear that was it.

-scott


Scott Noone
Consulting Associate and Chief System Problem Analyst
OSR Open Systems Resources, Inc.
http://www.osronline.com

“Michael Jones” wrote in message news:xxxxx@ntdev…

Yep, that was it–I had bollocksed up the loop which searched for and
removed the request from my manual queue. End result was an infinite
loop under some circumstances.

I wish I could install the SNoone-verifier on all my systems :slight_smile:

Thanks a bunch!

–mkj

On 8/29/2011 10:25 AM, Scott Noone wrote:

Any chance you have an infinite loop running on processor 1? The fact
that you’re in Unlock on that processor makes me think that maybe you’re
looping over and over and we’re just looking at some random state
(typically tough to get stuck in an Unlock routine!).

I’d put a breakpoint in your driver where WdfIoQueueFindRequest returns
in the DPC, see if it returns, and then step through the code if it
does. If it doesn’t return then we have another issue.

-scott


//
// Michael K. Jones
// Stone Hill Consulting, LLC
// http://www.stonehill.com
//_______________________________________________