Our client reported a lockup during shutdown. After analysing traces
(can’t imagine using debugger here ;-)) the problem is obvious; our
driver waits for something which can’t happen. What isn’t obvious is who
causes it. I see following events there:
- our driver sends IDLE IRP
- IDLE callback routine is called
3 IDLE callback routine sends D2 IRP and waits for completion
- IDLE IRP completion routine is called with cancelled status
- IRP_MJ_POWER handler of our driver is called with D2 IRP passing it
down
- D2 IRP completion routine is called
- IDLE callback wait is satisfied and callback finishes processing
Point 4 can occur anythime between 2 - 6 during IDLE callback run. These
fast quad-core CPUs…
I’d say the OS USB driver which completes IDLE IRP during callback run
breaks the contract. I haven’t found explicitly said it in the docs but
it is unexpected (which is why our driver doesn’t handle it) and
unreasonable. I’d like to know if it this situation handled in WDF.
Doron?
Anyway, I don’t see how to handle this situation and don’t break the
rules. IDLE completion should repower the device but the device must not
be repowered during callback run (explicitly said). Driver can’t wait in
the completion routine. Should it plan an workitem, wait until callback
finishes and then try to repower the device? I’d bet it’d lead to
another lockup, now within OS driver. I’m almost sure because I have one
dump where #4 occured just after #7, sent D0 IRP and it was never
completed.
So I’d see it as an OS (XP SP3) bug. Do you agree?
BTW, the IDLE callback seems as a horrible design failure causing
neverending chain of problems.
Best regards,
Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]
Forgot to say: IDLE IRP isn’t cancelled by our driver. It is probably OS
driver who does it because of shutdown.
Best regards,
Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]
-----Original Message-----
From: Michal Vodicka
Sent: Tuesday, January 19, 2010 12:15 AM
To: ‘Windows System Software Devs Interest List’
Subject: USB selective suspend problem during shutdown
Our client reported a lockup during shutdown. After analysing
traces (can’t imagine using debugger here ;-)) the problem is
obvious; our driver waits for something which can’t happen.
What isn’t obvious is who causes it. I see following events there:
- our driver sends IDLE IRP
- IDLE callback routine is called
3 IDLE callback routine sends D2 IRP and waits for completion
- IDLE IRP completion routine is called with cancelled status
- IRP_MJ_POWER handler of our driver is called with D2 IRP
passing it down
- D2 IRP completion routine is called
- IDLE callback wait is satisfied and callback finishes processing
Point 4 can occur anythime between 2 - 6 during IDLE callback
run. These fast quad-core CPUs…
I’d say the OS USB driver which completes IDLE IRP during
callback run breaks the contract. I haven’t found explicitly
said it in the docs but it is unexpected (which is why our
driver doesn’t handle it) and unreasonable. I’d like to know
if it this situation handled in WDF. Doron?
Anyway, I don’t see how to handle this situation and don’t
break the rules. IDLE completion should repower the device
but the device must not be repowered during callback run
(explicitly said). Driver can’t wait in the completion
routine. Should it plan an workitem, wait until callback
finishes and then try to repower the device? I’d bet it’d
lead to another lockup, now within OS driver. I’m almost sure
because I have one dump where #4 occured just after #7, sent
D0 IRP and it was never completed.
So I’d see it as an OS (XP SP3) bug. Do you agree?
BTW, the IDLE callback seems as a horrible design failure
causing neverending chain of problems.
Best regards,
Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]
The solution may be simple, Michal: disable selective suspend.
That’s in fact what some of our customers have been doing while faced with similar issues. Personally, I’m waking up suspended devices upon the PowerSystemShutdown IRP_MN_QUERY_POWER on XP to avoid these races.
Just out of curiosity: does it happen with Intel 5 series / 3400 series motherboards? These have exhibited a bunch of USB shutdown issues for us, both with inbox and our own drivers. USB IRP cancellation logic does not seem to work reliably at shutdown on these platforms.
Regards,
Ilya Faenson
Rockville, MD USA
Ilya,
sure, it is ultimate solution. Unfortunately unacceptable for most of
our customers. We already suggested it as one possibility (the
customer’s tech people found it themselves). The second is to open the
support case with MS and ask for a QFE. I hope they’ll do the second.
I wake devices in PowerStateCallback(…, PO_CB_SYSTEM_STATE_LOCK, …)
and till now it was usually (if I omit hung D0 and D2 IRPs sometimes)
sufficient. In this case the problem occurs just before callback run.
I’m not sure why. IRP_MN_QUERY_POWER isn’t even called for shutdown
there, only IRP_MN_SET_POWER.
I don’t know what motheboard it is, I only analyze data from customer. I
only know there are 4 cores which is why these races occur. Of course,
both parallel tasks which should be serialized run on different cores.
All the evil is coming from IDLE callback and I’m affraid it is
impossible to solve them completely. The wrong design leads to
requirements which can’t be fulfilled.
Best regards,
Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]
-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@hotmail.com
Sent: Tuesday, January 19, 2010 4:30 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] USB selective suspend problem during shutdown
The solution may be simple, Michal: disable selective
suspend.
That’s in fact what some of our customers have
been doing while faced with similar issues. Personally, I’m
waking up suspended devices upon the PowerSystemShutdown
IRP_MN_QUERY_POWER on XP to avoid these races.
Just out of curiosity: does it happen with Intel 5 series /
3400 series motherboards? These have exhibited a bunch of
USB shutdown issues for us, both with inbox and our own
drivers. USB IRP cancellation logic does not seem to work
reliably at shutdown on these platforms.
Regards,
Ilya Faenson
Rockville, MD USA
NTDEV is sponsored by OSR
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online
at http://www.osronline.com/page.cfm?name=ListServer