Last week I posted few messages about my UMDF driver suddenly
disappearing during the DTM machine test. Now I know where the problem
lies. There is nothing wrong with the driver, it is just clash between
DTM and UMDF. And UMDF is the loser.
It occurs during DTM Common I/O stress test. The test repeatedly (among
others) disables and re-enables all devices in the system. Including our
one which causes driver restart. During start UMDF kernel part receives
PnP IRP IRP_MN_START_DEVICE. In turn, it sends a message to the driver
and framework calls OnPrepareHardware() callback. There we configure USB
and prepare the device for use. Normally, it takes few hundreds ms. But
under the test, it is much longer. The whole machine is so overloaded
that mouse cursor is usually frozen. It slows down our driver in 3
orders of magnitude. And guess what. The message sent to driver timeouts
and UMDF driver manager kills the driver…
The timeout is by default 60 sec. Normally it is more than enough but
not in this case. I extended the timeout to 1 hour and left the test
running for whole weekend. The driver was restarted more than 300 times
and average OnPrepareHardware() callback processing time was about 260
sec (!). The worst time was 360 sec i.e 6 times longer than the default
timeout.
I’m going to file a bug report. This test is run by all OEMs and an UMDF
driver doesn’t have a big chance to pass it. With default configuration,
we’re going to release driver with timeout extended as short-term
solution. The way how to do it is of course undocumented. However, I
believe it is a problem for all UMDF drivers and should be solved inside
framework. In other cases it extends timeout when it seems driver makes
some progress. Not in this case. In addition, it seems worker threads
calling callbacks run on default priority. Raising could help but it
should be done inside framework, too.
The worst part was diagnostic. In such a case, when driver is
intentionally murdered, I’d expect some reasonable message in the Event
Log about it. There was only generic one suggesting that driver crashed.
There are silent WER reports when one knows where to find them and how
to configure WER to report at least something (was disabled at DTM
machine). WER created short descriptions and failed when copying WDF
traces and creating dumps. It was probably because of the stress tests
running. It was able to copy traces in about 10% of cases and in just
one case it created dumps. In other cases it just wrote about dump
creating failure. In many cases it simply didn’t do anything. There were
about 10x more messages in the event log than WER reports.
Actually, even small WER report file contained the useful info. Which
can be only deciphered when one knows where the problem is and/or have
to access to UMDF sources. For the future reference (only important
fields):
Sig[1].Name=Problem
Sig[1].Value=HostTimeout
It means host didn’t process a message within the timeout and was
killed. In this case, maybe there are others.
Sig[4].Name=ExitCode
Sig[4].Value=103
It is STATUS_PENDING as we all know. The status of the IRP in the time
of timeout. Quite misleading as an exit code.
Sig[6].Name=Message
Sig[6].Value=11b00
The most interesting part. It is OR of:
- 1 << 16 IRP related message
- 0x1b << 8 IRP_MJ_PNP
- 0x0 IRP_MN_START_DEVICE
There were also messages 11602 when it was power IRP related message
which was found in the queue as the first timeouted.
It is beyond my understanding how are driver developers expected to
decode it. Maybe it is documented somewhere but Google can’t find it.
Looking back I made one mistake. I thought the host timeout problem is
the consequence of driver crash. So I didn’t try to decipher what WER
report fields really mean and tried to find why driver crashes, instead.
Finally I found it and discovered that the timeout was the reason and
the crash/termination the consequence. Who with kernel experience would
expect it, killing driver because of a timeout seems as a really strange
idea… If it is documented somewhere, I’d like to know.
Best regards,
Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com http:</http:>]