Michal,
Your argument for a host controller device driver bug seems reasonable.
I have never stressed full speed bulk transfers in the way that you
described.
You wrote that you have observed the failure on WinXP and later
versions. The Win2K implementation of the USB stack differs
significantly from WinXP and later versions. This is documented in the
DDK, “USB Driver Stack for Windows XP and Later”.
Did you try to reproduce the problem under Win2K? It certainly would
not be conclusive. But, it might be interesting to see if the failure
occurred on the same hardware while running Win2K.
Regards,
Jim Allred
EVI Technology, LLC
Michal Vodicka wrote:
Jim,
thanks for the ideas. Comments inline:
> 1) If the USB peripheral device errors the data toggle
> (DATA0/DATA1) it may send two DATA0 or two DATA1 packets in a row.
> Per the spec’, the host controller will ACK the second packet and
> discard the contents. It assumes that the packet is a duplicate.
>
I checked it and it never occured, at least for lost frames. DATA0/1
always toggle.
> 2) The host controller may react this way if your USB peripheral
> fails to implement bit stuffing properly. I witnessed a case where
> a full speed device failed to transmit a stuffed 0-bit when it was
> the last bit of a packet. Literally, if the last six bits of the
> data CRC are ones then the device must transmit a stuffed 0-bit
> before the EOP. If I remember correctly, VIA host controllers were
> the most sensitive to this problem. Most other manufacturer’s
> controllers would ignore it. But, the VIA chips would ACK the
> packet and discard the data.
>
I hope analyser would catch it but checked it to be sure. In most
cases, lost frames CRC didn’t contain six consecutive ones. Frame
data except first two bytes also didn’t contain it.
> Regarding your periodic measurements. A problem such as number two
> is dependant on the data. Thus, periodicity in the data could
> potential yield such measurements.
>
I’m not sure if I exactly understand what you mean. Let me describe
how the device under test works. It generates stream of data
according to specified interval. Host job is to read data as fast as
possible because device circular buffer is very limited (8 frames).
Test frames are numbered: first two bytes contain the number from 0
to 0x3fff so I can detect data loss. The remaining 62 bytes is some
constant value as 0xf. The problem can be reproduced if device is
faster than host i.e. it always has data to send when host asks.
There is no NAK. With decreased device speed problem probability
lowers and finally disappears.
As for times, it is the only pattern I see. I wasn’t quite clear
before: the 56.xxx usec time is from lost frame transfer start to
next frame transfer start. I tried to set device to generate data
faster and used overlapped reads at PC side and although data stream
was very regular (17 transfers per time frame), I never noticed the
time difference shorter than 57 usec. It seems as a bit shorter times
are unique for lost frames.
Well, I believe in this case our device doesn’t cause the problem. It
can be HC problem but we reproduced it on HCs from two different
manufacturers. Also, it seems to be related to CPU speed as we
reproduced it on slower CPUs only. It leads to conclusion there is
some OS drivers bug, probably very tight race conditions. Reproduced
at XP SP0, SP2 and w2k3, always at USB UHCI controller. There were
computers with more HCs of different types and problem was
reproducible at UHCI only.
Best regards,
Michal Vodicka UPEK, Inc. [xxxxx@upek.com,
http://www.upek.com]
— Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256
You are currently subscribed to ntdev as: unknown lmsubst tag
argument: ‘’ To unsubscribe send a blank email to
xxxxx@lists.osr.com