FW: USB mysteries

> ----------

From: Dmitriy Budko[SMTP:xxxxx@vmware.com]
Sent: Thursday, February 24, 2005 5:53 AM
To: Windows System Software Devs Interest List
Cc: Michal Vodicka
Subject: RE: [ntdev] USB mysteries

do you see any pattern in the timing of missing frames
relative to the periodic 1ms SOF packets?

Dmitriy,

no, relative possitions of missing packets to SOFs differ. The only thing: if frame N is missing, there is never SOF before frame N + 1. It makes sense if my timing theory is correct. Also, I guess I never saw missing packet just after SOF.

Best regards,

Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]

Michal,

Your argument for a host controller device driver bug seems reasonable.
I have never stressed full speed bulk transfers in the way that you
described.

You wrote that you have observed the failure on WinXP and later
versions. The Win2K implementation of the USB stack differs
significantly from WinXP and later versions. This is documented in the
DDK, “USB Driver Stack for Windows XP and Later”.

Did you try to reproduce the problem under Win2K? It certainly would
not be conclusive. But, it might be interesting to see if the failure
occurred on the same hardware while running Win2K.

Regards,
Jim Allred
EVI Technology, LLC

Michal Vodicka wrote:

Jim,

thanks for the ideas. Comments inline:

> 1) If the USB peripheral device errors the data toggle
> (DATA0/DATA1) it may send two DATA0 or two DATA1 packets in a row.
> Per the spec’, the host controller will ACK the second packet and
> discard the contents. It assumes that the packet is a duplicate.
>

I checked it and it never occured, at least for lost frames. DATA0/1
always toggle.

> 2) The host controller may react this way if your USB peripheral
> fails to implement bit stuffing properly. I witnessed a case where
> a full speed device failed to transmit a stuffed 0-bit when it was
> the last bit of a packet. Literally, if the last six bits of the
> data CRC are ones then the device must transmit a stuffed 0-bit
> before the EOP. If I remember correctly, VIA host controllers were
> the most sensitive to this problem. Most other manufacturer’s
> controllers would ignore it. But, the VIA chips would ACK the
> packet and discard the data.
>

I hope analyser would catch it but checked it to be sure. In most
cases, lost frames CRC didn’t contain six consecutive ones. Frame
data except first two bytes also didn’t contain it.

> Regarding your periodic measurements. A problem such as number two
> is dependant on the data. Thus, periodicity in the data could
> potential yield such measurements.
>

I’m not sure if I exactly understand what you mean. Let me describe
how the device under test works. It generates stream of data
according to specified interval. Host job is to read data as fast as
possible because device circular buffer is very limited (8 frames).
Test frames are numbered: first two bytes contain the number from 0
to 0x3fff so I can detect data loss. The remaining 62 bytes is some
constant value as 0xf. The problem can be reproduced if device is
faster than host i.e. it always has data to send when host asks.
There is no NAK. With decreased device speed problem probability
lowers and finally disappears.

As for times, it is the only pattern I see. I wasn’t quite clear
before: the 56.xxx usec time is from lost frame transfer start to
next frame transfer start. I tried to set device to generate data
faster and used overlapped reads at PC side and although data stream
was very regular (17 transfers per time frame), I never noticed the
time difference shorter than 57 usec. It seems as a bit shorter times
are unique for lost frames.

Well, I believe in this case our device doesn’t cause the problem. It
can be HC problem but we reproduced it on HCs from two different
manufacturers. Also, it seems to be related to CPU speed as we
reproduced it on slower CPUs only. It leads to conclusion there is
some OS drivers bug, probably very tight race conditions. Reproduced
at XP SP0, SP2 and w2k3, always at USB UHCI controller. There were
computers with more HCs of different types and problem was
reproducible at UHCI only.

Best regards,

Michal Vodicka UPEK, Inc. [xxxxx@upek.com,
http://www.upek.com]

— Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: unknown lmsubst tag
argument: ‘’ To unsubscribe send a blank email to
xxxxx@lists.osr.com

Jim,

I haven’t tried it at w2k, yet. It can be interesting although both possible results won’t tell too much (it problem doesn’t occur, it can be because of timing change). Anyway, I’ll try.

Best regards,

Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]


From: xxxxx@lists.osr.com[SMTP:xxxxx@lists.osr.com] on behalf of Jim Allred[SMTP:xxxxx@evitechnology.com]
Reply To: Windows System Software Devs Interest List
Sent: Friday, February 25, 2005 9:13 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] USB mysteries

Michal,

Your argument for a host controller device driver bug seems reasonable.
I have never stressed full speed bulk transfers in the way that you
described.

You wrote that you have observed the failure on WinXP and later
versions. The Win2K implementation of the USB stack differs
significantly from WinXP and later versions. This is documented in the
DDK, “USB Driver Stack for Windows XP and Later”.

Did you try to reproduce the problem under Win2K? It certainly would
not be conclusive. But, it might be interesting to see if the failure
occurred on the same hardware while running Win2K.

Regards,
Jim Allred
EVI Technology, LLC

Michal Vodicka wrote:
> Jim,
>
> thanks for the ideas. Comments inline:
>
>
>> 1) If the USB peripheral device errors the data toggle
>> (DATA0/DATA1) it may send two DATA0 or two DATA1 packets in a row.
>> Per the spec’, the host controller will ACK the second packet and
>> discard the contents. It assumes that the packet is a duplicate.
>>
>
> I checked it and it never occured, at least for lost frames. DATA0/1
> always toggle.
>
>
>> 2) The host controller may react this way if your USB peripheral
>> fails to implement bit stuffing properly. I witnessed a case where
>> a full speed device failed to transmit a stuffed 0-bit when it was
>> the last bit of a packet. Literally, if the last six bits of the
>> data CRC are ones then the device must transmit a stuffed 0-bit
>> before the EOP. If I remember correctly, VIA host controllers were
>> the most sensitive to this problem. Most other manufacturer’s
>> controllers would ignore it. But, the VIA chips would ACK the
>> packet and discard the data.
>>
>
> I hope analyser would catch it but checked it to be sure. In most
> cases, lost frames CRC didn’t contain six consecutive ones. Frame
> data except first two bytes also didn’t contain it.
>
>
>> Regarding your periodic measurements. A problem such as number two
>> is dependant on the data. Thus, periodicity in the data could
>> potential yield such measurements.
>>
>
> I’m not sure if I exactly understand what you mean. Let me describe
> how the device under test works. It generates stream of data
> according to specified interval. Host job is to read data as fast as
> possible because device circular buffer is very limited (8 frames).
> Test frames are numbered: first two bytes contain the number from 0
> to 0x3fff so I can detect data loss. The remaining 62 bytes is some
> constant value as 0xf. The problem can be reproduced if device is
> faster than host i.e. it always has data to send when host asks.
> There is no NAK. With decreased device speed problem probability
> lowers and finally disappears.
>
> As for times, it is the only pattern I see. I wasn’t quite clear
> before: the 56.xxx usec time is from lost frame transfer start to
> next frame transfer start. I tried to set device to generate data
> faster and used overlapped reads at PC side and although data stream
> was very regular (17 transfers per time frame), I never noticed the
> time difference shorter than 57 usec. It seems as a bit shorter times
> are unique for lost frames.>
>
> Well, I believe in this case our device doesn’t cause the problem. It
> can be HC problem but we reproduced it on HCs from two different
> manufacturers. Also, it seems to be related to CPU speed as we
> reproduced it on slower CPUs only. It leads to conclusion there is
> some OS drivers bug, probably very tight race conditions. Reproduced
> at XP SP0, SP2 and w2k3, always at USB UHCI controller. There were
> computers with more HCs of different types and problem was
> reproducible at UHCI only.
>
> Best regards,
>
> Michal Vodicka UPEK, Inc. [xxxxx@upek.com,
> http://www.upek.com]
>
> — Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: unknown lmsubst tag
> argument: ‘’ To unsubscribe send a blank email to
> xxxxx@lists.osr.com
>


Questions? First check the Kernel Driver FAQ at http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@upek.com
To unsubscribe send a blank email to xxxxx@lists.osr.com