Error Recovery in WinUSB.

nyryder · June 25, 2010, 9:36am

My USb device has a stringent requirement for Error Recovery. When a 3 strike (because of bus error) happens i need to stop all the transmission and reset the device,initiates the state machine through certain register operations and re send the commands.
I use WinUsb for my data communication to the device.
When a 3 strike happens with ping/ack , my host software (USB stack + winusb) does not repeat the current packet , but it sends the next buffered packet from the queue. With many USB traces what i see is that the Host did an auto-unHalt of the BULK EP after the 3-strikes, but it failed to do a ClearFeature(EndpointHalt) with which to re-synchronize the data toggles; resulting a data toggle error. What i want is that when the 3 strike happens, stop any further transmission at that point and do the full device recovery, and clear up the stall on both ends and retransmit the data again.

An another thing i observed is that even if i set AUTO_CLEAR_STALL(on WinUsbSetPipePolicy) to be false on other error cases when the stall happens the keeps sending the data automatically (after clearing the host stall condition). I would like to disable the AUTO_CLEAR_STALL. My Host software machine is XP, (i will use the same software in Win7 also).

Any Guidance?. Thanking you in Advance.

nyryder · June 28, 2010, 12:05pm

Experts,

Let me know if my question is not clear.

Tim_Roberts · June 29, 2010, 5:16pm

xxxxx@gmail.com wrote:

Experts,

Well, I’m not sure what you’re talking about when you say “3 strike
happens”. The host software should never have to retry a packet.
That’s one of the guarantees in a USB bulk pipe – the controller will
keep sending until the device acknowledges receipt. Similarly, your
device should not have to reset itself because of a bus error retry.
That’s just part of the protocol.

Perhaps you could describe the sequence of operations again, using the
USB terminology.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

nyryder · June 29, 2010, 10:35pm

Thanks for the reply. I meant 3 strikes as 3 times retry of the same packet if the packet encounters an error. I am sending the data to the device in a bulk out pipe asynchronously using WinUsb_Writepipe command. I used Lecroy to get the trace of the USB packets. In one particular case, my software pushed/ queueud 3 USB transfers of size 4816968(A), 40(B), and 4827744 (C). For the first transfer the host breaks into 512 bytes packets and sends to device, when it is sending the 8192th packet of 512 bytes, the ACK from Device gets lost, then Host sends 3 pings continously, and device ACK it but looks like the host is not seeing the ACKs and it drops (not sending) the Current transfer(A). The Last PID in the transfer A is DATA1. The WinUsb_GetOverlappedResult fails and return numbers ofbytes transmitted as 2097152. The host controller sends the second(B) transfer with the same PID(DATA1, lecroy shows as data toggle error) but device acknowledges it, and host s/w proceeds to the next packet C. My device has a FIFO and it realzes the comamnds are not proper Incomplete A is appended with B, and then wiith C.

Having explained the problem, Now my first Question. Is there any way that i can stop the Host not to send any more data after the first error transmission(i.e at A)?.
Once tranmission fails, i call WinUSB_AbortPipe Call, and all the submitted transfers fails with WinUSB_GetOverlappedResult as zero and the getlasterror shows ERROR_OPERATION_ABORTED(995) code and my data structure shows these are for 40 bytes packet(B) and 2097152(C) bytes packets. But i do see those packets wire. The packets are not really aborted. But Windows says it is aborted. Why is that?.

Tim_Roberts · June 30, 2010, 12:20pm

xxxxx@gmail.com wrote:

Thanks for the reply. I meant 3 strikes as 3 times retry of the same packet if the packet encounters an error. I am sending the data to the device in a bulk out pipe asynchronously using WinUsb_Writepipe command. I used Lecroy to get the trace of the USB packets. In one particular case, my software pushed/ queueud 3 USB transfers of size 4816968(A), 40(B), and 4827744 (C). For the first transfer the host breaks into 512 bytes packets and sends to device, when it is sending the 8192th packet of 512 bytes, the ACK from Device gets lost, then Host sends 3 pings continously, and device ACK it but looks like the host is not seeing the ACKs and it drops (not sending) the Current transfer(A). The Last PID in the transfer A is DATA1. The WinUsb_GetOverlappedResult fails and return numbers ofbytes transmitted as 2097152. The host controller sends the second(B) transfer with the same PID(DATA1, lecroy shows as data toggle error) but device acknowledges it, and host s/w proceeds to the next packet C. My device has a FIFO and it realzes the comamnds are not proper Incomplete A is appended with B, and then wiith C.

Your transfers are too large. Windows does not support transfers larger
than about 3MB on a high-speed device. Check this knowledge base
article, which is on my FAQ list:
http://support.microsoft.com/kb/832430

Try breaking your transfers up into chunks of 1048576 bytes (make sure
it’s an even multiple of 512). My guess is that will solve your problem.

I’m not aware of any “three strikes and you’re out” policy in the USB
protocol at all. A bulk operation should retry until the sun burns out,
unless the device leaves the bus.

Having explained the problem, Now my first Question. Is there any way that i can stop the Host not to send any more data after the first error transmission(i.e at A)?.

A 5MB transfer is going to take more than 150ms to execute. As I see
it, there’s really no point in submitting multiple requests
simultaneously. Why don’t you just wait until the first one succeeds
before submitting the second? That would solve the problem, wouldn’t it?

Once tranmission fails, i call WinUSB_AbortPipe Call, and all the submitted transfers fails with WinUSB_GetOverlappedResult as zero and the getlasterror shows ERROR_OPERATION_ABORTED(995) code and my data structure shows these are for 40 bytes packet(B) and 2097152(C) bytes packets. But i do see those packets wire. The packets are not really aborted. But Windows says it is aborted. Why is that?.

“Abort pipe” is a low-level operation that aborts any requests
outstanding on the pipe. It’s possible that, because your first
transfer was so large, WinUSB hadn’t actually submitted the second and
third transfers yet, so they weren’t aborted. I’m making that up,
however. I’m not sure.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

nyryder · June 30, 2010, 4:37pm

I submit 4MB to WinUSB. But looks like winUSB is segmeting under 3 MB.
If i submit 4816968, then the transfers in the lecroy looks like 2097152, 2097152, and 622664. Thats the reason i did not bother about 3MB limit.
>>>>I’m not aware of any “three strikes and you’re out” policy in the USB
protocol at all. A bulk operation should retry until the sun burns out,
unless the device leaves the bus.

In this following link,http://msdn.microsoft.com/en-us/library/ff539286(VS.85).aspx. the 6 th paragraph starts in the following way “If an I/O request on a control, interrupt, or bulk pipe fails, the pipe halts.”. But it is not halting. The WinUSB it is dropping the current transfer and move it to the next transfer.

Your idea of synchronous transmission will work. But i fear the latency of the acknowledgements of the various layers. Thats the reason i went to async mode.

Thanks for your reply.

Tim_Roberts · June 30, 2010, 5:21pm

xxxxx@gmail.com wrote:

I submit 4MB to WinUSB. But looks like winUSB is segmeting under 3 MB.
If i submit 4816968, then the transfers in the lecroy looks like 2097152, 2097152, and 622664. Thats the reason i did not bother about 3MB limit.

I didn’t know this. It’s nice to have an actual observation to confirm it.

>>>>I’m not aware of any “three strikes and you’re out” policy in the USB
protocol at all. A bulk operation should retry until the sun burns out,
unless the device leaves the bus.

In this following link,http://msdn.microsoft.com/en-us/library/ff539286(VS.85).aspx. the 6 th paragraph starts in the following way “If an I/O request on a control, interrupt, or bulk pipe fails, the pipe halts.”. But it is not halting. The WinUSB it is dropping the current transfer and move it to the next transfer.

Interesting. And you have not set AUTO_CLEAR_STALL in the pipe policy?

Your idea of synchronous transmission will work. But i fear the latency of the acknowledgements of the various layers. Thats the reason i went to async mode.

With transfers as large as yours, latency is irrelevant. At the worst
case, if you had to wait for a complete time slice to expire before you
got a shot, there’d be a 16ms delay. Since your transfers are going to
take at least ten times that long, I would say you are prematurely
optimizing.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

nyryder · June 30, 2010, 5:33pm

I observed the traces, WINUsb is segmenting.
When i open the pipe i explicity call
UCHAR flag=FALSE;
bResult = WinUsb_SetPipePolicy(devHandle,bulkOutPipe,AUTO_CLEAR_STALL,sizeof(UCHAR), &flag);

Is there any IOCTL call can i make to EHCI to set the property to halt if there is an error?.

Sorry for not making it clear. The device supports small transfers. In this example i have shown big transfers. But the host can generate very small(40 bytes) bursty transfers to the device, thats the reason for ASYNC transfer support.

OSR_Community_User · July 1, 2010, 3:29am

On 06/30/2010 06:20 PM, Tim Roberts wrote:

Your transfers are too large. Windows does not support transfers larger
than about 3MB on a high-speed device. Check this knowledge base
article, which is on my FAQ list:
http://support.microsoft.com/kb/832430

Note: Documentation feedback submitted for “WinUsb_WritePipe Function”
http://msdn.microsoft.com/en-us/library/ff540322(v=VS.85).aspx

Thanks again, Tim: For UMDF/KMDF/WinUSB, too small transfers are bad
(latency). It is very important that “too large ones” are an issue, too.

Michal_Vodicka-2 · July 1, 2010, 2:37pm

> -----Original Message-----

From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Hagen Patzke
Sent: Thursday, July 01, 2010 9:29 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Error Recovery in WinUSB.

Thanks again, Tim: For UMDF/KMDF/WinUSB, too small transfers are bad
(latency). It is very important that “too large ones” are an
issue, too.

Large transfers are problem also other way. The scheduling takes much
longer time than for small ones so if your protocol can accept
occassional small gap in data but can’t accept large one, it is broken
on large transfer boundary. Anyway, the best it to queue more smaller
buffers if possible.

Best regards,

Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]

nyryder · July 7, 2010, 3:19pm

“The EHCI bulk transfer request queuing logic was optimized more for the case of
a small number of large transfers rather than a large number of small transfers”. This is from the news group discussion “http://www.osronline.com/showthread.cfm?link=176453”. Why do you say large transfer schedule takes more time?. Can you give more insight into that?. Thanks.

Tim_Roberts · July 7, 2010, 4:29pm

xxxxx@gmail.com wrote:

“The EHCI bulk transfer request queuing logic was optimized more for the case of
a small number of large transfers rather than a large number of small transfers”. This is from the news group discussion “http://www.osronline.com/showthread.cfm?link=176453”. Why do you say large transfer schedule takes more time?. Can you give more insight into that?. Thanks.

That’s not what it says. It says that, when they optimized the code,
they assumed the common case would be a small number of large transfers,
so they made that case run best.

It doesn’t say that a large number of small transfers necessarily takes
more time, it merely says they didn’t target that case. Now, it’s not
hard to imagine that it WOULD take longer to manage a long list of small
transfers. It’s a shelf-packing problem – you’re trying to pack boxes
of various sizes onto a fixed-size shelf. Everytime you have to run
through the list, it costs time.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.