Re: USB KMDF driver performance /BULK or INT ?

srinivaskumar.r@in.bosch.com wrote:

I am trying to measure the performance of my USB driver. My driver sits on top of ACPI/usbhub in device stack.
HW is configured for USB 2.0 high speed; with 2 BULK points.
From a user space test app, WriteFile API delivers 40 bytes of REQUEST data. Data is written correctly. For each 40 bytes of REQUEST, HW gives back 15 bytes of ACK and followed by 40 bytes of RESPONSE. ( There will be constant delay of 60 uSec between ACK and RESPONSE ).

This is simply not a scenario for which USB was optimized. No scenario
which requires call-and-response will ever approach maximum speed.
That’s just a natural side effect of the design of USB. Remember that
USB frames are all scheduled in advance. If you do not have a request
ready and waiting when a frame is scheduled, then you will miss that
frame. You won’t get another chance until the next frame is scheduled.

My design goal is to have as much less time as possible between each REQUESTs. Currently I see that the Round trip is anywhere between 450 - 700us in USBlyzer ( elapsed time between 2 consecutive REQUESTs.)

Yep.

Now I AM NOT CLEAR AS WHY USB IS TAKING MORE THAN ~ 400usec. I see some serious design flaw in my approach of BULK mode.
Before pointing to the FIRMWARE on code change to INTERRUPT, I need some pointers to improvement to ways to measure the actual time.

The endpoint type won’t make a difference. Having an interrupt endpoint
guarantees you a slot in the frame, but once again if you don’t have a
request ready and waiting when the frame is scheduled, you’ll miss your
shot. You need to change your design so you can stream multiple packets
and ack/nak a bunch of packets at a time.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

srinivaskumar.r@in.bosch.com wrote:

My product ( HW device + WIN USB KMDF Driver ) is a measurement device which has to tunnel data from a proprietary Serial BUS to USB on PC. Data ( max 40 bytes ) on Serial BUS can arrive any time between 50 - 100 usec .
Windows use mode applications can send data ( 40 bytes ) ANY TIME to the HW device over USB. FirmWare takes care to transmit this asap, when the Serial BUS is free.
HW deivce will send a REQUEST-OK-ACK to the application, once the transmit data is received from USB. Sometimes client apps can send stream of data as well to HW, which get buffered. The Device also send the RESPONSE messages to the device asap when it receives 40 bytes from the SERIAL BUS.

The HW has two independent serial BUS channels and both are connected in loopback mode for performance tests. I am NOT considering the FIRMWARE execution time and the data transfer time on the loopback cable on SERIAL BUS ( 80 usec and 60 usec respectively) for performance measurement. This way , I plan to test the overall quality of our product.

Now my design has to full fill the following,

  • Lowest Round trip time;
  • higher data throughput

USB is great for throughput. USB sucks for round trip time. That’s
just the way it is.

>>>> “. If you do not have a request ready and waiting when a frame is scheduled, then you will miss that frame. You won’t get another chance until the next frame is scheduled.”
I am aware that USB scheduling/transactions are hostcontroller driven. But I am already using “WdfUsbTargetPipeConfigCOntinuousReader()”, which , I assume takes care to schedule a read operation continuously for the BULK ENDPOINT.

Yes. It submits a request which will retry continuously on the bus
until the device responds. It pretty much fills up your bus with traffic.

However, that doesn’t help your outgoing packet. You will enter the
scheduling queue, but your packet won’t get sent until that frame is
committed to the hardware. Even if you snag the response right away,
the host controller doesn’t get an interrupt at every byte. It’s won’t
get an interrupt until the end of the microframe, and only when can it
finally complete any requests that were fulfilled during that microframe.

By the time the notification gets back up to you, the next microframe is
well underway. By the time you are able to get a response queued up, a
fair amount of time has elapsed.

How can I request in advance so that my IN transactions fast and scheduled properly? Is there any way to prioritize requests from my USB driver?

Get real. If there were a way to ask for priority, then everyone would
ask for priority. Bulk endpoints get the lowest priority of all of the
endpoint types, although that’s only relevant if you are competing with
other devices.

Priority is not the problem. USB was simply not designed to support
your scenario.

Will it help if, I have more bulk points on my HW? Is it possible to do INTERRUPT and BULK in parallel?

How could that possibly help? You can’t increase the bandwidth of the
bus, and you can’t change the frame timing. You can have up to 31
endpoints at a time, all active, but they will all be competing for
bandwidth with all of the other devices.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.