USB KMDF driver performance /BULK or INT ?

Hi ,

I am trying to measure the performance of my USB driver. My driver sits on top of ACPI/usbhub in device stack.
HW is configured for USB 2.0 high speed; with 2 BULK points.
From a user space test app, WriteFile API delivers 40 bytes of REQUEST data. Data is written correctly. For each 40 bytes of REQUEST, HW gives back 15 bytes of ACK and followed by 40 bytes of RESPONSE. ( There will be constant delay of 60 uSec between ACK and RESPONSE ).

My design goal is to have as much less time as possible between each REQUESTs. Currently I see that the Round trip is anywhere between 450 - 700use in USBlyzer ( elapsed time between 2 consecutive REQUESTs.)

  • The measured time in USBlyzer of URB for OUT transaction is ~ 87 - 150usec.

Inside the driver, “WdfUsbTargetPipeConfigCOntinuousReader()” is configured for reading.

  • IN transaction of ACK is taking ~ 250 - 350 usec.
  • IN transcation of RESPONSE also takes ~ 200usec.

All I know is that on HW, the firmware processing time is around 80usec + 60 (constant delay of ACK - RESPONSE ); so total processing time is ~170usec.
Now I AM NOT CLEAR AS WHY USB IS TAKING MORE THAN ~ 400usec. I see some serious design flaw in my approach of BULK mode.

Before pointing to the FIRMWARE on code change to INTERRUPT, I need some pointers to improvement to ways to measure the actual time.

Please suggest.

As you are measuring performance, I assume you have a performance problem.

This problem is structural to your design. Requiring a complete round-trip before issuing another request will inevitably lead to poor performance even if the underlying transport was not USB. USB makes this worse by imposing scheduling delays that a PCIe connection would not incur.

If you tell us more about how your application is structured, we can likely provide some better advise as there is very little that your driver can do to help or hinder this assuming you don?t have completely retarded code.

Sent from Mailhttps: for Windows 10

From: srinivaskumar.r@in.bosch.commailto:srinivaskumar.r
Sent: July 11, 2016 11:12 AM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: [ntdev] USB KMDF driver performance /BULK or INT ?

Hi ,

I am trying to measure the performance of my USB driver. My driver sits on top of ACPI/usbhub in device stack.
HW is configured for USB 2.0 high speed; with 2 BULK points.
From a user space test app, WriteFile API delivers 40 bytes of REQUEST data. Data is written correctly. For each 40 bytes of REQUEST, HW gives back 15 bytes of ACK and followed by 40 bytes of RESPONSE. ( There will be constant delay of 60 uSec between ACK and RESPONSE ).

My design goal is to have as much less time as possible between each REQUESTs. Currently I see that the Round trip is anywhere between 450 - 700use in USBlyzer ( elapsed time between 2 consecutive REQUESTs.)

- The measured time in USBlyzer of URB for OUT transaction is ~ 87 - 150usec.

Inside the driver, “WdfUsbTargetPipeConfigCOntinuousReader()” is configured for reading.
- IN transaction of ACK is taking ~ 250 - 350 usec.
- IN transcation of RESPONSE also takes ~ 200usec.

All I know is that on HW, the firmware processing time is around 80usec + 60 (constant delay of ACK - RESPONSE ); so total processing time is ~170usec.
Now I AM NOT CLEAR AS WHY USB IS TAKING MORE THAN ~ 400usec. I see some serious design flaw in my approach of BULK mode.

Before pointing to the FIRMWARE on code change to INTERRUPT, I need some pointers to improvement to ways to measure the actual time.

Please suggest.


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:></mailto:xxxxx></mailto:srinivaskumar.r></https:>

Hi,
Thanks for the reply.

My product ( HW device + WIN USB KMDF Driver ) is a measurement device which has to tunnel data from a proprietary Serial BUS to USB on PC. Data ( max 40 bytes ) on Serial BUS can arrive any time between 50 - 100 usec .
Windows use mode applications can send data ( 40 bytes ) ANY TIME to the HW device over USB. FirmWare takes care to transmit this asap, when the Serial BUS is free.
HW deivce will send a REQUEST-OK-ACK to the application, once the transmit data is received from USB. Sometimes client apps can send stream of data as well to HW, which get buffered. The Device also send the RESPONSE messages to the device asap when it receives 40 bytes from the SERIAL BUS.

The HW has two independent serial BUS channels and both are connected in loopback mode for performance tests. I am NOT considering the FIRMWARE execution time and the data transfer time on the loopback cable on SERIAL BUS ( 80 usec and 60 usec respectively) for performance measurement. This way , I plan to test the overall quality of our product.

Now my design has to full fill the following,

  • Lowest Round trip time;
  • higher data throughput

My Test setup has only a mouse + keyboard and my HW deice connected to the PC.

>>>“. If you do not have a request ready and waiting when a frame is scheduled, then you will miss that frame. You won’t get another chance until the next frame is scheduled.”

I am aware that USB scheduling/transactions are hostcontroller driven. But I am already using “WdfUsbTargetPipeConfigCOntinuousReader()”, which , I assume takes care to schedule a read operation continuously for the BULK ENDPOINT.

How can I request in advance so that my IN transactions fast and scheduled properly? Is there any way to prioritize requests from my USB driver?

Will it help if, I have more bulk points on my HW? Is it possible to do INTERRUPT and BULK in parallel?