KMDF high speed usb minimal latency

Hello i have a FT2232H based device, and so far, it’s quite speedy, however, now i have a need to poll the device it’s connected to, and as soon as a bit flips, continue with some action… I am using the FTDI D2XX drivers, and the latency i get from issuing a read command to the device(a few bytes), then reading back one byte (or a bit more) is from 0.8ms-3ms according to QueryHighPerformanceCounter. For this operation, this polling needs to be done a lot, and thus these latencies are high and limits the operation to something like 100KB/s. To have actual proper speed, the polling latency should be somewhere like 0.3ms, since the operation takes from 0.8ms to 1.6ms to actually complete.

In order to improve this significant drawback, i am ready to write a KMDF usb driver, however, i am not sure if it will allow me to send and receive individual microframes in usb high speed mode… so, could i get latency granularity less than 1ms when talking to the usb controller in KMDF mode ? The usb device supports max 512 byte IN/OUT transfers in high speed mode, and it reports itself as a BULK device.

xxxxx@rajko.info wrote:

Hello i have a FT2232H based device, and so far, it’s quite speedy, however, now i have a need to poll the device it’s connected to, and as soon as a bit flips, continue with some action… I am using the FTDI D2XX drivers, and the latency i get from issuing a read command to the device(a few bytes), then reading back one byte (or a bit more) is from 0.8ms-3ms according to QueryHighPerformanceCounter. For this operation, this polling needs to be done a lot, and thus these latencies are high and limits the operation to something like 100KB/s. To have actual proper speed, the polling latency should be somewhere like 0.3ms, since the operation takes from 0.8ms to 1.6ms to actually complete.

So, you have to do a “write” followed by a “read”? The only way to make
that faster is to use overlapped I/O to submit the two requests at the
same time.

USB is not a real-time bus. It’s all scheduled in advance – the host
controller driver schedules all the transfers for a frame, then submits
it to the hardware. Once the frame is gone, it starts scheduling the
next frame. If you submit the write, then wait for that to complete,
and then submit the read, then the two requests cannot be handled in the
same frame. A frame is 1ms.

So, in a very real sense, your device is simply a poor candidate for
USB. It wasn’t designed for this kind of operation. Streaming works
well, but round-trips are especially painful.

Are you talking to the FTDI driver directly, or are you using their
library? If you are making ioctl calls to the driver directly, you can
try using overlapped I/O to submit them both at once. But if you have
to go through their library, it might not support this.

In order to improve this significant drawback, i am ready to write a KMDF usb driver, however, i am not sure if it will allow me to send and receive individual microframes in usb high speed mode…

Well, not really; USB doesn’t work that way. The driver submits URBs,
and the host controller worries about the scheduling.

If you can figure out their packet format, you don’t actually need to
write a driver. You can use WinUSB. It supports overlapped I/O.

The usb device supports max 512 byte IN/OUT transfers in high speed mode, and it reports itself as a BULK device.

All high-speed bulk pipes have 512 byte packets, although I’m not sure
the FT2232 devices actually support that much. They’re pretty
bare-boned devices.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

i have the confidential document that you use when you want to directly interface with the IN/OUT pipes without using their library
right now i am using their drivers and library

a frame is 1ms in usb FULL SPEED mode sure, but isn’t that divided into 0.1ms microframes for high speed usb ? so couldn’t i somehow, put on the bus one write microframe (that will tell the device to sample some bytes), and then put a read microframe onto which the device can output data ? this can’t be done in 0.2ms or something like that ? when i looked at the usb driver functions, i’ve seen something like getmicroframe number, so i thought that microframe granularity was possible when issuing requests…

xxxxx@rajko.info wrote:

i have the confidential document that you use when you want to directly interface with the IN/OUT pipes without using their library
right now i am using their drivers and library

a frame is 1ms in usb FULL SPEED mode sure, but isn’t that divided into 0.1ms microframes for high speed usb ?

There are 8 microframes in a frame, but I don’t believe the HCD does
scheduling at that granularity – too much overhead. Your timing
evidence certainly supports that.

so couldn’t i somehow, put on the bus one write microframe (that will tell the device to sample some bytes), and then put a read microframe onto which the device can output data ? this can’t be done in 0.2ms or something like that ? when i looked at the usb driver functions, i’ve seen something like getmicroframe number, so i thought that microframe granularity was possible when issuing requests…

Client drivers don’t have that level of control (although it’s not
really important). Remember that the bus is a shared resource. You
don’t “own” a microframe. Your requests are being merged with all of
the other requests on the bus, including mice and keyboards, audio
devices, and so on. You submit a request, and the HCD adds it to its
schedule.

However, if you submit two requests with overlapped I/O, the HCD will
schedule them one right after the other, as much as possible. That’s
really what you are after.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

if the granularity is 1ms…then how did some operations (start time before i call write, and end time after i get the bytes i need) complete under 1ms, like 0.8ms ? the average time was also 1.3ms, and since i did write and read synchronously, wouldn’t that take 2ms minimum ?

so if i move to WinUSB, and submit the 2 requests one right after another in asynchrnous mode, the entire write/read operation will take atleast 1ms untill i get the bytes back ? that doesn’t seem like much of an improvement to my current situation

I can’t comment on the timing, but using winusb would prevent you having to
write a driver.

On Jun 23, 2010 7:54 PM, wrote:

if the granularity is 1ms…then how did some operations (start time before
i call write, and end time after i get the bytes i need) complete under 1ms,
like 0.8ms ? the average time was also 1.3ms, and since i did write and read
synchronously, wouldn’t that take 2ms minimum ?

so if i move to WinUSB, and submit the 2 requests one right after another in
asynchrnous mode, the entire write/read operation will take atleast 1ms
untill i get the bytes back ? that doesn’t seem like much of an improvement
to my current situation


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:

yes but using winusb there would still be a usermode->kernelmode switch occuring when i do any read/write call, which would mean the process would leave it’s scheduling timeslice, and introduce a significant delay untill it gets the timeslice back…

the essence of my question was, could i control packets on the usb bus going in and out on a microframe 0.1ms granularity when in a kernel driver, or is that just not possible ?

xxxxx@rajko.info wrote:

if the granularity is 1ms…then how did some operations (start time before i call write, and end time after i get the bytes i need) complete under 1ms, like 0.8ms ? the average time was also 1.3ms, and since i did write and read synchronously, wouldn’t that take 2ms minimum ?

That depends on many things, including the mechanism you’re using to do
the timing. By default, a USB host controller will not interrupt more
often than once per frame. It’s possible that the Windows EHCI driver
changes that threshhold so interrupts are generated more often, but that
increases the overhead, and provides very little benefit for the vast
majority of USB devices.

so if i move to WinUSB, and submit the 2 requests one right after another in asynchrnous mode, the entire write/read operation will take atleast 1ms untill i get the bytes back ? that doesn’t seem like much of an improvement to my current situation

Nope. When you’re designing a USB device from scratch, you take great
pains to make sure it doesn’t need this kind of round-trip operation.
What YOU have is a square peg in a round hole. It’s a device that
wasn’t designed for USB, being shoe-horned into the USB model.

It’s still worth the experiment.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

The usermode-kernelmode switch is essentially a function call. You do not
lose your timeslice if the IO is asynchronous.

Bill Wandel

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com]
On Behalf Of xxxxx@rajko.info
Sent: Wednesday, June 23, 2010 8:09 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] KMDF high speed usb minimal latency

yes but using winusb there would still be a usermode->kernelmode switch
occuring when i do any read/write call, which would mean the process would
leave it’s scheduling timeslice, and introduce a significant delay untill it
gets the timeslice back…

the essence of my question was, could i control packets on the usb bus going
in and out on a microframe 0.1ms granularity when in a kernel driver, or is
that just not possible ?


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

xxxxx@rajko.info wrote:

yes but using winusb there would still be a usermode->kernelmode switch occuring when i do any read/write call, which would mean the process would leave it’s scheduling timeslice, and introduce a significant delay untill it gets the timeslice back…

No, those are all trivial concerns. When an I/O request is completed,
the process gets a little temporary priority boost. The theory is that
a process that has been waiting for I/O is probably going to make
another I/O request right away, so it’s impact on overall CPU use is
quite low, and you might as well give it an extra shot.

the essence of my question was, could i control packets on the usb bus going in and out on a microframe 0.1ms granularity when in a kernel driver, or is that just not possible ?

Not possible. Your kernel driver would be subject to exactly the same
limitations as WinUSB. Both are clients of the USB host controller driver.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

i am using queryperformancecounter/queryperformancefrequency for timing

yes i know that this chip is unsuitable for non buffered/roundtrip requests, and because of this reason most usb devices are MCU based, however most MCU based usb controllers are either too expensive or too slow(CPU wise) or operate in full speed mode

i just need a definite answer about the microframe granularity situation, as i guess other high speed devices (such as usb network adapters) would highly benefit from being able to quickly switch from read/writes, so the capability MIGHT be there

xxxxx@rajko.info wrote:

yes i know that this chip is unsuitable for non buffered/roundtrip requests, and because of this reason most usb devices are MCU based, however most MCU based usb controllers are either too expensive or too slow(CPU wise) or operate in full speed mode

i just need a definite answer about the microframe granularity situation, as i guess other high speed devices (such as usb network adapters) would highly benefit from being able to quickly switch from read/writes, so the capability MIGHT be there

Have you looked at the URB interface that USB client drivers use to
submit requests? That’s all there is. There is no back door. You can
have a read request outstanding at all times, waiting forever for the
device to respond, then submit a write request that triggers the read,
but you’ll still have to wait for the host controller interrupt before
the HCD can tell you that the read completed.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

well, the default linux EHCI driver’s interrupt rate is once per microframe, if the same is true for windows, then you are saying that if i use asynchrnous io and WinUSB, then i could achieve the same latency as i would when writing a kernel driver (which would be something like 0.4ms max, if interrupt is once per microframe, which is perfect)

The answer is yes but the if part is very probably false and Windows
don’t do it. Rather easy to make an experiment. There is common
misperception that running in kernel mode makes things faster than in
user more. Instead, raising thread priorities can lead to better
results.

With WinUsb you don’t need to write any driver, just INF for your device
and testing app using WinUsb API. Very good for verifying concepts;
later you can write own kernel driver if necessary or UMDF one which
also uses WinUsb.

Best regards,

Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@rajko.info
Sent: Thursday, June 24, 2010 2:40 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] KMDF high speed usb minimal latency

well, the default linux EHCI driver’s interrupt rate is once
per microframe, if the same is true for windows, then you are
saying that if i use asynchrnous io and WinUSB, then i could
achieve the same latency as i would when writing a kernel
driver (which would be something like 0.4ms max, if interrupt
is once per microframe, which is perfect)


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online
at http://www.osronline.com/page.cfm?name=ListServer

i know, i’ve used winUSB before
and this device will only be used by one application, it doesn’t have to conform to a standard at all, so if i could get away without writing any drivers, that would be great
i just thought that the kernel-mode calls from a usermode app would take some microseconds/parts of milliseconds, so a winUSB application would have something like double the latency of that one write/read pair than a KMDF driver would have, but that seems to be wrong, and i could get same latency from a winUSB app (i’m fine with high priority as well), and a kernel driver ?

Very probably yes. Normally, when you can queue several URBs this is
sure, in your suboptimal case there can be a small difference. However,
writting kernel driver would be premature optimization. Try with WinUsb
at first, measure where is the latency coming from and then you can
optimize if necessary. You’ll probably find what Tim already decsribed
(1 ms scheduling) and kernel driver won’t help with it any way.

Best regards,

Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@rajko.info
Sent: Thursday, June 24, 2010 3:00 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] KMDF high speed usb minimal latency

i know, i’ve used winUSB before
and this device will only be used by one application, it
doesn’t have to conform to a standard at all, so if i could
get away without writing any drivers, that would be great
i just thought that the kernel-mode calls from a usermode app
would take some microseconds/parts of milliseconds, so a
winUSB application would have something like double the
latency of that one write/read pair than a KMDF driver would
have, but that seems to be wrong, and i could get same
latency from a winUSB app (i’m fine with high priority as
well), and a kernel driver ?


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online
at http://www.osronline.com/page.cfm?name=ListServer

well, when defining a usb endpoint descriptor, if you make it INTERRUPT, you have a field bInterval, that, according to the usb specification has the following unit:
High 125 usec
Full 1 msec
Low 1 msec
so, in order for windows to respect the most frequent bInterval value (0) on high speed INTERRUPT pipes, it would HAVE to communicate with the EHCI controller ATLEAST once per 125usec (1 microframe). besides, during a microframe, as many as 13-14 packets of data could be transferred, by more than one different device (and when using “high bandwidth mode”, like for usb hard drives, it usually transfers something like 1024 bytes per packet * 10 times per microframe) so having the EHCI controller’s interrupt interval at once per microframe (125uSec) might not be a bad idea, and this is probably why linux has it at that value anyway, so why would windows gimp itself and interrupt only every 1ms ?

Well, we can speculate about it, we can wait from an answer from MS devs
here or you can measure it. You can try to use ETW at Win7 for USB
driver, use xperf which could tell the number of interrupts or just
write an experimental app using WinUsb and use USB analyser to see what
happens at the bus. Believe me, you’ll easily distinguish if transfers
are scheduled at the start of 1 ms interval or not.

BTW, I guess there is a difference when transfer is already started
(interrupt polling, data transfers) when all necessary time slots are
used and when you start the new one which is your problem.

Best regards,

Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@rajko.info
Sent: Thursday, June 24, 2010 3:33 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] KMDF high speed usb minimal latency

well, when defining a usb endpoint descriptor, if you make it
INTERRUPT, you have a field bInterval, that, according to the
usb specification has the following unit:
High 125 usec
Full 1 msec
Low 1 msec
so, in order for windows to respect the most frequent
bInterval value (0) on high speed INTERRUPT pipes, it would
HAVE to communicate with the EHCI controller ATLEAST once per
125usec (1 microframe). besides, during a microframe, as many
as 13-14 packets of data could be transferred, by more than
one different device (and when using “high bandwidth mode”,
like for usb hard drives, it usually transfers something like
1024 bytes per packet * 10 times per microframe) so having
the EHCI controller’s interrupt interval at once per
microframe (125uSec) might not be a bad idea, and this is
probably why linux has it at that value anyway, so why would
windows gimp itself and interrupt only every 1ms ?


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online
at http://www.osronline.com/page.cfm?name=ListServer

xxxxx@rajko.info wrote:

well, when defining a usb endpoint descriptor, if you make it INTERRUPT, you have a field bInterval, that, according to the usb specification has the following unit:
High 125 usec
Full 1 msec
Low 1 msec
so, in order for windows to respect the most frequent bInterval value (0) on high speed INTERRUPT pipes, it would HAVE to communicate with the EHCI controller ATLEAST once per 125usec (1 microframe).

No, it doesn’t. USB host controllers work with request chains. The HCD
sets up a (possibly long) list of requests to be handled in the next
frame, then hands the address of that list to the host controller. The
host controller hardware works down that list on its own, doing DMA
transfers in and out of memory, and fires an interrupt at some point to
let the HCD know that stuff is done. That’s all set up IN ADVANCE.
Nothing is inserted on the fly.

For isochronous pipes, Windows REQUIRES that your requests include
enough packets for an even multiple of whole frames.

besides, during a microframe, as many as 13-14 packets of data could be transferred, by more than one different device (and when using “high bandwidth mode”, like for usb hard drives, it usually transfers something like 1024 bytes per packet * 10 times per microframe)…

Most hard drives use bulk pipes, 512 bytes per packets. USB cameras use
isochronous pipes, which can have 1024 byte packets, and up to 3
transfers per microframe.

…so having the EHCI controller’s interrupt interval at once per microframe (125uSec) might not be a bad idea, and this is probably why linux has it at that value anyway, so why would windows gimp itself and interrupt only every 1ms ?

For streaming devices, like audio, video, and mass storage, the drivers
set up a largish buffer and submit that, so that a single request spans
many microframes, and possibly multiple frames. A faster interrupt rate
means higher overhead and less overall system responsiveness, and as I
said earlier in the day, it doesn’t provide any benefit for the vast
majority of devices.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

http://www.osronline.com/article.cfm?article=524, if you’re looking for info
about available usb analyzers.

Good luck,

mm

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Michal Vodicka
Sent: Wednesday, June 23, 2010 9:44 PM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] KMDF high speed usb minimal latency

Well, we can speculate about it, we can wait from an answer from MS devs
here or you can measure it. You can try to use ETW at Win7 for USB driver,
use xperf which could tell the number of interrupts or just write an
experimental app using WinUsb and use USB analyser to see what happens at
the bus. Believe me, you’ll easily distinguish if transfers are scheduled at
the start of 1 ms interval or not.

BTW, I guess there is a difference when transfer is already started
(interrupt polling, data transfers) when all necessary time slots are used
and when you start the new one which is your problem.

Best regards,

Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@rajko.info
Sent: Thursday, June 24, 2010 3:33 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] KMDF high speed usb minimal latency

well, when defining a usb endpoint descriptor, if you make it
INTERRUPT, you have a field bInterval, that, according to the usb
specification has the following unit:
High 125 usec
Full 1 msec
Low 1 msec
so, in order for windows to respect the most frequent bInterval value
(0) on high speed INTERRUPT pipes, it would HAVE to communicate with
the EHCI controller ATLEAST once per 125usec (1 microframe). besides,
during a microframe, as many as 13-14 packets of data could be
transferred, by more than one different device (and when using “high
bandwidth mode”, like for usb hard drives, it usually transfers
something like
1024 bytes per packet * 10 times per microframe) so having the EHCI
controller’s interrupt interval at once per microframe (125uSec) might
not be a bad idea, and this is probably why linux has it at that value
anyway, so why would windows gimp itself and interrupt only every 1ms
?


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer