USB Controller Question

The project I posted about before is moving along. Now that the
parallel and ISA drivers are working, the next step is to develop a new
USB interface to replace the old ones.

As a first step, my customer got an ISA to USB interface hardware as
well as another USB 1.1 compatible Full Speed interface. When tested,
these interfaces were glacially slow compared to the ISA or parallel
hardware.

The application’s architecture sends out a single byte at a time and
reads in data one byte per bus transaction. It’s not efficient, but
that’s how the system was originally architected. The external chassis
is little more than an extension of the ISA bus and the interface
hardware just transmits ISA signals to the external chassis.

I did some research and I can’t find it now, but apparently the bus
controllers for Full Speed have a per transaction delay. As I
understand it, UHCI hardware has a delay of 4 ms per bus transaction and
OHCI a 2 ms delay. I was trying to find more information today and all
I can find is some information saying the scheduler runs on a 1 ms
schedule. Nothing about the 4 ms or 2 ms timing. I don’t even know
what the 2 ms or 4 ms timing is called.

Running the initial test hardware and looking at it with a USB analyzer,
the timing is almost exactly 4 ms per transaction. The motherboard of
the test system is an Intel chipset, so it’s obviously a UHCI controller.

I’ve switched gears to the Cypress FX2 High Speed microcontroller,
which, of course, will be running with the EHCI controller. Will the
transaction timing be the same as with the UHCI, or will this improve?

I have several different firmware design options I can do, but the
transaction timing is a key to my decision. If I can get the throughput
timing down to about 1/10 the timing with the USB 1.1 controller, the
delays will be imperceptible to the user and it will be about as fast as
the current system with ISA. (The total data throughput on the system
is fairly low.)

I have figured out how to reduce the total number of transactions on the
bus by combining some data patterns into a single transaction, but
overall, I’m only going to be able to reduce the total number of
transactions by about 1/2 by doing this.

The easiest way is to do all the transactions as vendor functions on
EP0. I only need to create about 6 different functions which are all
similar. If the per transaction speed with High Speed is good enough,
then this is all I have to do.

If the per transaction speed is too slow, I’m going to have to do
something more complex. I thought the best way around this is with an
Isochronous endpoint that has a software FIFO in the Windows driver for
OUT transactions. The driver would spool up transactions as the
software sends them and then sends them out on the Isochronous schedule.

This has its disadvantages though. Isochronous has no protection
against data loss and it makes everything more complex.

The best solution would probably be to move much of application code
that accesses the hardware down to the firmware and just return the
results to the PC. The application does a lot more outs than ins.
Typically it sends a series of out bytes to set something up, then it
might read a status byte, or read a value from an A to D converter.

Doing this would likely break the schedule though. Rewriting all of
these functions would be a major task.

If EP0 method works OK, but is a bit marginal, I can move one
application function to the firmware that should be fairly easy and does
make sense down there. At start up the application code now does a
search for installed hardware in the external chassis. There is a
possible for 256 different hardware addresses out in the chassis and it
tests them all by setting a card address, then testing the status byte
via an in call.

Moving this to the firmware and returning a block of 256 bytes via a
bulk transfer on call from the PC application would probably speed up
initialization dramatically. The FX2 would probably do the entire
hardware scan and have the results between the time that the technician
turned on the external chassis and started the application.

Anyway, what is critical for me to know is the transaction timing (I’m
probably using the wrong term for it) on the EHCI controller. If I’m
out to lunch on any of my other thoughts, please let me know. I’ve
worked with USB before, but due to the vagaries of R&D, the previous
projects ended up getting canceled before I got this far into them and
the last one was 10 years ago.

Bill

Bill Olson wrote:

…my customer got an ISA to USB interface hardware as
well as another USB 1.1 compatible Full Speed interface. When tested,
these interfaces were glacially slow compared to the ISA or parallel
hardware.

The application’s architecture sends out a single byte at a time and
reads in data one byte per bus transaction. It’s not efficient, but
that’s how the system was originally architected.

Such a design is doomed on USB.

I did some research and I can’t find it now, but apparently the bus
controllers for Full Speed have a per transaction delay. As I
understand it, UHCI hardware has a delay of 4 ms per bus transaction and
OHCI a 2 ms delay. I was trying to find more information today and all
I can find is some information saying the scheduler runs on a 1 ms
schedule. Nothing about the 4 ms or 2 ms timing. I don’t even know
what the 2 ms or 4 ms timing is called.

There is no such thing.

USB is a scheduled bus. The bus activity is divided up into “frames”,
each of which (for low-speed and full-speed) is 1 ms. The entire frame
is scheduled in advance, then loaded into the host controller to be
executed. If you do not have a transaction queued up in the host
controller driver waiting to be processed at the time the HCD is ready
to schedule the frame, you will miss that entire frame. The schedule is
not modified once submitted. The host controller then issues an
interrupt when the frame is complete.

So, if you have a driver design that sends a request, waits for it to be
completed, then does some processing and resubmits the request, the
performance will suck. By the time you get the first completion notice,
you have missed the next frame. That’s not a fault of the bus, that’s a
fault of the client driver, trying to misuse USB as something it is not.

You can submit read requests in advance. They will simply retry until
the device has data available. That would improve your latency.

I’ve switched gears to the Cypress FX2 High Speed microcontroller,
which, of course, will be running with the EHCI controller. Will the
transaction timing be the same as with the UHCI, or will this improve?

Probably. The bus is still all scheduled in advance, although now the
timing is done in units of microframes (125us) instead of frames (1ms).

The easiest way is to do all the transactions as vendor functions on
EP0. I only need to create about 6 different functions which are all
similar. If the per transaction speed with High Speed is good enough,
then this is all I have to do.

Control endpoint transactions are easier to handle in FX2 firmware, but
they are the slowest of all USB transactions. You should make an effort
to use bulk instead. Have you read the USB spec, or one of the books
like “USB Complete”? I wouldn’t want to consider building a USB device
without having a good background in how USB operates.

If the per transaction speed is too slow, I’m going to have to do
something more complex. I thought the best way around this is with an
Isochronous endpoint that has a software FIFO in the Windows driver for
OUT transactions. The driver would spool up transactions as the
software sends them and then sends them out on the Isochronous schedule.

From a software standpoint, all of the (non-control) endpoints are
practically identical They can all be queued up in advance. The only
difference is in how they are scheduled.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

> Bill Olson wrote:

> …my customer got an ISA to USB interface hardware as
> well as another USB 1.1 compatible Full Speed interface. When tested,
> these interfaces were glacially slow compared to the ISA or parallel
> hardware.
>
> The application’s architecture sends out a single byte at a time and
> reads in data one byte per bus transaction. It’s not efficient, but
> that’s how the system was originally architected.

On 01/09/11 2:46 PM, Tim Roberts wrote:

Such a design is doomed on USB.

The data rate is low enough that I don’t think it’s doomed, but it does
present some engineering challenges.

> I did some research and I can’t find it now, but apparently the bus
> controllers for Full Speed have a per transaction delay. As I
> understand it, UHCI hardware has a delay of 4 ms per bus transaction and
> OHCI a 2 ms delay. I was trying to find more information today and all
> I can find is some information saying the scheduler runs on a 1 ms
> schedule. Nothing about the 4 ms or 2 ms timing. I don’t even know
> what the 2 ms or 4 ms timing is called.

There is no such thing.

I tracked down where I read it. It was with the owner of DeVaSys who
make a Full Speed USB test board. He called it the “per transfer
latency” and said it would be 2ms or 4ms depending on whether the
computer had an OHCI or UHCI controller. I saw exactly 4 ms timing with
the USB analyzer and his board. Since it was so precise, I figured it
was a Windows timing, but now I think his firmware or driver may have
had something that caused the longer delays. I find it odd that he
would have such a delay in his code.

USB is a scheduled bus. The bus activity is divided up into “frames”,
each of which (for low-speed and full-speed) is 1 ms. The entire frame
is scheduled in advance, then loaded into the host controller to be
executed. If you do not have a transaction queued up in the host
controller driver waiting to be processed at the time the HCD is ready
to schedule the frame, you will miss that entire frame. The schedule is
not modified once submitted. The host controller then issues an
interrupt when the frame is complete.

I know about the schedules and frames. I found a number of articles
that talked about the 1 ms frame rate with LS and FS buses.

So, if you have a driver design that sends a request, waits for it to be
completed, then does some processing and resubmits the request, the
performance will suck. By the time you get the first completion notice,
you have missed the next frame. That’s not a fault of the bus, that’s a
fault of the client driver, trying to misuse USB as something it is not.

I am combining calls where I can so for example what is done now as a
write to the driver, another write, and then a read status from the
driver will be done as one EP0 vendor call and the firmware will do the
two writes and read the result.

Short of rearchitecting the entire application from scratch, that’s
about the best I can do.

You can submit read requests in advance. They will simply retry until
the device has data available. That would improve your latency.

I will submit them with the writes when I can.

> I’ve switched gears to the Cypress FX2 High Speed microcontroller,
> which, of course, will be running with the EHCI controller. Will the
> transaction timing be the same as with the UHCI, or will this improve?

Probably. The bus is still all scheduled in advance, although now the
timing is done in units of microframes (125us) instead of frames (1ms).

With the DeVaSys board, I had a 4 ms latency between each transaction.
If that latency with a HS bus is only 125us, that’s a 16X improvement
right off the top. I’m combining calls where I can, which will only be
about a 2X improvement on average, but that does raise the improvement
to around 32X.

With the DeVaSys board I was seeing it take about 10X more time for the
application to come up. If I only see an improvement of 20X, the USB
driver change will be invisible to the user for all practical purposes.

> The easiest way is to do all the transactions as vendor functions on
> EP0. I only need to create about 6 different functions which are all
> similar. If the per transaction speed with High Speed is good enough,
> then this is all I have to do.

Control endpoint transactions are easier to handle in FX2 firmware, but
they are the slowest of all USB transactions. You should make an effort
to use bulk instead. Have you read the USB spec, or one of the books
like “USB Complete”? I wouldn’t want to consider building a USB device
without having a good background in how USB operates.

Yes, I have read the spec, a few of times in fact. I read it back when
I first worked with USB in 1998, refreshed my memory and learned about
USB 2 in 2001 when I did another USB project, and finally reread it for
this project.

I probably will implement a Bulk endpoint for one start up function
which will need to transfer 255 bytes to the host, but for most
transactions, I’d be sending 1-3 bytes across the Bulk endpoint. I
figured it was easier to implement and would not impact the system any
to do those transfers on the Control endpoint.

> If the per transaction speed is too slow, I’m going to have to do
> something more complex. I thought the best way around this is with an
> Isochronous endpoint that has a software FIFO in the Windows driver for
> OUT transactions. The driver would spool up transactions as the
> software sends them and then sends them out on the Isochronous schedule.

From a software standpoint, all of the (non-control) endpoints are
practically identical They can all be queued up in advance. The only
difference is in how they are scheduled.

I was only going to implement an Isochronous endpoint if it bought me a
better frame rate than I could achieve with EP0 and it was necessary.
No need building a super highway for an application that is still riding
a bicycle.

Thanks for your information,
Bill

> Moving this to the firmware and returning a block of 256 bytes via a

bulk transfer on call from the PC application would probably speed up
initialization dramatically.

I would say - this is the way to go.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com