USB Driver: Problem with performance in ISO transfer

Hello
I write driver for USB device Cypress EZ-USB FX2.
Driver reads data from isochronous pipe on speed about 8 MegaBytes/sec (8184*1000).
It reads data in isochronous stream manner (we submit 8 IRP/URB and than resubmit this URBs in completion routine). Completion routine don’t write received data to other place now (it will write data to cyclic buffer in future). Driver already written and it works fine.

I have problem with performance. On Core 2 Duo computer I see CPU load about 6% and 12% on notebook with Intel ATOM processor. Is it normal situation? I expect that CPU load should be about 1%.

=============================
Some extra information:

;; Endpoint Descriptor
db DSCR_ENDPNT_LEN ;; Descriptor length
db DSCR_ENDPNT ;; Descriptor type
db 82H ;; Enpoint num
db ET_ISO ;; Endpoint type
db 00H ;; Maximun packet size (LSB)
db 0CH ;; Max packect size (MSB) //2 Packets Per Microframe
db 01H ;; Polling interval // Every Microframe

I submit 8 IRP/URB. Every URB contains 128 packets. Packet Size 2048 Bytes (because 2 packets per microframe).

xxxxx@spiritdsp.com wrote:

I write driver for USB device Cypress EZ-USB FX2.
Driver reads data from isochronous pipe on speed about 8 MegaBytes/sec (8184*1000).

You should be getting 16 megabytes per second. Are half of your packets
coming back empty?

Why use isochronous for this? If you are submitting 128 packets per
URB, then clearly latency isn’t an issue, since that’s one URB every
16ms. You might do better with a bulk pipe.

Are you configuring the FX2 for 2 transactions per microframe?

I have problem with performance. On Core 2 Duo computer I see CPU load about 6% and 12% on notebook with Intel ATOM processor. Is it normal situation? I expect that CPU load should be about 1%.

That number seems a bit high, but it is not out of reason. On my 2.4GHz
Core 2 Quad, I can run my high-bandwidth web cam (which does 24 MB/s) in
preview mode with about 9% total CPU load.

The Atom number doesn’t surprise me. An Atom’s performance is about
half of a Celeron at the same speed. It was designed for minimal size
and power consumption, not for maximum performance.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Tim Roberts wrote:

> You should be getting 16 megabytes per second.
> Are half of your packets coming back empty?

Yes, half of my packets coming back empty with status XACT_ERROR.

> Are you configuring the FX2 for 2 transactions per microframe?

Yes. I CAN set 1 transactions per microframe instead of 2 transactions in this task, because USB bandwidth with 1 transaction per microframe 1024*8000 (8 192 000) > 8184*1000 (8 184 000). BUT. In old topic on OSROnline you tell me: [if we look at two nearest microframes then FIRST transaction can occur in start of FIRST microframe and SECOND transaction can occur in the end of SECOND microframe. And internal buffer of USB device cand overflow].

>Why use isochronous for this?

I use ISO because my device is analog digital converter. My previous version of this driver use BULK and I resubmit Bulk URB from completion routine. But I detect that internal buffer of my device owerflow sometimes with Bulk.

> If you are submitting 128
> packets per URB, then clearly latency isn’t an issue,
> since that’s one URB every 16ms. You might do better with a bulk pipe.

Should I submit URB with 512 or 1024 packets? Whether correctly I have understood you?

> That number seems a bit high, but it is not out of reason.
> On my 2.4GHz Core 2 Quad, I can run my high-bandwidth
> web cam (which does 24 MB/s) in preview mode with about 9% total CPU load.

What is preview mode?
How you think, what should be CPU loading at use of such 8 MegaBytes per second? About 3%?

xxxxx@spiritdsp.com wrote:

Tim Roberts wrote:

>> Are you configuring the FX2 for 2 transactions per microframe?
>>

Yes. I CAN set 1 transactions per microframe instead of 2 transactions in this task, because USB bandwidth with 1 transaction per microframe 1024*8000 (8 192 000) > 8184*1000 (8 184 000). BUT. In old topic on OSROnline you tell me: [if we look at two nearest microframes then FIRST transaction can occur in start of FIRST microframe and SECOND transaction can occur in the end of SECOND microframe. And internal buffer of USB device cand overflow].

It is theoretically possible. In real life, however, I doubt that this
is an issue. Most host controllers probably assign you the same spot in
every microframe.

Also, remember that the two transactions are related. If the first
transaction fails, because your device has no data, then the second
transaction will not be issued. You’ll wait until the next microframe.

I use ISO because my device is analog digital converter. My previous version of this driver use BULK and I resubmit Bulk URB from completion routine. But I detect that internal buffer of my device owerflow sometimes with Bulk.

>> If you are submitting 128
>> packets per URB, then clearly latency isn’t an issue,
>> since that’s one URB every 16ms. You might do better with a bulk pipe.
>>

Should I submit URB with 512 or 1024 packets? Whether correctly I have understood you?

You cannot issue more than 255 packets in an URB. What I was saying was
this: Because you have so many packets in each URB, that means you won’t
get results back for a rather long time. With 128 packets, your URB
will not be completed for 16 milliseconds. That tells me that latency
must not be a critical concern for you. In that case a bulk pipe should
be fine, and you can get higher throughput with a bulk pipe.

>> That number seems a bit high, but it is not out of reason.
>> On my 2.4GHz Core 2 Quad, I can run my high-bandwidth
>> web cam (which does 24 MB/s) in preview mode with about 9% total CPU load.
>>

What is preview mode?

It’s a web cam. Preview means I am viewing the images on the screen.

How you think, what should be CPU loading at use of such 8 MegaBytes per second? About 3%?

It’s too hard to generalize. You could have other things going on, or
other devices running. Your processor could be a different clock
speed. Your memory speed could be different. Overall, your number is
within reason.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Tim Roberts wrote:

> Also, remember that the two transactions are related.
> If the first transaction fails, because your device has no data,
> then the second transaction will not be issued. You’ll wait until the next microframe.

I know about it

> It is theoretically possible. In real life, however, I doubt that this is an issue.
> Most host controllers probably assign you the same spot in every microframe.

What you can advise like a professional? Should I look at theoretically possible situation and set 2 transactions per microframe? Or should I set 1 transaction per microframe? Or Any other way?

And I repeat: In early topic you point me to a problem with time slots. Why you change opinion?

> You cannot issue more than 255 packets in an URB.
> What I was saying was this: Because you have so many
> packets in each URB, that means you won’t get results
> back for a rather long time. With 128 packets, your URB
> will not be completed for 16 milliseconds. That tells me
> that latency must not be a critical concern for you. In that
> case a bulk pipe should be fine, and you can get higher
> throughput with a bulk pipe.

You right! Latency not critical for me. But I use ISO pipe (instead of) because:

  1. It is very critical for me not to lose the data (issue of device internal buffer overflow). I want is guaranteed to take away the data from the device every second.
  2. I do not want that some FlashDisk (which can connect to the same host controller) has broken work of my device.

Am I right?

>>> How you think, what should be CPU loading at use of such 8
>>> MegaBytes per second? About 3%?
> It’s too hard to generalize. You could have other things going on,
> or other devices running. Your processor could be a different clock
> speed. Your memory speed could be different.
> Overall, your number is within reason.

What CPU loading will you have on your configuration (Core 2 Quad) if remove copy of received data (video) and remove displaying video on screen?