USB isochronous Super Speed max bandwidth

Hi there,

I have a USB 3.0 device that contiguously transfers data using isochronous pipe (interval=1, imul=2, burst=changeable on the device side).
I implemented a WDF USB kernel mode driver that works with the device. it is a Windows 10 x64 the latest update.

I have simple driver logic when I want to start data transmitting:

  1. the driver allocates 1000 URB requests, each request is 8 microframes * packet size (the packet size depends on the current burst value) - the driver receives 1000 requests a second, each request contains 8 microframes. I think, 1000 URB requests have to be enough for contiguous reading the isochronous pipe.
  2. Each time completition callback is called, I just safe the received data size and send the same request again (I reuse the request and memory according to the MSDN). So far, I don’t work with payload, just work with length and reusing a request.

Everything looks pretty easy, however :smile:, I have some issue:

if burst value is not high, like 1-10 then everything works nice.
if burst value is 12-13 I have about 10 microframes a seconds with zero length. I don’t know why so. The device side says the same like about 10 packets were completed but 0 bytes were really send for them.
if burst value is 14-15 I have about 100-200 microframes a seconds with zero length.

I think, it might be a Windows driver issue, in case the driver does not use enough requests for contiguous transfer. However, if I increase count of URB packets it does not help.
Unfortunately, I don’t see any errors on either windows or devices sides.
The device sends maximum data size according to the Super Speed config, it is a test data.

My questions are:

  1. why do I have those 0-length packets? is that normal situation if the device side always sends full packet size but really the data are not transferred?
  2. I have not found any known issues with the isochronous pipe on Windows, does it really support maximum isochronous Super Speed bandwidth which is about 3Gbps?
  3. Could you share any worked sample that supports contiguous isochronous Super Speed transfer with 3Gbps bandwidth?

I think it is a windows side issue, because it looks like if windows does not read a data in time it must be the same behaviour.

Thanks in advance for any reply

1,000 outstanding requests is just silly. In fact, I would have expected some of them to fail; I thought the USB stack would not allow requests more than 255 microframes in the future. 50 would be better, and 10 would probably be more than enough. Remember, your completions are handled during an interrupt DPC. You don’t have to wait for a scheduler interval.

Those 0 byte packets are normal with an isochronous pipe. Isochronous requests work differently from all other USB request types. It doesn’t just fill the available space sequentially. Instead, the packets are mapped 1:1 to microframes. When you submit a request with 8 packets, that request will span exactly 8 microframes. If your device doesn’t have anything to send during one of the microframes, then that packet will show 0 bytes. The buffer comes back discontiguous, and you have to pack the data yourself.

thanks, Tim! right, I use 1000 request to investigate the issue, however, if I use just 10-50 requests I don’t have appropriate bandwidth, I mean the device side reports about undelivered packets (the length of sent data is 0). So far, I do not have any errors with the requests, and I do not wait inside my completion callback (as you said it is DPC level). The device side has a huge queue with a requests to send data contiguously, because of that, I think, windows must not received 0-length packets at all. Does it possible to transfer 3Gbps data without losing data in case both Windows and device sides work well? I mean, if it is possible to have a stable data transmitting on maximum theoretical Super Speed bandwidth?

What is your USB device? Are you using some standard USB chip? Remember that your device does not get to control the timing of the isochronous packets. When the host asks, either you are ready immediately, or you NAK. And if you NAK one transfer in a burst, you won’t get any more opportunities until the next microframe. An isochronous endpoint really has to be tuned to match the actual data availability from the device.

A maximum isochronous pipe can do 48 x 1024 x 8 bytes per millisecond, which is 393MB/s. I know people have achieved that on bulk pipes, but I haven’t heard a lot of isochronous performance quotes. It’s really hard to come up with a compelling use case for isochronous these days, since bulk performance is so good.

Thank you, Tim!

I agree the bulk performance is pretty nice, however, the bulk does not guarantee a stable bandwidth - because of that I’m trying to get stable isochronous data transmitting, the best case for me is to achieve stable maximum bandwidth for super speed.
Do you think bulk can be used in case I have to have stable 3Gbps each second?

Are these dedicated systems, where you control the set of peripherals, so you can be confident no one is trying to do a back up to a USB mass storage device, or trying to play a video to a USB monitor? If so, then you can be pretty confident of the bulk throughput. The maximum throughput for a USB 3 bulk pipe is about 20% higher than an isochronous pipe, which gives you a little bit of headroom.

However, isochronous should work, too. Here’s a document from Cypress that has a good discussion about isochronous performance on USB 3: https://www.cypress.com/file/125281/download .

@A_K
When you get zero length transfers, what is the status in the individual USBD_ISO_PACKET_DESCRIPTOR?
Is the ErrorCount in the isoch URB zero?
How much data is the device capable of generating each microframe?
What host controller are you using for testing. When it comes to USB 3 Isoch, there can be a lot of variation in performance, as shown in the cypress document that @Tim_Roberts linked above.

There are some subtle performance interactions with very high bandwidth Isoch. Here are a couple of this that may help:

  1. Test first with all power saving features disabled. High Performance power plan, PCIe ACPI disabled, Processor C-States disabled, etc. Most moderns systems try very hard to lower power consumption but don’t seem to account for things like allocated isoch bandwidth. This isn’t a solution for an end product but can help tell you if the problem is you or the system.
  2. When selecting the burst and mult values for the isoch endpoints, don’t ask for more than you need.
  3. If possible, use physically contiguous memory.

I’ve tried to keep this “short” but if someone is interested in the why for #2 and 3, let me know.

Eric

@Eric_Wittmayer,

When you get zero length transfers, what is the status in the individual USBD_ISO_PACKET_DESCRIPTOR?
I thought I did not have an error for 0-length packets, but actually it is a USBD_STATUS_ISO_TD_ERROR (0xc0030000). Also, some of the packets that are partially delivered (not a full package) have error status as well USBD_STATUS_XACT_ERROR (0xC0000011). Those errors mean device-side issues, right?
How much data is the device capable of generating each microframe?
So far it a test device it generates max packet size for each microframe according to the current burst and mult values.
I've tried to keep this "short" but if someone is interested in the why for #2 and 3, let me know.
It is clear to me, thank you a lot.

@Tim_Roberts thanks for the link.

Those errors mean device-side issues, right?

It means your device was not ready to send when the system sent the IN token, or it sent some of burst, but not all. You haven’t said very anything about what chip you’re using or how you’re generating the data, but the timing in USB is very tight. When the system asks for data, you don’t get to loll around and send it when you’re ready. You have to be able to send RIGHT NOW. Otherwise, you miss the time slot and the rest of the microframe. So, for example, if you wait until you get the IN token to generate the data, you can’t possibly generate it fast enough to respond. You have to have the whole packet queued up in advance, so it can be transmitted as soon as the IN token arrives.

You haven't said very anything about what chip you're using or how you're generating the data

I don’t know details about the chip. I expect it should work well in any way if the chip supports USB 3.0, shouldn’t it?

So far, the device pre-allocates 128 packets with random data and sends those packets contiguously.
Right after a packet has been sent, the device queues the packet again without payload change.
The size of one packet is maximum size according to burst and mult parameters.
Because of that, the device side queue always has packets to send.

@Tim_Roberts , thanks for your reply

I don’t know details about the chip. I expect it should work well in any way if the chip supports USB 3.0, shouldn’t it?

Maybe you haven’t done a lot of hardware, but just because a chip supports USB 3.0 does not mean it supports it correctly or efficiently.

If you don’t know details about the chip, how have you configured it to generate data?

@Tim_Roberts, you are right, I have not done a lot of hardware, it is the first one.
I use Linux gadget driver to develop a custom USB device. Everything else is done by Linux kernel and RTOS.

Is it a Raspberry Pi?

it is a custom hardware, it is not in stock. I see the point, I’m waiting for details to verify whether the chip supports required bandwidth.

Solution:
Linux USB gadget works well, it can achieve 3Gpbs for Isochronous and real 4 Gbps for Bulk (USB 3.0)
The problem is in host hardware: either USB xHCI controller or RAM memory frequency or something else.
Isochronous 3Gbps works ideally on powerful PCs.