Revisiting USB Isoch, modifying buffer after submitting write URB

Andrew_V · August 17, 2009, 5:57pm

Hello all,

A few months back I submitted a question about how to submit isoch USB write requests in advance of when the actual data was ready. In other words, I wanted to write to the TransferBuffer after invoking WdfRequestSend. I have had very poor success with this, leading me to believe that there is a double buffering situation here. (In the earlier thread, I asked about a couple of very interesting-sounding, undocumented pipe constants – which Doron H. helpfully suggested not pursuing, since they’re unsupported in Vista.)

And as I’ve learned more about how DMA works, the double-buffering situation actually makes sense to me, because of “bounce buffering”. I’m guessing that the USB driver stack has to use “common buffer” memory in a specific range for the DMA transfer, and since my buffers are unlikely (?) to be in the “common buffer” range, the resulting mapping effectively double-buffers.

Is there any way that I can set the UrbIsochronousTransfer->TransferBuffer or TransferBufferMDL directly to a DMA common buffer so I can modify it closer to the time when the transfer is going on?

Here’s a comparison just for the sake of a sanity check: the Linux USB struct lets you specify a flag URB_NO_TRANSFER_DMA_MAP, that causes it to use the “transfer_dma” URB member instead of “transfer_buffer” – i.e. it lets you directly specify a DMA common buffer so it can skip the step of mapping (and the double buffering that goes with that).

Does anybody know of anything like this on Windows for USB URB’s? I’m doing low latency audio, and this could help performance by a couple of milliseconds (which is certainly a big deal). When I schedule my playback audio USB requests, I typically have to allow from a few to several milliseconds of scheduling delay to ensure that the requests get appropriately handled by the controller. It would be great if I could schedule further in advance, yet write the actual audio samples “just in time” to send.

Regards,
Andrew

Glen_Slick-1 · August 17, 2009, 6:51pm

Is your isoch device attached to a UHCI, OHCI, or EHCI host controller? If you are running a 64-bit system and have memory physically located above 4GB the memory located above 4GB is not directly accessible to UHCI and OHCI host controllers as they are limited to 32-bit address DMA. Some EHCI controller support 64-bit address DMA but that is not supported in Windows Vista, only Windows 7. Any transfer buffers that are not directly accessible by the host controller must be double-buffered by the OS.

Also, the transfer buffer for any single packet on a UHCI host controller cannot span a page break so any single packet transfer buffer which spans a page break must be double-buffered on a UHCI host controller.

Also, if you enable Driver Verifier DMA verification it will double buffer all USB transfer buffers.

Andrew_V · August 17, 2009, 7:26pm

Thanks for the info Glen, that is valuable to know. This is a high speed only device (EHCI). What it sounds like you are saying, is that the OS will automatically detect if the buffer I provide is acceptable for the DMA operation and leave it alone if so.

So is it the case on 32 bit XP (or higher), that:

if I don’t have driver verifier DMA checking turned on, and
if I make sure none of my packets’ portion of the TransferBuffer spans a page boundary

… then double buffering should not occur?

And then perhaps for 64 bits I could use MmAllocatePagesForMdl specifying the LowAddress and HighAddress parameters to ensure I’m in a 32 bit address range?

Tim_Roberts · August 17, 2009, 8:43pm

xxxxx@yahoo.com wrote:

Thanks for the info Glen, that is valuable to know. This is a high speed only device (EHCI). What it sounds like you are saying, is that the OS will automatically detect if the buffer I provide is acceptable for the DMA operation and leave it alone if so.

So is it the case on 32 bit XP (or higher), that:

if I don’t have driver verifier DMA checking turned on, and

if I make sure none of my packets’ portion of the TransferBuffer spans a page boundary

… then double buffering should not occur?

And then perhaps for 64 bits I could use MmAllocatePagesForMdl specifying the LowAddress and HighAddress parameters to ensure I’m in a 32 bit address range?

You are hyper-micro-optimizing here. The cost of an additional copy, if
it does occur, is irrelevant. Making a complete copy of a maximum
bandwidth isochronous stream is well under 1% CPU utilization on today’s
processors.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Andrew_V · August 17, 2009, 9:56pm

Hello Tim,

As you correctly point out, the CPU cost of an extra buffer copy is not very important here. What is important is the fact that I am required to schedule my write request 3-4 milliseconds before the first microframe / packet in the URB is actually written to the device; and that I don’t seem to have opportunity to modify the contents of the buffer after it’s scheduled. In other words, I’m not worried about CPU cycles, I’m trying to minimize latency.

Currently I’m providing a user option in my driver to add an extra millisecond of latency to the write scheduling. I find that pretty lame, but I’m currently not sure how to get around it. I’ve found that some controllers have lots of errors unless I give them that little bit of extra time to handle the request. I wish I could schedule my writes the same way I do the reads – queue up a bunch of them well in advance. So I would like to rely on knowing that I had a DMA common buffer, and be able to write the data “just in time”, 1 or 2 ms before it was accessed by DMA.

My driver can currently achieve around 5 ms of round trip audio latency, using 1 ms requests. I’d love to take that down to about 3 ms (or better yet, convert my currently “wasted” time into more jitter allowance for the user space I/O).

Regards,
Andrew

Andrew_V · August 18, 2009, 2:46pm

I will plan to experiment more on this, and let others know what I find out. Of course it would be great to get more hints from those with access to the OS source. ;^)

Right away I can say that my packet buffers do span page boundaries, so there’s no question they don’t properly comply with DMA requirements. I currently allocate a pretty large, multi-page memory buffer on startup, and then divide it up for requests and then again for packets, ignoring page boundaries.

Regards,
Andrew

Glen_Slick-1 · August 18, 2009, 5:03pm

Packet buffers spanning discontiguous page boundaries only causes packet buffer double buffering on UHCI controllers as UHCI controller TDs (Transfer Descriptors) contain only a single physical address pointer which cannot describe a buffer with discontiguous pages.

If your device is operating at high-speed downstream of an EHCI controller then transfer buffer page boundaries are not an issue. EHCI controller TDs have multiple physical address pointers and fully support transfer buffers which span discontiguous page boundaries. The same is true for OHCI controller TDs.