USBXHCI In-flight Bulk URBs limit

Dear OSR community,
I’m trying to understand how the Cypress USB3 Windows device driver works. I’m still a beginner in regards to Windows device drivers inner working, so feel free to address me to any learning resource that can help.
My application enqueues multiple URB request to a Bulk IN endpoint to keep the host driver always busy with transfers.
On some systems, high spikes in DPC latencies cause the WDF framework or USB host driver to stop processing completed URB requests for a while (sometimes even for 50-70ms), however, if a sufficient number of URBs are already submitted to the host controller, it should continue to work at least for some time. This, however, does not seem to happen: it seems that the USBXHCI host controller driver cannot have more than 5 in-flight URB request per bulk endpoint. I can see this using WinDbg using the command !usb3kd.xhci_deviceslots:

[7] : dt USBXHCI!_ENDPOINT_DATA 0xffff8308439baa60 dt USBXHCI!_ENDPOINT_CONTEXT32 0xffff83083e3250e0 ES_RUNNING
    ------------------------------------------------------------------------------------------
        EndpointType_BulkIn Address: 0x83 PacketSize: 1024 Interval: 0
        !ucx_endpoint 0xffff83083519ec80 !rcdrlogdump USBXHCI -a 0xffff83083e5a18a0
        !xhci_esm 0xffff8308439baa60 

        dt USBXHCI!_TR_DATA 0xffff83083f760e10 Mapping State: MS_Paused
        [0] : dt USBXHCI!_BULK_DATA 0xffff83083f760e10
        ------------------------------------------------------------------------------
            WdfQueue: !wdfqueue 0x7cf7c089f5c8 (27 waiting)

            PendingTransferList:
            --------------------
            [0] dt USBXHCI!_BULK_TRANSFER_DATA 0xffff83083d3adcc0 !urb 0xffff830838004870 !wdfrequest 0x7cf7c2c524d8 
                [0] dt USBXHCI!_BULK_STAGE_DATA 0xffff83083d3add50 !xhci_transfertrbs 0xffff83083d3addb0 
            [1] dt USBXHCI!_BULK_TRANSFER_DATA 0xffff830844ccccd0 !urb 0xffff8308380049c0 !wdfrequest 0x7cf7bb3334c8 
                [0] dt USBXHCI!_BULK_STAGE_DATA 0xffff830844cccd60 !xhci_transfertrbs 0xffff830844cccdc0 
            [2] dt USBXHCI!_BULK_TRANSFER_DATA 0xffff83083e710a50 !urb 0xffff8308380027a0 !wdfrequest 0x7cf7c18ef748 
                [0] dt USBXHCI!_BULK_STAGE_DATA 0xffff83083e710ae0 !xhci_transfertrbs 0xffff83083e710b40 
            [3] dt USBXHCI!_BULK_TRANSFER_DATA 0xffff830841fe5ce0 !urb 0xffff830838002b90 !wdfrequest 0x7cf7be01a4b8 
                [0] dt USBXHCI!_BULK_STAGE_DATA 0xffff830841fe5d70 !xhci_transfertrbs 0xffff830841fe5dd0 
            [4] dt USBXHCI!_BULK_TRANSFER_DATA 0xffff83083fd621c0 !urb 0xffff830838002ce0 !wdfrequest 0x7cf7c029dfd8 
                [0] dt USBXHCI!_BULK_STAGE_DATA 0xffff83083fd62250 !xhci_transfertrbs 0xffff83083fd622b0

Out of 32 URB entries of the WDF queue, 5 are pending and 27 are waiting. Although I don’t exactly understand the meaning of that, it seems that this limit is controlled by a variable called MaxPendingStages which can be found in the USBXHCI!_BULK_DATA data structure associated to the endpoint.

dt USBXHCI!_BULK_DATA 0xffff83083f760e10
[...]
+0x154 AttemptMapping   : 0n0
+0x158 MaxPendingStages : 5
+0x15c PendingStageCount : 5
[...]

I tried to manually change this value using the debugger and it actually modifies the number of pending requests, and the data transmission still works.

I found on this forum an old thread where a similar limit is mentioned, but I don’t know if these statements are still valid today.

Is there a reason why this limit is set to 5? If possible, can it be changed using the WDF api or the system configuration?
It’s really hard to get any information about this variable and the inner workings of the USBXHCI driver. Do you know how where I can get more info about this?

Thank you for your help,
Kind regards

Well, the guy with the answers about EHCI (Glen Slick) was one of the USB devs at MSFT (and a regular contributor at USBIF). So he was in a position to know. And it seems likely that the XHCI driver has been optimized according to the same parameters as the EHCI driver was/is.

But that’s all a guess. And I miss Glen here on the forum.

How large are your URBs? USB is all scheduled in advance, and the XHCI driver only needs to pull from its queue enough requests to fill the currently running frame, and the frame it is scheduling next. If 5 of your transfers span 2 frames, then the XHCI driver can’t use any more than that.

Thank you for your kind answers.

@“Peter_Viscarola_(OSR)” said:
Well, the guy with the answers about EHCI (Glen Slick) was one of the USB devs at MSFT (and a regular contributor at USBIF). So he was in a position to know. And it seems likely that the XHCI driver has been optimized according to the same parameters as the EHCI driver was/is.

But that’s all a guess. And I miss Glen here on the forum.

That explains a lot, does he answer here no more?

@Tim_Roberts said:
How large are your URBs? USB is all scheduled in advance, and the XHCI driver only needs to pull from its queue enough requests to fill the currently running frame, and the frame it is scheduling next. If 5 of your transfers span 2 frames, then the XHCI driver can’t use any more than that.

My URB buffers are 48*8KiB large and I’m working with a 48KiB transfer size per frame, so a single URB should cover more than 2 frame transfers. I’ve already tried to change this size and the number of max pending stages does not change, so it seems some sort of hardcoded limit

That explains a lot, does he answer here no more?

Mr. Slick has gone on to bigger and better things at MSFT.

The key point is that this is a total non-issue. The host controller drivers don’t schedule more than one frame ahead, so they CANNOT USE any requests that are more than 2ms out. Those requests will sit in the queue until they are needed to schedule a frame, and that’s exactly where they ought to be. You are trying to make a problem out of something that is simply not a problem.