64bit address send to 32bit address only supported xHCI usb host

Hi, All,

Recently, we run a 3rd party app/win driver package on Win8.1 64bit OS, for loopback test.

We found this case:
Our PCIe device (xHCI USB HOST) claim only 32bit support in its capability registers.

But the driver give a 64 bit physical address(randomly) to the PCIe device, for example
0x0000 0001 C845 3100, where bit32 is not 0.

So our device just consider the address as 0x0000 0000 C845 3100, where 32th bit is 0, and definitely cause the wrong action, loopback check of data integraty failed.

My question is:
According to the xHCI spec, our xHCI host implementation is right as following:
What reason could cause the host driver set a non-32bit physical/bus address for 32-bit only support xhci host processing?
************************************************************
64-bit Addressing Capability
(AC64).This flag documents the addressing range capability
of this implementation. The value of this flag determines whether the xHC has implemented the
high order 32 bits of 64 bit register and data structure pointer fields. Values for this flag have
the following interpretation:
Value Description
0 32-bit address memory pointers implemented
1 64-bit address memory pointers implemented
If 32-bit address memory pointers are implemented, the xHC shall ignore the high order 32 bits
of 64 bit data structure pointer fields, and systemsoftware shall ignore the high order 32 bits of
64 bit xHC registers.

*****************************************************

  1. can this cause by driver uncorrectly set the DEVICE_DESCRIPTION, Dma64BitAddresses & Dma32BitAddresses?

Does this mean that the DMA device support 64 bit or not?
Why DEVICE_DESCRIPTION have only one member, instead of two, such as Dma64BitAddresses, if it is trun, then it support 64, or not if false?

  1. if 32bit only support xhci host or any other 32bit only device, should they could only use the physical address between 0-4GB only

How does this 32bit only device use physicall memory with address larger than 4GB?

  1. check the WDK 7600 HELP

**************************************************
The InterfaceType specifies the bus interface. At present, its value can be one of the following: Internal, Isa, Eisa, or PCIBus. Additional types of buses will be supported in future versions of the operating system. The upper bound on the types of buses supported is always MaximumInterfaceType.

If the ScatterGather member is set to TRUE and the InterfaceType member is set to PCIBus, the Dma32BitAddresses member is ignored and the device is assumed to support 32-bit DMA addresses.
****************************************************

  1. does Win8.1 still support 4 type, what about PCIe?
  2. And does ScatterGather true, and interfaceType PCIBus, in this case, only 32bit supported, why?

Are you using Microsoft xHCI driver? It may not support 32 bit addressing. Because why care? All worthy host vendors implement 64 bits; why don’t you?

workingmailing@163.com wrote:

Recently, we run a 3rd party app/win driver package on Win8.1 64bit OS, for loopback test.

We found this case:
Our PCIe device (xHCI USB HOST) claim only 32bit support in its capability registers.

This is a disastrous design decision on your part. Your device is
screwed. It can never possibly be competitive with other XHCI
products. USB host controllers live and die by scatter/gather DMA. In
your case, every transfer will have to be copied into a bounce buffer
below the 4GB mark. That extra copy will render your device useless.
You certainly cannot keep up with SuperSpeed speeds.

  1. can this cause by driver uncorrectly set the DEVICE_DESCRIPTION, Dma64BitAddresses & Dma32BitAddresses?

Absolutely. If your DMA only supports 32 bits, then you need to set
Dma32BitAddresses. That is exactly what it’s for.

Does this mean that the DMA device support 64 bit or not?
Why DEVICE_DESCRIPTION have only one member, instead of two, such as Dma64BitAddresses, if it is trun, then it support 64, or not if false?

There are three generic classes of DMA support. Some antique devices
support 24-bit addresses, many support 32-bit addresses, many support
64-bit addresses. So, your choices are:

24 bit? Dma32 = FALSE, Dma64 = FALSE
32 bit? Dma32 = TRUE, Dma64 = FALSE
64 bit? Dma32 = TRUE, Dma64 TRUE

  1. if 32bit only support xhci host or any other 32bit only device, should they could only use the physical address between 0-4GB only

How does this 32bit only device use physicall memory with address larger than 4GB?

When you call the DMA_OPERATIONS.MapTransfer or the WdfDmaTransaction
APIs, the kernel will allocate a 64kB “bounce buffer” in the region
below 4GB. The incoming DMA operation is chopped up into 64kB pieces.
The kernel copies your incoming buffer, one chunk at a time, to the
bounce buffer, and then calls your DMA handler repeatedly to complete
the transfer.

  1. check the WDK 7600 HELP

**************************************************
The InterfaceType specifies the bus interface. At present, its value can be one of the following: Internal, Isa, Eisa, or PCIBus. Additional types of buses will be supported in future versions of the operating system. The upper bound on the types of buses supported is always MaximumInterfaceType.

If the ScatterGather member is set to TRUE and the InterfaceType member is set to PCIBus, the Dma32BitAddresses member is ignored and the device is assumed to support 32-bit DMA addresses.
****************************************************

  1. does Win8.1 still support 4 type, what about PCIe?

It is operationally identical to PCI.

  1. And does ScatterGather true, and interfaceType PCIBus, in this case, only 32bit supported, why?

You misunderstand. Dma32BitAddresses does not mean you support ONLY
32-bit addressing. It means you support AT LEAST 32-bit addressing.
You can still set Dma64BitAddresses in this case.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

I wonder what was rationale for designing 32 bit device. It doesn’t save a lot in the die size.

“640KB should be enough for everyone”?

UHCI/OHCI were designed when 4 GB meant a lot. And they only supported 32 bit addresses. But these days, about every computer is sold with 4GB and more.

On 27-Aug-2014 22:59, xxxxx@broadcom.com wrote:

I wonder what was rationale for designing 32 bit device. It doesn’t save a lot in the die size.

My dad once told me a story from those old good times in the USSR -
Engineers propose to make some cast metal part thinner, and save, say,
15% of the steel. Management is happy, the factory gets bonus.
A quarter later, the marketing proposes to strengthen this part, and 15%
of steel comes back. Management is happy again, bonuses paid again.
So these folks could inherit this trick: save on silicon cost, shave
bits off. Then, add performance, and put the silicon back :wink:

– pa

>Then, add performance, and put the silicon back

And spend millions for the new set of masks.

First, thank you very much, Tim, you give very detail response, and it is help.

Hi, Grig,

Thank you for reply.

As you known, the driver is not Microsoft, but USB-IF xHCI CV package app/driver.

To be accurate, would you pls give the reason/reasons for Microsoft xHCI driver not support 32bit addressing?

Your conclusion, is based on what OS, 32bit or 64 bit, or both, Win7, Vista, 8, or 8.1 or all?

Hi, Pa, and Grig,

The 64bit is next step in silicon implementation.

Anyway, result of implement 64bit or not, will feedback by market.
But the 32bit support only implementation is allowed by xHCI spec.

Hi, Tim,

What’s the software action when both 32 and 64 is set to true?

Regards,
Wesley

>As you known, the driver is not Microsoft, but USB-IF xHCI CV package app/driver.

I think it still runs on MS xHCI host driver.

would you pls give the reason/reasons for Microsoft xHCI driver not support 32bit addressing?

Is there a generally available host chip to test 32 bit mode?

>Our PCIe device (xHCI USB HOST) claim only 32bit support in its capability registers.

But the driver give a 64 bit physical address(randomly) to the PCIe device …

So you have a software issue and not a hardware issue.

What do you mean by randomly ?

Why doesn’t the driver fail when the 64 bits physical address is assigned ?

How does this driver can claim your device VID/PID if it fails to “drive” your device ?

Have you considered writing your own driver ?

I hope you have just loaded the wrong driver.

Hi, Grig,

I check the driver of USB-IF xHCI cv.

From the stack, it is layer above pci.sys, and it is a single driver named with xHCIdrv.sys.
So it have nothing to do with MS xHCI USB DRIVER.

I just confirm with you words:“It may not support 32 bit addressing”.
There’s no may or may not in this case, it should be yes or no.
And based on my experience, the 32 bit only xHCI host running correctly with MS XHCI DIRVER of 64bit version win8.1, with RAM equal to 8GB.

Hi, Tim,

I need to discuss with you a specific case.
In our environment, 8GB ram and 64bit Win8.1 version OS, with 32 bit only support xhci host.

xhci host do the operation according to the TRB(transfer request block), which include the
higher 32 bit phy address
lower 32 bit phy address
and buffer length.

when the xhci driver got a phy address from app allocated buffer, it will create a corresponding TRB for xhci host to process.

So if in this env, the trb should with the phy address of the bounce buffer, instead of the original phy address of the buffer allocated from app, right?

And in this case, the xhci host scatter gather capability is non useful, because it could only DMA with max 64KB once again and again, right?

Some update:

DmaAddressWidth is used instead of Dma32BitAddresses/Dma64BitAddresses if you claim your version is 3 in DEVICE_DESCRIPTION.

workingmailing@163.com wrote:

What’s the software action when both 32 and 64 is set to true?

If 64-bit support is set true, then the 32-bit setting is ignored. As I
said, when 32 is set true, it means “I support at least 32-bit
addressing”. When 64 is set true, it means “I support at least 64-bit
addressing”. So, if 64-bit is true, then you obviously support 32-bit
addressing, and the system doesn’t even check for it.

The decision process is like this:

if( 64-bit support is true )
// This driver has no physical address limitations
else if( 32-bit support is true )
// This driver is limited to the low 4GB
else
// This driver is limited to the low 16MB

As soon as it finds one set, it stops looking.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

workingmailing@163.com wrote:

As you known, the driver is not Microsoft, but USB-IF xHCI CV package app/driver.

To be accurate, would you pls give the reason/reasons for Microsoft xHCI driver not support 32bit addressing?

It’s because the impact on performance is so dramatic. Essentially
every byte you transfre has to be copied to and from the low memory region.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

workingmailing@163.com wrote:

I check the driver of USB-IF xHCI cv.

From the stack, it is layer above pci.sys, and it is a single driver named with xHCIdrv.sys.
So it have nothing to do with MS xHCI USB DRIVER.

I just confirm with you words:“It may not support 32 bit addressing”.
There’s no may or may not in this case, it should be yes or no.

You would have to check with the USB Implementor’s Forum to be sure.
They wrote the driver. If you are creating an xHCI controller, you
should probably be active on their discussion forums already.

I need to discuss with you a specific case.
In our environment, 8GB ram and 64bit Win8.1 version OS, with 32 bit only support xhci host.

xhci host do the operation according to the TRB(transfer request block), which include the
higher 32 bit phy address
lower 32 bit phy address
and buffer length.

when the xhci driver got a phy address from app allocated buffer, it will create a corresponding TRB for xhci host to process.

So if in this env, the trb should with the phy address of the bounce buffer, instead of the original phy address of the buffer allocated from app, right?

Correct.

And in this case, the xhci host scatter gather capability is non useful, because it could only DMA with max 64KB once again and again, right?

What you’re seeing here is that the EHCI/XHCI DMA model does not mate
very well with the Windows DMA abstraction. The Windows DMA abstraction
expects to handle one transaction at a time. In the EHCI/XHCI model,
you create a long chain of DMA requests, corresponding to a large number
of URBs, each of which has its own buffer. The hardware then traverses
this list on its own.

To do what you need, the driver would have to manage the bounce buffer
on its own. It would allocate its own common buffer in low memory, do
the copies in and out, and make your TRBs point to the common buffer.
The common buffer would only have to be large enough to handle your
schedule. If you only submit one microframe at a time, that would be
something like 48kB. If you submit one frame at a time, it’s more like
400kB.

I wouldn’t be surprised to learn that the USB-IF’s validation driver
does not do this.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

>>>It’s because the impact on performance is so dramatic. Essentially
every byte you transfre has to be copied to and from the low memory region.

<<_Cause we have not come across any issues when we plug our 32bit only xHC to the system which is Win8.1 64bit os, with more than 4GB ram.

This issue happened only with USB-IF xHCI DRIVER.

>>In the EHCI/XHCI model,you create a long chain of DMA requests, corresponding to a large number of URBs, each of which has its own buffer. The hardware then traverses this list on its own.

<<URB have it corresponding buffer(in virtual space, it is a continuous), may be consist of by lots of discontinuous physical memory.
In this case, a URB is map to several TRBs, and each TRB map to one DMA operation(get rid of AXI limitation, cause AXI also have limitation on max length).

And if the DMA controller scatter/gather supported, then the TRBs will processed in the traverse of the scatter/gather element lists.

And if it s/g support, but bounce buffer is room limited, such as 64K, so the s/g support is useless.

>>>bounce buffer
<<
>>>DmaAddressWidth
This member is used only if Version = DEVICE_DESCRIPTION_VERSION3.
For a bus-master DMA device, DmaAddressWidth specifies the width, in bits, of a DMA address. The DmaAddressWidth value must be nonzero and must not exceed 64. If the memory address width is greater than the DMA address width, map registers are required to access a region of memory that is beyond the address reach of the DMA controller.

<<MDL, PT is used for map between phy and vir
and map register is for map phy and bus add/logic add (different name for address of periperial view)
But map register is always a mysterious thing out of driver develop control.
From the upper description and this specific case(32bit only xHCI, on 8GB ram system), and compare with bounce buffer, I am cofused by:
if bounce buffer is exists, does map registers needed?
And what’s the usage for map register in this case?_

> if bounce buffer is exists, does map registers needed?

And what’s the usage for map register in this case?

Map register is conceptually an object which covers the single page (but you can ask for several map registers covering adjacent pages) and has the methods of PreDma, DmaDone and the BusSideAddress property.

They are operated as array covering adjacent pages, with MDL provided for PreDma and DmaDone primitives. Internally, MDL is conceptually disassembled to pages and a single physical page is used for PreDma/DmaDone for each single map register.

They have several implementations, one of them is bounce buffers, for which the map register is a page of memory below the mark (4GB, 16MB etc), and PreDma/DmaDone are memcpy(), BusSideAddress being the physical address of this bounce page.

Also, hardware IOMMU entries are treated as map registers, with PreDma/DmaDone being “arm/disarm the hardware IOMMU entry for this physical page” operations.

The operation is: you feed the MDL to map register array (which was previously allocated, of a proper size), and then use BusSideAddress of this array as a data pointer to be fed to the hardware.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

workingmailing@163.com wrote:

<<> URB have it corresponding buffer(in virtual space, it is a continuous), may be consist of by lots of discontinuous physical memory.
> In this case, a URB is map to several TRBs, and each TRB map to one DMA operation(get rid of AXI limitation, cause AXI also have limitation on max length).
>
> And if the DMA controller scatter/gather supported, then the TRBs will processed in the traverse of the scatter/gather element lists.
>
> And if it s/g support, but bounce buffer is room limited, such as 64K, so the s/g support is useless.

Not necessarily. A USB frame can consist of many different requests.
Each request has its own buffer, often rather short. Scatter/gather
lets you have 25 requests of 64 bytes each.

>>>> bounce buffer
> <<
For a USB host controller, I think this would need to be the case. For
a typical DMA consumer, the kernel does the work.

> <<> MDL, PT is used for map between phy and vir
> and map register is for map phy and bus add/logic add (different name for address of periperial view)
> But map register is always a mysterious thing out of driver develop control.

Yes, it’s an abstraction.

> From the upper description and this specific case(32bit only xHCI, on 8GB ram system), and compare with bounce buffer, I am cofused by:
> if bounce buffer is exists, does map registers needed?
> And what’s the usage for map register in this case?

Two names for the same thing.

There are some high-end I/O architectures where this address remapping
can be handled by the bus. (Consider the AGP GART, for instance.) In
that case, a map register might actually be a hardware register in the
bus controller.

But in the vast majority of cases, “map register” simply means “bounce
buffer”.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

> can be handled by the bus. (Consider the AGP GART, for instance.)

The generic name for things like AGP GART is “IOMMU”


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

thank you Maxim, and Tim.

It is helpful to get the map register more clearer.

>>Not necessarily. A USB frame can consist of many different requests.
Each request has its own buffer, often rather short. Scatter/gather
lets you have 25 requests of 64 bytes each.

<<
In both non-periodic and periodic case, a usb request is corresponding to a URB.
Here I address “especially” for non-periodic, cause for periodic transfer, namely int and iso, they are need to be scheduled in a specific service interval, otherwise, the “miss service” transfer event will pop up, then no more difference to bulk/control.