Does WdfCommonBufferCreate() use IOMMU to allocate memory not physically contiguous?

Hi:

My device doesn't support scatter-gather list DMA. I allocate 2GB memory by WdfCommonBufferCreate() and pass its logical address to my device. The DMA works fine.

After the system(128GB DDR, IOMMU enabled) ran a long time, I fail to allocate the 2GB memory while the overall memory usage is not high. I think it is because the memory is highly fragmented if WdfCommonBufferCreate() require physically contiguous memory.

But after reading some posts here, it seems WdfCommonBufferCreate() is able to allocate small pieces of memory, make the scatter-gather list and set up the map-register, etc.. Then it should succeed to allocate 2GB despite the memory is fragmented.

I would appreciate if someone share knowledge with me.

Thanks
Kevin

Well it might use an IOMMU, but that is unlikely, unless they've expanded usage from GPUs to other pci* devices.

In general the os meets the guarantee of a contiguous logical address space by allocating contiguous physical memory. 2GB is a huge common buffer.

It seems not using IOMMU.
A close look into the memory created by WdfCommonBufferCreate():

  1. There is only 1 MDL. The size is 2GB.
  2. I get page numbers by MmGetMdlPfnArray(). They are consecutive.
  3. The logical address(WdfCommonBufferGetAlignedLogicalAddress) and the physical address(MmGetPhysicalAddress) are identical.

I try allocate 2GB chunk as many as possible by WdfCommonBufferCreate() and MmAllocateContiguousMemory(). The maximum chunks they can allocate are the same.

@Peter_Viscarola_OSR said "In some version of Windows, the WDM function AllocateCommonBuffer simply calls MmAllocateContiguousMemory."

Now, my question:
how to allocate non-physically-contiguous memory for non-scatter-gather DMA?

In Linux, I am able to allocated a bunch of small memory and create the SG list, then pass a single address to my device.

Kevin

Here is my way to create the DMA enabler:
WdfDeviceSetAlignmentRequirement(device, FILE_32_BYTE_ALIGNMENT); WDF_DMA_ENABLER_CONFIG dmaConfig;
WDF_DMA_ENABLER_CONFIG_INIT(&dmaConfig, WdfDmaProfileScatterGather64Duplex, XDMA_MAX_TRANSFER_SIZE);
status = WdfDmaEnablerCreate(device, &dmaConfig, WDF_NO_OBJECT_ATTRIBUTES, &dmaEnabler);

There is no such thing. If you don't have scatter/gather, then you must have physically contiguous memory. There is no magic here.

The usual way to handle this is to do the common buffer allocation at initial boot time, when memory is not fragmented, and then keep it forever. Are you not able to to that?

I can allocate the memory at boot time, but I am just wondering why I can't do things equivalent in Linux:
In Linux, I am able to allocated a bunch of small memory and create the SG list, then pass a single address to my device.

Kevin

If you are passing the address of a scatter/gather list to your hardware, then your hardware must do scatter/gather DMA. As I said, there's no magic here.

My hardware doesn't support scatter/gather DMA. It only takes contiguous address.
The physical address of the allocated memory is not contiguous, but so I pass the logical address, which is contiguous with IOMMU translating the scatter/gather list into a consecutive space.

That is why MS remind us:
Do not use this routine to obtain physical addresses for use with DMA operations.

It seems Windows is not using IOMMU at all.

Is that Linux on an SOC? Most Windows hardware doesn't have an IOMMU.

It is a normal PC, not SOC.
I developed device driver on Linux and Windows for the same device.
The PC has IOMMU and I can enable/disable it in BIOS. I must enable it or my Linux driver won't work.
But my Windows driver works in all cases. So I think WdfCommonBufferCreate() doesn't use IOMMU. It always allocates contiguous physical memory.

Now I am trying this way:
mdl=MmAllocatePagesForMdlEx();
AdapterObject = IoGetDmaAdapter();
va = MmGetMdlVirtualAddress(mdl);
AdapterObject->DmaOperations->GetScatterGatherList();

GetScatterGatherList() returns a scatter-gather list with dis-contiguous physical memory. However, it only works for allocation of 2MB max.

So I think WdfCommonBufferCreate() doesn't use IOMMU. It always allocates contiguous physical memory.

Yes. There is no need to "think", this is a fact and always has been.

GetScatterGatherList() returns a scatter-gather list with dis-contiguous physical memory. However, it only works for allocation of 2MB max.

No, it works for allocations up to 4GB. I suspect what you mean is that if you try this with more than 2MB, you get multiple entries, since there are multiple pages.

I get multiple entries with 2MB. But GetScatterGatherList() returns STATUS_INSUFFICIENT_RESOURCES with more than 2MB.

I tried to call GetScatterGatherList() with a memory I allocated by ExAllocatePoolWithTag(). It is the same limitation of 2MB. I think there may be something wrong with my configuration, as Tim said it should be up to 4GB.

Tim, can you share the source code how you call GetScatterGatherList()?

Well, If your DEVICE_DESCRIPTION has ScatterGather set to FALSE, then it can't create a scatter/gather list with more than one page. Windows uses 4KB and 2MB pages.

The page size is 4K in my case.
When ScatterGather = FALSE, GetScatterGatherList() succeeds up to 256KB.
When ScatterGather = TRUE, GetScatterGatherList() succeeds up to 2MB.

I think you are somehow expecting the system to us an IOMMU to help you DMA data to your device that does not support scatter / gather. If that's what you are expecting, it isn't going to work no matter what functions you call because that's not how DMA is handled in Windows.

It is the responsibility of your driver to get data to & from your device according to its capabilities. Only your driver knows what those capabilities are, but most common designs have pre-built helpers. A common buffer is probably the most used design for devices that don't support scatter gather, but you have to copy in each block of data to that common buffer one block at a time. You can't rely on the system to do that for you.

Finally, I get the scatter-gather list of 2GB.
In DEVICE_DESCRIPTION, I have to set Version = DEVICE_DESCRIPTION_VERSION3, and DmaAddressWidth = 64.

Now, I need to get device logical address from the SG list and send it to my device. It is said that it is the physical address of the first entry. Anyone has tried that?

That is what exactly I want.

This way works in Linux. Do you mean I have to use common buffer (physically contiguous)? Then Windows doesn't use IOMMU?

Yes. I think we've said that 3 or 4 times in this thread. Memory-mapping IOMMUs are only available on a subset of motherboards, and there's no infrastructure for managing them. What do you expect Windows to do when one is not available?

There are many references to Windows' use of IOMMU's, both with Graphics and other PCIe devices.

How Windows protects against DMA drive-by attacks

Windows uses the system Input/Output Memory Management Unit (IOMMU) to block external peripherals from starting and performing DMA, unless the drivers for these peripherals support memory isolation (such as DMA-remapping). Peripherals with DMA Remapping compatible drivers are automatically enumerated, started, and allowed to perform DMA to their assigned memory regions.