GPU Driver: Fence polling hangs when Driver Verifier DMA Checking is enabled

Environment

  • OS: Windows 10 (10.0.19045)

  • Driver: KMDF kernel-mode driver for a PCIe GPU (64-bit DMA capable)

  • Driver Verifier: DMA Checking (flag 0x80)

Symptom

A basic GPU test works perfectly without Driver Verifier, but hangs when DMA Checking is enabled.

The test submits a DMA command to the GPU. The GPU copies data from CPU system memory to GPU device memory, and then writes a fence value to a CPU system memory buffer to signal completion. The user-mode test polls this fence buffer, but the expected value never arrives.

How the fence buffer is mapped for DMA

  1. kernel-mode allocates system memory by ZwAllocateVirtualMemory

  2. Driver pins the user pages into an MDL by IoAllocateMdl and MmProbeAndLockPages

  3. Driver calls BuildScatterGatherList to obtain DMA addresses

  4. These DMA addresses are programmed into the GPU page table

  5. GPU writes the fence value to this address after completing the DMA transfer

  6. CPU polls the system memory, but the value never appears when DMA Checking is on

DMA usage pattern

Our driver uses a long-lived mapping model:

  • BuildScatterGatherList is called once when memory is allocated

  • The GPU performs many DMA read/write operations to this memory over its lifetime

  • PutScatterGatherList is called only when the memory is freed

What we've tried

  • Disabling all Verifier options except DMA Checking → still hangs

  • Without DMA Checking, all other Verifier options enabled → test passes

  • The issue is reproducible 100% with DMA Checking on

Questions

  1. Is the long-lived scatter/gather mapping pattern compatible with DMA Verification? Or does DMA Checking require a per-transfer BuildScatterGatherList / PutScatterGatherList cycle?

  2. For GPU fence buffers (device writes asynchronously, CPU polls), should we use AllocateCommonBuffer instead of scatter/gather mapping?

  3. Any guidance on making a GPU DMA driver work correctly under DMA Verification?

Not the professional, but shouldn't you allocate a contiguous memory with MmAllocateContiguousMemory for DMA operations instead of relying ZwAllocateVirtualMemory?

My GPU support Scatter/Gather DMA, so the memory doesnot need continguous.

Are you able to tell WHEN the system hangs? Is it during initialization, at the first transfer, or randomly later on? Have you been able to catch this in the kernel debugger?

DMA Verification - Windows drivers | Microsoft Learn

think about how DMA verification has to work. To detect invalid access patterns, an intermediate layer has to exist. And if you don’t use the standard functions / methods, it won’t all hang together

probably we can conclude that this test has found a flaw in your design

The fence is a memory buffer which cpu and gpu can all acess. CPU dispatch a dma copy task to gpu, gpu dma will do memory copy from cpu memory to gpu memory. Then GPU will write fence buffer to 0xab after memory copy done. CPU poll the fence buffer until get value 0xab. In my case, the cpu cannot get value 0xab all the time, so the test case is hung at polling the fence. What puzzles me is that my case can pass when disable dma verifier. I donot know what dma verifier has done. it seems to be related to the DMA verifier.

DMA Verifier double buffers the user buffer on transfers. If you want to share a host memory buffer between the CPU and the device you need AllocateCommonBuffer.

Hi @ScarPunk , thanks for your replay. what is the meanof DMA Verifier double buffers the user buffer on transfers.

When you call BuildScatterGatherList DMA Verifier allocates a new buffer and provides the address of this buffer in the Scatter Gather List. So, your device is not DMA’ing directly into the user’s data buffer. You should be able to confirm this by comparing the physical addresses of the buffers.

2 Likes

Now, I fixed this issue. In my case,there are many dma buffers which are allocate by BuildScatterGatherList. When DMA Verifier runs, it double-buffers all DMA buffers for verified drivers. I didnot call FlushAdapterBuffers after dma copy done, so the data is located in shadow buffer, then My Gpu and cpu sync failed. I Allocated all the dma buffers by AllocateCommonBufferwhich DMA Verifier doesnot double-buffers, My case can passed now.

1 Like