UserMode address mapping leads to unexpected sequential PCI BAR reading

CARODEV · April 29, 2025, 3:29pm

Our driver is mapping PCI bar addresss into UserMode for an application and we're experiencing access errors after approximately 6 hours of runtime of application.
We found out that only if we are mapping certain areas of the BARs to UserMode we are getting those access errors.
Through investigation, we discovered also that a sequential 8-byte read across our PCI BARs, from first to last BAR, is responsible for the access errors.

Our driver maps PCI BAR addresses to UserMode the following:

MmMapIoSpace (with MmNonCached) for physical to virtual mapping
mapping virtual addresses to UserMode

This is our relevant code for mapping the virtual addresses to UserMode:

pMemUmOut->uMdl.pPtr = (void*) IoAllocateMdl(   (void*) pVirtualAdrIn,
                                                lSizeMemIn,           
                                                FALSE,                
                                                FALSE,                
                                                NULL);                

if  (pMemUmOut->uMdl.pPtr == NULL)
{
    // set ErrorString
	...
}
else
{
    MmBuildMdlForNonPagedPool((PMDL) pMemUmOut->uMdl.pPtr);


	try
	{
		// maps the physical pages that are described by an MDL
		pMemUmOut->uAdrUser.pPtr = MmMapLockedPagesSpecifyCache((PMDL) pMemUmOut->uMdl.pPtr,
																UserMode,
																MmNonCached,
																NULL,               
																FALSE,              
																NormalPagePriority);
	}
	except(EXCEPTION_EXECUTE_HANDLER)
	{
	...
	}
}

We are wondering who/what is performing these sequential reads?
The MmProbeAndLockPages routine should not be called because the pages are already locked (NonPagedPool)...
Any idea? Also how we could figure it out? We tried to set data breakpoint on the start of the first PCI BAR with windbg but unfortunately the breakpoint haven't triggered yet...
We would be very happy any help!

Mark_Roddy · April 29, 2025, 4:02pm

Anti virus?

I would suggest protecting your user mode app from scanning software.

Mark Roddy

Tim_Roberts · April 29, 2025, 4:34pm

This is one reason why mapping hardware directly into user mode is so strongly discouraged. User mode is not secure. Any application can open your process and do whatever they want with the mernory.

CARODEV · April 29, 2025, 7:31pm

Yes, that's a thing we could check as well, thanks!

CARODEV · April 29, 2025, 8:15pm

Thank you for your response.
What do you exactly understand under mapping it directly? I'm not sure if - how we do it - means that we are mapping hardware directly to user mode. To be clear:

First we're mapping physical addresses to kernel virtual addresses
Then we're mapping these kernel virtual addresses to user mode

The user mode mapping is process-context specific:

Each user mode application gets its own virtual address mapping
The addresses are only valid in the context of the application
Other user mode applications should actually not be able to access these addresses

However, I'm not really sure about how it really works in Windows. Is it possible to protect the mapped memory from being accessed by another process?
What would be your alternative approach for providing user mode applications access to specific hardware registers? The access needs to be still fast, so IOCTL calls for each register access would not work for us unfortunately....

MBond2 · April 29, 2025, 9:00pm

This topic has been discussed here and elsewhere many times. In brief, one of the major features of the UM / KM boundary is to protect the system and the hardware from invalid access. Your driver knows what the correct pattern of poking your hardware, and if it is the only code that can do it, and it is protected by that security boundary, then you can be confident that any stability issues caused by hardware being accessed in an invalid way (there are many ways to cause this even if the hardware / firmware itself has no bugs - and that's far from the norm)

but if you map these resources into UM, then anything that can access that address space, can now access those resources. That includes the code in your .EXE / .DLL and any code that you expect to access those resources, but also any code that can be injected via another .DLL to use them from the same address space, and any code that can call ReadProcessMemory / WriteProcessMemory to use them from a remote address space. Anti-virus programs frequently make use
these techniques, but it should be noted that something as simple as the clipboard does too.

There are security boundaries in UM too, but the system is designed to use the UM / KM boundary for this

The question about IOCTL performance has be raised many times too. Almost always, when real numbers are tested, IOCTL performance is not an issue. Sometimes poor design is a factor, but it would be a very rare situation where you couldn't make that work I would think. I would start by re-evaluating that assumption because if you don't you may never come to a satisfactory conclusion

Mark_Roddy · April 29, 2025, 9:19pm

" First we're mapping physical addresses to kernel virtual addresses Then we're mapping these kernel virtual…"
The virtual addresses mapped to user mode are backed by your hardware physical addresses. A read in user mode of those virtual addresses is a read of the mapped physical addresses.

The effect of this is that any user mode process with sufficient privileges can use ReadProcessMemory / WriteProcessMemory to access any process that has these virtual addresses. An anti-virus program, for example, might scan process memory looking for 'stuff'. The scan would likely read enough of process memory to cause your issue.

It is possible to block these accesses, but it is not trivial. I agree with @MBond2 that you should re-evaluate doing this mapping to user mode.

Jan_Bottorff · April 30, 2025, 11:34am

Did you set the memory breakpoint on the kernel or user address? Memory breakpoints work on virtual addresses, not physical addresses. Is it on x64 or ARM64? is the BAR in prefetchable 64-bit MMIO space or non-prefetchable 32-bit MMIO space?

Could it be you are seeing the result of the OS doing a memory dump after the process or OS crashes? A user-mode dump probably is not careful about avoiding device memory, it just dumps the mapped address space.

Do you have a PCIe analyzer? If so, you could set some sort of PCIe access trigger. I seem to remember some analyzers can fire a hardware trigger signal on a match (like for an oscilloscope trigger), and you could perhaps find a way to trigger an NMI or other interrupt that can freeze the software state. If you have a PCIe exerciser, you could perhaps program a BAR with a pattern match for the offending accesses, and get that to trigger a write to something that would force an OS crash, and then you could examine what cores were running.

Another idea, run in single core mode, and run the profiler (Windows Performance Recorder, or one the half dozen other ways to get profiles) at a high sample rate. The profiler will interrupt at high priority to sample, and you can perhaps get some time correlation between when you hardware chokes and what thread was executing. You could set the profile trace ot use a circular buffer, so it could run for hours without making gigabytes of useless data, and you stop it before the buffer wraps around.

And another idea, map a second range of MMIO space into your user app. Make this map some non-existent address which should cause a PCIe AER error when accessed, then in the debugger put a breakpoint on the AER handler. This assumes if something scans your device memory, it will take the bait and also scan your bogus device memory mapping.

Another idea, you could run it under QEMU, and make a fake PCIe device. Map that device in user mode. Then hack the QEMU virtual PCIe device to break when it sees the undesirable access pattern at the BAR physical address. It sounds like if it's an antivirus, just mapping the device BAR into user space may be sufficient to catch it.

I seem to remember there is a x64 debugger that runs as a hypervisor, I don't offhand remember it's name. You might see if that debugger can trap on virtualized BARs access passing through to real hardware BARs.

Jan

GrimBeaver · May 6, 2025, 2:40pm

Just curious what is your reason for doing this? I have always mapped the entire BAR to user space and never had a problem. I make the same calls you do and store them in a file context for each application that opens a session. Then in the PFN_WDF_FILE_CLOSE callback I call MmUnmapLockedPages and IoFreeMdl.

I get what @MBond2 says about why not to do this. But at the same time this model has allowed me to write drivers where the bulk of the logic is shared in a user library between Windows, Linux and VxWorks. I have a common OS abstraction layer and kernel driver that I share between multiple FPGA designs which each have their own unique user space implementation on top.