BSODs when processing image notifications from processes which use TPM

Hey guys,

I’ve recently been hit by a bit of a spate of BSODs from 9 or 10 different customers who use the driver I maintain. Nothing has changed in the driver for quite a while, so I think this might be a bad interaction with a Windows update that my driver can’t cope with. Most of the customers have reported the issue on 20H2 in the last week although a few on earlier versions of Win 10 too.

Most of the BSODs have occurred when the Windows Hello/Biometrics process “NgcIso.exe” runs although in one case it was when the customer started a Hyper-V Gen 2 VM and had virtualised TPM turned on. I note that Windows Hello can use TPM too so this seems to be a link.

The heart of the problem seems to be that my driver uses PsSetLoadImageNotifyRoutine in order to receive notifications about image loads. During our image load call-back, if the image is ntdll.dll we look into the user mode address of the image and look up a particular structure called the LdrSystemDllInitBlock since this contains an array containing which mitigation policies are set for the newly starting process.

In the case of these TPM processes, what seems to be happening is that the user mode memory associated with the image is unavailable. The documentation for:

https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntddk/nc-ntddk-pload_image_notify_routine

Says, “The operating system invokes this routine after an image has been mapped to memory, but before its entrypoint is called.” but I’m wondering if that’s not the case with these processes.

Does anyone have any insight about this? I’ve seen code online where people try to probe the memory associated with these image loads with a combination of IoAllocateMdl/MmProbeAndLockPages. That suggests to me that other people have encountered issues trying to access this memory in certain circumstances, although I confess I don’t understand how IoAllocateMdl would be appropriate in these circumstances.

Cheers.

Ben.

IoAllocateMdl/MmProbeAndLockPages makes sure the pages you’re working with are in memory right now until you unlock and release the mdl, and with the page protection you’ve decided to pass. For example, the .rdata section of a module is in normal circumstances read only, but you could probe and lock with a custom page protection (in that case we would like a write-able one) to write to that section without issues.

The documentation also specify:

The operating system calls the driver’s load-image notify routine at PASSIVE_LEVEL inside a critical region with normal kernel APCs always disabled and sometimes with both kernel and special APCs disabled.

1 Like

Thanks @ThatsBerkan, that’s helpful.

So I tried a combination of IoAllocateMdl/MmProbeAndLockPages and ensured the whole image was locked into memory. I checked it was so by checking pMdl->MdlFlags for MDL_PAGES_LOCKED so I know it worked.

Puzzlingly, the BSOD remained when running one of these processes associated with TPM. The LdrSystemDllInitBlock is inside the range of the ImageBase → ImageBase + ImageSize. I tried enumerating that entire range in Windbg and I found that there was a little chunk of the image that was inaccessible and came up as ???, most of the image was fine. Sure enough, my LdrSystemDllInitBlock is a region inside the inaccessible part.

I tried a ProbeForRead() on that area and it succeeded, a ProbeForWrite() failed though (despite ordinarily working on non-TPM processes). In the end I was able to fix the BSOD by using a structured exception handler when trying to read from the block. However… I’m not very satisfied with this, I’d like to understand why WinDBG can’t see this memory. It seems like there’s something architectural to do with TPM that I don’t understand. I’d also like to understand which other parts of the process image can’t be accessed in this circumstance. Most of the range is accessible, it’s just a few pages that are not.

Mmh, now that I think about it, is the page protection of that memory block executable ?

I think I’ve read somewhere that Windows has a security feature enabled by default when your computer has a TPM and runs on x64 / UEFI / enterprise and server builds of Windows, where executable memory can’t be write-able at the same time, and vice-versa, even tho the page protection states that it is both (in fact the PTE is actually set to not allow both write and exec at the same time).

It would make sense because you’ve stated that the ProbeForRead did not fail, only the ProbeForWrite did.
Could you try checking the page protection of that memory region, remove the execute protection, do your things and reset the protection back ?
It shouldn’t break anything as your code runs before the entry point of the executable is ran I believe.

You could also maybe map the MDL to kernel space with write protection, and write from there ?

Windows Server 2016 introduced a new Virtualization-based code protection to help protect physical and virtual machines from attacks that modify system code. To achieve this high protection level, Microsoft works in tandem with the computer hardware manufactures (Original Equipment Manufacturers, or OEMs) to prevent malicious writes into system execution code. This protection can be applied to any system and is being used as one of the building blocks for implementing the Hyper-V host health for shielded virtual machines (VMs).

As with any hardware based protection, some systems might not be compliant due to issues such as incorrect marking of memory pages as executables or by actually trying to modify code at run time, which may result in unexpected failures including data loss or a blue screen error (also called a stop error).

[FROM THE MODS: Mr. @ThatsBerkan … PLEASE DO NOT post the same post multiple times. Do you see the “sticky” at the head of the NTDEV category that says "Please Read: Did You Post Something and It Did Not Appear? Edit a Post and Have it Disappear"?]**

You mentioned that ProbeForRead succeeded. All ProbeForRead does is check that the address is a user mode address. It does not check if the address is actually readable. You still need to put the actual read code in a try/except block.

Bill Wandel

1 Like