InterlockedExchange causes PTE corruption

I’m posting this just so people might benefit. I replaced the exchange with a read of the old value plus a write of the new value.

The memory address was obtained through MmMapIoSpace. It was mapped to the local Apic registers on the current processor. The InterlockedExchange translated into an XCHG instruction, which has locking behavior.

After doing this in my driver, eventually the system crashed with a BAD PTE code.

Curiously, if I loaded the driver, ran the user app once and then unloaded the driver, I could do this indefinitely. But if I kept the driver resident, then after the second or third time, usually, I got the crash.

If anyone has an explanation for this, I’d be curious. But otherwise, this should be a caution to avoid locking instructions with mapped addresses of this sort.

> If anyone has an explanation for this, I’d be curious. But otherwise, this should be

a caution to avoid locking instructions with mapped addresses of this sort.

Your problem,IMHO, has absolutely nothing to do with locking - it is100% related to your use of MmMapIoSpace(). The very name of this function speaks for itself, don’t you think - it is supposed to be used only for mapping device registers, so that it is not meant to be used with “regular” RAM pages. The only thing that this function does is updating the target PTEs. However, unlike MDL-mapping functions, it does not increment refcount on the target page, let alone updating VADs and/or any other relevant MM’s structures. In fact, Memory Manager remains blissfully ignorant of the whole thing. As long as you do it only for brief periods of time you may get away with it. However, if you allow the whole thing to stay resident for a little while you are more than likely to come into a conflict with MemoryManager who thinks of your target address as of a freely available space that it may use in any way it wishes. This is why you are crashing with BAD PTE status…

Anton Bassov

Interlocked operations only work on cached memory.

Thanks both of you.

I was just thinking that it may be related to the caching type. I call MmMapIoSpace with MmNonCached. I think I just chose that because the memory was used so rarely that I didn’t need to take up space in the cache for it. If I want to use the interlocked xchg, I will use MmCached, but I don’t have the desire to experiment with this, since the write + read works fine.

I did the interlocked xchg originally because I was worried about some other thread accessing the data concurrently. But now I have satisfactory synchronization protections around these operations, at least in my own driver, and I think it’s unlikely that some other program would ever be touching this particular Apic register.

Regarding Anton’s comment about using MmMapIoSpace in the first place… On the AMD cpus, at least, the local APIC device has its registers accessible through I/O Configuration space, as though it were a bus device. There’s also an MSR that provides the base physical address for the APIC registers. I use this base physical address as the argument for MmMapIoSpace.

Is this still an inappropriate use of the function? And if so, then how else could my driver read and write the registers?

One more thing. Anyone think that this behavior of PTE corruption is a bug in Windows that could be fixed? I would like to think that an xchg instruction should not crash the system, and with no more side effects than the failure to lock the data between the read and the write.

Fooling around with registers that do not belong to you, and that are explicitly managed by some other part of the OS, is just that… “fooling around.” This isn’t a solution for production-level code. I hope you realize that.

I would say the chances of this are very low.

Peter
OSR
@OSRDrivers

On Mar 2, 2018, at 1:21 PM, xxxxx@rolle.name wrote:
>
> I was just thinking that it may be related to the caching type. I call MmMapIoSpace with MmNonCached. I think I just chose that because the memory was used so rarely that I didn’t need to take up space in the cache for it. If I want to use the interlocked xchg, I will use MmCached, but I don’t have the desire to experiment with this, since the write + read works fine.

There are several silly lines of thought in this paragraph.

- Don’t worry about optimizing the cache. The processors are experts at managing it. Unless you’re manipulating the value millions of times a second, your puny usage will have zero effect on performance.

- You don’t decide on cached vs non-cached in order to “save cache space”. You make that decision based on need, if there’s a chance something dangerous could happen if registers are accessed out of order. Otherwise, always default to cached.

- You can’t arbitrarily decide on a cache scheme when you map a chuck of address space that might already be mapped. You MUST match the cache type of the existing mappings.

Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Tim, thanks. I admit that I went overboard on optimization by requesting a MmNonCached mapping.