Semantics of Monitor Trap Flag (VT-x)

Hi,

I’ve a small hypervisor that intercepts PF exceptions due to write restrictions I introduced in the pagetables.

Scenario 1.
I intercept the PF, make memory writable by setting the W bit in pagetables and go back to guest at RIP=faulting instruction. Since memory is now writable, guest continues execution without a PF. This works as expected, no hangs or crashes.

Scenario 2.
As in 1: intercept PF, set W bit. Then I enable MTF with ToggleMTF(TRUE) (source below) and go back to guest at RIP=faulting instruction. The idea is to execute a single instruction, catch VmExit caused by MTF, then disable MTF and reintroduce the pagetable restrictions, so all future writes to memory I’m observing will trap.

In scenario 2 I sometimes get what I expect inside the VmExit MTF handler: GuestRip equal to the address of instruction following the instruction executed at VmEntry. Other times I get GuestRip pointing to inside of unrelated kernel procedures.

How do I debug this? I’m testing my driver inside vmware with 2 processors (2 cores each).

Also, regarding this excerpt from Intel manual:
“If VM entry is injecting a pending MTF VM exit (see Section 26.5.2), an MTF VM exit is pending on the
instruction boundary before the first instruction following the VM entry. This is the case even if the ?monitor
trap flag? VM-execution control is 0.”

This means that “VM entry” does not mean “load guest state; execute single instruction at GuestRip”, but rather “load guest state; stop at instruction boundary just before GuestRip” ?

VOID
ToggleMTF(int On)
{
ULONG64 Controls;

Controls = ShvVmxRead(CPU_BASED_VM_EXEC_CONTROL);
if (On) {
Controls |= CPU_BASED_MONITOR_TRAP_FLAG;
}
else {
Controls &= ~CPU_BASED_MONITOR_TRAP_FLAG;
}
__vmx_vmwrite(CPU_BASED_VM_EXEC_CONTROL, Controls);
}

On Wed, Jul 12, 2017 at 10:40 AM, xxxxx@gmail.com wrote:
> Hi,
>
> I’ve a small hypervisor that intercepts PF exceptions due to write restrictions I introduced in the pagetables.
>
> Scenario 1.
> I intercept the PF, make memory writable by setting the W bit in pagetables and go back to guest at RIP=faulting instruction. Since memory is now writable, guest continues execution without a PF. This works as expected, no hangs or crashes.
>
> Scenario 2.
> As in 1: intercept PF, set W bit. Then I enable MTF with ToggleMTF(TRUE) (source below) and go back to guest at RIP=faulting instruction. The idea is to execute a single instruction, catch VmExit caused by MTF, then disable MTF and reintroduce the pagetable restrictions, so all future writes to memory I’m observing will trap.
>
> In scenario 2 I sometimes get what I expect inside the VmExit MTF handler: GuestRip equal to the address of instruction following the instruction executed at VmEntry. Other times I get GuestRip pointing to inside of unrelated kernel procedures.
>
> How do I debug this? I’m testing my driver inside vmware with 2 processors (2 cores each).
>
>
> Also, regarding this excerpt from Intel manual:
> “If VM entry is injecting a pending MTF VM exit (see Section 26.5.2), an MTF VM exit is pending on the
> instruction boundary before the first instruction following the VM entry. This is the case even if the ?monitor
> trap flag? VM-execution control is 0.”

Injection of MTF VM exits is something that would be useful for nested
virtualization which doesn’t seem to be your case.

Do the unrelated kernel procedures have to do with interrupt/exception
handling by chance? You’re more likely bitten by this clause:

“If the “monitor trap flag” VM-execution control is 1 and VM entry is
injecting a vectored event (see Section 26.5.1), an MTF VM exit is
pending on the instruction boundary before the first instruction
following the VM entry.”

> This means that “VM entry” does not mean “load guest state; execute single instruction at GuestRip”, but rather “load guest state; stop at instruction boundary just before GuestRip” ?

If I understand your scenario correctly, you’ll either have to make
sure that there’s no interrupt pending or being injected. Or emulate
the instruction instead of single-stepping it. For what it’s worth,
KVM includes an instruction emulator which is used in many different
cases, including MMU emulation:

https://github.com/torvalds/linux/blob/master/arch/x86/kvm/mmu.c#L4817

>
> VOID
> ToggleMTF(int On)
> {
> ULONG64 Controls;
>
> Controls = ShvVmxRead(CPU_BASED_VM_EXEC_CONTROL);
> if (On) {
> Controls |= CPU_BASED_MONITOR_TRAP_FLAG;
> }
> else {
> Controls &= ~CPU_BASED_MONITOR_TRAP_FLAG;
> }
> __vmx_vmwrite(CPU_BASED_VM_EXEC_CONTROL, Controls);
> }
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: http:
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:>

My preference is single stepping for performance and code size reasons.

Here’s an example of the problematic situation:

PAGE_FAULT_EXCEPTION GuestRip=fffff80002aa6914, ErrorCode=3, Qualification=…
MonitorTrapFlag handler, GuestRip=fffff800030204f0

After PF I remove the write protection, enable MTF and return to guest. Then I get MTF VmExit at GuestRip=fffff800030204f0

0: kd> u fffff800030204f0
hal!HalpKInterruptHeap+0x4f0:
fffff800030204f0 50 push rax fffff800030204f1 55 push rbp
fffff800030204f2 488d2d67ffffff lea rbp,[hal!HalpKInterruptHeap+0x460 (fffff80003020460)]
fffff800`030204f9 ff6550 jmp qword ptr [rbp+50h]

I’m not injecting any events into the guest from the host’s PF VmExit handler.

On Wed, Jul 12, 2017 at 12:05 PM, xxxxx@gmail.com wrote:
> My preference is single stepping for performance and code size reasons.

Understood.

> Here’s an example of the problematic situation:
>
> PAGE_FAULT_EXCEPTION GuestRip=fffff80002aa6914, ErrorCode=3, Qualification=…
> MonitorTrapFlag handler, GuestRip=fffff800030204f0
>
> After PF I remove the write protection, enable MTF and return to guest. Then I get MTF VmExit at GuestRip=fffff800030204f0
>
> 0: kd> u fffff800030204f0
> hal!HalpKInterruptHeap+0x4f0:
> fffff800030204f0 50 push rax<br>&gt; fffff800030204f1 55 push rbp
> fffff800030204f2 488d2d67ffffff lea rbp,[hal!HalpKInterruptHeap+0x460 (fffff80003020460)]
> fffff800`030204f9 ff6550 jmp qword ptr [rbp+50h]
>
> I’m not injecting any events into the guest from the host’s PF VmExit handler.

This would then be a HW interrupt delivered to the guest “by the CPU”.
What happens if you simply disable interrupts (IF=0) in addition to
setting CPU_BASED_MONITOR_TRAP_FLAG?

> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: http:
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:>

Thanks Ladi, it worked :slight_smile:

I tested few combinatios (all on w7 x64 inside vmware).

DebugPrint when number of PFs % 1000 == 0
windbg not connected, IF=0 on PF
-> works
windbg connected, IF=0 on PF
-> noticeable slowdown, but works

DebugPrint all PFs / MTF VmExits
windbg not connected, IF not touched
-> noticeable slowdown, but works. DebugPrints captured correctly in DebugView. GuestRips with expected values at MTF VmExits
windbg connected, IF not touched ->
-> strange RIPs on VmExits, windbg unresponsive, vmware at 100% cpu, DebugPrints stop in windbg after few seconds. Occasionally “HW Malfunction” BSOD

I didn’t expect windbg to be a factor. Can anyone explain why attaching windbg causes errors in case of high number of DebugPrints? I connected via a serial port / named pipe. My expectation was that my driver will emit DebugPrints making the OS unresponsive, but that didn’t matter as I only wanted to observe the output in windbg.

I made some changes and now I’m getting an NMI inside the VmExit handler for MTF.

Manual states:
Suppose that the ?monitor trap flag? VM-execution control is 1, VM entry is not injecting an event, and the first
instruction following VM entry is neither a REP-prefixed string instruction or the XBEGIN instruction:
? If the instruction causes a fault, an MTF VM exit is pending on the instruction boundary following delivery of
the fault (or any nested exception).

How should I interpret “following delivery of the fault”?

Sample stacktrace of the problem:

00 fffff88002fe1b58 fffff80002bcdd92 nt!RtlpBreakWithStatusInstruction
01 fffff88002fe1b60 fffff80002a2fa12 nt!KiBugCheckDebugBreak+0x12
02 fffff88002fe1bc0 fffff80002bf37d3 hal!HalBugCheckSystem+0x1ba
03 fffff88002fe1c00 fffff80002a297a1 nt!WheaReportHwError+0x263
04 fffff88002fe1c60 fffff80002b95b61 hal!HalHandleNMI+0x149
05 fffff88002fe1c90 fffff80002ae3982 nt!KiProcessNMI+0x131
06 fffff88002fe1cf0 fffff80002ae37e3 nt!KxNmiInterrupt+0x82
07 fffff88002fe1e30 fffff80002ade4b5 nt!KiNmiInterrupt+0x163
08 fffffa80091dd708 fffff80002b26675 nt!DebugPrint+0x15
09 fffffa80091dd710 fffff80002b22178 nt! ?? ::FNODOBFM::string'+0xc642 0a fffffa80091dd9c0 fffff88004f2a62b nt!DbgPrintEx+0x30 0b fffffa80091dda00 fffff880`04f2aaf0 Hypervisor!HandleMonitorTrapFlag

On Thu, Jul 13, 2017 at 12:39 AM, xxxxx@gmail.com wrote:
> I made some changes and now I’m getting an NMI inside the VmExit handler for MTF.

What kind of changes have you made?

> Manual states:
> Suppose that the ?monitor trap flag? VM-execution control is 1, VM entry is not injecting an event, and the first
> instruction following VM entry is neither a REP-prefixed string instruction or the XBEGIN instruction:
> ? If the instruction causes a fault, an MTF VM exit is pending on the instruction boundary following delivery of
> the fault (or any nested exception).
>
> How should I interpret “following delivery of the fault”?

What I believe they want to say here is that if the next instruction
executed by the CPU is not the one at GuestRIP (because the one at
GuestRIP causes a fault), you will get a VM exit at the first
instruction of the respective interrupt handler (may be nested, i.e.
double fault). In other words, no code will have been run, although
the CPU state will change.

> Sample stacktrace of the problem:
>
> 00 fffff88002fe1b58 fffff80002bcdd92 nt!RtlpBreakWithStatusInstruction
> 01 fffff88002fe1b60 fffff80002a2fa12 nt!KiBugCheckDebugBreak+0x12
> 02 fffff88002fe1bc0 fffff80002bf37d3 hal!HalBugCheckSystem+0x1ba
> 03 fffff88002fe1c00 fffff80002a297a1 nt!WheaReportHwError+0x263
> 04 fffff88002fe1c60 fffff80002b95b61 hal!HalHandleNMI+0x149
> 05 fffff88002fe1c90 fffff80002ae3982 nt!KiProcessNMI+0x131
> 06 fffff88002fe1cf0 fffff80002ae37e3 nt!KxNmiInterrupt+0x82
> 07 fffff88002fe1e30 fffff80002ade4b5 nt!KiNmiInterrupt+0x163
> 08 fffffa80091dd708 fffff80002b26675 nt!DebugPrint+0x15
> 09 fffffa80091dd710 fffff80002b22178 nt! ?? ::FNODOBFM::string'+0xc642<br>&gt; 0a fffffa80091dd9c0 fffff88004f2a62b nt!DbgPrintEx+0x30<br>&gt; 0b fffffa80091dda00 fffff880`04f2aaf0 Hypervisor!HandleMonitorTrapFlag

Windows sometimes uses NMI for IPI. In this case it was probably
supposed to be delivered to the guest, not to the hypervisor. I wonder
what exactly you changed that you started getting this. Running
Windows on a single-processor VM will almost certainly get rid of this
issue.

> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: http:
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:>