IRQ level fault, VT-x

Hi,

Background
I’m writing a simple type 2 hypervisor based on the tutorial I’ve found online. I’m both new to virtualization and windows driver development - and as I’ve encounter some errors I would like to both consult the crashes and the general architecture as it might be completely wrong!

General Architecture
I wanted to enable user to communicate with the hypervisor using a standard nt windows kernel driver api. I wanted to keep it simple, but also provide an
ability to interact with virtualized code. The model picked is based on kvm approach (at least this is what I think).

I’ve drawn a diagram which illustrates what I’m trying to achieve:


When vmexit related to io_instruction occurs then I want to pass back the control to userland. The user then can read the output, or provide the input in the realtime. And then decide whether to continue running the virtualized code, or not. Some instructions like cpuid don’t invoke switching to userland, while some as hlt always cause virtual machine to exit.

Architecture Details
When for the first time the user invokes ioctl, I initialize the vmxon regions, vmcs structure and I invokevmlaunch instruction. Now the problem is that vmlaunch does not return, and the execution after vmexit is restored at exit handler. Now I’ve came up with a hacky way to solve this problem. Just before calling vmlaunch I push all general registers on the stack and I save the rsp. Then when I want to pass the control back to userland, I restore the saved rsp and I pop all registers from the stack. Then I land back in ioctl call after call to enter_guest procedure and return to userland.

enter_guest proc
    SAVE_GP ; pushes all registers onto the stack
    mov qword ptr [rcx], rsp

    vmlaunch

    xor rax, rax
    inc rax
    ret
enter_guest endp

; rcx holds saved rsp value which we will restore
pass_control_to_usermode proc
    mov rsp, rcx
    RESTORE_GP ; pops all registers from the stack
    xor rax, rax
    ret
pass_control_to_usermode endp

Not the question is whether this approach is ok for a small project? Is it ok to return to userland before calling vmxoff?
As I’ve mentioned, I’m getting some crashes which are hard for me to determinate the cause as they don’t happen directly inside the driver.

Crash one
The first crash is related to wrong IRQ level.

IRQL_GT_ZERO_AT_SYSTEM_SERVICE (4a)
Returning to usermode from a system call at an IRQL > PASSIVE_LEVEL.
Arguments:
Arg1: 00007fffa6aebea4, Address of system function (system call routine)
Arg2: 0000000000000002, Current IRQL
Arg3: 0000000000000000, 0
Arg4: fffff10f9e5e2b80, 0

Virtualized code:

char virtualized_code[] = {
    0x48,0x31,0xc0,               // xor rax, rax
    0x0f,0xa2,                    // cpuid

    0xba, 0xf8, 0x03, 0x00, 0x00, // mov edx, 0x3f8
    0x66, 0xb8, 0x41, 0x00,       // mov ax, 0x41
    0x66, 0xef,                   // out dx, ax

    0xf4};                        // hlt

The crash occurs just after leaving the ioctl for the first time. out instruction causes vmexit and I’m passing control back to userland. After I leave ioctl_dispatcher I receive the error.
It is simply connected with my poor knowledge about IRQs. I do know what they are, but I don’t know why it is at such a high level. I’ve also read that I should not use KeLowerIrq directly. Therefore I would like to hear some advices . I’m very ok to read provided links with further explanations.

Driver code:

static NTSTATUS eh_driver_ioctl_eh_run(PIRP irp) {
    log_entry("eh_driver_ioctl_eh_run()\n");

    PIO_STACK_LOCATION io_stack_location = IoGetCurrentIrpStackLocation(irp);
    struct __vcpu_t* vcpu = (struct __vcpu_t*) io_stack_location->FileObject->FsContext;

    NTSTATUS res = STATUS_SUCCESS;
    switch (vcpu->body.state)
    {
    case VCPU_STATE_CLEAR: // <-- in this crash we take this path
    {
        /* This is first the user runs this vcpu. We must set the rip and rsp and then invoke vmlaunch. */
        PCHAR inpt_code_buf = irp->AssociatedIrp.SystemBuffer;
        ULONG inpt_code_len = io_stack_location->Parameters.DeviceIoControl.InputBufferLength;
        log_debug("inpt_code_len: %d\n", inpt_code_len);

        void* code = alloc_non_paged(inpt_code_len);
        if (!code) {
            log_error("Failed to allocate mem for virtualized code. alloc_non_paged failed\n");
            res = STATUS_UNSUCCESSFUL;
        }

        RtlCopyMemory(code, inpt_code_buf, inpt_code_len);

        log_debug("Initial guest rip: %p\n", code);
        if (run_vcpu(vcpu, code)) {
            log_error("Failed to run_vmm. run_vmm failed.\n");
            res = STATUS_UNSUCCESSFUL;
        }
        irp->IoStatus.Information = inpt_code_len;
        break;
    }

    case VCPU_STATE_LAUNCHED:
        /* The vcpu is already launched. We just have to resume it. */
        if (resume_vcpu(vcpu)) {
            log_error("Failed to resume vmm. resume_vcpu failed.\n");
            res = STATUS_UNSUCCESSFUL;
        }
        break;

    case VCPU_STATE_OFF_AND_DIRTY:
        /* The cpu has been turned off either by executing hlt or by
        internal error. */
        log_error("Vcpu is at VCPU_STATE_OFF_AND_DIRTY state. Cannot run.\n");
        res = STATUS_UNSUCCESSFUL;
        break;
    }

    log_exit("eh_driver_ioctl_eh_run()\n");
    return res;
}

static NTSTATUS eh_driver_ioctl_dispatcher(PDEVICE_OBJECT device_object, PIRP irp)
{
    log_entry("eh_driver_ioctl_dispatcher()\n");

    UNREFERENCED_PARAMETER(device_object);

    PIO_STACK_LOCATION io_stack_location = IoGetCurrentIrpStackLocation(irp);

    NTSTATUS res = STATUS_SUCCESS;
    switch (io_stack_location->Parameters.DeviceIoControl.IoControlCode)
    {
    case EH_RUN:
        res = eh_driver_ioctl_eh_run(irp);
        break;

    default:
        res = STATUS_UNSUCCESSFUL;
        break;
    }

    log_exit("eh_driver_ioctl_dispatcher()\n");
    irp->IoStatus.Status = res;
    IoCompleteRequest(irp, IO_NO_INCREMENT);
    return res;
}

Output of !analyze -v:

*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

IRQL_GT_ZERO_AT_SYSTEM_SERVICE (4a)
Returning to usermode from a system call at an IRQL > PASSIVE_LEVEL.
Arguments:
Arg1: 00007fffa6aebea4, Address of system function (system call routine)
Arg2: 0000000000000002, Current IRQL
Arg3: 0000000000000000, 0
Arg4: fffff10f9e5e2b80, 0

Debugging Details:
------------------

*** WARNING: Unable to verify checksum for EHApp.exe

KEY_VALUES_STRING: 1

    Key  : Analysis.CPU.Sec
    Value: 3

    Key  : Analysis.DebugAnalysisProvider.CPP
    Value: Create: 8007007e on WINDEV2007EVAL

    Key  : Analysis.DebugData
    Value: CreateObject

    Key  : Analysis.DebugModel
    Value: CreateObject

    Key  : Analysis.Elapsed.Sec
    Value: 5

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 83

    Key  : Analysis.System
    Value: CreateObject

BUGCHECK_CODE:  4a

BUGCHECK_P1: 7fffa6aebea4

BUGCHECK_P2: 2

BUGCHECK_P3: 0

BUGCHECK_P4: fffff10f9e5e2b80

PROCESS_NAME:  EHApp.exe

STACK_TEXT:  
fffff10f`9e5e2198 fffff800`42727762 : fffff10f`9e5e2300 fffff800`42573cc0 00000000`00000000 00000000`00000000 : nt!DbgBreakPointWithStatus
fffff10f`9e5e21a0 fffff800`42726d46 : 00000000`00000003 fffff10f`9e5e2300 fffff800`425fbe50 00000000`0000004a : nt!KiBugCheckDebugBreak+0x12
fffff10f`9e5e2200 fffff800`425e7047 : 00000000`00000002 00000000`00000000 00000000`00000000 fffff800`477b4020 : nt!KeBugCheck2+0x946
fffff10f`9e5e2910 fffff800`425f8e29 : 00000000`0000004a 00007fff`a6aebea4 00000000`00000002 00000000`00000000 : nt!KeBugCheckEx+0x107
fffff10f`9e5e2950 fffff800`425f8cf3 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiBugCheckDispatch+0x69
fffff10f`9e5e2a90 00007fff`a6aebea4 : 00007fff`a41c8beb cccccccc`cccccccc cccccccc`cccccccc cccccccc`cccccccc : nt!KiSystemServiceExitPico+0x1fe
000000d5`b3d2f7e8 00007fff`a41c8beb : cccccccc`cccccccc cccccccc`cccccccc cccccccc`cccccccc cccccccc`cccccccc : ntdll!NtDeviceIoControlFile+0x14
000000d5`b3d2f7f0 00007fff`a4b755b1 : 00000000`0022e001 cccccccc`cccccccc cccccccc`cccccccc cccccccc`cccccccc : KERNELBASE!DeviceIoControl+0x6b
000000d5`b3d2f860 00007ff6`359d8464 : 000002ab`07a87a70 00007ff6`35a3ef30 000000d5`b3d2f8f0 00000000`00000000 : KERNEL32!DeviceIoControlImplementation+0x81
000000d5`b3d2f8b0 000002ab`07a87a70 : 00007ff6`35a3ef30 000000d5`b3d2f8f0 00000000`00000000 00000000`00000000 : EHApp!main+0x104 [C:\Users\User\source\repos\ExampleHipervisor\EHApp\main.c @ 37] 
000000d5`b3d2f8b8 00007ff6`35a3ef30 : 000000d5`b3d2f8f0 00000000`00000000 00000000`00000000 cccccccc`00000000 : 0x000002ab`07a87a70
000000d5`b3d2f8c0 000000d5`b3d2f8f0 : 00000000`00000000 00000000`00000000 cccccccc`00000000 000000d5`b3d2f914 : EHApp!_xt_z+0x120
000000d5`b3d2f8c8 00000000`00000000 : 00000000`00000000 cccccccc`00000000 000000d5`b3d2f914 00000000`00000000 : 0x000000d5`b3d2f8f0

SYMBOL_NAME:  ntdll!NtDeviceIoControlFile+14

MODULE_NAME: ntdll

IMAGE_NAME:  ntdll.dll

STACK_COMMAND:  .thread ; .cxr ; kb

BUCKET_ID_FUNC_OFFSET:  14

FAILURE_BUCKET_ID:  RAISED_IRQL_FAULT_EHApp.exe_ntdll!NtDeviceIoControlFile

OS_VERSION:  10.0.19041.1

BUILDLAB_STR:  vb_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10