Invoking Win32k syscalls from kernel space

Invoking Win32k syscalls from kernel space

I have noticed that while calling ntoskrnl.exe/ntdll.dll syscalls directly from the kernel works just fine, doing the same for win32k.sys/win32u.dll syscalls however fails when HVCI is enabled with the bug check 139, with the additional hint: Arg1: 0 - A stack-based buffer has been overrun.
Which is strange, as why would it work with HVCI enabled for the kernel itself but not for the win32 stuff? That would be problem 1.

Another issue is that before calling the first win32k sys call KiConvertToGuiThread must be triggered, unfortunately neither this nor PsConvertToGuiThread is exported, by the kernel. And this would be problem 2.

What we want to achieve as the final goal is to provide a syscall interface in our driver that an application could call, the driver would than do something in the kernel space, invoke the actual syscall, then do something else, before finally returning control to the user mode application.

“What is this good for?” - you may ask.
This mechanism is used by Sandboxie (https://github.com/sandboxie-plus/Sandboxie) for almost all ntoskrnl.exe syscalls to allow the driver to inspect the operations before they get executed. This was in older versions of windows only needed for ntdll.dll but unfortunately some win32k.sys syscalls also require this treatment. For sandboxed applications to keep working correctly, for example GdiDdDDI* are used by chromium/msedge and only operate correctly for sandboxed processes when redirected through this syscall interface. Without those functions working the HW acceleration is not available and at least on win 11 this causes UI glitches.

The syscalls redirection is done in user space by hooking ntdll.dll and now win32u.dll by processes that start in the sandbox. The driver only needs to provide an interface to invoke the original syscall.
So, the application now instead of doing a syscall directly calls a routine provided by the injected dll specifying the syscall number this routine than calls using a clean copy of NtDeviceIOControlFile the driver passing it the sys call number and a pointer to that stack containing the arguments. The driver then inspects the request and if found permissible, makes the thread impersonate a less restrictive token, calls the address of the syscall routine (obtained during initialization from the service table) the those executing the syscall, once that’s finished it de-impersonates the thread, back to its highly restricted primary token, and returns.
So as one can see a process running under sandboxie supervision with its highly restricted primary token, couldn’t do much when it would bypass the syscall interface for example by loading a unmodified copy of ntdll.dll it would just run into access denied almost every ware.
In the past this wasn’t an issue for win32k syscalls once the helper service helped the restricted process to connect to the desktop object. But as mentioned some new win32k functions don’t want to operate as desired when restricted.

So just to clarify it we are not messing with the kernel the SSDT or any unsandboxed process.

Problem 2 could be successfully ignored as other un redirected win32k syscalls would have been executed at this point already.

But problem 1 is in urgent need of fixing and I’m out of ideas why this would not work with HVCI enabled for the new set of syscalls. The redirection is implemented in the same exact way as the old working one and without HVCI it works just fine.

Anyone here knowing whats going on and how to fix it?

imho the best would be a way to actually invoke the original syscall from the kernel such that it does everything including KiConvertToGuiThread , but I’m not sure if that is even possible.

Cheers
David

First, let me say that Mr. Xanatos contacted me before posting this, in light of the fact that his previous thread on this topic was locked after we accused him of facilitating malware. I see here he has gone to some lengths to describe his intentions, which are obviously not “bad.”

Secondly, my only comment about calls to NTDLL from kernel mode versus Win32K from kernel mode is that the OS specifically accommodates native NT/Zw syscalls from kernel mode (such calls are allowed by design), but it provides no such facility for calls to Win32 that I know of (which is intended to be invoked only from user mode to the best of my knowledge).

Good luck Mr. Xanatos,

Peter

This is by design. Going back into ancient history, until NT 3.51, display drivers were user-mode DLLs that talked with their kernel miniports to access the hardware. This was a big part of the “subsystem purity” model; the NT kernel was supposed to be agnostic about operating systems, and only provide generic support for the upper layers. User-mode subsystems then provided the Win32 experience, or the OS/2 experience, or the Posix experience. (A lot of ATMs in the world used NT with the OS/2 subsystem, even into the 21st Century.)

This was a good idea, but it caused some performance bottlenecks in graphics. So, in NT 4.0, this was redesigned. Display drivers became kernel components, managed by win32k.sys, which was the impure kernel presence of the Win32 subsystem. To try to maintain the semblance of purity, win32k.sys protects itself against accesses from the user-mode parts of the Win32 subsystem.

Interesting peace of history, but i wonder why this protection triggers only when HVCI is in place, I would have thought it would be more a all or nothing situation, i.e. that than also without HVCI it would not work.

Okay, LOL… the HVCI issue was in the end trivial!
As it seams all i needed to do was to disable the “Control Flow Guard” option for the file that does the invoking, with it disabled all seams to work now.
:smiley:

LOL… so Windows was “helping” you all this time! Awesome. :wink:

Glad you got he issue sorted.

Peter

I wanted to fix this in a nicer way than just disabling CFG for the hole file,

so I wrote a small call wrapper that should do the trick

;extern NTSTATUS Sbie_InvokeSyscall_asm(void* func, int count, void* args);
Sbie_InvokeSyscall_asm PROC

     mov         qword ptr [rsp+20h], r9  
     mov         qword ptr [rsp+18h], r8  
     mov         qword ptr [rsp+10h], rdx  
     mov         qword ptr [rsp+8], rcx 
     
     ; note: (count & 0x0F) + 4 = 19 arguments are the absolute maximum

     ; quick sanity check
     cmp         rdx, 13h ; if count > 19
     jle         arg_count_ok
     mov         rax, 0C000001Ch ; return STATUS_INVALID_SYSTEM_SERVICE
     ret
arg_count_ok:

     push        rsi
     push        rdi
     ; prepare enough stack for up to 19 arguments
     sub         rsp, 98h  
     
     ; save our 3 relevant arguments to spare registers
     mov         r11, r8  ; args
     mov         r10, rdx ; count
     mov         rax, rcx ; func

     ; check if we have higher arguments and if not skip 
     cmp         r10, 4
     jle         copy_reg_args
     ; copy arguments 5-19
     mov         rsi, r11 ; source
     add         rsi, 20h
     mov         rdi, rsp ; destination
     add         rdi, 20h
     mov         rcx, r10 ; arg count
     sub         rcx, 4   ; skip the register passed args
     rep movsq

copy_reg_args:
     ; copy arguments 1-4
     mov         r9,  qword ptr [r11+18h]
     mov         r8,  qword ptr [r11+10h]
     mov         rdx, qword ptr [r11+08h]
     mov         rcx, qword ptr [r11+00h]

     ; call the function
     call        rax

     ; clear stack
     add         rsp, 98h  
     pop         rdi
     pop         rsi

     ret  

Sbie_InvokeSyscall_asm ENDP

And for the most part it seams to work, when HVCI is on native 64 bit applications work just fine on a 64 bit windows.

How ever for some strange reason when starting a 32 bit application sandboxed, they just fail, what to my understanding should not be possible as all the to the kernel com from the 64bit ntdll long after wow64 did its thing. From the view point of the kernel everything is 64 bit imho.

I did some more testing and it seams that this fails for functions with 4 and 5 arguments.

so i created this test


typedef NTSTATUS (*P_SystemService05)(
    ULONG_PTR arg01, ULONG_PTR arg02, ULONG_PTR arg03, ULONG_PTR arg04,
    ULONG_PTR arg05);

_FX NTSTATUS Sbie_InvokeSyscall5(void* func, ULONG_PTR *stack) {

    P_SystemService05 nt = (P_SystemService05)func;
    return nt(stack[0], stack[1], stack[2], stack[3], stack[4]);
}

_FX NTSTATUS Syscall_Invoke(SYSCALL_ENTRY* entry, ULONG_PTR* stack)
{

    if (entry->param_count <= 4) {

        extern NTSTATUS Sbie_InvokeSyscall4_asm(void* func, void* args);
        return Sbie_InvokeSyscall4_asm(entry->ntos_func, stack);

    }
    else if (entry->param_count == 5) {

        //extern NTSTATUS Sbie_InvokeSyscall5_asm(void* func, void* args);
        //return Sbie_InvokeSyscall5_asm(entry->ntos_func, stack);
        return Sbie_InvokeSyscall5(entry->ntos_func, stack);
    }
    else {

        extern NTSTATUS Sbie_InvokeSyscall_asm(void* func, int count, void* args);
        return Sbie_InvokeSyscall_asm(entry->ntos_func, entry->param_count, stack);
    }
}

So we implement working cases for 4 or less arguments, 5 arguments and 6 or more
and we have these hand made functions


Sbie_InvokeSyscall4_asm PROC

     mov         r11, rdx ; args
     mov         rax, rcx ; func

     mov         r9,  qword ptr [r11+18h]
     mov         r8,  qword ptr [r11+10h]
     mov         rdx, qword ptr [r11+08h]
     mov         rcx, qword ptr [r11+00h]

     jmp         rax

Sbie_InvokeSyscall4_asm ENDP

Sbie_InvokeSyscall5_asm PROC

sub     rsp, 38h
mov     rax, [rdx+20h]
mov     r10, rdx
mov     r9, [rdx+18h]
mov     r11, rcx
mov     r8, [rdx+10h]
mov     rdx, [rdx+8]
mov     rcx, [r10]
mov     [rsp+38h-18h], rax
call    r11
add     rsp, 38h
ret

Sbie_InvokeSyscall5_asm ENDP

when using Sbie_InvokeSyscall5 instead of Sbie_InvokeSyscall5_asm everything works fine,
when using Sbie_InvokeSyscall5_asm however 32 bit applications fail,
now that said I opened my driver in IDA copied Sbie_InvokeSyscall5 and thats what is in my Sbie_InvokeSyscall5_asm, so really how why?? i mean …
and yea I have then compiled the new driver and compared the bytes of Sbie_InvokeSyscall5 and Sbie_InvokeSyscall5_asm and they were identical.
I really don’t understand how it can be that one works and the other one fails if they are are binary identical, the only difference is they are located in different places in the driver’s address space.

also when I change Sbie_InvokeSyscall4_asm from the “optimized” version as presented to one that allocates some stack for the shadow space and used a call it also fails fails for 32 bit apps.

at this point I am out of ideas :frowning: