Attach to context of thread (without APC)

I am using KeStackAttachProcess for attaching to address space of specific user-mode process. But I realized it wasn’t what I needed after getting an internal bugcheck for using KeUserModeCallback with it, about APCs.

Most of the examples on the internet for KeUserModeCallback uses IOCTL, which defeats the purpose imo. If I were going to use IOCTLs, I would use inverted call model.

Anyway, how to switch to context of user-mode thread like it happens in IOCTL handlers? Without having to issue an IOCTL from user-mode, of course.

Here is the bugcheck if anyone is interested:

APC_INDEX_MISMATCH (1)
This is a kernel internal error. The most common reason to see this
BugCheck is when a filesystem or a driver has a mismatched number of
calls to disable and re-enable APCs. The key data item is the
Thread->CombinedApcDisable field. This consists of two separate 16-bit
fields, the SpecialApcDisable and the KernelApcDisable. A negative value
of either indicates that a driver has disabled special or normal APCs
(respectively) without re-enabling them; a positive value indicates that
a driver has enabled special or normal APCs (respectively) too many times.
Arguments:
Arg1: fffff80135a511a7, Address of system call function or worker routine
Arg2: 0000000000000001, Thread->ApcStateIndex
Arg3: 0000000000000000, (Thread->SpecialApcDisable << 16) | Thread->KernelApcDisable
Arg4: 0000000000000000, Call type (0 - system call, 1 - worker routine)

What is the end goal of this? Are you trying to force a function call from kernel mode into a user mode process? Where do you get the address from?

End goal is kernel-mode -> user-mode callbacks without using inverted call. No forcing involved.
Which address do you mean?

Oh, by the way, I cannot use the inverted call even if I want to. I need to notify the um app when I need to notify um app. Explicitly. A direct callback that gets fired. Not an I/O that is pended or completed.

Inverted call is the correct way to do that. Anything else is hackery. Have the app submit an ioctl early on and pend it forever. When you want to notify, you complete the request. The thread is awakened (which gives it a priority boost) and runs normally.

What I mean by “address” is, how are you going to specify what user-mode address to jump to? Before you do your “alert”, what was the thread doing? If it was blocked waiting for your notification, in what way is that different from an inverted call? It’s exactly the same thing, except that inverted call is well-supported and understood.

1 Like

Inverted call defeats the purpose of my project.

The user-mode address to jump to is chosen by the NT. I just specify the "API" number it will call from. The real address is insided the KernelCallbackTable inside PEB, which is an array of pointers. That is how Win32k communicates with um processes.

If inverted call defeats the purpose of your project, and instead your project demands direct calls from kernel mode into the process’s address space, then your project ‘s purpose is ill considered and unsupportable.

As Mr. Robert’s stated, trying to cram this into your driver is the most egregious form of unsupportable hackery. You don’t know what you don’t know. These direct callback features are simply not part of the I/O subsystem architecture.

if you need high resolution time critical callbacks, this is absolutely doable on modern windows systems with inverted call. If you need higher resolution or more deterministic behaviour, you’re using the wrong OS for the job.

Period. Full stop. Nothing here to debate.

Something doesn’t make sense. There is no reasonable purpose in calling any function except your own, and there is no ‘API Number’ for that. Even if you somehow have module exports by ordinal (very much depreciated) after the loader has completed its work, you need an address to call

but to echo others, you can’t make this sort of direct call. One point about inverted call that many people miss is that you really want to use OVERLAPPED IO and issue multiple pending IRPs that get pended in your driver. Then there should always be one to complete whenever you need to. And if you find that you run out, pend more. You might not think so, but hundreds of thousands of these calls can be completed per second with well designed code on moderate hardware

1 Like