Hi guys, while writing an NMI callback stack-walking, I encountered some problems that I haven't been able to solve for several hours. To understand the context of the problem -> I want to use NMI Callback to detect non-backed rip & rsp, for which I use KPCR->TSS->IST[NMI] to find MACHINE_FRAME ISR NMI to check rsp and rip for non-backed addresses. And at the end of the ISR NMI, I need the context of such an interrupted thread not to be restored, but instead to return to the context of the DPC & APC handler, or for the thread to terminate if it is not a DPC and is not an APC, and this is where the problem arises that I don't know how to solve. I need the NMI callback to distinguish APC and DPC from threads that independently raised the processor IRQL to APC_LEVEL or DISPATCH_LEVEL. How can I distinguish such threads? How can I correctly restore the thread context after the NMI ISR completes to the state it was in before it was captured by APC and DPC? And can I redirect the RIP in MACHINE_FRAME to the thread termination function when I detect a thread that was not used for DPC and APC?
You're trying to recover from an NMI? How do you expect to do that? Most NMIs represent conditions that are unrecoverable.
I want the ISR NMI not to restore the DPC, but to restore the thread state to the point where the thread was not used to execute the DPC, thus canceling the DPC, because I can't just terminate the thread on which it is executing once such a DPC is detected, since the thread it is using to execute may have nothing to do with what it is doing.
What is the reason of NMI and on which Windows version? For hardware originated NMIs sometimes you can install a WHEA error handler and a PSHED :
and recover using that. But the context where it occurred isn't easily available in this handler, IIRC.
Yes, this question bothers me. The OP seems to be implying that this is a regular occurrence, otherwise he wouldn't have so much information about it. In my experience, NMIs are exceedingly rare and only occur in case of hardware errors.
So, perhaps the OP could explain to us what circumstances are leading to his question. Why are you seeing NMIs at all?