Catching STATUS_ASSERTION_FAILURE with SEH on Insider builds?

We've got a kernel-mode test suite (so this is all about code used internally and never shipped to customers) which contains some tests of the form "if MyInternalRoutine is called with <these invalid parameters>, then an assertion, i.e. via the NT_ASSERT macro, is raised". This is achieved by wrapping the call in a SEH frame, in the end it looks like this:

NTSTATUS caught = STATUS_SUCCESS;
__try {
    MyInternalRoutine(NULL); // NULL here is an invalid parameter
}
__except (EXCEPTION_EXECUTE_HANDLER) {
    caught = GetExceptionCode();
}
if (caught == STATUS_ASSERTION_FAILURE) {
    // Record test success
} else {
    // Record test failure
}

Naturally, building this in DEBUG configuration is required (due to use of NT_ASSERT). This approach works on the generally available versions of Windows client, like 19045, 22631 or 26100 etc. that we've tested.

Last week I was probably the first person to run these against an Insider Preview build of Windows. With the latest canary ring build (27871 at the time) installed, to my surprise got a green screen of death, KMODE_EXCEPTION_NOT_HANDLED, coming from the assert test, from inside MyInternalRoutine.

I've tried replacing the NT_ASSERT with a "write to NULL" primitive - that causes a STATUS_ACCESS_VIOLATION exception which the test setup SEH frame catches successfully. So it looks like the build I have does not let you catch asserts.

Before I start digging I wonder if anyone's encountered this before. My two hypotheses at the moment are:

  1. This is an Insider build special behaviour
  2. This is a change in the next version of Windows and is not specific to Insider builds
1 Like

That would be bad...Do you have anything strange or non-standard about your build environment?

There's plenty things non-standard about the build... However I can reproduce by just making a brand new empty WDM driver project with the most basic code in it, that's outside of our build system.

Was having some trouble stepping through the kernel code that handles this, for example I could single step an entire memcpy that's deep in the assertion logic in NT, but not step over/step out - attempting that would just terminate the KD connection and hang the VM for some reason.

Will continue - maybe by scripting to prevent my "t" key from wearing out.

Finally had a couple days to stare at WinDbg and Ghidra...

So the sequence of events for hitting an NT_ASSERT is like this:

  1. int 2c
  2. KiRaiseAssertion
  3. KiExceptionDispatch(0xC0000420, ...)
  4. KiDispatchException(...)
  5. either RtlDispatchException(...) or KeBugCheck(...)

The decision being made in KiDispatchException resembles something like this:

KiDispatchException(u8 p1, u8 p2, u8 p3, u8 p4, bool get_a_chance)
{
	<...>
	
	bool handled = false;
	
	if (get_a_chance) {
		handled = KdTrap(...); // First Chance trap
		if (handled) return;
		
		handled = RtlDispatchException(...);
		if (handled) return;
	}
	
	handled = KdTrap(...); // Second Chance trap
	if (handled) return;
	
	KeBugCheck(...); // Straight to jail
	
	<...>
}

You'll see that if get_a_chance is True then this sequence matches what people normally except:

  • first the debugger, if any, gets a chance to see the exception,
  • if not handled, then the exception is dispatched, meaning any registered SEH handlers get to run,
  • if still not handled, then the debugger, if any, gets a second chance to see the exception,
  • finally, if not handled, the kernel emits a bug check

Now for the difference. In 26100, KiExceptionDispatch does this:

KiDispatchException(p1, p2, p3, p4, true); // Hardcoded 'true' for get_a_chance

In 27913 however that is not the case. More pseudo code for what happens there:

KiRaiseAssertion(...)
{
	<...>
	// the value of the 'CS' register is left on the stack by the CPU when jumping to KiRaiseAssertion, which is an interrupt handler
	bool interrupt_coming_from_kernel_mode = (cs_on_stack & 1) == 0;
	KiExceptionDispatch(0xC0000420, interrupt_coming_from_kernel_mode << 32, ...);
	<...>
}

KiExceptionDispatch(NTSTATUS exc_code, u8 flags, ...)
{
	<...>
	KiDispatchException(p1, p2, p3, p4, (flags & (1 << 32)) == 0);
	<...>
}

This means that for interrupts coming from kernel mode, get_a_chance will be False and it falls into the "go straight to second chance and bug check" scenario.

So this explains that yes, there is indeed a difference between the two kernels in how they handle this, and the difference is purely in code, not determined by any runtime configuration of the kernel. That does not answer whether this is an Insider-build special though. Next time, I'll go digging through all the Insider builds (on different rings) I can find.

2 Likes

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.