ETL strangeness under Win11 ...

Recently during some testing under Win11 22000 I noticed something odd … I have a KMDF kernel service which uses WPP internally using the well worn bit of

DriverEntry () {

WPP_INIT_TRACING (…)

TRACE()

}

… and use TraceLog to start collecting ETL information using -flags 0xFFFF and -level 0xFFFF. As it’s a kernel service the testing regimen in the harness is a) start up TraceLog, b) load up the Kernel Service, c) run the harness IOCTL’s d) shut down the Kernel Service e) stop TraceLog collection

This has worked well for many years on Win7 and Win10 all the way up to 21H2, so imagine my surprise when I loaded up Win11 and pop BSOD right in DriverEntry, right at the very first TRACE() call. The BSOD is inside of the TMH file, and appears to be related to the -flags and/or -level parameters … without either of those TraceLog is happy and no BSOD, but it’s essentially useless. Once I use either the -flags or -level value, with any value other than 0x0, BSOD

I verified that I’m using the latest TraceLog from the 22000 WDK and the driver is built using the 22000 WDK, and verified it works fine under Win10 21H2 … it’s just Win11 22000 that is the problem child …

Has anyone seen anything like this, before I start digging into things? It’s a big problem because I now have a way for a usermode application to cause a BSOD in my driver which isn’t exactly “good” … at a minimum I’ll need to make some changes to the driver, question is where and how much …

Same binary runs fine on win10 or is this a targeted compile for win11. Did you upgrade the WDK or VS since the last time it worked?

Not a targeted compile, the same binary and testing sequence works fine on Win10 21H2 … so far it looks like it’s a problem in the TRACE() function right after the WPP_INIT; the begin_wpp is

// MACRO: TRACE_THIS
//
// begin_wpp config
// FUNC TRACE_THIS();
// USESUFFIX(TRACE_THIS, “+++ Inside %!FUNC!, [IRQL=%s]”, GET_IRQL_STRING);
// end_wpp

… and GET_IRQL_STRING is

#define GET_IRQL_STRING (KeGetCurrentIrql() == PASSIVE_LEVEL ? “PAS” : (KeGetCurrentIrql() == APC_LEVEL ? “APC” : (KeGetCurrentIrql() == DISPATCH_LEVEL ? “DPC” : “INT”)))

The crash is our buddy c0005, DriverEntry is running at PASV, so it’s not a paging problem and chopping out the _GET_IRQL_STRING macro seems to fix the problem so I’m thinking that the printf() style transform in USESUFFIX is attempting an operation that ETL isn’t able to handle in DriverEntry. TRACE_THIS messages in CreateDevice and later work fine …

I get the same result for release as well as debug, so it’s not a compiler optimization someplace …

Hmm …

OK, wrapping this up with a bow …

  • Win11 22000.434 Pro (problem not seen in Win10 21H2 Pro and earlier

  • KMDF kernel service compiled with VS2019, WDK 22000 in both debug and release modes

  • Tracelog (found in WDK\Tools) using -f and -l parameters (any values) run PRIOR to loading the kernel service using devcon

  • WPP_INIT made in DriverEntry (see above snippet)

  • WPP trace message made in DriverEntry immediately following the WPP_INIT which contains a string substitution (%s, “Foo”) in the wpp_config block for that trace message (see above snippet)

It appears that for WPP calls made after WPP_INIT in DriverEntry that contain a string substitution of either static or dynamic strings inside of the wpp_config block when the ETL consumer accesses the macro data it throws a C005 exception in the context of the system thread running DriverEntry, causing a BSOD in the kernel service

WPP calls made in calls after DriverEntry completes (CreateDevice, PrepareHardware, etc.) operate normally, as do WPP calls which do not contain string substitutions in the wpp_config block in DriverEntry. Wrapping the WPP call in a try/ except will contain the BSOD, but it loses the WPP trace data in the ETL log for that trace

WPP calls which contain a string substitution in the wpp_config block made AFTER an initial WPP trace call without the string substitution do not have a BSOD (so, “priming the ETL pump” is where the drama happens)

Conclusion: For kernel service drivers, any WPP tracing calls which are called in DriverEntry must not contain any string substitution code in the wpp_config block for that trace call. Trace calls made after DriverEntry can contain string substitution code in the wpp_config block

[Soapbox mode begin] It’s extraordinarily disappointing (and I’m going with that verbiage here, since it’s a family forum and my original chosen verbiage would get me insta-banned by Peter) that a commonly available usermode program provided by MS is capable of causing a kernel service to BSOD without any kind of IOCTL interaction simply by running on Win11 rather than Win10.

At a minimum that’s a denial of service waiting to happen, at worst it could be an entry for all manner of boot level malware. All that would have been needed is a very simple try/ except in the TraceLog (and bretheren) usermode programs and this would have been avoided, now I have to a) change all of my kernel service drivers to ensure that they don’t call any WPP trace macros in DriverEntry period, b) have to check to ensure that any kernel service drivers currently in production are updated, because I can’t control the Win10 to Win11 update cycle (which I’ve ranted about before) and c) add this as a “risk” in my regulatory documentation, which is a major hassle …

Very heavy (((sigh)))

[Soapbox mode end]