How to hook into NTOSKRNL PopThermalDeviceHandler?

Taed_Wynnell · July 24, 2009, 1:45am

We have what seems to be a hardware or BIOS problem that we’re hoping to
“fix” in software by preventing Windows from reacting to a thermal
event.

The problem seems to be that a temperature sensor on the motherboard
suddenly goes from about 55 C to 125 C (and always 119 C or 125 C) in
the space of a second. This causes the Critical Thermal Trip Point to
hit, which somehow gets to Windows via ACPI, and Windows Server 2003
does a “Critical Shutdown”. A Critical Shutdown is where Windows cuts
its own power without flushing the disks (although driver shutdown
routines are called) since it’s an emergency situation. Clearly, we
don’t want it to shut down, especially not for no good reason.

In the last 2 years, we’ve seen this problem about 15 times on 13
motherboards (so there is a slight recurrence rate, but not enough to
conclude that it’s bad hardware or environmental). As we have about
1000 of these systems in the field, it’s rare, but for a 24x7 system,
it’s a big problem. The motherboard/BIOS vendor has not been of much
help, and even if they fixed it in the BIOS, deploying a new BIOS would
be a huge undertaking. I have a driver that flips some bits in one of
the motherboard chips that allegedly disables any interrupts from the
thermal / voltage monitoring chip, and that prevents a shutdown when we
intentionally overheat the system (so it does work for the problem
intended), but that did not prevent the problem in the field, which
implies that it’s something more than just an overtemp condition.

When Windows starts to do the Critical Shutdown, the stack looks like
this.

STACK_TEXT:
f78f2d28 80934a61 863ab680 86374900 863a8d10
nt!PopThermalDeviceHandler+0xfb
f78f2d40 80867724 863a2b40 808a45bc 808a2b50 nt!PopPolicyWorkerMain+0x25
f78f2d80 8087741d 80000000 00000000 863a2b40
nt!PopPolicyWorkerThread+0x74
f78f2dac 8093a77e 80000000 00000000 00000000 nt!ExpWorkerThread+0xeb
f78f2ddc 808845aa 80877332 00000001 00000000
nt!PspSystemThreadStartup+0x2e
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16

We’d like to somehow hook into PopThermalDeviceHandler to prevent
Windows from doing the Critical Shutdown. While I’m a driver writer,
I’m not sure where to start on something like this. Does anyone have
any advice for me about how to start on such a project? (Or should I
call up OSR or someone and pay them to do it as it needs a Windows
internals expert?)

Thanks in advance for any suggestions or words of wisdom?

Email secured by Check Point at OSR.COM

Pavel_A1 · July 24, 2009, 2:41am

How have you obtained this stack trace? is this what has been
actually seen/captured in the field, or from “artificial” experiment
with overheating in the lab?

–pa

“Taed Wynnell” wrote in message news:xxxxx@ntdev…
> We have what seems to be a hardware or BIOS problem that we’re hoping to
> “fix” in software by preventing Windows from reacting to a thermal
> event.
>
> The problem seems to be that a temperature sensor on the motherboard
> suddenly goes from about 55 C to 125 C (and always 119 C or 125 C) in
> the space of a second. This causes the Critical Thermal Trip Point to
> hit, which somehow gets to Windows via ACPI, and Windows Server 2003
> does a “Critical Shutdown”. A Critical Shutdown is where Windows cuts
> its own power without flushing the disks (although driver shutdown
> routines are called) since it’s an emergency situation. Clearly, we
> don’t want it to shut down, especially not for no good reason.
>
> In the last 2 years, we’ve seen this problem about 15 times on 13
> motherboards (so there is a slight recurrence rate, but not enough to
> conclude that it’s bad hardware or environmental). As we have about
> 1000 of these systems in the field, it’s rare, but for a 24x7 system,
> it’s a big problem. The motherboard/BIOS vendor has not been of much
> help, and even if they fixed it in the BIOS, deploying a new BIOS would
> be a huge undertaking. I have a driver that flips some bits in one of
> the motherboard chips that allegedly disables any interrupts from the
> thermal / voltage monitoring chip, and that prevents a shutdown when we
> intentionally overheat the system (so it does work for the problem
> intended), but that did not prevent the problem in the field, which
> implies that it’s something more than just an overtemp condition.
>
> When Windows starts to do the Critical Shutdown, the stack looks like
> this.
>
> STACK_TEXT:
> f78f2d28 80934a61 863ab680 86374900 863a8d10
> nt!PopThermalDeviceHandler+0xfb
> f78f2d40 80867724 863a2b40 808a45bc 808a2b50 nt!PopPolicyWorkerMain+0x25
> f78f2d80 8087741d 80000000 00000000 863a2b40
> nt!PopPolicyWorkerThread+0x74
> f78f2dac 8093a77e 80000000 00000000 00000000 nt!ExpWorkerThread+0xeb
> f78f2ddc 808845aa 80877332 00000001 00000000
> nt!PspSystemThreadStartup+0x2e
> 00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16
>
>
> We’d like to somehow hook into PopThermalDeviceHandler to prevent
> Windows from doing the Critical Shutdown. While I’m a driver writer,
> I’m not sure where to start on something like this. Does anyone have
> any advice for me about how to start on such a project? (Or should I
> call up OSR or someone and pay them to do it as it needs a Windows
> internals expert?)
>
> Thanks in advance for any suggestions or words of wisdom?
>
>
>

Taed_Wynnell · July 24, 2009, 1:07pm

We got help from Microsoft and they created an NTOSKRNL.EXE that would bluescreen instead of doing the Critical Shutdown. So, that is the stack just before it is about to do a Critical Shutdown with the actual problem in the field.

That is also how we know the the temperature that Windows was told about, and were able to verify that it was far over the thermal trip point (and the Windows thermal zone information was verified to agree with the BIOS setting). We know that the temperature was fine prior to that thermal event because we have a service that reads the temperature every second and logs it to disk.

If we don’t figure out this hook, one solution might be to create a custom NTOSKRNL.exe that ignores any Critical Shutdown requests, but that’s a bad solution since we’d be chasing NTOSKRNL.exe versions every time a new security hotfix for it came out.

Michal_Vodicka-2 · July 24, 2009, 1:11pm

> -----Original Message-----

From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@vertical.com
Sent: Friday, July 24, 2009 7:07 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] How to hook into NTOSKRNL PopThermalDeviceHandler?

If we don’t figure out this hook, one solution might be to
create a custom NTOSKRNL.exe that ignores any Critical
Shutdown requests, but that’s a bad solution since we’d be
chasing NTOSKRNL.exe versions every time a new security
hotfix for it came out.

If could be made configurable using a registry for example. Not custom
but standard feature and only you would use it.

Best regards,

Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]

Email secured by Check Point at OSR.COM

Peter_Viscarola_OSR · July 24, 2009, 1:38pm

This sounds like a sensor/motherboard bug or failure to me.

I have to ask: Why not just swap out the motherboard (or the system) every time this happens? Change motherboard versions? (obviously, I know nothing about the logistics of dealing with your customer base… but I’ve got to ask, as this seems to me to be the easiest solution).

Peter
OSR

Taed_Wynnell · July 24, 2009, 5:23pm

> Why not just swap out the motherboard (or the system) every time this happens?

We do that, but it just keeps happening on other systems. And the runtime of the systems isn’t correlated; sometimes, it’s on a new system (with a few weeks of runtime), sometimes on an old one (with 2 years of runtime). That certainly doesn’t sound like a hardware problem, particularly since it rarely reproduces, and even then, it was just a single reproduction months later.

After dealing with this problem about once a month for almost 2 years, it’s clear that it’s going to keep happening, so it would be nice if we could just prevent it in software.

We will probably move to another motherboard anyway, but that doesn’t help with the installed base of 1000 systems – it would be very expensive both in parts and manpower to replace all of those motherboards (or even to just upgrade the BIOS) throughout the US.

We had a similar sort of problem years ago where there was a hard drive firmware problem (it took me about 9 months to convice the very large hard drive vendor of that, though the fix came very quickly after that), and there was no way to update the firmware from Windows. So, we had to have people visit 3000 locations at that time to upgrade the drive firmware from a boot CD. That was painful in every aspect: money, manpower, time, etc… At least logistically, I was able to scan the systems remotely to see which drives still had bad firmware, so we could figure out which sites were missed and so on. Still to this day, I find the occassional bad firware in the field because a bad drive had been swapped out for some spare that they had in some back room that had been missed.

Peter_Viscarola_OSR · July 25, 2009, 10:49am

With all due respect, it sounds exactly like a hardware problem to me.

We’re talking solid-state devices here, not mechanical ones. The accumulated run-time needn’t be related to the failure. In addition to problems in the sensor, the internal chip logic, etc… there are a whole host of environmental possibilities such a EMI.

OK, OK… be that as it may, it sounds like going around and changing everyone’s hardware would be a freakin’ nightmare. And swapping the board with another (obviously) won’t guarantee correct operation… so I get that board swapping isn’t the real and final answer.

Sounds like you conclusion that some cool reverse engineering and a strategically placed hack are indeed the right way to go. I fully concur.

Oh… no… that would be a very bad idea, as you mentioned, and for a whole host of reasons. ESPECIALLY if you don’t already have a source code license and derivatives grant (which it sounds like you don’t).

You’d be waaaaaaaay better off with a well-designed binary hack (funny phrase, huh?) than a source code hack that requires a private build of the OS. I don’t even like to THINK about that.

If you decide to go the binary hack route, this is almost certainly something we (OSR) could help you design and implement with as much safety/reliability as possible,

Peter
OSR

anton_bassov · July 25, 2009, 3:12pm

> We’d like to somehow hook into PopThermalDeviceHandler to prevent Windows from

doing the Critical Shutdown. While I’m a driver writer, I’m not sure where to start on something
like this. Does anyone have any advice for me about how to start on such a project?

First of all, you seem to be barking at the wrong tree. If you look at the call stack, it becomes obvious that the whole thing is handled in context of a worker thread. Apparently, when thermal interrupt gets raised ACPI’s ISR queues a DPC that signals an event that worker thread waits on, so that the thread proceeds to shutdown. Therefore, what you should hook is not code that thread executes but thermal interrupt handler so that the event does not get signaled, in the first place. You have to make your hooking code detect the condition of your interest ( if interrupt got raised for some other reason it should defer execution to system-provided interrupt handler), handle interrupt and return. Just be careful with what you do - otherwise you may end up with damaged hardware and/or set the whole thing on fire…

Anton Bassov

Pavel_A1 · July 25, 2009, 5:32pm

wrote in message news:xxxxx@ntdev…
>> We’d like to somehow hook into PopThermalDeviceHandler to prevent Windows
>> from
>> doing the Critical Shutdown. While I’m a driver writer, I’m not sure
>> where to start on something
>> like this. Does anyone have any advice for me about how to start on such
>> a project?
>
>
> First of all, you seem to be barking at the wrong tree. If you look at
> the call stack, it becomes obvious that the whole thing is handled in
> context of a worker thread. Apparently, when thermal interrupt gets raised
> ACPI’s ISR queues a DPC that signals an event that worker thread waits on,
> so that the thread proceeds to shutdown. Therefore, what you should hook
> is not code that thread executes but thermal interrupt handler so that the
> event does not get signaled, in the first place. You have to make your
> hooking code detect the condition of your interest ( if interrupt got
> raised for some other reason it should defer execution to system-provided
> interrupt handler), handle interrupt and return. Just be careful with what
> you do - otherwise you may end up with damaged hardware and/or set the
> whole thing on fire…
>
>
> Anton Bassov

They already tried to prevent the detection of the thermal condition:

So it seems that the root cause is yet to be clarified ( some another
sensor?)

–pa

anton_bassov · July 25, 2009, 6:08pm

> They already tried to prevent the detection of the thermal condition:

…and, according to the OP, it worked fine on their test system, which strongly suggests that as long as thermal interrupt handler does not enter execution the problem does not arise.

So it seems that the root cause is yet to be clarified ( some another sensor?)

It may be just a spurious thing (for example, motherboard bug) , especially taking into consideration that this problem arises only once in a while…

Anton Bassov

Doron_Holan · July 25, 2009, 6:16pm

having no idea how the OS gets the thermal data, let’s see if this is on course or not. i would hazard a guess that the thermal data is reported via a device in the acpi tables. the OS would subsequently enumerate that device and probably install acpi.sys on it. furthermore, i would guess that this instance of acpi would enable a specific device interface which the kernel would open and then acpi would notify the OS (either through a pended irp or a callback,i have no idea) of the even.

if this is all correct (god knows), you could probably install an upper filter on top of this device and catch the OS opening the device or registering the callback. from there, you can make the thermal data disappear. this is the course i would pursue with microsoft PSS first before you ask for a one off kernel

d

anton_bassov · July 25, 2009, 6:27pm

> furthermore, i would guess that this instance of acpi would enable a specific device interface

which the kernel would open and then acpi would notify the OS (either through a pended irp
or a callback,i have no idea) of the even.

Before ACPI can notify the OS it has to get notified itself about the event, so that I think thermal interrupt
is still at the root of the whole thing…

if this is all correct (god knows), you could probably install an upper filter on top of this device
and catch the OS opening the device or registering the callback.

…but then you have to know what you are looking for, which is not going to be the case if we are speaking about some undocumented proprietary interface that is meant to be hidden from the rest of the world…

Anton Bassov

Doron_Holan · July 25, 2009, 6:37pm

yes, it is undocumented and this is why i told him to go ask pss. if there is a need, i am sure they can be convinced to give him the definitions he needs to make this work (if it works this way)

d

Cay_Bremer · July 26, 2009, 7:12am

Instead of writing a filter driver or hooking the kernel, there’s a simpler but unaesthetic way to solve this problem.

The ACPI bus driver enumerates a raw device (ACPI\ThermalZone) and creates/registers a GUID_DEVICE_THERMAL_ZONE instance. The kernel is notified of its arrival, opens it and connects to it by referencing the topmost device object. I’m not entirely sure what happens next, but I think it sends an IRP of IRP_MJ_DEVICE_CONTROL with an IoControlCode of 0x294080, 0x298084 or 0x298088 down the stack which the bus driver pends.
Disclaimer: This information was gathered from a Windows XP kernel and might be outdated.

Long story short, you could follow Doron’s more elegant solution and write an upper filter driver that fails all IRP_MJ_CREATE requests, or you could simply install a bogus filter driver (i.e., specify a driver which doesn’t exist) and prevent the device stack from starting in the first place.

Cay

anton_bassov · July 26, 2009, 7:50am

> Long story short, you could follow Doron’s more elegant solution and write an upper filter driver

that fails all IRP_MJ_CREATE requests, or you could simply install a bogus filter driver (i.e., specify
a driver which doesn’t exist) and prevent the device stack from starting in the first place.

The very first objection to the above approach that gets into one’s head is that the system may get
“surprised” if it fails to send an IRP to the target device, let alone set up the entire target stack. Therefore, even if you go for the filter option, rather than interrupt hooking, apparently you still have to make sure that everything looks fine and dandy, from the OS’s perspective - it should believe that its callback never gets invoked simply because the event of interest never occurs, but callback registration itself should be successful…

Anton Bassov

Cay_Bremer · July 26, 2009, 8:00am

At least on my test machine, both options fail gracefully.

Cay

On Sun, 26 Jul 2009 13:50:22 +0200, wrote:
>> Long story short, you could follow Doron’s more elegant solution and
>> write an upper filter driver
>> that fails all IRP_MJ_CREATE requests, or you could simply install a
>> bogus filter driver (i.e., specify
>> a driver which doesn’t exist) and prevent the device stack from
>> starting in the first place.
>
> The very first objection to the above approach that gets into one’s head
> is that the system may get
> “surprised” if it fails to send an IRP to the target device, let alone
> set up the entire target stack. Therefore, even if you go for the
> filter option, rather than interrupt hooking, apparently you still have
> to make sure that everything looks fine and dandy, from the OS’s
> perspective - it should believe that its callback never gets invoked
> simply because the event of interest never occurs, but callback
> registration itself should be successful…
>
> Anton Bassov

Peter_Viscarola_OSR · July 26, 2009, 11:39am

You need to grab the ACPI spec and read-up on the alert, instrument the system, and then observe the emergency over-temp shutdown in the lab. Then you need to identify possible work-arounds for the the problem, and choose the one with the fewest disadvantages for your product. That’s what we would do here. Anything else is just guesswork, right?

I’m not saying the filter driver route isn’t potentially promising – I just don’t recall how the emergency over-temp shutdown works in the power manager. It’s not necessarily clear which device to filter. The emergency over-temp shutdown may or may not be handled by the ThermalZone device.

Peter
OSR

Cay_Bremer · July 26, 2009, 12:10pm

I’ve only done static reverse engineering and a bit of experimentation,
but given the fact that a custom kernel turned the shutdown into a bug
check, and that PopThermalDeviceHandler() conditionally calls a routine
aptly named PopCriticalShutdown, I am fairly certain that disrupting
driver-kernel communication will prevent these shutdowns.

Cay

On Sun, 26 Jul 2009 17:39:45 +0200, wrote:
>

>
> You need to grab the ACPI spec and read-up on the alert, instrument the
> system, and then observe the emergency over-temp shutdown in the lab.
> Then you need to identify possible work-arounds for the the problem, and
> choose the one with the fewest disadvantages for your product. That’s
> what we would do here. Anything else is just guesswork, right?
>
> I’m not saying the filter driver route isn’t potentially promising – I
> just don’t recall how the emergency over-temp shutdown works in the
> power manager. It’s not necessarily clear which device to filter. The
> emergency over-temp shutdown may or may not be handled by the
> ThermalZone device.
>
> Peter
> OSR
>
>

Pavel_A1 · July 27, 2009, 5:55am

Cay Bremer wrote:

… I am fairly certain that disrupting
driver-kernel communication will prevent these shutdowns.

Not likely. In a properly designed board, the event visible by the
software (interrupt, ACPI event etc.) only provides an early
warning, so that the software could wrap up cleanly.
There can (should) be another line of defense to shut down anyway,
if the software ignores this warning.
This can be another sensor with a higher threshold, or timeout in
handling of the first event.

–pa

Cay_Bremer · July 27, 2009, 7:08am

I’ve read up on the ACPI spec (as Peter suggested) and it does confirm
your point:
“The system must disable the power either after the temperature reaches
some hardware-determined level above _CRT or after a predetermined time
has passed.”
In this case, depending on the nature of the false alarm, a software
solution may (final threshold) or may not (elapsed time) work.

Cay

On Mon, 27 Jul 2009 11:55:15 +0200, Pavel A. wrote:
> Cay Bremer wrote:
>> … I am fairly certain that disrupting driver-kernel communication
>> will prevent these shutdowns.
>
> Not likely. In a properly designed board, the event visible by the
> software (interrupt, ACPI event etc.) only provides an early
> warning, so that the software could wrap up cleanly.
> There can (should) be another line of defense to shut down anyway,
> if the software ignores this warning.
> This can be another sensor with a higher threshold, or timeout in
> handling of the first event.
>
> --pa