Reliably detect BSoD

Hi,

I’d like to reliably detect that a BSoD happened. I see three ways of doing it:

  1. Check the existence of a crash dump or whether the event 1001* was
    emitted, but the problem is that both can be disabled by user
  2. Check whether event 41** was emitted, but when the crash dump file
    isn’t generated this event won’t provide the bug check details
  3. Create a driver that registers a bug check callback, but I believe
    this callback, because it executes at HIGH_LEVEL, won’t be able to
    write to a file or registry so I can retrieve this information after
    reboot.

Am I correct about 3? Is there another way to get this information?

BR,
Thiago

* Event ID: 1001, Source: BugCheck, Message:
The computer has rebooted from a bugcheck. The bugcheck was: (…)

** Event ID: 41, Source: Kernel-Power, Message:
The system has rebooted without cleanly shutting down first. This
error could be caused if the system stopped responding, crashed, or
lost power unexpectedly.

On Tue, 7 Jun 2016, Thiago Figueredo Cardoso wrote:

  1. Create a driver that registers a bug check callback, but I believe
    this callback, because it executes at HIGH_LEVEL, won’t be able to
    write to a file or registry so I can retrieve this information after
    reboot.

You could create a file beforehand, read the physical sector number and
then write to that in the callback.

Bo Branten

> Am I correct about 3?

In a way, yes…

If the system is bugchecking it means that its current state is corrupt, and, hence writing to disk files in this state may result in persistent data corruption. This is why you should not be allowed to do anything in a callback - you should not be allowed write to files, and you should not be able to signal an event to unblock a thread that may attempt to do so either. This is why, in order to ensure you don’t attempt anything “creative”, it runs at HIGH_LEVEL…

Anton Bassov

Yes, of course. In this way, Branten’s suggestion would be considered
“creative”, wouldn’t it?

As a matter of curiosity, Branten, could you provide me some hints on
how this can be done (get the physical sector number and write to it)?

Thiago

On Wed, Jun 8, 2016 at 9:48 AM, wrote:
>> Am I correct about 3?
>
> In a way, yes…
>
> If the system is bugchecking it means that its current state is corrupt, and, hence writing to disk files in this state may result in persistent data corruption. This is why you should not be allowed to do anything in a callback - you should not be allowed write to files, and you should not be able to signal an event to unblock a thread that may attempt to do so either. This is why, in order to ensure you don’t attempt anything “creative”, it runs at HIGH_LEVEL…
>
>
> Anton Bassov
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: http:
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:>

On Wed, 8 Jun 2016, Thiago Figueredo Cardoso wrote:

As a matter of curiosity, Branten, could you provide me some hints on
how this can be done (get the physical sector number and write to it)?

FSCTL_GET_RETRIEVAL_POINTER_BASE and then create youre own IRP to send to
the disk driver. This is how Windows itself saves the crash dump by
writing it to the pagefile from where it is copied to its own file after
the next boot.

Bo Branten

Thiago Figueredo Cardoso wrote:

I’d like to reliably detect that a BSoD happened. I see three ways of doing it:

  1. Check the existence of a crash dump or whether the event 1001* was
    emitted, but the problem is that both can be disabled by user

If the user said he doesn’t want to know about crash dumps, then what
business is it of yours to try to override him? It’s his computer, not
yours.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Thanks, Branten.

On Wed, Jun 8, 2016 at 1:23 PM, Tim Roberts wrote:
> Thiago Figueredo Cardoso wrote:
>> I’d like to reliably detect that a BSoD happened. I see three ways of doing it:
>>
>> 1. Check the existence of a crash dump or whether the event 1001* was
>> emitted, but the problem is that both can be disabled by user
>
> If the user said he doesn’t want to know about crash dumps, then what
> business is it of yours to try to override him? It’s his computer, not
> yours.
>

I don’t think that when the user disables the creation of dump files
necessarily means he doesn’t want to know about the crashes. For
instance, a user could disable the creation of dump files once he
wouldn’t analyze them, but keep the creation of an event in the event
log so he could know when it happened. The thing is that Windows only
creates the event when the dump file is created (even if this scenario
I mentioned is allowed to be configured). I don’t want to override
him, that’s why I want to know if Windows saves this information in
another way.

Or handle shutdown notifications by writing an “all clear” record somewhere
and looking for that record on start and deleting it. Sure there are holes

  • you can miss early and late crashes - but it is simple to implement and
    works for like 99.9% of the cases and is likely good enough.

Mark Roddy

On Wed, Jun 8, 2016 at 1:18 PM, Thiago Figueredo Cardoso <
xxxxx@cesar.org.br> wrote:

Thanks, Branten.

On Wed, Jun 8, 2016 at 1:23 PM, Tim Roberts wrote:
> > Thiago Figueredo Cardoso wrote:
> >> I’d like to reliably detect that a BSoD happened. I see three ways of
> doing it:
> >>
> >> 1. Check the existence of a crash dump or whether the event 1001* was
> >> emitted, but the problem is that both can be disabled by user
> >
> > If the user said he doesn’t want to know about crash dumps, then what
> > business is it of yours to try to override him? It’s his computer, not
> > yours.
> >
>
> I don’t think that when the user disables the creation of dump files
> necessarily means he doesn’t want to know about the crashes. For
> instance, a user could disable the creation of dump files once he
> wouldn’t analyze them, but keep the creation of an event in the event
> log so he could know when it happened. The thing is that Windows only
> creates the event when the dump file is created (even if this scenario
> I mentioned is allowed to be configured). I don’t want to override
> him, that’s why I want to know if Windows saves this information in
> another way.
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: <
> http://www.osronline.com/showlists.cfm?list=ntdev&gt;
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer&gt;
></http:>

@Bo Branten:

When the bugcheck callback runs, the disk stack is not functioning. You cannot write anything, other than calling the crashdump driver, which you cannot do directly, either.

@Thiago:

What problem are exactly trying to solve by detecting bugcheck?

> In this way, Branten’s suggestion would be considered “creative”, wouldn’t it?

Sure - allocating an IRP while IO Manager is not functional and sending it down the disk stack that is not functional either is, indeed, very “creative” approach. In fact, I don’t know how the OS handles the task of saving a crash dump, but I am pretty sure it pre-allocates and reserves everything in advance so that the operation does not have to go via the “regular” FSD and storage stacks…

Anton Bassov

On Wed, Jun 8, 2016 at 3:59 PM, Mark Roddy wrote:
> Or handle shutdown notifications by writing an “all clear” record somewhere
> and looking for that record on start and deleting it. Sure there are holes -
> you can miss early and late crashes - but it is simple to implement and
> works for like 99.9% of the cases and is likely good enough.
>

Yeap, that works for most cases and probably I’ll use something like
this, but I’m still searching for a way that works for all cases. The
kernel emits some signals that can be used to detect an abnormal
shutdown. One that can be used is the event ID 41. In this case there
is no need to write the “all clear” record :slight_smile:

Let’s ask once again:
What problem are you exactly trying to solve by detecting the bugcheck?

You understand that the kernel itself does not have some magic spell to determine this information either right? It does exactly the same think ? write some special value somewhere on a clean shutdown and then inspect this value when starting to try to determine what recovery steps might be needed if there was a crash etc. Obvious limitation include the previous boot never made it far enough to update this value or if the crash occurred during shutdown after this was written but there are many others including arbitrary state corruption or malicious modification of disk media while offline

Your own mechanism will be at least as reliable and much easier to implement and maintain. Just use a marker file or a registry key and don?t worry about corner cases that you can?t solve anyways

Sent from Mailhttps: for Windows 10

From: Thiago Figueredo Cardosomailto:xxxxx
Sent: June 9, 2016 9:49 AM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: Re: [ntdev] Reliably detect BSoD

On Wed, Jun 8, 2016 at 3:59 PM, Mark Roddy wrote:
> Or handle shutdown notifications by writing an “all clear” record somewhere
> and looking for that record on start and deleting it. Sure there are holes -
> you can miss early and late crashes - but it is simple to implement and
> works for like 99.9% of the cases and is likely good enough.
>

Yeap, that works for most cases and probably I’ll use something like
this, but I’m still searching for a way that works for all cases. The
kernel emits some signals that can be used to detect an abnormal
shutdown. One that can be used is the event ID 41. In this case there
is no need to write the “all clear” record :slight_smile:


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:></mailto:xxxxx></mailto:xxxxx></https:>