I’d like to reliably detect that a BSoD happened. I see three ways of doing it:
Check the existence of a crash dump or whether the event 1001* was
emitted, but the problem is that both can be disabled by user
Check whether event 41** was emitted, but when the crash dump file
isn’t generated this event won’t provide the bug check details
Create a driver that registers a bug check callback, but I believe
this callback, because it executes at HIGH_LEVEL, won’t be able to
write to a file or registry so I can retrieve this information after
reboot.
Am I correct about 3? Is there another way to get this information?
BR,
Thiago
* Event ID: 1001, Source: BugCheck, Message:
The computer has rebooted from a bugcheck. The bugcheck was: (…)
** Event ID: 41, Source: Kernel-Power, Message:
The system has rebooted without cleanly shutting down first. This
error could be caused if the system stopped responding, crashed, or
lost power unexpectedly.
On Tue, 7 Jun 2016, Thiago Figueredo Cardoso wrote:
Create a driver that registers a bug check callback, but I believe
this callback, because it executes at HIGH_LEVEL, won’t be able to
write to a file or registry so I can retrieve this information after
reboot.
You could create a file beforehand, read the physical sector number and
then write to that in the callback.
If the system is bugchecking it means that its current state is corrupt, and, hence writing to disk files in this state may result in persistent data corruption. This is why you should not be allowed to do anything in a callback - you should not be allowed write to files, and you should not be able to signal an event to unblock a thread that may attempt to do so either. This is why, in order to ensure you don’t attempt anything “creative”, it runs at HIGH_LEVEL…
Yes, of course. In this way, Branten’s suggestion would be considered
“creative”, wouldn’t it?
As a matter of curiosity, Branten, could you provide me some hints on
how this can be done (get the physical sector number and write to it)?
Thiago
On Wed, Jun 8, 2016 at 9:48 AM, wrote: >> Am I correct about 3? > > In a way, yes… > > If the system is bugchecking it means that its current state is corrupt, and, hence writing to disk files in this state may result in persistent data corruption. This is why you should not be allowed to do anything in a callback - you should not be allowed write to files, and you should not be able to signal an event to unblock a thread that may attempt to do so either. This is why, in order to ensure you don’t attempt anything “creative”, it runs at HIGH_LEVEL… > > > Anton Bassov > > > — > NTDEV is sponsored by OSR > > Visit the list online at: http: > > MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers! > Details at http: > > To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:>
On Wed, 8 Jun 2016, Thiago Figueredo Cardoso wrote:
As a matter of curiosity, Branten, could you provide me some hints on
how this can be done (get the physical sector number and write to it)?
FSCTL_GET_RETRIEVAL_POINTER_BASE and then create youre own IRP to send to
the disk driver. This is how Windows itself saves the crash dump by
writing it to the pagefile from where it is copied to its own file after
the next boot.
On Wed, Jun 8, 2016 at 1:23 PM, Tim Roberts wrote: > Thiago Figueredo Cardoso wrote: >> I’d like to reliably detect that a BSoD happened. I see three ways of doing it: >> >> 1. Check the existence of a crash dump or whether the event 1001* was >> emitted, but the problem is that both can be disabled by user > > If the user said he doesn’t want to know about crash dumps, then what > business is it of yours to try to override him? It’s his computer, not > yours. >
I don’t think that when the user disables the creation of dump files necessarily means he doesn’t want to know about the crashes. For instance, a user could disable the creation of dump files once he wouldn’t analyze them, but keep the creation of an event in the event log so he could know when it happened. The thing is that Windows only creates the event when the dump file is created (even if this scenario I mentioned is allowed to be configured). I don’t want to override him, that’s why I want to know if Windows saves this information in another way.
Or handle shutdown notifications by writing an “all clear” record somewhere
and looking for that record on start and deleting it. Sure there are holes
you can miss early and late crashes - but it is simple to implement and
works for like 99.9% of the cases and is likely good enough.
Mark Roddy
On Wed, Jun 8, 2016 at 1:18 PM, Thiago Figueredo Cardoso <
xxxxx@cesar.org.br> wrote:
Thanks, Branten.
On Wed, Jun 8, 2016 at 1:23 PM, Tim Roberts wrote: > > Thiago Figueredo Cardoso wrote: > >> I’d like to reliably detect that a BSoD happened. I see three ways of > doing it: > >> > >> 1. Check the existence of a crash dump or whether the event 1001* was > >> emitted, but the problem is that both can be disabled by user > > > > If the user said he doesn’t want to know about crash dumps, then what > > business is it of yours to try to override him? It’s his computer, not > > yours. > > > > I don’t think that when the user disables the creation of dump files > necessarily means he doesn’t want to know about the crashes. For > instance, a user could disable the creation of dump files once he > wouldn’t analyze them, but keep the creation of an event in the event > log so he could know when it happened. The thing is that Windows only > creates the event when the dump file is created (even if this scenario > I mentioned is allowed to be configured). I don’t want to override > him, that’s why I want to know if Windows saves this information in > another way. > > — > NTDEV is sponsored by OSR > > Visit the list online at: < > http://www.osronline.com/showlists.cfm?list=ntdev> > > MONTHLY seminars on crash dump analysis, WDF, Windows internals and > software drivers! > Details at http: > > To unsubscribe, visit the List Server section of OSR Online at < > http://www.osronline.com/page.cfm?name=ListServer> ></http:>
When the bugcheck callback runs, the disk stack is not functioning. You cannot write anything, other than calling the crashdump driver, which you cannot do directly, either.
@Thiago:
What problem are exactly trying to solve by detecting bugcheck?
> In this way, Branten’s suggestion would be considered “creative”, wouldn’t it?
Sure - allocating an IRP while IO Manager is not functional and sending it down the disk stack that is not functional either is, indeed, very “creative” approach. In fact, I don’t know how the OS handles the task of saving a crash dump, but I am pretty sure it pre-allocates and reserves everything in advance so that the operation does not have to go via the “regular” FSD and storage stacks…
On Wed, Jun 8, 2016 at 3:59 PM, Mark Roddy wrote: > Or handle shutdown notifications by writing an “all clear” record somewhere > and looking for that record on start and deleting it. Sure there are holes - > you can miss early and late crashes - but it is simple to implement and > works for like 99.9% of the cases and is likely good enough. >
Yeap, that works for most cases and probably I’ll use something like this, but I’m still searching for a way that works for all cases. The kernel emits some signals that can be used to detect an abnormal shutdown. One that can be used is the event ID 41. In this case there is no need to write the “all clear” record
You understand that the kernel itself does not have some magic spell to determine this information either right? It does exactly the same think ? write some special value somewhere on a clean shutdown and then inspect this value when starting to try to determine what recovery steps might be needed if there was a crash etc. Obvious limitation include the previous boot never made it far enough to update this value or if the crash occurred during shutdown after this was written but there are many others including arbitrary state corruption or malicious modification of disk media while offline
Your own mechanism will be at least as reliable and much easier to implement and maintain. Just use a marker file or a registry key and don?t worry about corner cases that you can?t solve anyways
From: Thiago Figueredo Cardosomailto:xxxxx Sent: June 9, 2016 9:49 AM To: Windows System Software Devs Interest Listmailto:xxxxx Subject: Re: [ntdev] Reliably detect BSoD
On Wed, Jun 8, 2016 at 3:59 PM, Mark Roddy wrote: > Or handle shutdown notifications by writing an “all clear” record somewhere > and looking for that record on start and deleting it. Sure there are holes - > you can miss early and late crashes - but it is simple to implement and > works for like 99.9% of the cases and is likely good enough. >
Yeap, that works for most cases and probably I’ll use something like this, but I’m still searching for a way that works for all cases. The kernel emits some signals that can be used to detect an abnormal shutdown. One that can be used is the event ID 41. In this case there is no need to write the “all clear” record