Is disk or memory reinitialized for a crash dump?

OSR_Community_User · January 7, 2007, 7:12am

I get a WinXP bugcheck 80 (NMI_HARDWARE_FAILURE)
Arg1: 004f4454 Arg2: 0 Arg3: 0 Arg4: 0

Since Windows successfully does full memory dump, can I assume that whatever caused the NMI, has not affected disk, memory and so on?

Does Windows re-initialize the chipset, disk controller etc. for this or other hardware related bugchecks?

Thanks,
–PA

Mark_Roddy · January 7, 2007, 11:50am

Windows creates an alternate low level storage stack for dumps (the dumpstack) that contains cloned versions of disk.sys and the boot adapter driver. You can find these in the kernel by listing the modules from the debugger and looking for modules named dump_*. The dump stack is created as part of the boot process.

On a bugcheck with dumps enabled the adapter dump driver goes through its initialization process, which ought to re-enable access to the adapter and to the disks attached to the adapter for most crashes. Obviously if the pci bus the adapter is connected is compromised or unreachable nothing will happen, and if the adapter itself has failed, or the dump disk has failed, and these failures are not transient, no dump will occur.

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-274855-
xxxxx@lists.osr.com] On Behalf Of xxxxx@writeme.com
Sent: Sunday, January 07, 2007 7:13 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Is disk or memory reinitialized for a crash dump?

I get a WinXP bugcheck 80 (NMI_HARDWARE_FAILURE)
Arg1: 004f4454 Arg2: 0 Arg3: 0 Arg4: 0

Since Windows successfully does full memory dump, can I assume that
whatever caused the NMI, has not affected disk, memory and so on?

Does Windows re-initialize the chipset, disk controller etc. for this
or other hardware related bugchecks?

Thanks,
–PA

Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Jake_Oshins · January 7, 2007, 6:25pm

Let me add to what Mark said.

Windows will attempt to avoid perturbing machine state as much as is
practical when attempting to write a dump file. But, over the years, our
customers have told us that there are goals more important than that. In
particular, they want a strong guarantee that the machine will successfully
reboot and restart operation every time, no matter what. This has lead to
an architecture where we invoke a series of “bugcheck callbacks” at crash
time which attempt to put the hardware into a workable state before the
reboot. Your NIC driver, for instance, probably registers one to stop its
common buffer DMA, etc. Every time we modify the crash path to do anything,
it means that some extra machine state gets affected. Because this
architecture is open to use by third-parties, I can’t tell you for sure what
will happen in your machine.

With an NMI, the situation is even worse. You can’t count on memory or I/O
busses working correctly after an NMI. So your guess is as good as mine
about whether you’ll get what you want.

Jake Oshins
Windows Kernel Team

“Mark Roddy” wrote in message news:xxxxx@ntdev…
Windows creates an alternate low level storage stack for dumps (the
dumpstack) that contains cloned versions of disk.sys and the boot adapter
driver. You can find these in the kernel by listing the modules from the
debugger and looking for modules named dump_*. The dump stack is created as
part of the boot process.

On a bugcheck with dumps enabled the adapter dump driver goes through its
initialization process, which ought to re-enable access to the adapter and
to the disks attached to the adapter for most crashes. Obviously if the pci
bus the adapter is connected is compromised or unreachable nothing will
happen, and if the adapter itself has failed, or the dump disk has failed,
and these failures are not transient, no dump will occur.

> -----Original Message-----
> From: xxxxx@lists.osr.com [mailto:bounce-274855-
> xxxxx@lists.osr.com] On Behalf Of xxxxx@writeme.com
> Sent: Sunday, January 07, 2007 7:13 AM
> To: Windows System Software Devs Interest List
> Subject: [ntdev] Is disk or memory reinitialized for a crash dump?
>
> I get a WinXP bugcheck 80 (NMI_HARDWARE_FAILURE)
> Arg1: 004f4454 Arg2: 0 Arg3: 0 Arg4: 0
>
> Since Windows successfully does full memory dump, can I assume that
> whatever caused the NMI, has not affected disk, memory and so on?
>
> Does Windows re-initialize the chipset, disk controller etc. for this
> or other hardware related bugchecks?
>
> Thanks,
> --PA
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer

David_J_Craig · January 7, 2007, 6:39pm

This is probably why there are some cases where the only recovery that works
is to power the system down. I have had it happen upon occassion, but it is
very infrequent. I do build my own systems, but generally use good parts
and don’t just go for the cheapest.

“Jake Oshins” wrote in message
news:xxxxx@ntdev…
> Let me add to what Mark said.
>
> Windows will attempt to avoid perturbing machine state as much as is
> practical when attempting to write a dump file. But, over the years, our
> customers have told us that there are goals more important than that. In
> particular, they want a strong guarantee that the machine will
> successfully reboot and restart operation every time, no matter what.
> This has lead to an architecture where we invoke a series of “bugcheck
> callbacks” at crash time which attempt to put the hardware into a workable
> state before the reboot. Your NIC driver, for instance, probably
> registers one to stop its common buffer DMA, etc. Every time we modify
> the crash path to do anything, it means that some extra machine state gets
> affected. Because this architecture is open to use by third-parties, I
> can’t tell you for sure what will happen in your machine.
>
> With an NMI, the situation is even worse. You can’t count on memory or
> I/O busses working correctly after an NMI. So your guess is as good as
> mine about whether you’ll get what you want.
>
> - Jake Oshins
> Windows Kernel Team
>
>
>
> “Mark Roddy” wrote in message news:xxxxx@ntdev…
> Windows creates an alternate low level storage stack for dumps (the
> dumpstack) that contains cloned versions of disk.sys and the boot adapter
> driver. You can find these in the kernel by listing the modules from the
> debugger and looking for modules named dump_*. The dump stack is created
> as part of the boot process.
>
> On a bugcheck with dumps enabled the adapter dump driver goes through its
> initialization process, which ought to re-enable access to the adapter and
> to the disks attached to the adapter for most crashes. Obviously if the
> pci bus the adapter is connected is compromised or unreachable nothing
> will happen, and if the adapter itself has failed, or the dump disk has
> failed, and these failures are not transient, no dump will occur.
>
>> -----Original Message-----
>> From: xxxxx@lists.osr.com [mailto:bounce-274855-
>> xxxxx@lists.osr.com] On Behalf Of xxxxx@writeme.com
>> Sent: Sunday, January 07, 2007 7:13 AM
>> To: Windows System Software Devs Interest List
>> Subject: [ntdev] Is disk or memory reinitialized for a crash dump?
>>
>> I get a WinXP bugcheck 80 (NMI_HARDWARE_FAILURE)
>> Arg1: 004f4454 Arg2: 0 Arg3: 0 Arg4: 0
>>
>> Since Windows successfully does full memory dump, can I assume that
>> whatever caused the NMI, has not affected disk, memory and so on?
>>
>> Does Windows re-initialize the chipset, disk controller etc. for this
>> or other hardware related bugchecks?
>>
>> Thanks,
>> --PA
>>
>>
>> —
>> Questions? First check the Kernel Driver FAQ at
>> http://www.osronline.com/article.cfm?id=256
>>
>> To unsubscribe, visit the List Server section of OSR Online at
>> http://www.osronline.com/page.cfm?name=ListServer
>
>
>
>
>

Mark_Roddy · January 7, 2007, 7:00pm

There are many cases where bugchecks don’t even get invoked, and if there is
no attached debugger the chance of getting a dump is nil. Dump reliability
is a major headache for those of us dealing with high availability servers.
The general rule is that any system crash due to software failure ought to
be a onetime occurrence, and that goal is hard to achieve when you are in
‘no dump no clue’ state.

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-274869-
xxxxx@lists.osr.com] On Behalf Of David J. Craig
Sent: Sunday, January 07, 2007 6:39 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Is disk or memory reinitialized for a crash dump?

This is probably why there are some cases where the only recovery that
works
is to power the system down. I have had it happen upon occassion, but
it is
very infrequent. I do build my own systems, but generally use good
parts
and don’t just go for the cheapest.

“Jake Oshins” wrote in message
> news:xxxxx@ntdev…
> > Let me add to what Mark said.
> >
> > Windows will attempt to avoid perturbing machine state as much as is
> > practical when attempting to write a dump file. But, over the years,
> our
> > customers have told us that there are goals more important than that.
> In
> > particular, they want a strong guarantee that the machine will
> > successfully reboot and restart operation every time, no matter what.
> > This has lead to an architecture where we invoke a series of
> “bugcheck
> > callbacks” at crash time which attempt to put the hardware into a
> workable
> > state before the reboot. Your NIC driver, for instance, probably
> > registers one to stop its common buffer DMA, etc. Every time we
> modify
> > the crash path to do anything, it means that some extra machine state
> gets
> > affected. Because this architecture is open to use by third-parties,
> I
> > can’t tell you for sure what will happen in your machine.
> >
> > With an NMI, the situation is even worse. You can’t count on memory
> or
> > I/O busses working correctly after an NMI. So your guess is as good
> as
> > mine about whether you’ll get what you want.
> >
> > - Jake Oshins
> > Windows Kernel Team
> >
> >
> >
> > “Mark Roddy” wrote in message
> news:xxxxx@ntdev…
> > Windows creates an alternate low level storage stack for dumps (the
> > dumpstack) that contains cloned versions of disk.sys and the boot
> adapter
> > driver. You can find these in the kernel by listing the modules from
> the
> > debugger and looking for modules named dump_*. The dump stack is
> created
> > as part of the boot process.
> >
> > On a bugcheck with dumps enabled the adapter dump driver goes through
> its
> > initialization process, which ought to re-enable access to the
> adapter and
> > to the disks attached to the adapter for most crashes. Obviously if
> the
> > pci bus the adapter is connected is compromised or unreachable
> nothing
> > will happen, and if the adapter itself has failed, or the dump disk
> has
> > failed, and these failures are not transient, no dump will occur.
> >
> >> -----Original Message-----
> >> From: xxxxx@lists.osr.com [mailto:bounce-274855-
> >> xxxxx@lists.osr.com] On Behalf Of xxxxx@writeme.com
> >> Sent: Sunday, January 07, 2007 7:13 AM
> >> To: Windows System Software Devs Interest List
> >> Subject: [ntdev] Is disk or memory reinitialized for a crash dump?
> >>
> >> I get a WinXP bugcheck 80 (NMI_HARDWARE_FAILURE)
> >> Arg1: 004f4454 Arg2: 0 Arg3: 0 Arg4: 0
> >>
> >> Since Windows successfully does full memory dump, can I assume that
> >> whatever caused the NMI, has not affected disk, memory and so on?
> >>
> >> Does Windows re-initialize the chipset, disk controller etc. for
> this
> >> or other hardware related bugchecks?
> >>
> >> Thanks,
> >> --PA
> >>
> >>
> >> —
> >> Questions? First check the Kernel Driver FAQ at
> >> http://www.osronline.com/article.cfm?id=256
> >>
> >> To unsubscribe, visit the List Server section of OSR Online at
> >> http://www.osronline.com/page.cfm?name=ListServer
> >
> >
> >
> >
> >
>
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · January 8, 2007, 12:42pm

Thanks for all replies.

Is there any way to find what exactly caused bugcheck 80 (NMI_HARDWARE_FAILURE) ?

We suspect an exception on PCI express bus.
Does PCI express root controller provide indication whether it raised an error?
If yes, can my bugcheck hook see this indication before it is reset by the dump drivers?

–PA

Tim_Roberts · January 8, 2007, 1:35pm

xxxxx@writeme.com wrote:

Thanks for all replies.

Is there any way to find what exactly caused bugcheck 80 (NMI_HARDWARE_FAILURE) ?

Well, sort of. An NMI caused it. No, that’s not very helpful…

We suspect an exception on PCI express bus.

That’s possible. Bad SIMMs or caches are other common causes.

Does PCI express root controller provide indication whether it raised an error?

Well, only that it’s NMI signal will be asserted. Do you have a
PCIExpress bus analyzer? You should be able to see error traffic
leading up to the fault, if this was the cause.

If yes, can my bugcheck hook see this indication before it is reset by the dump drivers?

No, I don’t think there is anything you can do in software. This is not
considered to be a recoverable situation (hence “non-maskable”).

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Beverly_Brown · January 8, 2007, 2:05pm

This has been a sore spot for me for a long time. The source of a NMI
should be identifiable. Microsoft and chipset vendors should be
working together to have a mechanism for reporting whether the cause
was PCIE or memory or whatever other possible source there may be in a
system.

I know it’s doable. In the past, I have written a NMI handler that was
able to trace down the exact PCI bus in the system that contained the
failing device. Granted this was on a relatively simple chipset on NT4
and windows 2000. It would be a nasty task to undertake in a regular
driver on current chipsets and operating systems (impossible on 64-bit
systems since hooking the IDT is prohibited), but for the OS vendor
working together with chipset vendors it should be possible to come up
with a solution for this.

It is not enough to report that a NMI occurred. Reporting the source
of the NMI when multiple sources are possible is important for being
able to diagnose a problem when it occurs. A bus error on PCIE
shouldn’t cause the system to halt anyway (other platforms - Solaris
comes to mind - handle it as a non-fatal error). With PCIE, it should
be easier to determine which device caused it since PCIE is
point-to-point. The device can then be surprise-removed and leave the
rest of the system running provided the device is not critical to
system operation. Non-maskable doesn’t necessarily have to mean fatal.
You always want to know that it happened, but an intelligent decision
can be made about how to handle it once you know what caused it to
happen.

Beverly

On 1/8/07, Tim Roberts wrote:
> xxxxx@writeme.com wrote:
> > Thanks for all replies.
> >
> > Is there any way to find what exactly caused bugcheck 80 (NMI_HARDWARE_FAILURE) ?
> >
>
> Well, sort of. An NMI caused it. No, that’s not very helpful…
>
> > We suspect an exception on PCI express bus.
> >
>
> That’s possible. Bad SIMMs or caches are other common causes.
>
> > Does PCI express root controller provide indication whether it raised an error?
> >
>
> Well, only that it’s NMI signal will be asserted. Do you have a
> PCIExpress bus analyzer? You should be able to see error traffic
> leading up to the fault, if this was the cause.
>
> > If yes, can my bugcheck hook see this indication before it is reset by the dump drivers?
> >
>
> No, I don’t think there is anything you can do in software. This is not
> considered to be a recoverable situation (hence “non-maskable”).
>
> –
> Tim Roberts, xxxxx@probo.com
> Providenza & Boekelheide, Inc.
>
>
> —
> Questions? First check the Kernel Driver FAQ at http://www.osronline.com/article.cfm?id=256
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>

Jake_Oshins · January 8, 2007, 2:57pm

We are. There’s an effort here that goes across chipset vendors and BIOS
makers and Windows. It’s called “WHEA” for “Windows Hardware Error
Architecture” (I think) and it will arrive with Longhorn Server.

Jake Oshins
Windows Kernel Team

“Beverly Brown” wrote in message
news:xxxxx@ntdev…
>
> This has been a sore spot for me for a long time. The source of a NMI
> should be identifiable. Microsoft and chipset vendors should be
> working together to have a mechanism for reporting whether the cause
> was PCIE or memory or whatever other possible source there may be in a
> system.
>
> I know it’s doable. In the past, I have written a NMI handler that was
> able to trace down the exact PCI bus in the system that contained the
> failing device. Granted this was on a relatively simple chipset on NT4
> and windows 2000. It would be a nasty task to undertake in a regular
> driver on current chipsets and operating systems (impossible on 64-bit
> systems since hooking the IDT is prohibited), but for the OS vendor
> working together with chipset vendors it should be possible to come up
> with a solution for this.
>
> It is not enough to report that a NMI occurred. Reporting the source
> of the NMI when multiple sources are possible is important for being
> able to diagnose a problem when it occurs. A bus error on PCIE
> shouldn’t cause the system to halt anyway (other platforms - Solaris
> comes to mind - handle it as a non-fatal error). With PCIE, it should
> be easier to determine which device caused it since PCIE is
> point-to-point. The device can then be surprise-removed and leave the
> rest of the system running provided the device is not critical to
> system operation. Non-maskable doesn’t necessarily have to mean fatal.
> You always want to know that it happened, but an intelligent decision
> can be made about how to handle it once you know what caused it to
> happen.
>
>
>
> Beverly
>
> On 1/8/07, Tim Roberts wrote:
>> xxxxx@writeme.com wrote:
>> > Thanks for all replies.
>> >
>> > Is there any way to find what exactly caused bugcheck 80
>> > (NMI_HARDWARE_FAILURE) ?
>> >
>>
>> Well, sort of. An NMI caused it. No, that’s not very helpful…
>>
>> > We suspect an exception on PCI express bus.
>> >
>>
>> That’s possible. Bad SIMMs or caches are other common causes.
>>
>> > Does PCI express root controller provide indication whether it raised
>> > an error?
>> >
>>
>> Well, only that it’s NMI signal will be asserted. Do you have a
>> PCIExpress bus analyzer? You should be able to see error traffic
>> leading up to the fault, if this was the cause.
>>
>> > If yes, can my bugcheck hook see this indication before it is reset by
>> > the dump drivers?
>> >
>>
>> No, I don’t think there is anything you can do in software. This is not
>> considered to be a recoverable situation (hence “non-maskable”).
>>
>> –
>> Tim Roberts, xxxxx@probo.com
>> Providenza & Boekelheide, Inc.
>>
>>
>> —
>> Questions? First check the Kernel Driver FAQ at
>> http://www.osronline.com/article.cfm?id=256
>>
>> To unsubscribe, visit the List Server section of OSR Online at
>> http://www.osronline.com/page.cfm?name=ListServer
>>
>

OSR_Community_User · January 8, 2007, 3:55pm

Jake,

This is exciting news - WHEA and LH server… unfortunately the NMI strikes today, and we’re talking about WinXP

The PCI config space of the device has the error status information.
So the question is, will the OS or PCI driver reset my device before my bugcheck hook can read the device config. space?

Another question - how to read the config space in bugcheck context…

Thanks,
Pavel

Beverly_Brown · January 8, 2007, 4:31pm

This is certainly a big step in the right direction. I have heard of
the WHEA effort and was concerned that much of the implementation was
being left to system vendors and was afraid that they wouldn’t spend
the money to implement the PSHEDs that would be required to make it
work. Is WHEA going to be a logo requirement?

I was also hoping that the solution would not be limited to servers.
Non-server systems experience the occasional NMI as well. I can see
targeting a server system for initial implementation, but will it
eventually be supported on non-server systems?

Beverly

On 1/8/07, Jake Oshins wrote:
> We are. There’s an effort here that goes across chipset vendors and BIOS
> makers and Windows. It’s called “WHEA” for “Windows Hardware Error
> Architecture” (I think) and it will arrive with Longhorn Server.
>
> - Jake Oshins
> Windows Kernel Team
>
>
> “Beverly Brown” wrote in message
> news:xxxxx@ntdev…
> >
> > This has been a sore spot for me for a long time. The source of a NMI
> > should be identifiable. Microsoft and chipset vendors should be
> > working together to have a mechanism for reporting whether the cause
> > was PCIE or memory or whatever other possible source there may be in a
> > system.
> >
> > I know it’s doable. In the past, I have written a NMI handler that was
> > able to trace down the exact PCI bus in the system that contained the
> > failing device. Granted this was on a relatively simple chipset on NT4
> > and windows 2000. It would be a nasty task to undertake in a regular
> > driver on current chipsets and operating systems (impossible on 64-bit
> > systems since hooking the IDT is prohibited), but for the OS vendor
> > working together with chipset vendors it should be possible to come up
> > with a solution for this.
> >
> > It is not enough to report that a NMI occurred. Reporting the source
> > of the NMI when multiple sources are possible is important for being
> > able to diagnose a problem when it occurs. A bus error on PCIE
> > shouldn’t cause the system to halt anyway (other platforms - Solaris
> > comes to mind - handle it as a non-fatal error). With PCIE, it should
> > be easier to determine which device caused it since PCIE is
> > point-to-point. The device can then be surprise-removed and leave the
> > rest of the system running provided the device is not critical to
> > system operation. Non-maskable doesn’t necessarily have to mean fatal.
> > You always want to know that it happened, but an intelligent decision
> > can be made about how to handle it once you know what caused it to
> > happen.
> >
> >
> >
> > Beverly
> >
> > On 1/8/07, Tim Roberts wrote:
> >> xxxxx@writeme.com wrote:
> >> > Thanks for all replies.
> >> >
> >> > Is there any way to find what exactly caused bugcheck 80
> >> > (NMI_HARDWARE_FAILURE) ?
> >> >
> >>
> >> Well, sort of. An NMI caused it. No, that’s not very helpful…
> >>
> >> > We suspect an exception on PCI express bus.
> >> >
> >>
> >> That’s possible. Bad SIMMs or caches are other common causes.
> >>
> >> > Does PCI express root controller provide indication whether it raised
> >> > an error?
> >> >
> >>
> >> Well, only that it’s NMI signal will be asserted. Do you have a
> >> PCIExpress bus analyzer? You should be able to see error traffic
> >> leading up to the fault, if this was the cause.
> >>
> >> > If yes, can my bugcheck hook see this indication before it is reset by
> >> > the dump drivers?
> >> >
> >>
> >> No, I don’t think there is anything you can do in software. This is not
> >> considered to be a recoverable situation (hence “non-maskable”).
> >>
> >> –
> >> Tim Roberts, xxxxx@probo.com
> >> Providenza & Boekelheide, Inc.
> >>
> >>
> >> —
> >> Questions? First check the Kernel Driver FAQ at
> >> http://www.osronline.com/article.cfm?id=256
> >>
> >> To unsubscribe, visit the List Server section of OSR Online at
> >> http://www.osronline.com/page.cfm?name=ListServer
> >>
> >
>
>
>
> —
> Questions? First check the Kernel Driver FAQ at http://www.osronline.com/article.cfm?id=256
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>

OSR_Community_User · January 8, 2007, 6:08pm

JAKE (ET. AL.):

I’ve only been periodically following, and really don’t know much about
the issue, but, just out of curiosity, I was wondering if MCA would at
least partially address this problem. In any case, while checking out
the WHEA documentation, I noticed in the overview, it states:

Note[:] The system management interrupt (SMI) is typically handled by
the firmware, not by the operating system.

I don’t particularly care about the MCA question, but I have an
interest in the statement about SMI. Is this necessarily true
(assuming, of course, no third party participation)? That is, is it the
case that Windows (or bootmgr or winload, for that matter) does not
install an SMI handler under any circumstance. I understand the SMM
architecture and why it is normally inaccessible and so forth, but I am
curious as to whether or not Windows has any involvement in the SMI,
particularly as some ICH’s (like the 6300ESB)seem to have provided the
capability to latch or trigger the SMI to everything but the kitched
sink, and mention OS callbacks?

Thanks,

mm

>> xxxxx@writeme.com 2007-01-08 15:55 >>>
Jake,

This is exciting news - WHEA and LH server… unfortunately the NMI
strikes today, and we’re talking about WinXP

The PCI config space of the device has the error status information.
So the question is, will the OS or PCI driver reset my device before my
bugcheck hook can read the device config. space?

Another question - how to read the config space in bugcheck
context…

Thanks,
Pavel

Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · January 8, 2007, 6:16pm

JAKE (ET. AL.):

Sorry. I missed the word “typically” in the WHEA documentation. Your
thoughts would still be appreciated.

mm

>> xxxxx@evitechnology.com 2007-01-08 18:07 >>>
JAKE (ET. AL.):

I’ve only been periodically following, and really don’t know much
about
the issue, but, just out of curiosity, I was wondering if MCA would at
least partially address this problem. In any case, while checking out
the WHEA documentation, I noticed in the overview, it states:

Note[:] The system management interrupt (SMI) is typically handled by
the firmware, not by the operating system.

I don’t particularly care about the MCA question, but I have an
interest in the statement about SMI. Is this necessarily true
(assuming, of course, no third party participation)? That is, is it
the
case that Windows (or bootmgr or winload, for that matter) does not
install an SMI handler under any circumstance. I understand the SMM
architecture and why it is normally inaccessible and so forth, but I
am
curious as to whether or not Windows has any involvement in the SMI,
particularly as some ICH’s (like the 6300ESB)seem to have provided the
capability to latch or trigger the SMI to everything but the kitched
sink, and mention OS callbacks?

Thanks,

mm

>> xxxxx@writeme.com 2007-01-08 15:55 >>>
Jake,

This is exciting news - WHEA and LH server… unfortunately the NMI
strikes today, and we’re talking about WinXP

The PCI config space of the device has the error status information.
So the question is, will the OS or PCI driver reset my device before
my
bugcheck hook can read the device config. space?

Another question - how to read the config space in bugcheck
context…

Thanks,
Pavel

Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · January 9, 2007, 8:57am

MM: Would you mind to start a new thread for SMI discussion.

I still hope for some advice of finding the cause of NMI bugcheck.

–PA