We occassionally see the NMI_HARDWARE_FAILURE bluescreen on a bunch of our
machines:
Call your hardware vendor for support
NMI: Parity Check / Memory Parity Error
*** The system has halted ***
*** STOP: 0x00000080 (0x004F4454,0x00000000,0x00000000,0x00000000)
By walking through the HalHandleNMI assembly, I know that because it says
“Parity Check” that means that it saw that the SERR (PCI System Error) bit
set in I/O port 0x61 (NMISC NMI Status and Control Register). (Oddly
enough, a parity check does NOT mean that it saw the PCI PERR signal – that
seems to be ignored by NT.)
So, my questions are:
-
Can anything other than a PCI SERR signal or a RAM parity error generate
that failure? I do not suspect RAM since we run ECC RAM and a supporting
motherboard, we see it on many different systems, and it only reproduces on
a few machines. We design our own hardware, so I suspect PCI SERR, but I
want to make sure that there couldn’t be any other possibilities.
-
What does the “0x004F4454” parameter mean? It’s hard-coded that way in
the HAL assembly, so I figure it must hold some meaning to someone. I
noticed that it’s the ASCII string “TDO”, but can’t imagine what that stands
for.
> By walking through the HalHandleNMI assembly, I know that because it
says
“Parity Check” that means that it saw that the SERR (PCI System
Error) bit
set in I/O port 0x61 (NMISC NMI Status and Control Register).
(Oddly
Looks like a hardware problem. And what other bits are set in port
0x61?
Max
Yes, I know it’s a hardware problem – recall that my first question asks
what other than PCI SERR and RAM error could cause that bluescreen. I don’t
know what other bits are set; it’s not reproducible and I doubt I could
debug an NMI if it were. I know that the IOCHK bit is not set since it
would have printed a further message if it were. The other bits in 0x61 are
unimportant.
-----Original Message-----
From: Maxim S. Shatskih [mailto:xxxxx@storagecraft.com]
Sent: Sunday 23 March 2003 7:00 AM
To: NT Developers Interest List
Subject: [ntdev] Re: NMI questions…
By walking through the HalHandleNMI assembly, I know that because it says
“Parity Check” that means that it saw that the SERR (PCI System Error) bit
set in I/O port 0x61 (NMISC NMI Status and Control Register). (Oddly
Looks like a hardware problem. And what other bits are set in port 0x61?
Max
You are currently subscribed to ntdev as: xxxxx@vertical.com
To unsubscribe send a blank email to xxxxx@lists.osr.com