How to extract more detail from MACHINE_CHECK_EXCEPTION (0x9c) ? [SEC=UNCLASSIFIED]

Hi all,

I have recently had a kernel panic + reset due to a MACHINE_CHECK_EXCEPTION
(9c). I would like to use this machine check + kernel dump as an opportunity to
learn how to get more details as to exactly why the machine check occurred.

Can anyone recommend how to interrogate this memory dump further to extract more
details out ?

MACHINE_CHECK_EXCEPTION (9c)

A fatal Machine Check Exception has occurred.

NOTE: This is a hardware error. This error was reported by the CPU
via Interrupt 18. This analysis will provide more information about
the specific error. Please contact the manufacturer for additional
information about this error and troubleshooting assistance.

This error is documented in the following publication:

  • IA-32 Intel(r) Architecture Software Developer’s Manual
    Volume 3: System Programming Guide

Concatenated Error Code:

_VAL_UC_EN_MISCV_ADDRV_PCC_BUSCONNERR_1F

This error code can be reported back to the manufacturer.
They may be able to provide additional information based upon
this error. All questions regarding STOP 0x9C should be
directed to the hardware manufacturer.

BUGCHECK_STR: 0x9C_GenuineIntel

DEFAULT_BUCKET_ID: DRIVER_FAULT
PROCESS_NAME: cscript.exe
CURRENT_IRQL: 0
LAST_CONTROL_TRANSFER: from e1288154 to e107c4a0

STACK_TEXT:
f5f4d280 e1288154 0000009c 00000000 f5f4d2b0 nt!KeBugCheckEx+0x1b
f5f4d3b4 e127f86f f5f47fe0 00000000 00000000
hal!HalpMcaExceptionHandler+0x11e
f5f4d3b4 e1049f99 f5f47fe0 00000000 00000000
hal!HalpMcaExceptionHandlerWrapper+0x77
bdf42cd4 e1046dc7 fc4a04a0 0007d1fb bdf42cec nt!MiFindNodeOrParent+0x1e
bdf42cf0 e1049e12 fcaf2870 c03007d0 c01f47ec nt!MiLocateAddress+0x3b
bdf42d04 e10475a7 7d1fb0e5 bdf42d3c bdf42d34
nt!MiCheckVirtualAddress+0x42
bdf42d4c e1036c4c 00000000 7d1fb0e5 00000001 nt!MmAccessFault+0xad7
bdf42d4c 7d1fb0e5 00000000 7d1fb0e5 00000001 nt!KiTrap0E+0xdc
WARNING: Frame IP not in any known module. Following frames may be wrong.
0013f5c4 00000000 00000000 00000000 00000000 0x7d1fb0e5

STACK_COMMAND: kb

FOLLOWUP_IP:
nt!MiFindNodeOrParent+1e
e1049f99 72c8 jb nt!MiFindNodeOrParent+0x20 (e1049f63)
SYMBOL_STACK_INDEX: 3
SYMBOL_NAME: nt!MiFindNodeOrParent+1e
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: nt
DEBUG_FLR_IMAGE_TIMESTAMP: 4b27c5b8
IMAGE_NAME: memory_corruption
FAILURE_BUCKET_ID: 0x9C_GenuineIntel_nt!MiFindNodeOrParent+1e
BUCKET_ID: 0x9C_GenuineIntel_nt!MiFindNodeOrParent+1e
Followup: MachineOwner

Thanks!

-Alex

IMPORTANT: This email remains the property of the Department of Defence and is subject to the jurisdiction of section 70 of the Crimes Act 1914. If you have received this email in error, you are requested to contact the sender and delete the email.

I don’t see these often and when I do I’ve never had a great deal of luck
tracking down more info out of them, they tend to be very vendor specific.
What’s the full !analyze -v output? Does it provide any further clues on
where to look?

-scott


Scott Noone
Consulting Associate and Chief System Problem Analyst
OSR Open Systems Resources, Inc.
http://www.osronline.com

“Wilkinson, Alex” wrote in message news:xxxxx@windbg…

Hi all,

I have recently had a kernel panic + reset due to a MACHINE_CHECK_EXCEPTION
(9c). I would like to use this machine check + kernel dump as an opportunity
to
learn how to get more details as to exactly why the machine check occurred.

Can anyone recommend how to interrogate this memory dump further to extract
more
details out ?

MACHINE_CHECK_EXCEPTION (9c)

A fatal Machine Check Exception has occurred.

NOTE: This is a hardware error. This error was reported by the CPU
via Interrupt 18. This analysis will provide more information about
the specific error. Please contact the manufacturer for additional
information about this error and troubleshooting assistance.

This error is documented in the following publication:

  • IA-32 Intel(r) Architecture Software Developer’s Manual
    Volume 3: System Programming Guide

Concatenated Error Code:

_VAL_UC_EN_MISCV_ADDRV_PCC_BUSCONNERR_1F

This error code can be reported back to the manufacturer.
They may be able to provide additional information based upon
this error. All questions regarding STOP 0x9C should be
directed to the hardware manufacturer.

BUGCHECK_STR: 0x9C_GenuineIntel

DEFAULT_BUCKET_ID: DRIVER_FAULT
PROCESS_NAME: cscript.exe
CURRENT_IRQL: 0
LAST_CONTROL_TRANSFER: from e1288154 to e107c4a0

STACK_TEXT:
f5f4d280 e1288154 0000009c 00000000 f5f4d2b0 nt!KeBugCheckEx+0x1b
f5f4d3b4 e127f86f f5f47fe0 00000000 00000000
hal!HalpMcaExceptionHandler+0x11e
f5f4d3b4 e1049f99 f5f47fe0 00000000 00000000
hal!HalpMcaExceptionHandlerWrapper+0x77
bdf42cd4 e1046dc7 fc4a04a0 0007d1fb bdf42cec nt!MiFindNodeOrParent+0x1e
bdf42cf0 e1049e12 fcaf2870 c03007d0 c01f47ec nt!MiLocateAddress+0x3b
bdf42d04 e10475a7 7d1fb0e5 bdf42d3c bdf42d34
nt!MiCheckVirtualAddress+0x42
bdf42d4c e1036c4c 00000000 7d1fb0e5 00000001 nt!MmAccessFault+0xad7
bdf42d4c 7d1fb0e5 00000000 7d1fb0e5 00000001 nt!KiTrap0E+0xdc
WARNING: Frame IP not in any known module. Following frames may be wrong.
0013f5c4 00000000 00000000 00000000 00000000 0x7d1fb0e5

STACK_COMMAND: kb

FOLLOWUP_IP:
nt!MiFindNodeOrParent+1e
e1049f99 72c8 jb nt!MiFindNodeOrParent+0x20 (e1049f63)
SYMBOL_STACK_INDEX: 3
SYMBOL_NAME: nt!MiFindNodeOrParent+1e
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: nt
DEBUG_FLR_IMAGE_TIMESTAMP: 4b27c5b8
IMAGE_NAME: memory_corruption
FAILURE_BUCKET_ID: 0x9C_GenuineIntel_nt!MiFindNodeOrParent+1e
BUCKET_ID: 0x9C_GenuineIntel_nt!MiFindNodeOrParent+1e
Followup: MachineOwner

Thanks!

-Alex

IMPORTANT: This email remains the property of the Department of Defence and
is subject to the jurisdiction of section 70 of the Crimes Act 1914. If you
have received this email in error, you are requested to contact the sender
and delete the email.

On 31 March 2011 07:41, Wilkinson, Alex
wrote:
>
> Can anyone recommend how to interrogate this memory dump further to extract more
> details out ?
>

AMD provides a utility called mcat.exe to diagnose machine check
exceptions. Unfortunately, in your case it’s an Intel CPU, and I
don’t think Intel provide a similar utility.

I always consider an 0x9C to be a hardware issue, and tend to look
straight for components that might be showing signs of faults
elsewhere (eg HP IML or WMIxWDM events in the system log).

I’m not sure how much help this will be, but:

http://blogs.msdn.com/b/joshpoley/archive/2007/12/03/debugging-a-bugcheck-9c.aspx


AdamT
Puns: The leading cause of Lojban advocacy.

Wilkinson, Alex wrote:

Hi all,

I have recently had a kernel panic + reset due to a MACHINE_CHECK_EXCEPTION
(9c). I would like to use this machine check + kernel dump as an opportunity to
learn how to get more details as to exactly why the machine check occurred.

Can anyone recommend how to interrogate this memory dump further to extract more
details out ?

Concatenated Error Code:

_VAL_UC_EN_MISCV_ADDRV_PCC_BUSCONNERR_1F

In my experience, the most common form of machine check is a bus error,
and the “BUSCONERR” in that string would tend to support that. I have
seen this when a PCIExpress board goes bad (time for reform school), or
when a PCI board is not fully seated in its socket, or when a PCI board
triggers a parity error, and so on.

We had one machine where there was a piece of lint between the processor
and its socket, so that one of the balls didn’t fully seat. Banging on
the desk in the wrong way would cause a machine check.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.