Understanding HalpPciReadMmConfigUlong Error

Folks -

I have a PCIe device causes problems on a Dell (Precision 390 and PowerEdge 2500) running Windows XP and Vista. This problem occurs just at the tail end of the boot process. I get a Blue Screen saying NMI: Parity Check / Memory Parity Error. I don’t see this problem when the card is installed in several different systems.

I’d like to find out what the HAL is doing so I can see about getting the PCIe card to work properly. I have hooked up the debugger and stack dump is provided. Any suggestions on how to get more information?

Thanks,
mike

eax=00000005 ebx=00000000 ecx=00000001 edx=8206bf4f esi=fffffffe edi=000001df
eip=81949cbc esp=8197beb8 ebp=8197bf04 iopl=0 nv up di pl zr na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000046
nt!RtlpBreakWithStatusInstruction:
81949cbc cc int 3
ChildEBP RetAddr Args to Child
8197beb4 81835ebc 00000005 81970444 000001df nt!RtlpBreakWithStatusInstruction (FPO: [1,0,0])
8197bf04 81836d31 00000005 8197bf24 82060b09 nt!KiBugCheckDebugBreak+0x1c
8197bf10 82060b09 8206fdd8 000001f0 84c70028 nt!KeEnterKernelDebugger+0x45
8197bf24 8205d432 a0c70028 8197bf50 818102c5 hal!HalpNMIHalt+0xff (FPO: [Non-Fpo])
8197bf30 818102c5 84c70028 83f192e8 8206fdd8 hal!HalBugCheckSystem+0x5a (FPO: [Non-Fpo])
8197bf50 82060ba3 00000000 00000000 8195609a nt!WheaReportHwError+0x179
8197bf5c 8195609a 00000000 871fa448 8197f14a hal!HalHandleNMI+0x93 (FPO: [Non-Fpo])
8197bf5c 82059b7c 00000000 871fa448 8197f14a nt!KiTrap02+0x136 (FPO: [0,0] TrapFrame @ 8197bf70)
8197bfe0 87187544 00000010 00000000 00000000 hal!HalpPciReadMmConfigUlong+0x10 (FPO: [3,0,0])
WARNING: Frame IP not in any known module. Following frames may be wrong.
8197bff0 00000000 00000000 80152000 00000000 0x87187544

Have you considered the most obvious, faulty or misconfigured RAM, did you
run memdiag ?

//Daniel

wrote in message news:xxxxx@ntdev…
> Folks -
>
> I have a PCIe device causes problems on a Dell (Precision 390 and
> PowerEdge 2500) running Windows XP and Vista. This problem occurs just at
> the tail end of the boot process. I get a Blue Screen saying NMI: Parity
> Check / Memory Parity Error. I don’t see this problem when the card is
> installed in several different systems.
>
> I’d like to find out what the HAL is doing so I can see about getting the
> PCIe card to work properly. I have hooked up the debugger and stack dump
> is provided. Any suggestions on how to get more information?
>
> Thanks,
> mike
>
> eax=00000005 ebx=00000000 ecx=00000001 edx=8206bf4f esi=fffffffe
> edi=000001df
> eip=81949cbc esp=8197beb8 ebp=8197bf04 iopl=0 nv up di pl zr na pe
> nc
> cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000
> efl=00000046
> nt!RtlpBreakWithStatusInstruction:
> 81949cbc cc int 3
> ChildEBP RetAddr Args to Child
> 8197beb4 81835ebc 00000005 81970444 000001df
> nt!RtlpBreakWithStatusInstruction (FPO: [1,0,0])
> 8197bf04 81836d31 00000005 8197bf24 82060b09 nt!KiBugCheckDebugBreak+0x1c
> 8197bf10 82060b09 8206fdd8 000001f0 84c70028 nt!KeEnterKernelDebugger+0x45
> 8197bf24 8205d432 a0c70028 8197bf50 818102c5 hal!HalpNMIHalt+0xff (FPO:
> [Non-Fpo])
> 8197bf30 818102c5 84c70028 83f192e8 8206fdd8 hal!HalBugCheckSystem+0x5a
> (FPO: [Non-Fpo])
> 8197bf50 82060ba3 00000000 00000000 8195609a nt!WheaReportHwError+0x179
> 8197bf5c 8195609a 00000000 871fa448 8197f14a hal!HalHandleNMI+0x93 (FPO:
> [Non-Fpo])
> 8197bf5c 82059b7c 00000000 871fa448 8197f14a nt!KiTrap02+0x136 (FPO: [0,0]
> TrapFrame @ 8197bf70)
> 8197bfe0 87187544 00000010 00000000 00000000
> hal!HalpPciReadMmConfigUlong+0x10 (FPO: [3,0,0])
> WARNING: Frame IP not in any known module. Following frames may be wrong.
> 8197bff0 00000000 00000000 80152000 00000000 0x87187544
>
>
>

Thanks Daniel -

I’ll look into running memdiag. Would you suspect the problem to be with the actual RAM of the computer? This problem has happend on more than one system. When I do run memdiag, I will not be able to do it with the PCIe card installed since that causes the BSoD - even in Safe Mode.

mike

What does the CATC trace show?

wrote in message news:xxxxx@ntdev…
> Thanks Daniel -
>
> I’ll look into running memdiag. Would you suspect the problem to be with
> the actual RAM of the computer? This problem has happend on more than one
> system. When I do run memdiag, I will not be able to do it with the PCIe
> card installed since that causes the BSoD - even in Safe Mode.
>
> mike
>

mpb@pt.com wrote:

Thanks Daniel -

I’ll look into running memdiag. Would you suspect the problem to be with the actual RAM of the computer? This problem has happend on more than one system. When I do run memdiag, I will not be able to do it with the PCIe card installed since that causes the BSoD - even in Safe Mode.

If the problem has happened on more than one system, then the fault is
almost certainly with your board. An NMI can be caused by a PCIe
protocol violation. Some root complex chipsets are more forgiving of
slight timing problems than others.

Does your PCIe chipset handle spread-spectrum clocking? Do you have a
bus analyzer at your disposal to get the real details?


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Tim -

I’m conviced that the problem is with the PCIe card. I’m just trying to figure out what. I was hoping to get low-level information from Windows to tell me why it is doing something wrong.

Thanks,
mike

Install a chk kernel + hal and see if anything appears in the debugger

d

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of mpb@pt.com
Sent: Tuesday, July 15, 2008 10:36 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Understanding HalpPciReadMmConfigUlong Error

Tim -

I’m conviced that the problem is with the PCIe card. I’m just trying to figure out what. I was hoping to get low-level information from Windows to tell me why it is doing something wrong.

Thanks,
mike


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

mpb@pt.com wrote:

I’m conviced that the problem is with the PCIe card. I’m just trying to figure out what. I was hoping to get low-level information from Windows to tell me why it is doing something wrong.

The problem is that this is happening at a level below Windows. This is
not like a blue screen, where some Windows component decided to bring
down the system in an orderly fashion. Instead, the root complex
hardware issued an NMI, which Windows must treat as an unrecoverable
condition. In the general case, Windows simply doesn’t KNOW anything more.

A PCIExpress bus analyzer will probably be your next stop. They aren’t
cheap.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Tim -

I agree with everything you are saying. I would like to say that I have also run both Linux and Solaris on the same system and do not have a problem. That’s why I want to find out what Windows finds so offensicve.

Thanks,
mike