Dump switch / debugging a hung system

I’ve been trying to track down a system hang (system unresponsive, mouse
cursor won’t move, no BSOD, can’t break in with windbg (either with Break on
the debugger or SysReq on the target PC)) that occurs about one second after
using stopping a stream on our audio device (WDM Audio Driver) on certain
hardware platforms. Target PC is a AMD Athlon XP 2400+ CPU (2 GHz) with VIA
VT8375 KM266 / VT8235 chipset (FIC AM37-L motherboard). OS platform is
Windows XP Pro (either “Gold” or SP1). The same crash happens on Windows XP
Home SP1 (no surprise there) and Windows 2000 Pro. If I try our legacy
MME/SYS driver on Windows XP, the hang also occurs. Interestingly, our
legacy MME/VxD driver for the same card on Windows 98 SE works flawlessly on
the same machine.

I turned on Driver Verifier for all drivers, ran under the checked build,
put breakpoints in my ISR, etc., all to no avail. So as a last resort, I
wired up a “Dump Switch” per the instructions at
http://www.microsoft.com/whdc/system/CEC/dmpsw.mspx?pf=true and set the
NMICrashDump registry value to 1, as described in the article. My logic
analyzer shows the PCI SERR# line driven low for a few clocks when I press
the dump switch (one clock when my card drives it low and a few more clocks
as it floats back high). But no crash dump happens. I did this when the
system was working properly, not when it was hung. The test platform is the
checked build of Windows XP Pro, SP1.

Device Manager shows the Computer is “ACPI Multiprocessor PC”, using
ntkrnlpa.exe with a hal.dll of halaacpi.dll.

This specific incarnation of hal.dll isn’t mentioned in the above article as
supporting the dump switch, but the article was last updated on December 4,
2001, so that might not be surprising. Or it might mean I’m out of luck.

Does anyone know if the dump switch ought to work on this platform/HAL? And
does anyone have any suggestions as to how to track down a hang that happens
about 1 second after my driver ought to be out of the picture? The driver
works fine on several other test machines here as well as over 100 beta
tester machines (both single and dual CPU). I know that doesn’t mean the
driver isn’t broken somewhere, but at least it shows the driver isn’t
completely brain dead.

Thank you!

-Dan

If I had to guess, I’d say that your host bridge has its handling of SERR#
disabled so it doesn’t generate an NMI when you pull on the SERR line (or
some other bridge in the path is not passing SERR# through).

Kind of tricky to check/fix – you’d need the data sheets for the chipset in
your system and then you’d have to pole around at the configuration (windbg
!pci, !dd, !ed, ib and ob commands are useful for this)

Good luck!
Simon

-----Original Message-----
From: Daniel E. Germann [mailto:xxxxx@visi.com]
Sent: Friday, August 06, 2004 5:45 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Dump switch / debugging a hung system

I’ve been trying to track down a system hang (system unresponsive, mouse
cursor won’t move, no BSOD, can’t break in with windbg (either with Break on
the debugger or SysReq on the target PC)) that occurs about one second after
using stopping a stream on our audio device (WDM Audio Driver) on certain
hardware platforms. Target PC is a AMD Athlon XP 2400+ CPU (2 GHz) with VIA
VT8375 KM266 / VT8235 chipset (FIC AM37-L motherboard). OS platform is
Windows XP Pro (either “Gold” or SP1). The same crash happens on Windows XP
Home SP1 (no surprise there) and Windows 2000 Pro. If I try our legacy
MME/SYS driver on Windows XP, the hang also occurs. Interestingly, our
legacy MME/VxD driver for the same card on Windows 98 SE works flawlessly on
the same machine.

I turned on Driver Verifier for all drivers, ran under the checked build,
put breakpoints in my ISR, etc., all to no avail. So as a last resort, I
wired up a “Dump Switch” per the instructions at
http://www.microsoft.com/whdc/system/CEC/dmpsw.mspx?pf=true and set the
NMICrashDump registry value to 1, as described in the article. My logic
analyzer shows the PCI SERR# line driven low for a few clocks when I press
the dump switch (one clock when my card drives it low and a few more clocks
as it floats back high). But no crash dump happens. I did this when the
system was working properly, not when it was hung. The test platform is the
checked build of Windows XP Pro, SP1.

Device Manager shows the Computer is “ACPI Multiprocessor PC”, using
ntkrnlpa.exe with a hal.dll of halaacpi.dll.

This specific incarnation of hal.dll isn’t mentioned in the above article as
supporting the dump switch, but the article was last updated on December 4,
2001, so that might not be surprising. Or it might mean I’m out of luck.

Does anyone know if the dump switch ought to work on this platform/HAL? And
does anyone have any suggestions as to how to track down a hang that happens
about 1 second after my driver ought to be out of the picture? The driver
works fine on several other test machines here as well as over 100 beta
tester machines (both single and dual CPU). I know that doesn’t mean the
driver isn’t broken somewhere, but at least it shows the driver isn’t
completely brain dead.

Thank you!

-Dan


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@stratus.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Beyond that, you’ll have to ensure that NMI is enabled at the proper Local
or I/O APIC input. This won’t happen automatically unless the BIOS tells
the OS which input is the right one. The catch is that if the BIOS doesn’t
do that (and most commodity BIOSes don’t) then you don’t know which one is
the right one either. Only the motherboard manufacturer knows, and he
probably isn’t telling. I generally run through both of the Local APIC LVT
entries (see Intel’s manuals for details) first when trying to enable them.
You’ll have to guess at polarity and trigger mode. When you guess wrong,
you’re machine will crash. It takes a while to find the right combo.
Fortunately, on most machines, there are only two LVT entry possibilites,
two trigger modes and two polarities, giving eight possible combinations.

If you want to know if it got configured automatically, type !mapic in the
debugger. (This will only work with an ACPI APIC HAL, though you say that
that’s what you have.) If any of the entries list an NMI source then it
will be set up for you.


Jake Oshins
Windows Kernel Group

This posting is provided “AS IS” with no warranties, and confers no rights.

“Graham, Simon” wrote in message
news:xxxxx@ntdev…
> If I had to guess, I’d say that your host bridge has its handling of SERR#
> disabled so it doesn’t generate an NMI when you pull on the SERR line (or
> some other bridge in the path is not passing SERR# through).
>
> Kind of tricky to check/fix – you’d need the data sheets for the chipset
> in
> your system and then you’d have to pole around at the configuration
> (windbg
> !pci, !dd, !ed, ib and ob commands are useful for this)
>
> Good luck!
> Simon
>
> -----Original Message-----
> From: Daniel E. Germann [mailto:xxxxx@visi.com]
> Sent: Friday, August 06, 2004 5:45 PM
> To: Windows System Software Devs Interest List
> Subject: [ntdev] Dump switch / debugging a hung system
>
> I’ve been trying to track down a system hang (system unresponsive, mouse
> cursor won’t move, no BSOD, can’t break in with windbg (either with Break
> on
> the debugger or SysReq on the target PC)) that occurs about one second
> after
> using stopping a stream on our audio device (WDM Audio Driver) on certain
> hardware platforms. Target PC is a AMD Athlon XP 2400+ CPU (2 GHz) with
> VIA
> VT8375 KM266 / VT8235 chipset (FIC AM37-L motherboard). OS platform is
> Windows XP Pro (either “Gold” or SP1). The same crash happens on Windows
> XP
> Home SP1 (no surprise there) and Windows 2000 Pro. If I try our legacy
> MME/SYS driver on Windows XP, the hang also occurs. Interestingly, our
> legacy MME/VxD driver for the same card on Windows 98 SE works flawlessly
> on
> the same machine.
>
> I turned on Driver Verifier for all drivers, ran under the checked build,
> put breakpoints in my ISR, etc., all to no avail. So as a last resort, I
> wired up a “Dump Switch” per the instructions at
> http://www.microsoft.com/whdc/system/CEC/dmpsw.mspx?pf=true and set the
> NMICrashDump registry value to 1, as described in the article. My logic
> analyzer shows the PCI SERR# line driven low for a few clocks when I press
> the dump switch (one clock when my card drives it low and a few more
> clocks
> as it floats back high). But no crash dump happens. I did this when the
> system was working properly, not when it was hung. The test platform is
> the
> checked build of Windows XP Pro, SP1.
>
> Device Manager shows the Computer is “ACPI Multiprocessor PC”, using
> ntkrnlpa.exe with a hal.dll of halaacpi.dll.
>
> This specific incarnation of hal.dll isn’t mentioned in the above article
> as
> supporting the dump switch, but the article was last updated on December
> 4,
> 2001, so that might not be surprising. Or it might mean I’m out of luck.
>
> Does anyone know if the dump switch ought to work on this platform/HAL?
> And
> does anyone have any suggestions as to how to track down a hang that
> happens
> about 1 second after my driver ought to be out of the picture? The driver
> works fine on several other test machines here as well as over 100 beta
> tester machines (both single and dual CPU). I know that doesn’t mean the
> driver isn’t broken somewhere, but at least it shows the driver isn’t
> completely brain dead.
>
> Thank you!
>
> -Dan
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@stratus.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>