NMI distribution on Dual CPU system

We have a dual CPU system running NT4 SP5. We get
a hang where we are failing to run a DPC on CPU 0. We then generated an NMI,
and got a dump, but we can only see the state of CPU 1. We get this about once every
three days when we are trying to repoduce it.

Is there a way I can find out what CPU 0 was doing from the dump? (~0 gets an error: can read PCR register)

We don’t know whether the bug is that we fail to drop the mask, that we are spinning on a lock,
or that an ISR never returns.

I presume that the APIC (we are using halmps) treats the NMI as a regular interrupt in that it
presents it to both CPUs, but only the first one gets it.

I’m trying to understand whether the the NMI will be distributed randomly between the CPUs,
and that we just have to wait until we happen to get a dump on the CPU actually hitting the problem,
or if the APIC will somehow prefer the CPU that doesn’t have the lock. i.e., I’m looking for some theory
on how the APIC distributes NMIs with two CPUs.

Thanks,

-DH

Dave Harvey, System Software Solutions, Inc.
617-964-7039, FAX 208-361-9395, xxxxx@syssoftsol.com, http://www.syssoftsol.com
Creators of RedunDisks - Robust RAID 1 for embedded systems.


You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com