I’m looking for some advice and direction on a problem that I’m stuck
on.
Summary: We have systems with a PCI card that, seemingly depending on
which PCI devices it is sharing with, will either work fine, or the ISR
will never be called and the system will hang. I’m all confused.
Long version:
We have a Win2003 SP 2 system that we’ve been shipping for 10 years that
is a PBX. We recently moved to a new motherboard and BIOS (Kontron
ETX-PM). Most of the time, the system works fine. Interrupts work, the
system functions as expected. Furthermore, it shares interrupts with
other motherboard devices (video, Ethernet, USB) without a problem. The
card hardware is not new, it’s basically unchanged for 5 years. The card
driver is not new, it is unchanged for 3 years.
However, we’ve noticed that sometimes the PCI INT A interrupt line from
our card ends up sharing with a “bad sharer” (such as ACPI or one of the
USB devices (Intel 82801DB/DBM device 24C4)), and when that happens, the
very first interrupt that our card generates is never serviced. Using
WinDbg, we’ve set a breakpoint in the driver’s ISR, and it is never
called. However, !idt looks good and the KINTERRUPT objects look good.
The “bad sharer” information could, of course, just be coincidental with
some other issue.
We will let it sit for hours, and it just sits there servicing
higher-priority interrupts but never handles ours or calls our ISR.
We can then pull out our PCI card (yes, naughty, but bear with me),
which stops driving the interrupt, and the system is then fine and
responsive. That just shows that it’s our card generating the interrupt
and that just stopping it from driving is enough to “fix” the hang.
To me, it sounds like what is described in KB 824395 (“Interrupts that
come from a PCI device that uses a Windows NT 4.0-style driver are
ignored”), but someone at Microsoft checked, and that doesn’t seem to be
a problem in Win2003, despite what the KB article says.
I should stress again that MOST OF THE TIME THE SYSTEM WORKS FINE. The
only time it seems to have a problem is when it’s sharing with the ACPI
device or a particular USB device. Thus, it does not seem to be a
hardware or driver issue, but one with how the IRQ resources are
allocated or handled.
We’ve noticed that the system always is fine if we have a USB keyboard /
mouse in the system. We’ve discovered that if we re-flash the BIOS (thus
clearing any BIOS resource allocation) and don’t plug in any USB
devices, that we will have the problem. We can also force the problem by
assigning PCI INT A to IRQ 9 in the BIOS, which is the IRQ used by ACPI.
So, the goal – other than getting it to work correctly – is to figure
out why sharing with those particular devices causes a problem. Sharing
with other devices does not cause a problem, so it does not seem to be a
problem with the hardware or the driver.
Is there a way to get Windows to respect the BIOS settings, or some
other way to control Windows’ IRQ resource allocation?
Thanks in advance for any advice!