Stopping an MSI / MSI-x interrupt storm in bus driver

Hello,

We are working on a bus driver for a third party card that can enumerate child devices. We were originally using pin-based interrupts, but now I have implemented MSI-x interrupt support and it appears to be working properly. The proper registry keys are set in the registry and the interrupts appear to work properly; I also see that the interrupt vectors are being set up as expected, and when I query and check the MessageSignaled bit of the WDF_INTERRUPT_INFO struct, I see that it is set to TRUE.

If I have the third party bus driver installed and I switch to our driver, everything works as expected. The problem is that if I boot the PC with our driver installed, things only work properly until a device is plugged into the bus. At this point, the system is swamped with interrupts unendingly and everything freezes. If I unplug the device, the system unfreezes itself, but I can see the interrupt storm continuing indefinitely.

We are not acknowledging the message signalled interrupts any differently than we were acknowledging pin-based interrupts, in terms of host controller registers.
It seems as though the third party driver is setting the card into some state the prevents the interrupt storm, but I’m not sure what this could be and we are not currently able to capture a PCI-e trace.

The third party developer is not willing to assist us, and I can’t find anything in the spec that would suggest we are missing a step or should be acknowledging the message interrupts in some general way as they come in.

I know that this is a total shot in the dark, but does anyone have any clues as to what might be causing this?

Thanks in advance.

Regards,

Richard

The device incorrectly fires MSI interrupts, as if they were level-triggered.

Are you sure, though, that you.re actually getting MSI, not legacy style?

On 02-May-2013 02:10, xxxxx@broadcom.com wrote:

The device incorrectly fires MSI interrupts, as if they were level-triggered.

Are you sure, though, that you.re actually getting MSI, not legacy style?

Maybe the original driver tells the device that it uses MSI,
so it behaves differently. Need to use sniffer.
–pa

It sounds to me like the device is asserting its line-based interrupt and your driver isn’t shutting that down. The third-party driver did. The reason the system is hanging is that some other driver has enabled the same I/O APIC input that your line-based interrupt is connected to and nobody is clearing it.

If your device implements the “interrupt pending” bit in the status register, look at that for proof. If not, boot your driver with MSI-X disabled and see which IRQ it’s sitting on. Then repro the problem and see if the storming IRQ is the same. !ioapic in the debugger will dump that information, along with !apic to look at the local APIC state to see which interrupts are actually pending at the processor.

  • Jake Oshins
    (former interrupt guy)
    Windows Kernel Team

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
Sent: Wednesday, May 1, 2013 3:30 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Stopping an MSI / MSI-x interrupt storm in bus driver

Hello,

We are working on a bus driver for a third party card that can enumerate child devices. We were originally using pin-based interrupts, but now I have implemented MSI-x interrupt support and it appears to be working properly. The proper registry keys are set in the registry and the interrupts appear to work properly; I also see that the interrupt vectors are being set up as expected, and when I query and check the MessageSignaled bit of the WDF_INTERRUPT_INFO struct, I see that it is set to TRUE.

If I have the third party bus driver installed and I switch to our driver, everything works as expected. The problem is that if I boot the PC with our driver installed, things only work properly until a device is plugged into the bus. At this point, the system is swamped with interrupts unendingly and everything freezes. If I unplug the device, the system unfreezes itself, but I can see the interrupt storm continuing indefinitely.

We are not acknowledging the message signalled interrupts any differently than we were acknowledging pin-based interrupts, in terms of host controller registers.
It seems as though the third party driver is setting the card into some state the prevents the interrupt storm, but I’m not sure what this could be and we are not currently able to capture a PCI-e trace.

The third party developer is not willing to assist us, and I can’t find anything in the spec that would suggest we are missing a step or should be acknowledging the message interrupts in some general way as they come in.

I know that this is a total shot in the dark, but does anyone have any clues as to what might be causing this?

Thanks in advance.

Regards,

Richard


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer