How to debug a wired interrupt storm (device register shows no pending interrupt)

OSR_Community_User · May 29, 2010, 4:20am

My driver’s interrupt handler is called a lot and every time when the handler checks interrupt register of the device, it shows there’s no pending interrupt. Any idea why this is happening? Is there any way to confirm that OS did see an interrupt before it calls my interrupt handler? Thanks in advance!

James_Harper · May 29, 2010, 4:34am

>

My driver’s interrupt handler is called a lot and every time when the
handler
checks interrupt register of the device, it shows there’s no pending
interrupt. Any idea why this is happening? Is there any way to confirm
that OS
did see an interrupt before it calls my interrupt handler? Thanks in
advance!

I’d start with whatever is loaded into the IDT in your vector.

Are you sharing an interrupt with another device?

James

OSR_Community_User · May 29, 2010, 7:57am

In my EvtInterruptIsr I had to check the interrupt status of my device
and disable interrupts until I handled them in the deferred
EvtInterruptDpc. Windows seems to kind of de-assert the interrupt, so
it would immediately be signaled again in EvtInterruptIsr when not
disabled. In the EvtInterruptDpc I had to enable the interrupts again
after handling them.
Interestingly under OSX with the same technique of the separation of
signaling the interrupt in a high priority handler and deferring the
handling of the interrupt to a lower priority handler it was not
necessary to disable the interrupt in the meantime between signaling
and handling.

Best,
Hagen.

On Sat, May 29, 2010 at 10:33 AM, James Harper
wrote:
>>
>> My driver’s interrupt handler is called a lot and every time when the
> handler
>> checks interrupt register of the device, it shows there’s no pending
>> interrupt. Any idea why this is happening? Is there any way to confirm
> that OS
>> did see an interrupt before it calls my interrupt handler? Thanks in
> advance!
>>
>
> I’d start with whatever is loaded into the IDT in your vector.
>
> Are you sharing an interrupt with another device?
>
> James
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>

–
___________________
dynamic acoustics e.U.
Weyringergasse 37/3/11a
1040 Vienna / Austria
+43 680 1268 751

VAT: ATU65049413
FN: 326751t
IBAN: AT463200000010353811
BIC: RLNWATWW

Peter_Viscarola_OSR · May 29, 2010, 10:20am

This is the expected behavior when your device is sharing an interrupt with another device, which is very likely on the PCI bus when not using MSI.

Why do you think this is a problem?? Why are you labeling it a “storm”??

Peter
OSR

Maxim_S_Shatskih · May 29, 2010, 1:11pm

> In my EvtInterruptIsr I had to check the interrupt status of my device

Return FALSE from the ISR if your device’s status register says it is not interrupting now
Save the interrupt info from the registers to some devext fields
Notify the hardware that all interrupts are handled, so the hardware will deassert the ISR forever (till the next hardware event)
KeInsertQueueDpc
return TRUE

In the DPC:

for(;
{
under interrupt spinlock or in KeSynchronizeExecution callback
move the saved interrupt info from devext to locals
clean the saved interrupt info in the devext
end interrupt spinlock or in KeSynchronizeExecution callback
if the local interrupt info says “no interrupt” - break
process the local interrupt info
}

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Jake_Oshins · May 29, 2010, 2:00pm

When I read this, I had three choices:

Try to write a step-by-step cookbook approach that will definitely allow
you to find the problem.
Give you some general statements and strategies which will probably only
help if you have a lot of technical depth.
Ignore this post.

#1 would take a lot more time than I have to give. I apologize.
#2 only helps if you have the requisite experience.
#3 keeps my name clean but doesn’t help anyone.

I’m going with #2.

Wired PCI-style interrupts amount to grounding a pin until your device gets
serviced. PCI Express has replaced the pin with a messaging protocol, but
the semantics are the same. You can discover the pin by reading the
Interrupt Pin register in the configuration space of your device. Nothing
else comes directly from the PCI spec. Eight years of my career involved
filling in the gaps left because of that choice. (On the other hand, I
think that PCI was so widely successful because it was easy to map onto
almost any machine.)

First, you need to figure out which devices are sharing interrupts with
yours. The ones that actually have drivers loaded will be easy. In kd.exe
or windbg.exe, type “!arbiter 4” or look in Device Manager for devices which
share your IRQ.

Interrupt storms, on the other hand, happen most frequently specifically
because some device which is sharing your IRQ has no driver loaded, and thus
can’t clear its interrupt as a result of running an ISR. These interrupts
are usually pending because they were enabled in the BIOS or by an OS before
a subsequent reboot.

In order to fully track down the list of devices that are connected to your
IRQ, you need to enumerate all the PCI devices in the system. This is most
easily done in the debugger, as devices without drivers are hard to see from
the OS. Type “!pci 3 0 ff”.

Now read the BIOS to figure out what it says about how interrupts are routed
from those devices. Sometimes, devices are routed through “link nodes” or
“IRQ steering devices” which are two names for the same thing. They amount
to a mux which can be set to steer your interrupt (actually yours and any
other device’s which is wire-or’d before being routed into the mux) toward a
specific interrupt controller input.

Go get yourself a copy of the ACPI spec. It’s at http://acpi.info. I’ll
wait. You have it? Good. No, now really go get it. I’ll wait again. …
… … … Okay.

The mapping from PCI slot to “link node” (as it’s called throughout the ACPI
spec) or directly to an interrupt controller input is described in Chapter
6. See “_PRT.” There’s one for every PCI bus, except when there is a PCI
to PCI bridge that fully conforms to the PCI to PCI bridge plug-in interrupt
routing spec, which can be found at http://pcisig.com. I’ll wait again. …
… … … Oh, you’re not a member of the PCI SIG? That’s frustrating.
Luckily, this is all repeated in Tom Shanley’s Mindshare book on PCI. Go to
Amazon and order it. Come back and read this when you’ve got it. It says,
in short, that there’s a pattern for forwarding interrupts from the South
side of a PCI to PCI bridge to the north side. If there are any PCI busses
in your machine that don’t have _PRT objects in the ACPI BIOS, apply that
and figure out how all the devices are routed.

Now you need to know how to read the BIOS. First you need to know that the
BIOS will implement _PRT as a method that returns one of two variant tables.
The IRQ routing is different when using an 8259 PIC and when using I/O
APICs. (Definitions of these won’t be necessary for your task, but you can
find them at http://developer.intel.com.) The OS evaluates a method early
in boot to tell the BIOS which table it should return when _PRT is
evaluated. You’re almost certainly running in APIC mode, as that’s required
for multiple logical processors, including Hyperthreads or multi-core
processors.

Now find all the _PRT methods in your BIOS by typing “!amli find _PRT” in
the debugger. You have a list of paths in the ACPI namespace now.

Now disassemble all of them by, one by one, by typing something like the
following. The exact strings should be taken from the previous step.

!amli /u _sb.PCI0._PRT

The language you just dumped is described in the last two chapters of the
ACPI spec. It will probably be pretty obvious in that it will return one of
two objects, where one will probably have “PIC” in the name and one will
have “APIC.” Pick the useful one (probably APIC) and type something like:

!amli dns _sb.PCI0.APIC

You now have the table for the interrupt routing for one PCI bus. Use
Chapter 6 (which is definitely not like using the force.) Do this for all
PCI buses.

Keep notes. Correlate that with what you found when you dumped all PCI
devices above. You now have a list of all the devices that either share an
interrupt controller input or a mux input.

Your last task, if there are link nodes, is to figure out how they’re
programmed. This is most easily done by typing:
!acpiirqarb. That will dump the list of link nodes and their current
values.

Now you have the definitive list of devices which are sharing your
interrupt. If you can, disable each in turn until the storm stops. If they
implement the “interrupt disable” bit in their PCI common header, this is
easy in the debugger, though you’ll have to hand code the values to write to
CF8 and CFC. Again, go read the Mindshare book.

If they don’t implement the interrupt disable bit, and you can’t disable
them and still run the machine, you now have to go hunt down the specs for
them (if they’re published) on the Internet and you need to figure out how
to shut them up programmatically, again implemented from the debugger.

You’ll need the various debugger commands: ob ow od ib iw id !db !dw !dd !dq

If a device is necessary for keeping the machine running, like a storage
controller that you booted from, you’ll just have to check to see whether
the line has been de-asserted at the interrupt controller from within the
debugger. For this, type either:

!pic
!ioapic

One will be meaningless because you’re not actually using it. The other
will be meaningful, depending on which mode the machine is in.

Good luck.

–
Jake Oshins
Hyper-V I/O Architect (former interrupt guy)
Windows Kernel Group

This post implies no warranties and confers no rights.

wrote in message news:xxxxx@ntdev…
> My driver’s interrupt handler is called a lot and every time when the
> handler checks interrupt register of the device, it shows there’s no
> pending interrupt. Any idea why this is happening? Is there any way to
> confirm that OS did see an interrupt before it calls my interrupt handler?
> Thanks in advance!
>

Gary_Little-3 · May 29, 2010, 8:13pm

The fact that you are in your interrupt handler is a REAL good indication
you had an interrupt. You’ve got some techinical reading ahead of you,
courtesy of other responders, so I’ll keep it simple.

In some devices you have to enable the indicator that says you had an
interrupt. Is your device that type and did you?

Did you acknowledege the last interrupt?

Gary G. Little
H (952) 223-1349
C (952) 454-4629
xxxxx@comcast.net

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@msn.com
Sent: Saturday, May 29, 2010 3:20 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] How to debug a wired interrupt storm (device register shows
no pending interrupt)

My driver’s interrupt handler is called a lot and every time when the
handler checks interrupt register of the device, it shows there’s no pending
interrupt. Any idea why this is happening? Is there any way to confirm that
OS did see an interrupt before it calls my interrupt handler? Thanks in
advance!

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

__________ Information from ESET Smart Security, version of virus signature
database 5154 (20100528) __________

The message was checked by ESET Smart Security.

http://www.eset.com

__________ Information from ESET Smart Security, version of virus signature
database 5154 (20100528) __________

The message was checked by ESET Smart Security.

http://www.eset.com

Pavel_A1 · May 30, 2010, 4:51am

Gary,

The OP wrote that the device does have an interrupt register.

If only I understand the reply of Jake O. correctly, he suggests that
some other device sharing the IRQ does not have a driver, and it asserts its
interrupt after power-up.
This remained unnoticed until the OP connected to the IRQ, so it gets
unmasked on the APIC.

Regards,

pa

“Gary G. Little” wrote in message
news:xxxxx@ntdev…
> The fact that you are in your interrupt handler is a REAL good indication
> you had an interrupt. You’ve got some techinical reading ahead of you,
> courtesy of other responders, so I’ll keep it simple.
>
> In some devices you have to enable the indicator that says you had an
> interrupt. Is your device that type and did you?
>
> Did you acknowledege the last interrupt?
>
>
> Gary G. Little
> H (952) 223-1349
> C (952) 454-4629
> xxxxx@comcast.net
>
>
>
> -----Original Message-----
> From: xxxxx@msn.com
> Sent: Saturday, May 29, 2010 3:20 AM
> To: Windows System Software Devs Interest List
> Subject: [ntdev] How to debug a wired interrupt storm (device register
> shows
> no pending interrupt)
>
> My driver’s interrupt handler is called a lot and every time when the
> handler checks interrupt register of the device, it shows there’s no
> pending
> interrupt. Any idea why this is happening? Is there any way to confirm
> that
> OS did see an interrupt before it calls my interrupt handler? Thanks in
> advance!
>

anton_bassov · May 30, 2010, 9:58am

> If only I understand the reply of Jake O. correctly, he suggests that some other device sharing

the IRQ does not have a driver, and it asserts its interrupt after power-up. This remained unnoticed
until the OP connected to the IRQ, so it gets unmasked on the APIC.

Well, the “only” question is who brought PCI device that had not been claimed by any driver into a state when it is in a position to request CPU’s attention - objectively, a device should be completely out of play until someone claims it, at least under any reasonably designed OS . OTOH, Jake should know it better - after all, this is Windows and Jake he is one of its senior architects, so that I take his words for granted …

Anton Bassov

OSR_Community_User · May 30, 2010, 10:21am

Thanks, Maxim,
yes, I do 1-5. Except I cannot “3) Notify the hardware that all
interrupts are handled”, because I would need to dequeue information
from a device register and handle them inside a spinlock. This is
obviously not viable at EvtInterruptIsr time.
I could dequeue all data (and so notifiy the hardware about all
interrupts handled) and safe them in a devext fields for deferred
handling at DPC level, but I consider this overhead. I just wondered
why OSX would not call its EvtInterruptIsr equivalent immediately but
Windows does. Anyway that thing is solved by disabling the interrupts
between EvtInterruptIsr and EvtInterruptDpc. Not nice though.

Thanks,
Hagen

On Sat, May 29, 2010 at 7:10 PM, Maxim S. Shatskih
wrote:
>> In my EvtInterruptIsr I had to check the interrupt status of my device
>
> 1) Return FALSE from the ISR if your device’s status register says it is not interrupting now
> 2) Save the interrupt info from the registers to some devext fields
> 3) Notify the hardware that all interrupts are handled, so the hardware will deassert the ISR forever (till the next hardware event)
> 4) KeInsertQueueDpc
> 5) return TRUE
>
> In the DPC:
>
> ? ?for(;
> ? ?{
> ? ? ? ?under interrupt spinlock or in KeSynchronizeExecution callback
> ? ? ? ?move the saved interrupt info from devext to locals
> ? ? ? ?clean the saved interrupt info in the devext
> ? ? ? ?end interrupt spinlock or in KeSynchronizeExecution callback
> ? ? ? ?if the local interrupt info says “no interrupt” - break
> ? ? ? ?process the local interrupt info
> ? ?}
>
> –
> Maxim S. Shatskih
> Windows DDK MVP
> xxxxx@storagecraft.com
> http://www.storagecraft.com
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>

Igor_Sharovar · May 30, 2010, 12:24pm

>Anyway that thing is solved by disabling the interrupts between >EvtInterruptIsr and EvtInterruptDpc. Not nice though.
Why is it not good? It is usual design to handle interrupts. In an Interrupt routine a driver either mask particular interrupt or disable all interrupts in hardware and in DPC unmask or enable interrupts. It allows to properly manage Interrupt flow because ISR has higher priority than DPC.

Igor Sharovar

Maxim_S_Shatskih · May 30, 2010, 2:27pm

>from a device register and handle them inside a spinlock. This is

obviously not viable at EvtInterruptIsr time.

At least WDM ISR runs with an interrupt spinlock held, I think that EvtInterruptIsr is the same in this.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Maxim_S_Shatskih · May 30, 2010, 2:28pm

>state when it is in a position to request CPU’s attention - objectively, a device should be completely

out of play until someone claims it

This is the hardware requirement I think - to never cause interrupts unless allowed by the driver’s init path.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Maxim_S_Shatskih · May 30, 2010, 2:29pm

> Why is it not good? It is usual design to handle interrupts. In an Interrupt routine a driver either mask

particular interrupt or disable all interrupts in hardware and in DPC unmask or enable interrupts. It
allows to properly manage Interrupt flow because ISR has higher priority than DPC.

For instance, SCSIPORT mandates this model.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com