Facing infinite loop of ISR routines

Hi,

This issue continued from following open issue:

http://www.osronline.com/showthread.cfm?link=274656

I am developing NDIS Miniport driver with PCIe-TSE interface. I have used SGDMA
for TX and RX, and using interrupt method in RX only. I am enabling RX interrupt
only when submitting request to the device.

Issue:
When i load the driver, ISR routine is getting called infinitely after
successful initialization of driver. But this happen even before enabling
interrupt on RX. I have debugged and printed RX sgdma status bits in ISR as well
and there are no any reasons for ISR to be called.

You can read all responses/details from above link.

After Tim’s below suggestion:

It’s easy enough to tell if some other device is sharing your IRQ, in
Device Manager, by choosing the “Resources by Connection” view.

I have gone through if any drivers are sharing interrupt line with my NDIS Miniport driver and I found there three of them

  1. Display driver - Intel express chipset family
  2. Intel USB Host controller 27cb
  3. High Definition audio controller

Performed some debugging by enabling and disabling above drivers and found that display driver is generating endless interrupts.

My question is, if I have already returned FALSE from ISR Handler for one interrupt then what happens in next interrupt. Shouldn’t it to be redirected to its original driver? Or Do I have to return FALSE(If it is not mine) for each and every interrupt PCI generates?

Thanks in advance…

If the interrupts are level triggered, each interrupt starts with the first connected driver. So if three drivers A,B, and C are connected to a to a given interrupt vector… Each time the interrupt occurs they’ll be called in order A, B, C. Each time, B is only called if A returns FALSE. C is only called if A and B both return FALSE.

Peter
OSR
@OSRDrivers

Since this is a PCIe device, can you consider using MSI?

MSIs are not shared by design, eliminating all these complications (assuming of course that OS won’t fall back to a legacy interrupt for some reason).

Performed some debugging by enabling and disabling above drivers and found that
display driver is generating endless interrupts.

My question is, if I have already returned FALSE from ISR Handler for one
interrupt then what happens in next interrupt. Shouldn’t it to be redirected to
its original driver?

This is exactly the problem with level interrupts. The kernel cannot know whether this
is a new request from one of sharing devices, or the same old request get stuck.
So it goes back to the first device… and this keeps running rounds until done.

Or Do I have to return FALSE(If it is not mine) for each
and every interrupt PCI generates?

You have to fix the device (hardware firmware, FPGA, whatever) and then do the right thing in the driver.

Regards,
– pa

Thank you Peter and Pavel for your clarifications and suggestions. Those repeated IRQ calls were affecting my driver’s performance.
Now I am looking forward to use MSI instead of level/Legacy interrupt.

Cheers
Parth

MSI (MSI-X) are irrelevant to the posted problem.
The driver MUST return FALSE, if it’s not its device requesting the interrupt, and MUST return TRUE if it was the device. Before returning from interrupt, the driver MUST either handle the interrupt, so the request gets de-asserted, or disable the device’s IRQ line, go to a DPC, handle the interrupt and re-enable the IRQ line.

I dunno… I’m lost. Again.

It’s simple.

You get an interrupt and your ISR is called. In your ISR, you determine if your device was interrupting (NOTE: You HAVE to be able to determine this) If you device IS requesting an interrupt when your ISR is called, you return TRUE for your ISR. If your device was NOT requesting an interrupt when your ISR is called, you return FALSE.

It doesn’t matter whether your device is signaling you with an LBI, an MSI, an MSI-X or smoke-signals. That’s what’s both necessary and sufficient for the interrupt dispatcher to be happy.

Now… Do you have a further question?

Peter
OSR
@OSRDrivers

>It doesn’t matter whether your device is signaling you with an LBI, an MSI, an MSI-X or smoke-signals.

To avoid confusion, MSI(X) messages are never shared, and they have edge-triggered semantics.

Smoke signals are level triggered :wink:

Peter
OSR
@OSRDrivers

>Smoke signals are level triggered :wink:

I thought thet were active on the rising edge,unless I am confusing them with flames. Am I wrong?

Anton Bassov

Peter Viscarola (OSR) wrote:

It’s simple.

You get an interrupt and your ISR is called. In your ISR, you determine if your
device was interrupting (NOTE: You HAVE to be able to determine this) If you
device IS requesting an interrupt when your ISR is called, you return TRUE for
your ISR. If your device was NOT requesting an interrupt when your ISR is
called, you return FALSE.

This is not all. Besides of returning TRUE the ISR must do something to make
the level-triggered request go away.

If the device has interrupt disable bit, the ISR can just set it and the hardware drops the request.
Then the ISR schedules its DPC. The DPC comes some time later, and it actually handles the request, a.k.a. acknowledges it - so that the device *internally* drops it.
Then the DPC can clear the interrupt disable bit for next requests to come.

Or, the ISR itself can handle the request (and make it disappear), if this can be done quickly enough.

The device must not raise the request again until it has something new to tell to the host.
As Peter wrote, this is not rocket science - but still needs to be done correctly at both ends (driver and device).

@Anton: apropos flames and smokes… enough is enough? :wink:

– pa

>Besides of returning TRUE the ISR must do something to make the level-triggered request go away.

Well, thisis obvious,don’t you think - otherwise interrupt will fire again immediately after CPU issues EOI, so that you will get into interrupt storm.

Or, the ISR itself can handle the request (and make it disappear), if this can be done
quickly enough.

Under Linux, yes.However, under Windows ISR is VERY limited in what it can do - it cannot do any allocations, ir cannot access the dspatcher for_any_access, and, in fact, it cannot do almost anything, apart from accessing device registers and scheduling a DPC…

Anton Bassov

> Well, this is obvious,don’t you think

I used to think so - until reading the recent post of Randy Lewis.

[quote] … all (or at least most) of these references and articles are
written by people who already know the material inside and out and they often
aren’t effective at communicating to someone who is not so familiar with the
material. [/quote]

– p

> I used to think so - until reading the recent post of Randy Lewis.

I think it could be a good idea to stick to “proper design principles” and to menion this poster only on those threads where he actually posts. Otherwise, he may come here and screw up this thread
with his complaints about being “mistreated”, as well as with his bragging about being insensitive to what he calls “derogatory remarks”…

Anton Bassov

Randy Lewis wasn’t wrong. This stuff is opaque to a beginner, to a first approximation.

Although he may have presented a few oddities in some of his posts, it was evident to me that he had done some homework, and wasn’t just asking us to “plz giv me the codes”.

Phil

Not Speaking for LogRhythm!
Phil Barila | Senior Software Engineer
720.881.5364 (w)
LogRhythm, Inc.
A LEADER in Gartner’s SIEM Magic Quadrant four consecutive years (2012-2015)
Highest Score in Gartner’s 2015 SIEM Critical Capabilities Report
A CHAMPION in Info-Tech Research Group’s 2015 SIEM Vendor Landscape Report
SANS “Best of the Year” in SIEM, 2014
Perfect 5-Star Rating in SC Magazine (2009-2014)