EvtInterruptIsr re-entered

k.s · April 16, 2008, 4:36am

Hi,

(for the first time in my life) I am developing a KMDF-based driver for a PCI board. It has two transmit/receive controllers, but only a single interrupt. A controller sets interrupt when a frame is send of received. Hence I’ve created a WDFINTERRUPT object and an ISR function. Then I connected the two controllers with a cable and send a frame from 1 to 2:

[11:05:56.7129] unican debug: sja1000_load_frame() - loaded id=0x1 unican: SFF unican: flags=0
[11:05:56.7121] >
[11:05:56.7121] unican debug: enter to isr, minor 0, irqn = 1
[11:05:56.7121] unican debug: RC_DATA_INT occured… Minor 0
[11:05:56.7121] unican debug: sja1000_retrieve_frame() - retrieved id=0x1 unican: SFF unican: flags=0
[11:05:56.7121] unican warning: sending signal to UNICAN_WDF_EVENT_00_00_00
[11:05:56.7121] > // !!!
[11:05:56.7121] unican debug: enter to isr, minor 0, irqn = 0
[11:05:56.7121] -----
[11:05:56.7121] unican debug: enter to isr, minor 1, irqn = 2
[11:05:56.7121] unican debug: TR_DATA_INT occured… Minor 1
[11:05:56.7121] unican debug: TR_DATA_INT complete
[11:05:56.7122] <
[11:05:56.7122] unican debug: RC_DATA_INT complete
[11:05:56.7122] -----
[11:05:56.7122] unican debug: enter to isr, minor 1, irqn = 0
[11:05:56.7122] <

is isr enter
------ is half of ISR executed
< is isr leave

Documentation sais that framework blocks all interrupts during processing, so my question is why was the ISR itself interrupted before it had returned (at the !!! point)?

Besides, when the data is send like this, by single frames at a time, all works, but at high rates I get an EXCEPTION_DOUBLE_FAULT.

Actually I don’t know if I need to clear PCI interrupt once the ISR is called. Or do I need to temporary disable interrupts during ISR? Anyway, as I said, when I send only single frames manually everything works, no interrupt storms and the like.

Any clues?

anton_bassov · April 16, 2008, 5:03am

> Documentation sais that framework blocks all interrupts during processing, so my question is why was the >ISR itself interrupted before it had returned

When ISR executes, all interrupts of the same or lower priority are masked via TPR on a given CPU (which has nothing to do with a framework,.btw). However, interrupts of higher priority still may interrupt your ISR’s execution…

Actually I don???t know if I need to clear PCI interrupt once the ISR is called. Or do I need to temporary > disable interrupts during ISR?

You should not think about raising IRQL, disabling/enabling interrupts, masking IRQ at IOAPIC, synchronizing between different CPUs, etc when writing your ISR. What your ISR should do is to, first of all, check whether your device has really interrupted, and if not, return FALSE. Otherwise, it should to whatever is needed to stop your device from interrupting, apparently, queue a DPC for further processing, and return TRUE. This is all that you should be concerned about - the system takes care of the rest…

Anton Bassov

k.s · April 16, 2008, 5:19am

> When ISR executes, all interrupts of the same or lower priority are masked

via TPR on a given CPU (which has nothing to do with a framework,.btw).
However, interrupts of higher priority still may interrupt your ISR’s execution…

Then how was the second irq from the same irq vector possible in my log? ISR
was executing, another irq arrived and re-entered it. I have just one IRQ line
in my device and it should have been masked off by the system at that time!

anton_bassov · April 16, 2008, 6:23am

> Then how was the second irq from the same irq vector possible in my log?

It is hard to say anything without seeing the lines “ISR entered” and “ISR returns” in your log.Introduce above mentioned debug messages respectively immediately after having entered and immediately before returning from ISR, and you will see that the same ISR just cannot be nested (it is protected by interrupt spinlock, so that if such scenario could occur, the system would just get deadlocked).

The only theoretical exception to the above is the situation when you made your ISR service multiple KINTERRUPT objects,i.e. made separate calls to IoConnectInterrupt(). It may happen if there are multiple instances of your device that are connected to different interrupt lines (for example, 2 identical NICs are inserted into different PCI slots). Once interrupt spinlock is associated with KINTERRUPT object, rather than with ISR code, spinlocks are going to be different. However, I believe even in this case the OS takes steps to ensure that two instances of the same ISR cannot execute at once. For example, it can assign DIRQL that is implied by numerically highest vector( out of those that IRQs corresponding to your devices are mapped to) in order to avoid deadlocks, and introduce the additional shared spinlock in order to handle MP issues. At least according to MSDN, two instances of the same ISR cannot run simultaneously, no matter what. Let’s see if Jake reads this thread - after all, he is the most authoritative source of info on this subject that one can imagine…

Anton Bassov

k.s · April 16, 2008, 6:51am

I my log listing “>” was printed at the beginning of ISR and “<” at the end.

David_R_Cattley · April 16, 2008, 7:37am

At least according to MSDN, two instances of the same ISR cannot run simultaneously, no matter what.

I am not an authoritative source on the internals of ISR handling on Windows (and I don’t even play one on TV).

It has, however, always been my understanding that “same ISR” in this context refers to the interrupt object and *not* the code registered as the handler. So when the system asserts that it will not invoke an ISR re-entrantly, it is *not* talking about the code but the ‘activation’ of the interrupt object. When logging an ISR, it is helpful to log the context information provided to the handler as well when dealing with the same handler being invoked on different CPUs servicing *different* interrupt objects. In this way, you can correlate the log messages associated with each of the activations.

You can always boot /OneCPU as well if you want to eliminate other CPU’s for testing purposes. If you get re-entered with only /OneCPU set that would sure be a surprise to me.

Good luck,
-dave

anton_bassov · April 16, 2008, 8:27am

>It has, however, always been my understanding that “same ISR” in this context refers to

the interrupt object and *not* the code registered as the handler.

Well, it is the code registered as the handler and not interrupt object who actually accesses shared resources, don’t you think??? In such case , once the same ISR routine may service multiple KINTERRUPTs, interrupt spinlock does not really make sense in itself, does it -after all, multiple instances of the same routine may access a shared resource simultaneously, although each of them own its spinlock. If there is no mistake in the OP’s analysis, then this is a *MASSIVE* bug in MSDN documentation - as it turns out, you still have to synchronize access to shared resources manually, although MSDN claims the OS does everything for you…

Anton Bassov

Don_Burn_1 · April 16, 2008, 8:42am

Actually, it makes sense since the interrupt object contains the spinlock.
If you want to support multiple interrupt objects sharing the same
resources, you need to supply the spinlock to IoConnectInterrupt or in the
WDF_INTERRUPT_CONFIG. This has been in Windows driver programming for a
long time and people used to understand it.

–
Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr
Remove StopSpam to reply

wrote in message news:xxxxx@ntdev…
> >It has, however, always been my understanding that “same ISR” in this
> >context refers to
>> the interrupt object and not the code registered as the handler.
>
> Well, it is the code registered as the handler and not interrupt object
> who actually accesses shared resources, don’t you think??? In such case ,
> once the same ISR routine may service multiple KINTERRUPTs, interrupt
> spinlock does not really make sense in itself, does it -after all,
> multiple instances of the same routine may access a shared resource
> simultaneously, although each of them own its spinlock. If there is no
> mistake in the OP’s analysis, then this is a MASSIVE bug in MSDN
> documentation - as it turns out, you still have to synchronize access to
> shared resources manually, although MSDN claims the OS does everything for
> you…
>
> Anton Bassov
>

Peter_Viscarola_OSR · April 16, 2008, 9:45am

Mr. Burn and Mr. Cattley are correct: The ISR *for a given device instance* can’t be running in parallel with itself. Given that properly designed, separate, PCI-family devices rarely (and I do mean REALLY REALLY rarely) share registers (in the broadest sense of the term) among device instances, this works perfectly.

Assuming the devices do not share registers, the ISR code can be re-entered if multiple device instances interrupt simultaneously. Given the contexts are different, the device objects are different, the KINTERRUPTs are different, and the register sets don’t overlap… there’s no reason to prevent this from happening. In fact, preventing this would be a bad thing.

If you have multiple device instances that share registers, that’s the purpose of having the ability to provide an external spin lock on the call to IoConnectInterrupt (and friends). This allows you to serialize the execution among multiple ISRs.

It’s pretty simple, yes?

Peter
OSR

anton_bassov · April 16, 2008, 10:14am

Peter,

If you have multiple device instances that share registers, that’s the purpose of having the ability to provide > an external spin lock on the call to IoConnectInterrupt (and friends).

What about DIRQL??? Say, two instances of ISR correspond to vectors that imply different priorities (and,
according to the output that the OP has provided, this is what happens in his case). Furthermore, judging from his output, the OS does not assign both instances the same DIRQL, i.e. the one that is implied by vector of higher priority, so that one instance of ISR can interrupt another on the same CPU. Therefore, if the OP has provided an external spin lock ( or simply provided his own implementation of a spinlock and used in within ISR context), he would have invariably deadlocked…

Anton Bassov

Peter_Viscarola_OSR · April 16, 2008, 10:26am

The interrupt spin lock’s IRQL is specified in Synchrnoize IRQL (on the call to IoConnectInterrupt or similar).

You specify Synchrnoize IRQL when you call IoConnectInterrupt (or similar). For “normal” devices, you do not specify a spin lock (pointer is NULL), and you set Synchronize IRQL to the same as your device IRQL, as determined from your interrupt resources.

When you have multiple device instances that share resources (again, this is an incredibly rare thing), then you specify a spin lock and set Synchronize IRQL to the numerically highest of the device IRQLs.

There’s nothing new or unique here. It’s been this way since the beginning of time… Very clever of Cutler, I’d say, foreseeing this possible need. It’s sure not something that *I* would have thought of. That damn external spin lock has saved more than one “weirdo” project, using perverted hardware or other types of dangerously strange device mash-ups.

Peter
OSR

k.s · April 16, 2008, 11:59am

To make things clear: above output was generated on a single-CPU, single-core machine with WinXP sp2 and latest DDK (as of March 2008). Although my device has two controllers it has a single PCI-controller (PLX 9052). So the driver created a single WDFDEVICE and a single WDFINTERRUPT (and also two device interfaces for each channel).

From the output it can be said that two interrupts, coming from the same line, were nested–am I correct?

Doron_Holan · April 16, 2008, 12:39pm

How are you creating your log? Printing to a buffer? Writing to a log file? Perhaps the writes to the log file are interleaving due to deferral and are not showing what is really going on.

Also, looking at the log
[11:05:56.7121] unican warning: sending signal to UNICAN_WDF_EVENT_00_00_00

I certainly hope that you are not setting a KEVENT at DIRQL

d

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@yandex.ru
Sent: Wednesday, April 16, 2008 9:04 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] EvtInterruptIsr re-entered

To make things clear: above output was generated on a single-CPU, single-core machine with WinXP sp2 and latest DDK (as of March 2008). Although my device has two controllers it has a single PCI-controller (PLX 9052). So the driver created a single WDFDEVICE and a single WDFINTERRUPT (and also two device interfaces for each channel).

From the output it can be said that two interrupts, coming from the same line, were nested–am I correct?

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Jake_Oshins · April 16, 2008, 1:11pm

Actually, David, this isn’t the case. Windows holds your ISR spinlock
while your ISR executes. Your ISR cannot be executed on two CPUs
concurrently and it cannot interrupt itself.

I suspect that the original poster’s tracing logic is somewhat
asynchronous.

Jake Oshins
Authoritative Source on the internals of ISR handling on Windows

“David R. Cattley” wrote in message
news:xxxxx@ntdev…

At least according to MSDN, two instances of the same ISR cannot run
simultaneously, no matter what.

I am not an authoritative source on the internals of ISR handling on
Windows (and I don’t even play one on TV).

It has, however, always been my understanding that “same ISR” in this
context refers to the interrupt object and not the code registered
as the handler. So when the system asserts that it will not invoke
an ISR re-entrantly, it is not talking about the code but the
‘activation’ of the interrupt object. When logging an ISR, it is
helpful to log the context information provided to the handler as well
when dealing with the same handler being invoked on different CPUs
servicing different interrupt objects. In this way, you can
correlate the log messages associated with each of the activations.

You can always boot /OneCPU as well if you want to eliminate other
CPU’s for testing purposes. If you get re-entered with only /OneCPU
set that would sure be a surprise to me.

Good luck,
-dave

David_R_Cattley · April 16, 2008, 1:25pm

Jake,

In my reply I mean the ISR to be the ‘code’ (as in Interrupt Service Routine

emphasis on Routine) and not the logical object consisting of the
resources bound together by the interrupt object (ISR SpinLock, Context, and
ISR_Code). It was my intention to dispel that the interrupt management
logic in NT somehow kept track of the ISR code addresses in some way for the
purpose of allocating a lock to protect the ‘code’ (aka, a single spinlock
to ensure that no two device interrupts could simultaneously invoke the same
ISR handler). This seemed both unreasonable and contrary to my experience
where surely I recall seeing that on separate PCI devices, both of the same
‘type’, with two independent interrupt objects, both with the same handler,
could be invoked ‘simultaneously’ without any problem.

And I think that is likewise what Peter clarified when discussing the
purpose of the facility to supply an specific spinlock in cases where the
system cannot assume that the resources to be touched by the ISRs on two
different interrupt objects are not disjoint.

It all comes down to the confusion (and I am apparently adding to it and not
helping) by using the term ISR. Far better would be to use the term
InterruptObject like we use DPC Object which is distinct from the DPC
Callback Routine associated with the DPC object.

So I should have inserted “handler” in a number of places to doubly ensure
that I was referring to ‘code’ and not the ‘object’.

-dave

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Jake Oshins
Sent: Wednesday, April 16, 2008 1:11 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] EvtInterruptIsr re-entered

Actually, David, this isn’t the case. Windows holds your ISR spinlock
while your ISR executes. Your ISR cannot be executed on two CPUs
concurrently and it cannot interrupt itself.

I suspect that the original poster’s tracing logic is somewhat
asynchronous.

Jake Oshins
Authoritative Source on the internals of ISR handling on Windows

“David R. Cattley” wrote in message
news:xxxxx@ntdev…

At least according to MSDN, two instances of the same ISR cannot run
simultaneously, no matter what.

I am not an authoritative source on the internals of ISR handling on
Windows (and I don’t even play one on TV).

It has, however, always been my understanding that “same ISR” in this
context refers to the interrupt object and not the code registered
as the handler. So when the system asserts that it will not invoke
an ISR re-entrantly, it is not talking about the code but the
‘activation’ of the interrupt object. When logging an ISR, it is
helpful to log the context information provided to the handler as well
when dealing with the same handler being invoked on different CPUs
servicing different interrupt objects. In this way, you can
correlate the log messages associated with each of the activations.

You can always boot /OneCPU as well if you want to eliminate other
CPU’s for testing purposes. If you get re-entered with only /OneCPU
set that would sure be a surprise to me.

Good luck,
-dave

—
NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

k.s · April 16, 2008, 1:45pm

Doron,

How are you creating your log? Printing to a buffer? Writing to a log file?
Perhaps the writes to the log file are interleaving due to deferral and are
not showing what is really going on.

They are KdPrints

sending signal to UNICAN_WDF_EVENT_00_00_00
I certainly hope that you are not setting a KEVENT at DIRQL

This is exactly what I do. What’s wrong with KeSetEvent?

Doron_Holan · April 16, 2008, 1:55pm

The IRQL limits of KeSetEvent is IRQL <= dispatch level. Your ISR runs at DIRQL > dispatch level. What you are doing can cause deadlocks and other errors that are very hard to diagnose. One specific side effect is that it can lower IRQL when called at too high of an IRQL. When the IRQL is lowered, your ISR could potentially run again which probably explain the reentrancy (although the interrupt spinlock should prevent that and cause a deadlock instead of reentrancy).

Queue a DPC for ISR and set the event at the right IRQL and see if the reentrant ISR problem goes away

d

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@yandex.ru
Sent: Wednesday, April 16, 2008 10:50 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] EvtInterruptIsr re-entered

Doron,

How are you creating your log? Printing to a buffer? Writing to a log file?
Perhaps the writes to the log file are interleaving due to deferral and are
not showing what is really going on.

They are KdPrints

sending signal to UNICAN_WDF_EVENT_00_00_00
I certainly hope that you are not setting a KEVENT at DIRQL

This is exactly what I do. What’s wrong with KeSetEvent?

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

k.s · April 16, 2008, 3:14pm

Thank you very much! I’ll try to move events code to DPC. Looks like you
solved a 3+ days problem for me. Thanks again!

Jake_Oshins · April 17, 2008, 12:12pm

Actually, I’m more confused about what you mean now. So let me be as
close to clear as I can be.

When an interrupt occurs, Windows raises IRQL to SynchronizeIrql,
which is almost always (unless the driver overrides it) the device
IRQL assigned by the PnP manager. (SynchronizeIrql must be greater
than or equal to the assigned device IRQL.) It then takes the
spinlock in the interrupt object (or the one supplied) and then calls
the Interrupt Service Routine (which is the code.)

So I’m not sure what distinction you’re making. The “code” cannot be
re-entrant. It will be active on exactly one CPU at a time. It will
not be pre-empted by another instance of itself. This is all because
the lock is in the interrupt object that your code doesn’t have to
take it, which is what ensures that the entire routine is atomic.

Jake Oshins

“David R. Cattley” wrote in message
news:xxxxx@ntdev…
> Jake,
>
> In my reply I mean the ISR to be the ‘code’ (as in Interrupt Service
> Routine
> - emphasis on Routine) and not the logical object consisting of the
> resources bound together by the interrupt object (ISR SpinLock,
> Context, and
> ISR_Code). It was my intention to dispel that the interrupt
> management
> logic in NT somehow kept track of the ISR code addresses in some way
> for the
> purpose of allocating a lock to protect the ‘code’ (aka, a single
> spinlock
> to ensure that no two device interrupts could simultaneously invoke
> the same
> ISR handler). This seemed both unreasonable and contrary to my
> experience
> where surely I recall seeing that on separate PCI devices, both of
> the same
> ‘type’, with two independent interrupt objects, both with the same
> handler,
> could be invoked ‘simultaneously’ without any problem.
>
> And I think that is likewise what Peter clarified when discussing
> the
> purpose of the facility to supply an specific spinlock in cases
> where the
> system cannot assume that the resources to be touched by the ISRs on
> two
> different interrupt objects are not disjoint.
>
> It all comes down to the confusion (and I am apparently adding to it
> and not
> helping) by using the term ISR. Far better would be to use the term
> InterruptObject like we use DPC Object which is distinct from the
> DPC
> Callback Routine associated with the DPC object.
>
> So I should have inserted “handler” in a number of places to doubly
> ensure
> that I was referring to ‘code’ and not the ‘object’.
>
> -dave
>
> -----Original Message-----
> From: xxxxx@lists.osr.com
> [mailto:xxxxx@lists.osr.com] On Behalf Of Jake Oshins
> Sent: Wednesday, April 16, 2008 1:11 PM
> To: Windows System Software Devs Interest List
> Subject: Re:[ntdev] EvtInterruptIsr re-entered
>
> Actually, David, this isn’t the case. Windows holds your ISR
> spinlock
> while your ISR executes. Your ISR cannot be executed on two CPUs
> concurrently and it cannot interrupt itself.
>
> I suspect that the original poster’s tracing logic is somewhat
> asynchronous.
>
> - Jake Oshins
> Authoritative Source on the internals of ISR handling on Windows
>
>
>
> “David R. Cattley” wrote in message
> news:xxxxx@ntdev…
>
> At least according to MSDN, two instances of the same ISR cannot run
> simultaneously, no matter what.
>
>
> I am not an authoritative source on the internals of ISR handling on
> Windows (and I don’t even play one on TV).
>
> It has, however, always been my understanding that “same ISR” in
> this
> context refers to the interrupt object and not the code registered
> as the handler. So when the system asserts that it will not invoke
> an ISR re-entrantly, it is not talking about the code but the
> ‘activation’ of the interrupt object. When logging an ISR, it is
> helpful to log the context information provided to the handler as
> well
> when dealing with the same handler being invoked on different CPUs
> servicing different interrupt objects. In this way, you can
> correlate the log messages associated with each of the activations.
>
> You can always boot /OneCPU as well if you want to eliminate other
> CPU’s for testing purposes. If you get re-entered with only /OneCPU
> set that would sure be a surprise to me.
>
> Good luck,
> -dave
>
>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>
>

OSR_Community_User · April 17, 2008, 12:45pm

“Doron Holan” wrote in message
news:xxxxx@ntdev…
> The IRQL limits of KeSetEvent is IRQL <= dispatch level. Your ISR runs at
> DIRQL > dispatch level. What you are doing can cause deadlocks and other
> errors that are very hard to diagnose. One specific side effect is that
> it can lower IRQL when called at too high of an IRQL. When the IRQL is
> lowered, your ISR could potentially run again which probably explain the
> reentrancy (although the interrupt spinlock should prevent that and cause
> a deadlock instead of reentrancy).
>…

On UP machine, this won’t deadlock - so this looks like the OP’s situation?

–PA