You ask for a lot. But, in order to put this to bed, here it is.
APIC Case (I'll stick to the single-processor case for brevity:)
Let me add some hypothetical details. When I use "Time A" or "Time B,"
assume that time passes in alphabetical order.
Card 1 is attached to I/O APIC input #21.
Card 2 is attached to I/O APIC input #20.
Card 3 and 4 are attached to I/O APIC input #19.
The OS has assigned IDT entry 0x71 (IRQL 0xB) to I/O APIC input #21.
The OS has assigned IDT entry 0xa1 (IRQL 0xE) to I/O APIC input #20.
The OS has assigned IDT entry 0x93 (IRQL 0xD) to I/O APIC input #19.
Card 2 asserts INTA# (by grounding it) at Time A.
Card 1 asserts INTA# at Time C.
Card 4 asserts INTA# at Time D.
Card 3 asserts INTA# at Time E.
Assume the processor is running at PASSIVE_LEVEL at Time A.
Now for the flow:
The I/O APIC will send a message to the Local APIC in the processor
shortly after Time A telling it that a level-triggered interrupt
occurred on vector 0xa1.
The Local APIC will set the bit corresponding to 0xa1 in its Trigger
Mode Register. Then set the 0xa1 bit in its IRR register, meaning that
it received the interrupt.
At Time B, the Local APIC then asserts an interrupt at the processor
core. The processor core responds by reading the vector from the Local
APIC, dumping context on the stack and jumping through the IDT entry at
0xa1. This causes the Local APIC to set the 0xa1 bit in its ISR
The NT kernel has placed an architecture-specific interrupt pre-amble at
that address which raises to IRQL issues a "sti" instruction,
Just before it reaches the sti, we hit Time C. The I/O APIC sends a
message to the local APIC telling it that a level-triggered interrupt
occurred on vector 0x71.
The Local APIC sets the IRR and TMR bits associated with 0x71 and does
nothing more, since IRQL has been raised to higher than this vector.
(IRQL, on APIC systems, is maintained directly in the Local APIC's Task
The processor issues the "sti" instruction mentioned above. The
interrupt pre-amble code then starts looking for
architecture-independent ISRs connected to vector 0xa1 that are there as
a result of drivers calling IoConnectInterrupt.
Time D arrives. The I/O APIC detects that card 4 has asserted INTA#.
It sends a message to the Local APIC telling it that a level-triggered
interrupt occurred on vector 0x93. The Local APIC sets the 0x93's IRR
and TMR bits.
The processor executes more of card2's driver's ISR. When this
completes, the ISR returns "TRUE, it was my interrupt and I handled it."
This prompts the NT kernel to quit processing the ISR chain, ACK the
interrupt and drop IRQL. This will cause the ISR and IRR bits
corresponding to 0xa1 in the Local APIC to be cleared. Because 0xa1's
TMR bit is set, this ACK will also cause a message to be sent to the I/O
APIC, telling it to re-sample vector 0xa1.
Now that IRQL has been lowered, the Local APIC will interrupt the
processor again, this time with vector 0x93, which is currently the
highest priority in the IRR register.
The processor will again jump through the IDT to the pre-able code.
Again, the code will raise IRQL, this time to level 0xD, and start
About this time the ACK message reaches the I/O APIC. The I/O APIC
re-samples input number #20. It's not asserted, since card2's driver
just ran its ISR. No new interrupt occurs here.
Back to the processor. It starts to execute the first ISR on 0x93's
chain. Assume that it finds card3's driver's ISR first on the list. It
will execute that ISR, which will clear the interrupting condition in
card3 and return "TRUE - that was my device, and the condition has been
handled." The processor then sends and ACK and drops IRQL. This causes
0x93's IRR and ISR bits to be cleared, and a message to be sent to I/O
APIC, since 0x93's TMR bit is set.
It will then drop IRQL and accept another interrupt from the Local APIC,
this time vector 0x71. The processor probably won't get very far
through the pre-amble code before the message gets to the I/O APIC.
Imagine it gets no further than raising to IRQL 0xB and issuing a "sti."
At that point, the I/O APIC will re-sample input #19. Since card4 is
still asserting INTA#, this input will still be active. The I/O APIC
will send a message back to the Local APIC telling it that vector 0x93
was triggered, level-style.
The Local APIC will then interrupt the processor with vector 0x93 again.
The processor will jump through the IDT and start executing pre-amble
code, raising back to IRQL 0xD. It will call card3's ISR again, since
it is first on 0x93's chain. Card3's ISR will return "FALSE - it wasn't
me." The NT kernel will then call the next ISR on the chain, that of
card4. Card4's ISR will run and return "TRUE - that was mine and I've
cleared interrupting condition." This will cause the kernel to issue
another ACK and drop IRQL.
The Local APIC will then clear 0x93's ISR and IRR bits again and issue
an ACK to the I/O APIC. Then the I/O APIC will re-sample input #19.
Now it's de-asserted, since neither card3 nor card4 is interrupting.
The processor will now continue executing the interrupt pre-amble code
for vector 0x71, which will call card1's ISR. Card1's ISR will return
"TRUE - that was mine and I've cleared the interrupting condition." The
kernel will ACK the interrupt and drop IRQL.
The Local APIC will clear the ISR and IRR bits associated with 0x71.
This will cause an ACK to be sent to the I/O APIC. The I/O APIC will
then re-sample input #21. It is now deasserted.
The processor will go back to executing lower-priority code. In all
likelihood, these ISRs queued up a bunch of DPCs. Now those DPCs will
be executed in the order that they were queued at DISPATCH_LEVEL. Then
the processor will drop back to PASSIVE_LEVEL and look for threads to
run. It may or may not find any.
I'm really tired of typing at the moment. So I challenge somebody else
to do either PIC case. (They are much more alike than different. With
respect to the device's ISRs, they are indistinguishable.)
This is the last that I will write on this topic. I refer any further
inquiries to Intel's Programmer's Reference Manuals for the Pentium
Family. See Volume 3, chapter 8. http://developer.intel.com/design/pentium4/manuals/
Windows Kernel Group Interrupt Guy
This posting is provided "AS IS" with no warranties, and confers no
Subject: RE: Device Interrupt priority - Reviewing Jose Flores
From: "Christiaan Ghijselinck" <xxxxx@CompaqNet.be>
Date: Wed, 11 Dec 2002 20:38:24 +0100
Who will rise to the next challenge :
"Four PCI cards fire quasi at the same moment ( assume 100 ns time
difference ) an interrupt. Card1 , card2 and card3 have different IRQ's
card4 shares the same IRQ with card3.
I would like to see a detailled flow of all handschaking actions between
Cards <--> [ PCI bus <-- > ] PIC/APIC <--> CPU with bus <--> OS/ISR's in
both the "Lazy" and the "Strict Model". The flow ends when the last ( =
IRQ has been completely serviced.
Such a description would have an incredible value .