IRQ, IRQL, intel docs and windows....

anton_bassov · March 22, 2008, 10:39am

> Am I correct that the IRQL on which the ISR will run is 100% determined only by the IDT vector number,

in other words - each IDT vector has its hard-coded IRQL?

I don’t know abot PIC HAL, but if you are asking about APIC one, then you are wrong. Interrupt priority of *currently* running code is defined not by vector number (after all, one can raise IRQL outside of ISR context) but by the value in TPR - IRQL is just an index into the array of value to be written to TPR. When CPU decides whether it can fire interrupt, it checks vector number against the value in TPR ( i.e. against the priority of currently running code). However, CPU does not modify TPR when it raises interrupt - it is software’s responsibility. If interrupt is vectored via interrupt gate, its handler stub starts execution with all maskable interrupts disabled. Therefore, interrupt handler stub raises IRQL to the appropriate level by writing to TPR while interrupts are still disabled, then enables interrupts with CLI, and, at this point, it can already call ISR - all lower-priority interrupts will be masked while ISR executes . Disassemble HalBeginSystemInterrupt(), and you wil see how it works from the software’s perspective (from the hardware’s one it is described in Intel Manuals - please note that there are 2 more priority-related registers that are RO for the software)…

Anton Bassov

OSR_Community_User · March 22, 2008, 11:26am

Actually, an IDT vector does have a hard-coded IRQL, and an ISR’s IRQL is defined by its IDT vector. Anton, you’re right that the TPR mostly defines the current processor priority (although not entirely - see below). But as you also say, an interrupt’s vector number is checked against the TPR to see if the interrupt can be dispatched. This means that there is priority implied by vector number, and an arbitrary vector number cannot be assigned an arbitrary IRQL.

Also, the TPR is not the register that controls processor priority - the PPR is. The PPR is basically the max of the TPR and the priority associated with the currently in-service interrupt. So if you take an interrupt at one priority level, you will not take another at a less-than-or-equal priority until you EOI that interrupt. Full stop - regardless of whether you set the TPR. Schemes like the “lazy IRQL” mechanism that has been brought up recently rely on this to provide interrupt priority.

This is another reason that the IDT vector of an interrupt implies its IRQL on an APIC system. As Jake corrected me before, this is not true in Windows’ PIC implementation.

Dave

anton_bassov · March 22, 2008, 4:04pm

> Actually, an IDT vector does have a hard-coded IRQL, and an ISR’s IRQL is defined by its IDT vector.

No!!! This is not IRQL but processor priority, which is hardware-only concept that is out of software’ s direct control (apart from assigning vector number and issuing EOI), while IRQL is under the control of OS software…

Anton Bassov

anton_bassov · March 22, 2008, 4:24pm

> When I last read the Linux kernel - 2.3 - they had nearly the same,

Sure. They don’t prioritize hardware interrupts - they don’t rely either on TPR or ISR regsiters (they don’t write to TPR and issue EOI straight away), but, at the same time, some interrupts may be executed with all interrupts disabled on a given CPU. I think that main reason behind that is portability concerns - after all,
Linux kernel is supposed to be able to run on very different hardware architectures, and some architectures may just lack the support for prioritizing hardware interrupts against one another…

with DPCs called “bottom halves”. Unlike Windows, where there can be arbitrary DPCs, Linux only had >a predefined array of bottom halves - net_bh() and so on.

Well, bottom halves come into 2 flavours - softirqs and tasklets (the latter is just built upon the former). Although Linux does not prioritize hardware interrupts, it still prioritizes software ones. It defines 32 softirq priorities( only 6 of them are currently used), with the lowest-priority softirq being used to queue tasklets.

BTW, what do you mean by “arbitrary DPC”??? Do you mean the one that is queued to another CPU? Indeed, Linux does not allow something like that - all DPCs get processed by the CPU that actually queued them. However, it does not execute more than 10 tasklets in one go, so that it does not necessarily flush DPC queue before returning control to interrupted task…

Anton Bassov

OSR_Community_User · March 22, 2008, 4:43pm

> This is not IRQL but processor priority, which is hardware-only
concept that is out of software’ s direct control (apart from assigning vector
number and issuing EOI), while IRQL is under the control of OS software.

Sorry Anton, that’s not right. IRQL and processor priority are intimately related and on an APIC system cannot be separated. You are right - the concept of IRQL is largely a software one. Its definition is that if you set IRQL to a specific value, then interrupts at a lower IRQL level will not happen, and interrupts at a higher IRQL level will happen. So let’s say you’re right, and that you can have any vector have any IRQL that the OS chooses. Here’s a sequence to consider:

An interrupt happens, software sets the IRQL and runs the ISR.
While this interrupt is happening, an interrupt at a lower vector number happens. If you are right, this interrupt could be assigned a higher IRQL by the OS.

By the definition of IRQL, setting the IRQL to the first interrupt’s IRQL level should still allow the second interrupt to happen. Stuff running at lower IRQL doesn’t mask stuff running at higher IRQL, right? But that’s not what would happen. The second interrupt *would* be masked by the first, because the processor’s hardware priority would prevent it. So the hardware would get in the way of the OS’s definition of IRQL, and there is nothing the OS can do about this in software. It’s just how APICs work.

Dave

anton_bassov · March 22, 2008, 5:23pm

Davis,

So let’s say you’re right, and that you can have any vector have any IRQL that the OS chooses. Here’s > a sequence to consider:

An interrupt happens, software sets the IRQL and runs the ISR.

While this interrupt is happening, an interrupt at a lower vector number happens. If you are right, this interrupt could be assigned a higher IRQL by the OS.

By the definition of IRQL, setting the IRQL to the first interrupt’s IRQL level should still allow the second >interrupt to happen. Stuff running at lower IRQL doesn’t mask stuff running at higher IRQL, right?
But that’s >not what would happen. The second interrupt *would* be masked by the first, because
the processor’s hardware priority would prevent it.

Sorry, but this is what EOI is for - the OS could have issued it right after having raised IRQL and enabled interrupts, but before actually servicing interrupt (i.e. calling ISR) , rather than the way Windows actually does (i.e. after having called ISR) . Please note that, upon receiving EOI, APIC clears the highest priority bit in the * ISR* register and not in PPR - as long as TPR indicates priority higher than ISR does, PPR does not get modified by EOI… There is no requirement that EOI must be issued only after interrupt has been actually handled. For example, Linux acknowledges interrupt with EOI straight away.

Therefore, the"paradox" that you have described may happen only if you issue EOI after having serviced interrupt. However, if you chose to issue EOI before calling ISR, then interrupt delivery would get controlled only by TPR while your ISR executes…

Anton Bassov

OSR_Community_User · March 22, 2008, 6:32pm

Okay, I’ll give you that. EOIing before running the ISR eems like an odd implementation for level-triggered interrupts, but sure. But even then, the processor decides if an interrupt is serviceable by checking the vector number against the PPR. So if you set the IRQL of a processor to x by writing the TPR (which then changes the PPR), this automatically masks interrupts at some vectors and not the interrupts at other vectors. So again, vector implies priority. To completely separate the concepts you’d have to never use the TPR for prioritization.

But you are right - you could probably design a system that completely disconnected software priority (IRQL) from implied IDT vector priority by EOIing immediately, and never writing the TPR. This system would have to deal with a bunch of issues like a ton of spurious interrupts, but could have advantages I can’t think of. Suffice it to say that Windows isn’t built this way, and in the APIC implementation in Windows today vector implies IRQL.

Dave

Maxim_S_Shatskih · March 22, 2008, 7:26pm

> Actually, an IDT vector does have a hard-coded IRQL, and an ISR’s IRQL is

defined by its IDT vector.

What will occur if I will provide the override for this in SynchronizeIrql (for
a custom spinlock, for instance)?

–
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

Maxim_S_Shatskih · March 22, 2008, 7:28pm

> BTW, what do you mean by “arbitrary DPC”??? Do you mean the one that is

queued to another CPU? Indeed, Linux does not allow something like that

No. In Windows, you can provide any function to a DPC.

In Linux, sorry no, it is a hardcoded array of function pointers - for
instance, net_bh() is the only function for NET_BH priority.

–
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

OSR_Community_User · March 22, 2008, 7:34pm

> What will occur if I will provide the override for this in SynchronizeIrql (for
a custom spinlock, for instance)?

SynchronizeIrql can be higher than the interrupt’s IRQL, which as I’m describing is impacted by the vector’s native priority. Basically the first thing that will happen when the interrupt is received is that the IRQL is raised to the SynchronizeIrql. The relationship between vector and IRQL that I’m describing is one of the reasons that SynchronizeIrql is not allowed to be lower than the IRQL provided in the interrupt resource descriptor.

Dave

anton_bassov · March 22, 2008, 8:00pm

Davis,

EOIing before running the ISR eems like an odd implementation for level-triggered interrupts,

You would have to mask IRQ at IOAPIC in your stub - otherwise, you may well get into “interrupt storm” with level-triggered interrupts ( line is still asserted, but local APIC claims that it has already handled it)…

But even then, the processor decides if an interrupt is serviceable by checking the vector number >against the PPR. So if you set the IRQL of a processor to x by writing the TPR (which then changes the >PPR), this automatically masks interrupts at some vectors and not the interrupts at other vectors.

Sure…

To completely separate the concepts you’d have to never use the TPR for prioritization.

Not really More on it below…

But you are right - you could probably design a system that completely disconnected software
priority >(IRQL) from implied IDT vector priority by EOIing immediately, and never writing the TPR.

Actually, you can still make use of TPR. What you have to do is to process ISRs in context of a software interrupt (or, if you want to maintain the concept of IRQL at the level Windows currently does, which, IMHO, is a bit to the extreme, even define multiple software interrupts for this purpose ) - the only requirement is that all *actual* hardware interrupts map to vectors that numerically imply priority above the one(s) that actually handle(s) ISRs. IMHO, processing all ISRs in context of the same interrupt could be a better idea. More on it below …

This system would have to deal with a bunch of issues like a ton of spurious interrupts,

Well, it would have to disable IRQ at IOAPIC…

…but could have advantages I can’t think of.

Well, the most obvious advantage is flexibility . It is not so easy to decide in advance whether device X should have any interrupt priority to device Y, in the first place. Consider the scenario when the target machine has multiple identical devices that can be handled by the same driver that are mechanically connected to different IRQs plus happen to share their IRQs with other devices. For example, on my machine IRQ21 is shared by EHCI, one instance of UHCI, SATA controller and NIC, IRQ 20 is shared by another instance of UHCI and 1394controller, and yet two more instances of UHCI have dedicated IRQs.
Which of them should be of higher priority??? Furthermore, it may well happen that, from the logical standpoint, the same device may have various operational priorities, depending on operation ( for example, data receipt acknowledging send completion on by NIC). Therefore, probably, it makes sense not to prioritize interrupts against one another, but make all ISRs equal, and, instead, let them decicd at which priority level they want to queue their DPCs in a given situation. In other words, probably it makes sense to make IRQL software-only concept …

Anton Bassov

anton_bassov · March 22, 2008, 8:03pm

> In Linux, sorry no, it is a hardcoded array of function pointers - for instance, net_bh() is the only function for > NET_BH priority.

As I told you already, Linux has softirqs and tasklets as “bottom halves” . What you are speaking about is softirq, but tasklets give you more flexibility (they are built upon a dedicated softirq)…

Anton Bassov

Maxim_S_Shatskih · March 22, 2008, 8:58pm

>disconnected software priority (IRQL) from implied IDT vector priority by
EOIing

immediately, and never writing the TPR. This system would have to deal with a
bunch of issues like a ton of spurious interrupts, but could have advantages I
can’t
think of. Suffice it to say that Windows isn’t built this way

Am I wrong pre-APIC HALs are exactly this, using software emulation of
interrupt priority instead of using PIC hardware for this?

–
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

OSR_Community_User · March 22, 2008, 9:05pm

Maxim, right - everything about the discussion of TPR, PPR, etc needs to have “in an APIC system” attached to it. These are APIC registers. As Jake posted earlier in this thread, there isn’t a hard map from vector to IRQL in the Windows PIC implementation.

Dave

Maxim_S_Shatskih · March 22, 2008, 9:16pm

Thanks Dave! one more question:

what is the meaning of InterruptLevel (Irql) and InterruptVector
interrupt “coordinates” for both Raw and Translated?

I suspect one of them (probably Translated) is the IDT vector number, and
another (probably Raw) is the old-style IRQ number (PIC wire number).

Is it so?

And one more question about MSIs: am I correct that an IDT vector is
allocated for each message? from the API documentation I see that a personal
KINTERRUPT is allocated for each message, but what about the IDT vector?

–
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

wrote in message news:xxxxx@ntdev…
> Maxim, right - everything about the discussion of TPR, PPR, etc needs to have
“in an APIC system” attached to it. These are APIC registers. As Jake posted
earlier in this thread, there isn’t a hard map from vector to IRQL in the
Windows PIC implementation.
>
> Dave
>

anton_bassov · March 22, 2008, 9:38pm

> Am I wrong pre-APIC HALs are exactly this, using software emulation of interrupt priority instead of using >PIC hardware for this?

Well, PIC HAL just could not have used PIC hardware features for this purpose simply due to the lack of any. IIRC, its handler stub just disabled IRQ the way I suggest, but it had to do port IO instead of simply writing to memory-mapped register APIC HAL would have to do if it wanted to disable IRQ (but it does not)…

what is the meaning of InterruptLevel (Irql) and InterruptVector interrupt “coordinates” for both Raw and > Translated? I suspect one of them (probably Translated) is the IDT vector number, and another (probably > Raw) is the old-style IRQ number (PIC wire number).

IIRC, this is, indeed, the case (or at least was on XP SP0 when I experimented with this one)…

am I correct that an IDT vector is allocated for each message? from the API documentation I see that a > personal KINTERRUPT is allocated for each message, but what about the IDT vector?

This is an interesting question - IIRC, we discussed it few months ago. The very idea of having to share vector would just defeat the purpose of MSI, but Peter found some indirect suggestion on MSDN that MSI vectors could be shared. IIRC, Jake told us that in its original implementation MSI vectors could be shared, but subsequently they changed it so that every message now has its dedicated vector…

Anton Bassov

OSR_Community_User · March 22, 2008, 10:26pm

From a driver perspective, the most important thing to remember is that the raw resource isn’t all that useful, and the translated resource’s Level and Vector fields are suitable for passing to IoConnectInterrupt. Nothing else should be assumed, and the OS is free to change anything under the covers as long as it keeps things working for drivers that follow these rules.

However, the way things work today, in the raw resource, both Level and Vector are “IRQ” - a representation of the device’s connection to an interrupt controller. For MSI, the MessageInterrupt union in the resource descriptor doesn’t have Level. Vector is a token for the device’s request - it’s basically “IRQ” but there’s no interrupt controller involved. Note that in this definition of “IRQ” it’s not exactly the old-school PIC wire number. It might be an IOAPIC input number, or something else. I’m not aware of any reason the raw descriptor is all that useful in a driver.

In the translated resource Level and Vector are what you’d expect. Level is IRQL, Vector is IDT entry. But again, there’s no reason that has to stay that way. Internally there’s even a level of indirection maintained between Vector and IDT Entry so that the two things aren’t assumed to be exactly the same values.

Dave

OSR_Community_User · March 22, 2008, 10:29pm

And here’s the old thread that Anton referenced with the answer about MSI shareability.

http://www.osronline.com/showThread.cfm?link=118548

Dave

Jake_Oshins · March 23, 2008, 12:11am

I’m assuming that you mean the parameters in a start device IRP. The
answer depends on the HAL you’re running. For modern ACPI/APIC
systems in conjunction with a device that doesn’t implement MSI(-X)
the start device parameters are:

Raw: Vector == Level == IRQ
Translated: Vector == IDT entry, Level == IRQL

You can determine IRQL from IDT entry (except for the lower 32 which
are exceptions, not interrupts) but you cannot determine IDT entry
solely from IRQL.

Jake

“Maxim S. Shatskih” wrote in message
news:xxxxx@ntdev…
> BTW, Jake, am I correct that Level interrupt parameter is the
> (A)PIC input
> wire number, while the Vector parameter is the IDT vector number?
>
> Am I correct that the IRQL on which the ISR will run is 100%
> determined
> only by the IDT vector number, in other words - each IDT vector has
> its
> hard-coded IRQL?
>
> –
> Maxim Shatskih, Windows DDK MVP
> StorageCraft Corporation
> xxxxx@storagecraft.com
> http://www.storagecraft.com
>
>

Jake_Oshins · March 23, 2008, 12:19am

Providing SynchronizeIrql will cause Windows to raise IRQL higher than
that implied by your IDT entry before your ISR is called. This allows
you to share a spinlock between two or more ISRs, by picking the
highest of the IRQLs among them. This is an advanced and rarely used
construct.

Remember that all spinlocks have an associated IRQL. Without that,
you’d have potential deadlocks. Other systems, like Linux (I’ve
heard) don’t really have IRQL but always disable interrupts before
spinning. This is effectively HIGH_LEVEL in Windows.

Jake Oshins
Windows Interrupt Guy

“Maxim S. Shatskih” wrote in message
news:xxxxx@ntdev…
>> Actually, an IDT vector does have a hard-coded IRQL, and an ISR’s
>> IRQL is
>>defined by its IDT vector.
>
> What will occur if I will provide the override for this in
> SynchronizeIrql (for
> a custom spinlock, for instance)?
>
> –
> Maxim Shatskih, Windows DDK MVP
> StorageCraft Corporation
> xxxxx@storagecraft.com
> http://www.storagecraft.com
>