Interrupt Routing -- was: Question abuot masking interrupt

Totally. +1

This seems so obvious to me, that I figure it MUST have been tried and proven unacceptable for some reason. I’d love to know some of the back-story, if somebody in the know wants to provide it…

Peter
OSR

> I 'm asserting that what he is referring to is the ProcessorEnableMask affinity that you specify

when calling IoConnectInterrupt which causes HalEnable(System)Interrupt to be
called for each processor.

Please note that it is IOAPIC’s redirection table entry that specifies CPUs that may get interrupted by a given
source and not the other way around - otherwise you could get into a situation when the same interrupt source maps to different vectors on different CPUs, which implies the same source could have different priorities for different CPUs…

Anton Bassov

Peter,

This seems so obvious to me, that I figure it MUST have been tried and proven unacceptable
for some reason.

Does not APIC bus arbitration protocol ensure that interrupt gets dispensed to the least busy CPU among the ones that are allowed to be interrupted by a given source???

Anton Bassov

Sorry, wrong usage of ‘assert’. It was more like I wanted to see if this
throws an exception or not.

Is it not that the I/O APIC (which belongs to the chipset) directs
interrupts to the local APICs based on their IDTs ? Or is it that the I/O
APIC is programmed separately in this manner ? Unfortunately there is not
much information on the I/O APIC or this redirection table in the public
Intel manuals, they say you need to contact them manually.

//Daniel

wrote in message news:xxxxx@ntdev…
>> I 'm asserting that what he is referring to is the ProcessorEnableMask
>> affinity that you specify
>> when calling IoConnectInterrupt which causes HalEnable(System)Interrupt
>> to be
>> called for each processor.
>
> Please note that it is IOAPIC’s redirection table entry that specifies
> CPUs that may get interrupted by a given
> source and not the other way around - otherwise you could get into a
> situation when the same interrupt source maps to different vectors on
> different CPUs, which implies the same source could have different
> priorities for different CPUs…
>
>
> Anton Bassov
>

Daniel,

Unfortunately there is not much information on the I/O APIC or this redirection table in the
public Intel manuals,

In fact, they just have a separate manual for IOAPIC, so that they don’t go into too much details about it in
their 3-volume developer’s manuals. Please find a link to IOAPIC manual below - this doc goes into all details of IOAPIC, including even pin layout.

they say you need to contact them manually.

Luckily, it is not as bad as that - they’ve got quite a few docs in a public domain. Just enter 'IOAPIC" into a search box on Intel site, and the very first link that you will get is http://www.intel.com/design/chipsets/datashts/290566.htm

If you want to discover the mapping of a particular device to particular IOAPIC pin, then you have to read BIOS -related docs as well. The link below may be quite helpful
http://www.intel.com/design/archives/processors/pro/docs/242016.htm.

Although, according to Jake, Windows does not even look at tables that above mentioned doc describes, and, instead, goes right to ACPI ones, I believe it may still helpful - after all, in terms of complexity, layout of these tables does not go anywhere close to ACPI ones, while there is a good chance that info
found in these tables may be valid even on machines with ACPI HAL…

If you want more than that, then you can download ACPI Specs. However, I must warn you in advance that, unlike Intel manuals, this is not the easiest read one would imagine…

Anton Bassov

The IOAPIC has a mode called lowest priority delivery, where the chipset chooses the right destination from a set of processors based on the interrupt priority of the various processors. The problem is that interrupt priority changes a lot. Like a whole lot. Acquiring spinlocks changes it, timers and other DPCs change it, and of course interrupts change it. And the information about priority is kept in the processor, but the chipset needs to act on it. The latency and overhead of transmitting this information meant that it was basically always stale. And so many systems gave up trying.

Some chipset/processor combinations forward to just one processor. Some round robin or hash based on vector number. All of these options are simpler than perfect lowest priority, but have negative effects in some workloads. But they exist, and new processors/chipsets are continuing to do this differently, so there is still a desire to get the distribution right.

Dave

>

Totally. +1

This seems so obvious to me, that I figure it MUST have been tried and
proven
unacceptable for some reason. I’d love to know some of the
back-story, if
somebody in the know wants to provide it…

Having a read of the Linux mailing list archives where this is discussed
is an interesting (but lengthy) thing to do. Some of the more specific
discussions are obviously not relevant to windows but still interesting.

James

Simply put, no it doesn’t.

First of all, none of us have seen a machine with an APIC bus in recent
memory. Second, when the APIC bus did exist, it guaranteed only that the
processor with the lowest TPR value got interrupted. But NT idles
processors (for various reasons) at DISPATCH_LEVEL. So the least busy
processor gets interrupted well after processors doing useful work at
PASSIVE_LEVEL.


Jake Oshins
Hyper-V I/O Architect
Windows Kernel Group

This post implies no warranties and confers no rights.


wrote in message news:xxxxx@ntdev…
> Peter,
>
>> This seems so obvious to me, that I figure it MUST have been tried and
>> proven unacceptable
>> for some reason.
>
> Does not APIC bus arbitration protocol ensure that interrupt gets
> dispensed to the least busy CPU among the ones that are allowed to be
> interrupted by a given source???
>
> Anton Bassov
>

All of the relevant information is publicly available from Intel. In
particular, see Volume 3, Chapter 8, Section 11 of the Programmer’s
Reference Manual. For the I/O APIC, any chipset datasheet will do.

The I/O APIC redirection table sends interrupts in various modes. The mode
we’ve been implicitly describing (lowest priority mode) describes a set of
processors that an interrupt is sent to. One of them gets the interrupt.
Which one is really up to the north bridge.


Jake Oshins
Hyper-V I/O Architect
Windows Kernel Group

This post implies no warranties and confers no rights.


wrote in message news:xxxxx@ntdev…
> Sorry, wrong usage of ‘assert’. It was more like I wanted to see if this
> throws an exception or not.
>
> Is it not that the I/O APIC (which belongs to the chipset) directs
> interrupts to the local APICs based on their IDTs ? Or is it that the I/O
> APIC is programmed separately in this manner ? Unfortunately there is not
> much information on the I/O APIC or this redirection table in the public
> Intel manuals, they say you need to contact them manually.
>
> //Daniel
>
>
> wrote in message news:xxxxx@ntdev…
>>> I 'm asserting that what he is referring to is the ProcessorEnableMask
>>> affinity that you specify
>>> when calling IoConnectInterrupt which causes HalEnable(System)Interrupt
>>> to be
>>> called for each processor.
>>
>> Please note that it is IOAPIC’s redirection table entry that specifies
>> CPUs that may get interrupted by a given
>> source and not the other way around - otherwise you could get into a
>> situation when the same interrupt source maps to different vectors on
>> different CPUs, which implies the same source could have different
>> priorities for different CPUs…
>>
>>
>> Anton Bassov
>>
>

In the Bad Old Days, when interrupt line correlated 1:1 with priority, a
number of cards would interrupt on two lines, one for the less critical
interrupt, one for the really critical interrupt.

Assuming that all devices are equally important can lead to the priority
inversion problem.

What I don’t understand is how a PCI BIOS can *a priori* determine how
important a device is relative to other devices.

Also note that using a priority-ordered DPC queue simply changes the problem
slightly; the issue of how priorities are established remains a problem.
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@hotmail.com
Sent: Thursday, July 30, 2009 10:36 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Interrupt Routing – was: Question abuot masking
interrupt

I am dubious about the merits of a general purpose OS that provides a
configurable
interrupt priority scheme. My devices would always want the highest
priority available.

…which, in turn, raises questions about the reasoning behind prioritizing
hardware interrupts to one another, in the first place. To begin with,
the same device may interrupt for different reasons (for example, send
completion and data arrival on NIC), and sometimes these reasons may, from
the logical point of view, imply different priorities of interrupts
handling. I think it would be much more reasonable to treat all hardware
interrupts (apart from timer, of course) as equals while allowing a wide
range of priorities of software interrupts that ISRs defer the work to, and
enforcing the requirement for ISRs to do as little work as possible (i.e.
check the reason for interrupt, program device registers to stop in from
interrupting and request software interrupt that does the further
processing)…

Anton Bassov


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

> First of all, none of us have seen a machine with an APIC bus in recent memory.

Indeed, starting from Pentium IV, the system bus is used for IOAPIC- to- local APIC communications.
However, the way I understand it (probably wrong anyway), particular details may be chipset-specific. Therefore, in context of Peter’s question the very first thing that got into my head was APIC bus arbitration protocol that is used by P6 family and Pentium processors - Intel Developer’ s Manual describes it in great detail. Judging from this description, I somehow arrived to the conclusion that the least busy processor is guaranteed to be chosen on these systems. More on it below…

Second, when the APIC bus did exist, it guaranteed only that the processor with the lowest
TPR value got interrupted. But NT idles processors (for various reasons) at DISPATCH_LEVEL.
So the least busy processor gets interrupted well after processors doing useful
work at PASSIVE_LEVEL.

Sorry, but this is already a software-related issue - objectively, the hardware just cannot make any
judgement about the actual importance of the task that a given CPU performs, so it has to decide only upon the information in hardware registers. However, if I got it right, Peter was speaking about guarantees provided by the hardware platform…

Anton Bassov

> Assuming that all devices are equally important can lead to the priority inversion problem.

If you think about it carefully you will realize that the above statement contradicts itself - priority inversion just
does not make sense when everyone is equally important, don’t you think…

What I don’t understand is how a PCI BIOS can *a priori* determine how important a device
is relative to other devices.

BIOS does not assign priorities to devices …

Also note that using a priority-ordered DPC queue simply changes the problem slightly; the issue
of how priorities are established remains a problem.

I think this decision can be left to driver writers pretty much the same way decisions about thread priorities are left to app ones. The argument that everyone would want to have the highest possible priority
seems to be faulty - if your device’s minimal latency comes at the price of unresponsive GUI you will, apparently, think twice about trying it…

Anton Bassov

Note that I said “assuming all devices are equally important”, which is
often an invalid assumption; perhaps the correct statement would have been

“Assuming the all devices are equally important belies the fact that they
are, in real systems, *not* equally important; consequently, this assumption
leads to a problem analogous to the priority-inversion problem when a device
which actually *is* more important (in terms of meeting an interrupt latency
requirement) is blocked for an indeterminate time because it is erroneously
treated as a peer of all the other devices.”

I didn’t realize I needed to be so precise in stating what seemed obvious.

The PCI BIOS certainly does assign priorities; I know this because in some
bizarre cases I’ve had to go into the PCI BIOS setup to reserve an
non-exclusive interrupt line, and my choice of reserved line impacted the
priority; it took some experimentation to discover which line gave the best
performance (although this was far enough in the past that I was probably
working with a PIC system). Embedded systems, including MS-DOS systems, do
not have any code to assign device priorities, yet the device priorities are
nonetheless assigned. I never had to assign device priorities when working
on embedded x86 systems, yet they were clearly assigned. One of the
problems we had was that for devices that required low-latency service, the
priorities were assigned incorrectly relative to what we needed. But
sometimes we just couldn’t get the priorities assigned in the way we wanted.

Note that in a single app, the designer of the app gets to assign the thread
priorities based on an understanding of the relative importance of the
threads. This is also what we did in real-time operating systems; and
techniques such as rate monotonic analysis take these priorities and thread
execution times can be used to determine the balance of the thread mix and
the priorities can be adjusted to achieve a feasible solution.

A device driver, on the other hand, works in isolation; it cannot determine
its importance relative to a set of unknown device drivers in any given
situation. Therefore, it is not as simple as in working with a mix of
threads in a single app. Note also that the “thread priority” game only
works well in vertical market systems where the entire collection of apps is
predetermined; the impact of manipulating thread priorities in a system with
an unknown set of applications running is also unsolvable. I’m not sure how
any application writer can magically determine the correct thread priorities
to set to achieve a specified performance without (a) interfering with
unknown and unknowable apps that may coexist or (b) being interfered with by
unknown and unknowable apps.

In one case, we gave up 25% of the CPU resources to guarantee both the GUI
remained responsive and the realtime-constrained threads handled their
workload correctly; the trick is to SetThreadAffinityMask for the GUI to one
CPU (arbitrarily, CPU0) and the worker threads to the system mask & ~1 (that
is, to never run on CPU0). But note that this required some serious
assumptions that are not necessarily consistent with a general-purpose
system, but we were working in a vertical-market application turnkey-system
style environment.

When I deliver an app, and I have discovered the (very rare) need to
manipulate thread priorities, I always make these choices settable so an end
user can adjust them to make sure they work in the real environment in which
the app must live. Ultimately, the priorities can only have meaning in
context. A given app may work well with one priority assignment and fail in
another, and each such instance would be determinable only in the context of
the system which is actually running with its existing mix of applications.
In the case of device drivers, the importance of an interrupt may also
profoundly affect my application’s responsiveness, which is what I’ve
encountered in practice. But I didn’t know how to solve the problem.
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@hotmail.com
Sent: Friday, July 31, 2009 6:52 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Interrupt Routing – was: Question abuot masking
interrupt

Assuming that all devices are equally important can lead to the priority
inversion problem.

If you think about it carefully you will realize that the above statement
contradicts itself - priority inversion just
does not make sense when everyone is equally important, don’t you think…

What I don’t understand is how a PCI BIOS can *a priori* determine how
important a device
is relative to other devices.

BIOS does not assign priorities to devices …

Also note that using a priority-ordered DPC queue simply changes the
problem slightly; the issue
of how priorities are established remains a problem.

I think this decision can be left to driver writers pretty much the same way
decisions about thread priorities are left to app ones. The argument that
everyone would want to have the highest possible priority
seems to be faulty - if your device’s minimal latency comes at the price of
unresponsive GUI you will, apparently, think twice about trying it…

Anton Bassov


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

> Assuming the all devices are equally important belies the fact that they are, in real systems,

*not* equally important;

What you should gave added is " IN A GIVEN CONTEXT" - after all, device priority is relative thing and may depend on various factors like type of operation, priorities. of apps that wait for data from it, etc.
For example, priority of USB controller in a situation when isochronous transfer is in the schedule should, apparently, be different from the one when only bulk ones are in sight…

…this assumption leads to a problem analogous to the priority-inversion problem when
a device which actually *is* more important (in terms of meeting an interrupt latency requirement)
is blocked for an indeterminate time because it is erroneously treated as a peer of
all the other devices."

Actually, assigning fixed priorities to devices/controllers themselves does not seem to eliminate this problem either (just look at the above example of USB controller). Therefore, I think treating all devices (apart from timer, of course) as equals at the time of interrupt while giving a wider range of priorities to deferred tasks that ISRs queue seems to address this problem better…

A device driver, on the other hand, works in isolation; it cannot determine its importance relative
to a set of unknown device drivers in any given situation.

In fact, this part is relatively easy - all that is needed is a predefined set of priority constants like INTERACTIVE_EVENT, NIC_DATA_ARRIVAL, NIC_SEND_COMPLETE, DISK_IO,
so that a driver writer can schedule a job depending on a given situation…

Anton Bassov

On Sun, Aug 2, 2009 at 9:58 AM, wrote:
> Therefore, I think treating all devices (apart from timer, of course) as equals at the time of interrupt while giving a wider range of priorities to deferred tasks that ISRs queue seems to address this problem better.

Which is approximately what NT tries to do with its isr/dpc design and
the addition of threaded DPCs and increased dpc priority granularity
in current releases.

Has anyone actually ever used a threaded dpc?

Mark Roddy

> Which is approximately what NT tries to do with its isr/dpc design and the addition

of threaded DPCs and increased dpc priority granularity in current releases.

Actually, NT does not go anywhere close to it…

In fact, it does exactly the opposite - it prioritizes hardware interrupts to one another, i.e makes a clear
distinction between priorities of ISR invocations, while making relatively small distinction between the ones of deferred jobs that ISRs queue. What I am speaking about is treating all ISRs ,apart from timer, equally (i.e. either making them pre-emptible by one another on LIFO basis or, instead, queuing them on FIFO one - in either case, EOI has to be issued prior to ISR invocation, which implies disabling interrupt source in the stub) , while giving a wider range of DPC priorities, so that DPC X may get preempted by DPC Y if the latter is of higher priority than the former…

Anton Bassov

>ones of deferred jobs that ISRs queue. What I am speaking about is treating all ISRs ,apart from timer,

equally

The proper realtime design:

  • use interrupt threads
  • use trivial, uncustomizable, very short ISRs whose only job is to wake the interrupt threads
  • apply policies to the scheduler about the guaranteed timeslice and latency, which will work for both usual threads and interrupt threads.

Note that timeslice and latency bugdets are competitors, i.e. the thread A’s worst-case latency is the sum of all guaranteed timeslices for all threads but A.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

“Maxim S. Shatskih” wrote in message
news:xxxxx@ntdev…
>>ones of deferred jobs that ISRs queue. What I am speaking about is
>>treating all ISRs ,apart from timer,
>>equally
>
> The proper realtime design:
>
> - use interrupt threads
> - use trivial, uncustomizable, very short ISRs whose only job is to wake
> the interrupt threads
> - apply policies to the scheduler about the guaranteed timeslice and
> latency, which will work for both usual threads and interrupt threads.
>

And all this resembles another Microsoft OS: WinCE (except of, maybe, the
last bullet).

You’re right, the host OS should prioritize the interrupt-initiated
activities above ISRs.

Modern systems seem to have very limited means for tweaking ISR priority by
hardware:
PCI has only 4 IRQs (A B C D), which seem to be prioritized by the host
interrupt controller.
With APIC, interrupts can have only 3 priorities: normal, high or low
( as described, for example, in
http://www.microsoft.com/whdc/archive/MSI.mspx )

Regards,
–pa

> Modern systems seem to have very limited means for tweaking ISR priority by hardware:

PCI has only 4 IRQs (A B C D), which seem to be prioritized by the host interrupt controller.

Only by PIC…

APIC leaves it to the OS to prioritize interrupts to one another by mapping them to vectors of its own choice, with interrupt priority implied by vector number…

With APIC, interrupts can have only 3 priorities: normal, high or low ( as described, for example, in >http://www.microsoft.com/whdc/archive/MSI.mspx )

The doc you refer to does not say it, and it could not have said it because it is totally wrong - I am afraid you just misunderstood it…

APIC defines 15 priority groups (16 vectors per priority with first 16 IDT entries reserved by Intel )and
not 3 as you claim, with priority implied by vector number. What this doc says is that there are 3 possible priorities for MSI -capable devices relatively to other devices under Vista. If device is MSI-capable and the OS decides to take advantage of its MSI capability, device will raise interrupts by directly writing to memory, rather than by asserting the pin - there is no need for sharing MSI vectors. Therefore, once MSI vectors don’t have to be shared, this baby can be made simply by allocating a vector for message-signaled interrupt from a group with a given interrupt priority.

Furthermore, the docs was careful enough to say that his applies only to MSI interrupts. If devices raises interrupts via a pin, two devices may be just physically bound by motherboard wiring to raise interrupts via the same pin, i.e. will be bound to share interrupt vector. Once priority is implied by vector number,
they may be forced to have the same interrupt priority, with the OS being unable to do anything about it
(in some cases BIOS may allow the OS to choose PCI routing group for a given device, but in some cases the whole thing is defined by motherboard wiring)…

Anton Bassov

wrote in message news:xxxxx@ntdev…

>> With APIC, interrupts can have only 3 priorities: normal, high or low (
>> as described, for example, in
>> >http://www.microsoft.com/whdc/archive/MSI.mspx )
>
> The doc you refer to does not say it, and it could not have said it
> because it is totally wrong - I am afraid you just misunderstood it…

?? In Interrupt Prioritization section:
“Establishing an interrupt priority policy is supported on Windows Vista
systems with ACPI and with enabled [APICs], for devices that use traditional
line-based interrupts or MSIs.”

> APIC defines 15 priority groups (16 vectors per priority with first 16 IDT
> entries reserved by Intel )and
> not 3 as you claim, with priority implied by vector number. What this doc
> says is that there are 3 possible priorities for MSI -capable devices
> relatively to other devices under Vista. If device is MSI-capable and the
> OS decides to take advantage of its MSI capability, device will raise
> interrupts by directly writing to memory, rather than by asserting the
> pin - there is no need for sharing MSI vectors. Therefore, once MSI
> vectors don’t have to be shared, this baby can be made simply by
> allocating a vector for message-signaled interrupt from a group with a
> given interrupt priority.
>

Well. APIC is functional superset of everything known before it, so yes it
has same 15 or so prioritities and even several groups.
So yes you can assign fixed priorities, like in the old DOS times, or maybe
reprogram the priorities in runtime.
Don’t know why interrupt priority policy in NT6+ depends on APIC -
because this dependency would defeat the idea of relaying prioritization of
interrupt handling to the OS scheduling.
Also maybe there are platforms without APICs or with something different
(AMD?).

>… two devices may be just physically bound by motherboard wiring to
>raise interrupts via the same pin, i.e. will be bound to share interrupt
>vector. Once priority is implied by vector number,
> they may be forced to have the same interrupt priority, with the OS being
> unable to do anything about it
> (in some cases BIOS may allow the OS to choose PCI routing group for a
> given device, but in some cases the whole thing is defined by motherboard
> wiring)…

Isn’t this another confirmation that hardcoded priorities are outdated?
Hardwired priority of ISR plays a very little role, if all ISRs can be kept
short. When the handling
continues at DPC level, hardware details don’t matter. What matters is how
the OS
implements DPCs - such as one or more queue(s) or scheduled threads.
But the problem with selection of CPU to deliver interrupts to, as explained
by Jake O.,
is just too bad, unless the hardware will cooperate :frowning:

Regards,
–pa