ISR Latency question

OSR_Community_User-27 · December 24, 2001, 8:06am

During the “latency” between the occurence of an interrupt and the time an
ISR receives control, is the operating system actually preprocessing the
interrupt? For instance: if we say that we find an average latency of 50us
until entering the ISR, does that mean that every interrupt generated
steals 50us of time from the processor (plus the actual time in the ISR,
plus the post processing)? Or is this simply the time it takes for the
processor to get around to processing the interrupt (due to higher priority
interrupts, perhaps)?
Having set bpint breakpoints in softice and traced through, I am aware that
there is indeed quite a bit of system preprocessing code which is executed
upon every interrupt. Nevertheless, the latencies we talk about, in the
tens of microseconds, seem much larger than necessary to run that
preprocessing code.

Sincerely,
Avi

You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

OSR_Community_User · December 24, 2001, 2:38pm

>During the “latency” between the occurence of an interrupt and the time an

ISR receives control, is the operating system actually preprocessing the
interrupt? For instance: if we say that we find an average latency of 50us
until entering the ISR, does that mean that every interrupt generated
steals 50us of time from the processor (plus the actual time in the ISR,
plus the post processing)? Or is this simply the time it takes for the
processor to get around to processing the interrupt (due to higher
priority interrupts, perhaps)?

The time spent processing interrupts is taken away from all processing at
lower priority levels. If it’s only a few percent, generally nobody worries
about it. If it takes 50% of the processor, that’s not so good.

A latency of 50 uS is possibly caused by multiple devices sharing an
interrupt level. Is it a PCI device? Computers with ACPI seem to often put
many devices on the same interrupt level. If YOUR device causes a lot of
interrupts, and is sharing an interrupt level with a bunch of other
devices, and your ISR get’s put at the end of a chain of devices, the ISR
of every device ahead of you device will have to be run to check for the
interrupt source. Anytime you read from an I/O bus, things slow down a LOT
compared to normal memory reads. On modern processors, reading a single
location from a PCI bus is like hundreds of processor clock cycles

It’s also possible there are higher priority interrupts that defer
processing of your interrupt. If YOUR device only is active when say a
network card is running at high bandwidth, this might be possible. If your
device is generating many interrupts, and no other devices are highly
active, interrupt deferal is probably not the source of your latency.

It is possible to use a tool like Intel’s VTune to profile CPU time spent
in interrupt handlers (your and others).

Interrupt latency has been deeply analyzed a number times, and there are
papers on the web that give some interesting insights. Searching for older
messages on this list should find some url’s to that info.

Jan

You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

OSR_Community_User · December 24, 2001, 4:44pm

> interrupt? For instance: if we say that we find an average latency of 50us

until entering the ISR, does that mean that every interrupt generated
steals 50us of time from the processor (plus the actual time in the ISR,

Several thousands interrupts per second (V90 modem + Media Player + WinDbg host catching enormous amounts of debug prints - hundreds
of thousands or such - testing the AVL tree code reliability under heavy load, a character is printed on each tree rebalance step) -
do not slow down the machine seriously.
Well, WinDbg does, if its Command window is not minimized just for redraw.

I can say more - 2500 interrupts per second do not slow down the P-166 machine seriously.

So, 50us maybe contains some hardware latency inside PIC.
Maybe an oscilloscope will help to catch the moment of ISR started to execute?

Max

You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

OSR_Community_User · December 25, 2001, 1:25am

if you want to reduce latency do this:

use exclusive vector when connecting
otherwize, the kernel will loop through all registered isr’s (on your
shared vector)
having each one query it’s hardware
use affinity when connecting if you are smp
call your isr in the background!!!
to make sure that the TLB, code cache and data cache are not flushing
your isr away
you might consider running on smp and not using interrupts at all
simply busy loop on one processor waiting for the int# condition
(osr’s book hints at this)
you might want to connect as a higher DIRQL then the system assigns you,
if you feel other devices are bothering you.

the bottom line is to understand, in my oprinion, that interrupt latency is
not an issue of how long before the cpu gets interrupted!
that happens very quickly (and is 100% deterministic - you should be able to
find real numbers from the pc manufacturer’s data sheets),

but rather, the time it takes the cpu to translate the IDT virtual address
into a physical address, and then to access variables from RAM, needing to
translate each one’s address from virtual to physical (requiring 3 RAM
accesses).

Also, if using shared irq’s, the actual IDT’s isr has to loop through all
isr’s on the same irq, each one needing to go through the same story.

So, in my opinion, it is not hardware interrupt latency that is producing
the large delays we always hear about,
but rather Operating System issues, such as searching for the proper isr,
TLB and the caches.

call me if you want to discuss this further
056-657-169
regards to Asher, Mimi, Yeoshua and the rest of the gang at excalibur
assaf

----- Original Message -----
From: “Avi Shmidman”
To: “NT Developers Interest List”
Sent: Monday, December 24, 2001 3:05 PM
Subject: [ntdev] ISR Latency question

>
> During the “latency” between the occurence of an interrupt and the time an
> ISR receives control, is the operating system actually preprocessing the
> interrupt? For instance: if we say that we find an average latency of 50us
> until entering the ISR, does that mean that every interrupt generated
> steals 50us of time from the processor (plus the actual time in the ISR,
> plus the post processing)? Or is this simply the time it takes for the
> processor to get around to processing the interrupt (due to higher
priority
> interrupts, perhaps)?
> Having set bpint breakpoints in softice and traced through, I am aware
that
> there is indeed quite a bit of system preprocessing code which is executed
> upon every interrupt. Nevertheless, the latencies we talk about, in the
> tens of microseconds, seem much larger than necessary to run that
> preprocessing code.
>
> Sincerely,
> Avi
>
>
> —
> You are currently subscribed to ntdev as: xxxxx@hotmail.com
> To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com
>

—
You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

Jake_Oshins · December 26, 2001, 2:03am

I hesitate to respond to this, since I imagine that it will suck me into
a never ending discussion. But if anybody follows your advice, there
will be even more device drivers in the world that have little,
unexplained, odd behaviors. I’ll try to break it down point by point.

If your device is on either a PCI bus or a bus that mimics the PCI
protocol, then there may be other devices that are hard-wired to share
interrupts with your device. The PCI spec says only that, when your
device wants to generate an interrupt, it should ground one of the INTx#
pins. It’s up to the motherboard designer to determine whether the INTx
pins are wired together or whether they each connect to a distinct input
on the interrupt controller. Most low-end motherboards use the 8259 PIC
interrupt controller, which makes them interrupt-constrained. In
practice, this means that your device will share interrupts unless you
choose the motherboard carefully.

Furthermore, if you connect with “exclusive” chosen, one of three things
will happen:

A) The motherboard has already guaranteed you exclusivity, so the choice
is moot.
B) Another device is sharing, and it has already connected. This will
result in your device failing to connect its interrupt.
C) Another device is sharing, and it has not already connect. This will
result in your device working and the other driver will experience a
failure to connect an interrupt. (Please don’t write any device drivers
that actually cause other devices to fail to function.)

I’m not sure exactly what you mean by this. But, again, because PCI
devices may be forced to share interrupts, I’d like to write for a
moment on the topic of affinity. If your device is sharing, then it
must share each and every processor with the other devices on the
chains. Consider this example. You’re sharing with a SCSI controller.
The SCSI controller has already started, connecting its ISR to processor
A and processor B. The input on the interrupt controller will
subsequently be unmasked, and directed to both processors. Now your
device gets IRP_MN_START_DEVICE with an affinity mask that includes both
processors. You decide to change that mask to include only processor B,
and you connect your interrupt. This will cause the kernel to connect
your ISR to processor B’s chain, but not to processor A’s. Now your
device interrupts. It may go to either processor. If it goes to
processor B, everything works fine, since your ISR will run. If the
next interrupt goes to processor A, then processor A will start running
through its ISR chain. Since your driver’s ISR is not in that chain,
this interrupt cannot be dismissed. (I consider it a bug in the PCI
spec that there is no way to dismiss an interrupt without running a
driver-supplied ISR. But that’s another discussion, particularly
because we’ve gotten that fixed in PCI 2.3.) When processor A gets to
the end of the chain, it acks the interrupt, even though no ISR has
claimed it. Since PCI interrupts are level-triggered, this will cause
the interrupt to be immediately re-asserted. Most SMP machines are
built with APIC interrupt controllers that will re-assert this interrupt
on the same processor that just failed to handle it, causing processor A
to now go into an endless loop, failing to handle the interrupt.
Processor A will remain at the associated Device IRQL, never dropping
down low enough to handle DPCs. The machine will continue to run only
as long as it doesn’t depend on processor A to handle a DPC.
Dependencies of this sort usually take between 30 and 90 seconds to crop
up. At that point, the machine will appear completely hung to the user.

The point I’m tring to make is that you should never do anything with
your affinity mask other than just pass it through.

This seems harmless, though it seems just as likely to reduce the
performance of the machine.
If you have that much control over your environment, this might be
the way to go. But, again, if you do this at high IRQL, you’ll
eventually deadlock the machine. If you do it at low IRQL, you’ll may
still have latency problems.
The DIRQL that you connect with is used mainly when you call
KeSynchronizeExecution, so that the system can take the spinlock at the
right IRQL. The DIRQL that your ISR is called at is determined by the
vector that you’re attached to. You can’t help that.

The latency between when a device interrupts and when its ISR is called
is mostly a matter of waiting for other ISRs at equal or greater IRQL.
Or it’s a matter of waiting for code that has explicitly raised IRQL.

Jake Oshins
(the guy who maintains interrupt-related stuff in the NT kernel)

-----Original Message-----
Subject: ISR Latency question
From: “Assaf Wodeslavsky”
Date: Tue, 25 Dec 2001 00:21:06 +0200
X-Message-Number: 1

if you want to reduce latency do this:
1. use exclusive vector when connecting
otherwize, the kernel will loop through all registered isr’s (on
your
shared vector)
having each one query it’s hardware
2. use affinity when connecting if you are smp
3. call your isr in the background!!!
to make sure that the TLB, code cache and data cache are not
flushing
your isr away
4. you might consider running on smp and not using interrupts at all
simply busy loop on one processor waiting for the int# condition
(osr’s book hints at this)
5. you might want to connect as a higher DIRQL then the system assigns
you,
if you feel other devices are bothering you.

the bottom line is to understand, in my oprinion, that interrupt latency
is
not an issue of how long before the cpu gets interrupted!
that happens very quickly (and is 100% deterministic - you should be
able to
find real numbers from the pc manufacturer’s data sheets),

but rather, the time it takes the cpu to translate the IDT virtual
address
into a physical address, and then to access variables from RAM, needing
to
translate each one’s address from virtual to physical (requiring 3 RAM
accesses).

Also, if using shared irq’s, the actual IDT’s isr has to loop through
all
isr’s on the same irq, each one needing to go through the same story.

So, in my opinion, it is not hardware interrupt latency that is
producing
the large delays we always hear about,
but rather Operating System issues, such as searching for the proper
isr,
TLB and the caches.

call me if you want to discuss this further
056-657-169
regards to Asher, Mimi, Yeoshua and the rest of the gang at excalibur
assaf

----- Original Message -----
From: “Avi Shmidman”
To: “NT Developers Interest List”
Sent: Monday, December 24, 2001 3:05 PM
Subject: [ntdev] ISR Latency question

>
> During the “latency” between the occurence of an interrupt and the
time an
> ISR receives control, is the operating system actually preprocessing
the
> interrupt? For instance: if we say that we find an average latency of
50us
> until entering the ISR, does that mean that every interrupt generated
> steals 50us of time from the processor (plus the actual time in the
ISR,
> plus the post processing)? Or is this simply the time it takes for the
> processor to get around to processing the interrupt (due to higher
priority
> interrupts, perhaps)?
> Having set bpint breakpoints in softice and traced through, I am aware
that
> there is indeed quite a bit of system preprocessing code which is
executed
> upon every interrupt. Nevertheless, the latencies we talk about, in
the
> tens of microseconds, seem much larger than necessary to run that
> preprocessing code.
>
> Sincerely,
> Avi
>
>
> —
> You are currently subscribed to ntdev as: xxxxx@hotmail.com
> To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com
>

—
You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

OSR_Community_User · December 26, 2001, 5:14pm

>I hesitate to respond to this, since I imagine that it will suck me into

a never ending discussion.

Jake,

Even here, I think discussions eventually wind down. Actually, I’d think
because of your experience, you might be in a unique position to bring
clarity very quickly to certain discussions.

I get the sense your interested in helping the quality of Windows drivers,
and you also need to be careful about what you say, for a variety of reasons.

Let me toss out an idea… Perhaps the ntdev community could periodically
compile a list of “we’re stumped” questions, and somehow pass them to the
OS folks at Microsoft. You folks could then respond to some of those
questions. This would help us developers write better drivers, and ALL of
our customers would be happier. As the ntdev community has a number or very
knowledgeable people, we could take care not to waste your time with silly
questions.

As I’ve found the ntdev list to be a tremendous help at times, I try to
give back to the ntdev community. It seems like all us kernel developers
are pretty vulnerable to each other’s design choices. Things would be much
better overall if we worked together instead of against each other.

Jan

You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

Mark_Roddy · December 26, 2001, 7:09pm

I’m with you right up to the part on interrupt chaining. The chaining is
through the interrupt object itself, not through some other data
structure. If the second device in your example reduces the affinity to
processor B it should be the case that all shared interrupts for that
vector are directed only to processor B, and none to processor A.
Processor A and B do not have individual ‘interrupt chains’, they have a
jump table that is connected to one or more interrupt objects, and all
processors are using the same jump table. To set the affinity for a
specific interrupt vector, the interrupt mask for all processors not in
the group are set so that they do not see that interrupt, and that is
all there is to it.

Now as to the question of is it rude to set the affinity on a shared
interrupt, that is a question that can only be answered by ‘it depends’,
and ‘suitability for purpose’.

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Jake Oshins
Sent: Wednesday, December 26, 2001 1:59 AM
To: NT Developers Interest List
Subject: [ntdev] RE: ISR Latency question

I hesitate to respond to this, since I imagine that it will
suck me into a never ending discussion. But if anybody
follows your advice, there will be even more device drivers
in the world that have little, unexplained, odd behaviors.
I’ll try to break it down point by point.

If your device is on either a PCI bus or a bus that
mimics the PCI protocol, then there may be other devices that
are hard-wired to share interrupts with your device. The PCI
spec says only that, when your device wants to generate an
interrupt, it should ground one of the INTx# pins. It’s up
to the motherboard designer to determine whether the INTx
pins are wired together or whether they each connect to a
distinct input on the interrupt controller. Most low-end
motherboards use the 8259 PIC interrupt controller, which
makes them interrupt-constrained. In practice, this means
that your device will share interrupts unless you choose the
motherboard carefully.

Furthermore, if you connect with “exclusive” chosen, one of
three things will happen:

A) The motherboard has already guaranteed you exclusivity, so
the choice is moot.
B) Another device is sharing, and it has already connected.
This will result in your device failing to connect its interrupt.
C) Another device is sharing, and it has not already connect.
This will result in your device working and the other driver
will experience a failure to connect an interrupt. (Please
don’t write any device drivers that actually cause other
devices to fail to function.)

I’m not sure exactly what you mean by this. But, again,
because PCI devices may be forced to share interrupts, I’d
like to write for a moment on the topic of affinity. If your
device is sharing, then it must share each and every
processor with the other devices on the chains. Consider
this example. You’re sharing with a SCSI controller. The
SCSI controller has already started, connecting its ISR to
processor A and processor B. The input on the interrupt
controller will subsequently be unmasked, and directed to
both processors. Now your device gets IRP_MN_START_DEVICE
with an affinity mask that includes both processors. You
decide to change that mask to include only processor B, and
you connect your interrupt. This will cause the kernel to
connect your ISR to processor B’s chain, but not to processor
A’s. Now your device interrupts. It may go to either
processor. If it goes to processor B, everything works fine,
since your ISR will run. If the next interrupt goes to
processor A, then processor A will start running through its
ISR chain. Since your driver’s ISR is not in that chain,
this interrupt cannot be dismissed. (I consider it a bug in
the PCI spec that there is no way to dismiss an interrupt
without running a driver-supplied ISR. But that’s another
discussion, particularly because we’ve gotten that fixed in
PCI 2.3.) When processor A gets to the end of the chain, it
acks the interrupt, even though no ISR has claimed it. Since
PCI interrupts are level-triggered, this will cause the
interrupt to be immediately re-asserted. Most SMP machines
are built with APIC interrupt controllers that will re-assert
this interrupt on the same processor that just failed to
handle it, causing processor A to now go into an endless
loop, failing to handle the interrupt. Processor A will
remain at the associated Device IRQL, never dropping down low
enough to handle DPCs. The machine will continue to run only
as long as it doesn’t depend on processor A to handle a DPC.
Dependencies of this sort usually take between 30 and 90
seconds to crop up. At that point, the machine will appear
completely hung to the user.

The point I’m tring to make is that you should never do
anything with your affinity mask other than just pass it through.

This seems harmless, though it seems just as likely to
reduce the performance of the machine.

If you have that much control over your environment, this
might be the way to go. But, again, if you do this at high
IRQL, you’ll eventually deadlock the machine. If you do it
at low IRQL, you’ll may still have latency problems.

The DIRQL that you connect with is used mainly when you
call KeSynchronizeExecution, so that the system can take the
spinlock at the right IRQL. The DIRQL that your ISR is
called at is determined by the vector that you’re attached
to. You can’t help that.

The latency between when a device interrupts and when its ISR
is called is mostly a matter of waiting for other ISRs at
equal or greater IRQL. Or it’s a matter of waiting for code
that has explicitly raised IRQL.

Jake Oshins
(the guy who maintains interrupt-related stuff in the NT kernel)

-----Original Message-----
Subject: ISR Latency question
From: “Assaf Wodeslavsky”
> Date: Tue, 25 Dec 2001 00:21:06 +0200
> X-Message-Number: 1
>
> if you want to reduce latency do this:
> 1. use exclusive vector when connecting
> otherwize, the kernel will loop through all registered
> isr’s (on your shared vector)
> having each one query it’s hardware
> 2. use affinity when connecting if you are smp
> 3. call your isr in the background!!!
> to make sure that the TLB, code cache and data cache are
> not flushing your isr away 4. you might consider running on
> smp and not using interrupts at all
> simply busy loop on one processor waiting for the int# condition
> (osr’s book hints at this)
> 5. you might want to connect as a higher DIRQL then the
> system assigns you, if you feel other devices are bothering you.
>
> the bottom line is to understand, in my oprinion, that
> interrupt latency is not an issue of how long before the cpu
> gets interrupted! that happens very quickly (and is 100%
> deterministic - you should be able to find real numbers from
> the pc manufacturer’s data sheets),
>
> but rather, the time it takes the cpu to translate the IDT
> virtual address into a physical address, and then to access
> variables from RAM, needing to translate each one’s address
> from virtual to physical (requiring 3 RAM accesses).
>
> Also, if using shared irq’s, the actual IDT’s isr has to loop
> through all isr’s on the same irq, each one needing to go
> through the same story.
>
> So, in my opinion, it is not hardware interrupt latency that
> is producing the large delays we always hear about, but
> rather Operating System issues, such as searching for the
> proper isr, TLB and the caches.
>
> call me if you want to discuss this further
> 056-657-169
> regards to Asher, Mimi, Yeoshua and the rest of the gang at
> excalibur assaf
>
> ----- Original Message -----
> From: “Avi Shmidman”
> To: “NT Developers Interest List”
> Sent: Monday, December 24, 2001 3:05 PM
> Subject: [ntdev] ISR Latency question
>
>
> >
> > During the “latency” between the occurence of an interrupt and the
> time an
> > ISR receives control, is the operating system actually preprocessing
> the
> > interrupt? For instance: if we say that we find an average
> latency of
> 50us
> > until entering the ISR, does that mean that every interrupt
> generated
> > steals 50us of time from the processor (plus the actual time in the
> ISR,
> > plus the post processing)? Or is this simply the time it
> takes for the
> > processor to get around to processing the interrupt (due to higher
> priority
> > interrupts, perhaps)?
> > Having set bpint breakpoints in softice and traced through,
> I am aware
> that
> > there is indeed quite a bit of system preprocessing code which is
> executed
> > upon every interrupt. Nevertheless, the latencies we talk about, in
> the
> > tens of microseconds, seem much larger than necessary to run that
> > preprocessing code.
> >
> > Sincerely,
> > Avi
> >
> >
> > —
> > You are currently subscribed to ntdev as: xxxxx@hotmail.com To
> > unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com
> >
>
>
> —
> You are currently subscribed to ntdev as:
> xxxxx@hollistech.com To unsubscribe send a blank email to
> leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com
>
>

—
You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

Jake_Oshins · December 27, 2001, 2:17am

I have to respectfully disagree with you. Each processor has a separate
IDT, which is the “jump table” that you refer to. And when you call
IoConnectInterrupt, while you get a single interrupt object back, it’s
really just a head of a linked list of interrupt objects, one for each
processor involved.

If you want to see this visually, install Windows XP and hook it up to
the latest WinDbg (or kd.) Type !idt in the command window. This is a
debugger extension that I wrote that will dump all the ISR chains on the
given processor, separating them out by IDT entry. If you want to see
another processor, switch processors in the debugger using ~n, where n
is the number of the target processor and type !idt again. You’ll then
see the chains for that processor, which may be different from the first
one you looked at. (In most cases, they will be the same. But at least
you’ll be able to play with this yourself.)

The example that I gave below is not meerly hypothetical. I’ve
personally debugged several failed machines that were in exactly this
situation.

At the moment, I happen to be looking at changing IoConnectInterrupt for
some future version of NT so that you can’t actually get yourself into
the deadlock that I described below. The problem is that guaranteeing
deadlock avoidance here will mean taking the affinity passed in by the
driver as meerly a suggestion. I’ll make sure that it remains a strong
suggestion. But there is really no way to keep the machine running if
two drivers with different affinities are actually sharing. (You could
do it if you could get every chipset maker to change the definition of
an I/O APIC, which is the part of the interrupt subsystem that collects
interrupts from devices in today’s commodity-market SMP machines, but
that’s a huge, years-long task, one which would be harder to accomplish
than it would be worth.)

Jake

-----Original Message-----

Subject: RE: ISR Latency question
From: “Mark Roddy”
Date: Wed, 26 Dec 2001 19:05:06 -0500
X-Message-Number: 18

I’m with you right up to the part on interrupt chaining. The chaining is
through the interrupt object itself, not through some other data
structure. If the second device in your example reduces the affinity to
processor B it should be the case that all shared interrupts for that
vector are directed only to processor B, and none to processor A.
Processor A and B do not have individual ‘interrupt chains’, they have a
jump table that is connected to one or more interrupt objects, and all
processors are using the same jump table. To set the affinity for a
specific interrupt vector, the interrupt mask for all processors not in
the group are set so that they do not see that interrupt, and that is
all there is to it.

Now as to the question of is it rude to set the affinity on a shared
interrupt, that is a question that can only be answered by ‘it depends’,
and ‘suitability for purpose’.

> -----Original Message-----
> From: xxxxx@lists.osr.com
> [mailto:xxxxx@lists.osr.com] On Behalf Of Jake Oshins
> Sent: Wednesday, December 26, 2001 1:59 AM
> To: NT Developers Interest List
> Subject: [ntdev] RE: ISR Latency question
>
>
> I hesitate to respond to this, since I imagine that it will
> suck me into a never ending discussion. But if anybody
> follows your advice, there will be even more device drivers
> in the world that have little, unexplained, odd behaviors.
> I’ll try to break it down point by point.
>
> 1. If your device is on either a PCI bus or a bus that
> mimics the PCI protocol, then there may be other devices that
> are hard-wired to share interrupts with your device. The PCI
> spec says only that, when your device wants to generate an
> interrupt, it should ground one of the INTx# pins. It’s up
> to the motherboard designer to determine whether the INTx
> pins are wired together or whether they each connect to a
> distinct input on the interrupt controller. Most low-end
> motherboards use the 8259 PIC interrupt controller, which
> makes them interrupt-constrained. In practice, this means
> that your device will share interrupts unless you choose the
> motherboard carefully.
>
> Furthermore, if you connect with “exclusive” chosen, one of
> three things will happen:
>
> A) The motherboard has already guaranteed you exclusivity, so
> the choice is moot.
> B) Another device is sharing, and it has already connected.
> This will result in your device failing to connect its interrupt.
> C) Another device is sharing, and it has not already connect.
> This will result in your device working and the other driver
> will experience a failure to connect an interrupt. (Please
> don’t write any device drivers that actually cause other
> devices to fail to function.)
>
> 2. I’m not sure exactly what you mean by this. But, again,
> because PCI devices may be forced to share interrupts, I’d
> like to write for a moment on the topic of affinity. If your
> device is sharing, then it must share each and every
> processor with the other devices on the chains. Consider
> this example. You’re sharing with a SCSI controller. The
> SCSI controller has already started, connecting its ISR to
> processor A and processor B. The input on the interrupt
> controller will subsequently be unmasked, and directed to
> both processors. Now your device gets IRP_MN_START_DEVICE
> with an affinity mask that includes both processors. You
> decide to change that mask to include only processor B, and
> you connect your interrupt. This will cause the kernel to
> connect your ISR to processor B’s chain, but not to processor
> A’s. Now your device interrupts. It may go to either
> processor. If it goes to processor B, everything works fine,
> since your ISR will run. If the next interrupt goes to
> processor A, then processor A will start running through its
> ISR chain. Since your driver’s ISR is not in that chain,
> this interrupt cannot be dismissed. (I consider it a bug in
> the PCI spec that there is no way to dismiss an interrupt
> without running a driver-supplied ISR. But that’s another
> discussion, particularly because we’ve gotten that fixed in
> PCI 2.3.) When processor A gets to the end of the chain, it
> acks the interrupt, even though no ISR has claimed it. Since
> PCI interrupts are level-triggered, this will cause the
> interrupt to be immediately re-asserted. Most SMP machines
> are built with APIC interrupt controllers that will re-assert
> this interrupt on the same processor that just failed to
> handle it, causing processor A to now go into an endless
> loop, failing to handle the interrupt. Processor A will
> remain at the associated Device IRQL, never dropping down low
> enough to handle DPCs. The machine will continue to run only
> as long as it doesn’t depend on processor A to handle a DPC.
> Dependencies of this sort usually take between 30 and 90
> seconds to crop up. At that point, the machine will appear
> completely hung to the user.
>
> The point I’m tring to make is that you should never do
> anything with your affinity mask other than just pass it through.
>
> 3. This seems harmless, though it seems just as likely to
> reduce the performance of the machine.
>
> 4. If you have that much control over your environment, this
> might be the way to go. But, again, if you do this at high
> IRQL, you’ll eventually deadlock the machine. If you do it
> at low IRQL, you’ll may still have latency problems.
>
> 5. The DIRQL that you connect with is used mainly when you
> call KeSynchronizeExecution, so that the system can take the
> spinlock at the right IRQL. The DIRQL that your ISR is
> called at is determined by the vector that you’re attached
> to. You can’t help that.
>
> The latency between when a device interrupts and when its ISR
> is called is mostly a matter of waiting for other ISRs at
> equal or greater IRQL. Or it’s a matter of waiting for code
> that has explicitly raised IRQL.
>
> - Jake Oshins
> (the guy who maintains interrupt-related stuff in the NT kernel)
>
> -----Original Message-----
> Subject: ISR Latency question
> From: “Assaf Wodeslavsky”
> Date: Tue, 25 Dec 2001 00:21:06 +0200
> X-Message-Number: 1
>
> if you want to reduce latency do this:
> 1. use exclusive vector when connecting
> otherwize, the kernel will loop through all registered
> isr’s (on your shared vector)
> having each one query it’s hardware
> 2. use affinity when connecting if you are smp
> 3. call your isr in the background!!!
> to make sure that the TLB, code cache and data cache are
> not flushing your isr away 4. you might consider running on
> smp and not using interrupts at all
> simply busy loop on one processor waiting for the int# condition
> (osr’s book hints at this)
> 5. you might want to connect as a higher DIRQL then the
> system assigns you, if you feel other devices are bothering you.
>
> the bottom line is to understand, in my oprinion, that
> interrupt latency is not an issue of how long before the cpu
> gets interrupted! that happens very quickly (and is 100%
> deterministic - you should be able to find real numbers from
> the pc manufacturer’s data sheets),
>
> but rather, the time it takes the cpu to translate the IDT
> virtual address into a physical address, and then to access
> variables from RAM, needing to translate each one’s address
> from virtual to physical (requiring 3 RAM accesses).
>
> Also, if using shared irq’s, the actual IDT’s isr has to loop
> through all isr’s on the same irq, each one needing to go
> through the same story.
>
> So, in my opinion, it is not hardware interrupt latency that
> is producing the large delays we always hear about, but
> rather Operating System issues, such as searching for the
> proper isr, TLB and the caches.
>
> call me if you want to discuss this further
> 056-657-169
> regards to Asher, Mimi, Yeoshua and the rest of the gang at
> excalibur assaf
>

—
You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

Mark_Roddy · December 27, 2001, 10:26am

Yup, you’re absolutely correct. I always assumed that this was managed
differently, but instead it is in fact not managed correctly at all :-(.
Yet another reason to leave affinity alone. Thanks for pointing this
out.

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Jake Oshins
Sent: Thursday, December 27, 2001 2:13 AM
To: NT Developers Interest List
Subject: [ntdev] RE: ISR Latency question

I have to respectfully disagree with you. Each processor has
a separate IDT, which is the “jump table” that you refer to.
And when you call IoConnectInterrupt, while you get a single
interrupt object back, it’s really just a head of a linked
list of interrupt objects, one for each processor involved.

If you want to see this visually, install Windows XP and hook
it up to the latest WinDbg (or kd.) Type !idt in the command
window. This is a debugger extension that I wrote that will
dump all the ISR chains on the given processor, separating
them out by IDT entry. If you want to see another processor,
switch processors in the debugger using ~n, where n is the
number of the target processor and type !idt again. You’ll
then see the chains for that processor, which may be
different from the first one you looked at. (In most cases,
they will be the same. But at least you’ll be able to play
with this yourself.)

The example that I gave below is not meerly hypothetical.
I’ve personally debugged several failed machines that were in
exactly this situation.

At the moment, I happen to be looking at changing
IoConnectInterrupt for some future version of NT so that you
can’t actually get yourself into the deadlock that I
described below. The problem is that guaranteeing deadlock
avoidance here will mean taking the affinity passed in by the
driver as meerly a suggestion. I’ll make sure that it
remains a strong suggestion. But there is really no way to
keep the machine running if two drivers with different
affinities are actually sharing. (You could do it if you
could get every chipset maker to change the definition of an
I/O APIC, which is the part of the interrupt subsystem that
collects interrupts from devices in today’s commodity-market
SMP machines, but that’s a huge, years-long task, one which
would be harder to accomplish than it would be worth.)

Jake

-----Original Message-----

Subject: RE: ISR Latency question
From: “Mark Roddy”
> Date: Wed, 26 Dec 2001 19:05:06 -0500
> X-Message-Number: 18
>
> I’m with you right up to the part on interrupt chaining. The
> chaining is through the interrupt object itself, not through
> some other data structure. If the second device in your
> example reduces the affinity to processor B it should be the
> case that all shared interrupts for that vector are directed
> only to processor B, and none to processor A. Processor A and
> B do not have individual ‘interrupt chains’, they have a jump
> table that is connected to one or more interrupt objects, and
> all processors are using the same jump table. To set the
> affinity for a specific interrupt vector, the interrupt mask
> for all processors not in the group are set so that they do
> not see that interrupt, and that is all there is to it.
>
> Now as to the question of is it rude to set the affinity on a
> shared interrupt, that is a question that can only be
> answered by ‘it depends’, and ‘suitability for purpose’.
>
> > -----Original Message-----
> > From: xxxxx@lists.osr.com
> > [mailto:xxxxx@lists.osr.com] On Behalf Of Jake Oshins
> > Sent: Wednesday, December 26, 2001 1:59 AM
> > To: NT Developers Interest List
> > Subject: [ntdev] RE: ISR Latency question
> >
> >
> > I hesitate to respond to this, since I imagine that it will
> > suck me into a never ending discussion. But if anybody
> > follows your advice, there will be even more device drivers
> > in the world that have little, unexplained, odd behaviors.
> > I’ll try to break it down point by point.
> >
> > 1. If your device is on either a PCI bus or a bus that
> > mimics the PCI protocol, then there may be other devices that
> > are hard-wired to share interrupts with your device. The PCI
> > spec says only that, when your device wants to generate an
> > interrupt, it should ground one of the INTx# pins. It’s up
> > to the motherboard designer to determine whether the INTx
> > pins are wired together or whether they each connect to a
> > distinct input on the interrupt controller. Most low-end
> > motherboards use the 8259 PIC interrupt controller, which
> > makes them interrupt-constrained. In practice, this means
> > that your device will share interrupts unless you choose the
> > motherboard carefully.
> >
> > Furthermore, if you connect with “exclusive” chosen, one of
> > three things will happen:
> >
> > A) The motherboard has already guaranteed you exclusivity, so
> > the choice is moot.
> > B) Another device is sharing, and it has already connected.
> > This will result in your device failing to connect its interrupt.
> > C) Another device is sharing, and it has not already connect.
> > This will result in your device working and the other driver
> > will experience a failure to connect an interrupt. (Please
> > don’t write any device drivers that actually cause other
> > devices to fail to function.)
> >
> > 2. I’m not sure exactly what you mean by this. But, again,
> > because PCI devices may be forced to share interrupts, I’d
> > like to write for a moment on the topic of affinity. If your
> > device is sharing, then it must share each and every
> > processor with the other devices on the chains. Consider
> > this example. You’re sharing with a SCSI controller. The
> > SCSI controller has already started, connecting its ISR to
> > processor A and processor B. The input on the interrupt
> > controller will subsequently be unmasked, and directed to
> > both processors. Now your device gets IRP_MN_START_DEVICE
> > with an affinity mask that includes both processors. You
> > decide to change that mask to include only processor B, and
> > you connect your interrupt. This will cause the kernel to
> > connect your ISR to processor B’s chain, but not to processor
> > A’s. Now your device interrupts. It may go to either
> > processor. If it goes to processor B, everything works fine,
> > since your ISR will run. If the next interrupt goes to
> > processor A, then processor A will start running through its
> > ISR chain. Since your driver’s ISR is not in that chain,
> > this interrupt cannot be dismissed. (I consider it a bug in
> > the PCI spec that there is no way to dismiss an interrupt
> > without running a driver-supplied ISR. But that’s another
> > discussion, particularly because we’ve gotten that fixed in
> > PCI 2.3.) When processor A gets to the end of the chain, it
> > acks the interrupt, even though no ISR has claimed it. Since
> > PCI interrupts are level-triggered, this will cause the
> > interrupt to be immediately re-asserted. Most SMP machines
> > are built with APIC interrupt controllers that will re-assert
> > this interrupt on the same processor that just failed to
> > handle it, causing processor A to now go into an endless
> > loop, failing to handle the interrupt. Processor A will
> > remain at the associated Device IRQL, never dropping down low
> > enough to handle DPCs. The machine will continue to run only
> > as long as it doesn’t depend on processor A to handle a DPC.
> > Dependencies of this sort usually take between 30 and 90
> > seconds to crop up. At that point, the machine will appear
> > completely hung to the user.
> >
> > The point I’m tring to make is that you should never do
> > anything with your affinity mask other than just pass it through.
> >
> > 3. This seems harmless, though it seems just as likely to
> > reduce the performance of the machine.
> >
> > 4. If you have that much control over your environment, this
> > might be the way to go. But, again, if you do this at high
> > IRQL, you’ll eventually deadlock the machine. If you do it
> > at low IRQL, you’ll may still have latency problems.
> >
> > 5. The DIRQL that you connect with is used mainly when you
> > call KeSynchronizeExecution, so that the system can take the
> > spinlock at the right IRQL. The DIRQL that your ISR is
> > called at is determined by the vector that you’re attached
> > to. You can’t help that.
> >
> > The latency between when a device interrupts and when its ISR
> > is called is mostly a matter of waiting for other ISRs at
> > equal or greater IRQL. Or it’s a matter of waiting for code
> > that has explicitly raised IRQL.
> >
> > - Jake Oshins
> > (the guy who maintains interrupt-related stuff in the NT kernel)
> >
> > -----Original Message-----
> > Subject: ISR Latency question
> > From: “Assaf Wodeslavsky”
> > Date: Tue, 25 Dec 2001 00:21:06 +0200
> > X-Message-Number: 1
> >
> > if you want to reduce latency do this:
> > 1. use exclusive vector when connecting
> > otherwize, the kernel will loop through all registered
> > isr’s (on your shared vector)
> > having each one query it’s hardware
> > 2. use affinity when connecting if you are smp
> > 3. call your isr in the background!!!
> > to make sure that the TLB, code cache and data cache are
> > not flushing your isr away 4. you might consider running on
> > smp and not using interrupts at all
> > simply busy loop on one processor waiting for the int# condition
> > (osr’s book hints at this)
> > 5. you might want to connect as a higher DIRQL then the
> > system assigns you, if you feel other devices are bothering you.
> >
> > the bottom line is to understand, in my oprinion, that
> > interrupt latency is not an issue of how long before the cpu
> > gets interrupted! that happens very quickly (and is 100%
> > deterministic - you should be able to find real numbers from
> > the pc manufacturer’s data sheets),
> >
> > but rather, the time it takes the cpu to translate the IDT
> > virtual address into a physical address, and then to access
> > variables from RAM, needing to translate each one’s address
> > from virtual to physical (requiring 3 RAM accesses).
> >
> > Also, if using shared irq’s, the actual IDT’s isr has to loop
> > through all isr’s on the same irq, each one needing to go
> > through the same story.
> >
> > So, in my opinion, it is not hardware interrupt latency that
> > is producing the large delays we always hear about, but
> > rather Operating System issues, such as searching for the
> > proper isr, TLB and the caches.
> >
> > call me if you want to discuss this further
> > 056-657-169
> > regards to Asher, Mimi, Yeoshua and the rest of the gang at
> > excalibur assaf
> >
>
> —
> You are currently subscribed to ntdev as: xxxxx@hollistech.com
> To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com
>
>

—
You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

OSR_Community_User · December 27, 2001, 11:12am

For those of you who have SoftICE, the “intobj” command will give you a
similar view.

Alberto.

-----Original Message-----
From: Jake Oshins [mailto:xxxxx@windows.microsoft.com]
Sent: Thursday, December 27, 2001 2:13 AM
To: NT Developers Interest List
Subject: [ntdev] RE: ISR Latency question

If you want to see this visually, install Windows XP and hook it up to
the latest WinDbg (or kd.) Type !idt in the command window. This is a
debugger extension that I wrote that will dump all the ISR chains on the
given processor, separating them out by IDT entry. If you want to see
another processor, switch processors in the debugger using ~n, where n
is the number of the target processor and type !idt again. You’ll then
see the chains for that processor, which may be different from the first
one you looked at. (In most cases, they will be the same. But at least
you’ll be able to play with this yourself.)

You are currently subscribed to ntdev as: xxxxx@compuware.com
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

OSR_Community_User · December 27, 2001, 11:37am

Jake, you mention “dismissing” an interrupt, and a fix in PCI 2.3. Could you
maybe expand a little on what precisely you mean, and what kind of fix this
entails ?

Alberto.

I’m not sure exactly what you mean by this. But, again,
because PCI devices may be forced to share interrupts, I’d
like to write for a moment on the topic of affinity. If your
device is sharing, then it must share each and every
processor with the other devices on the chains. Consider
this example. You’re sharing with a SCSI controller. The
SCSI controller has already started, connecting its ISR to
processor A and processor B. The input on the interrupt
controller will subsequently be unmasked, and directed to
both processors. Now your device gets IRP_MN_START_DEVICE
with an affinity mask that includes both processors. You
decide to change that mask to include only processor B, and
you connect your interrupt. This will cause the kernel to
connect your ISR to processor B’s chain, but not to processor
A’s. Now your device interrupts. It may go to either
processor. If it goes to processor B, everything works fine,
since your ISR will run. If the next interrupt goes to
processor A, then processor A will start running through its
ISR chain. Since your driver’s ISR is not in that chain,
this interrupt cannot be dismissed. (I consider it a bug in
the PCI spec that there is no way to dismiss an interrupt
without running a driver-supplied ISR. But that’s another
discussion, particularly because we’ve gotten that fixed in
PCI 2.3.) When processor A gets to the end of the chain, it
acks the interrupt, even though no ISR has claimed it. Since
PCI interrupts are level-triggered, this will cause the
interrupt to be immediately re-asserted. Most SMP machines
are built with APIC interrupt controllers that will re-assert
this interrupt on the same processor that just failed to
handle it, causing processor A to now go into an endless
loop, failing to handle the interrupt. Processor A will
remain at the associated Device IRQL, never dropping down low
enough to handle DPCs. The machine will continue to run only
as long as it doesn’t depend on processor A to handle a DPC.
Dependencies of this sort usually take between 30 and 90
seconds to crop up. At that point, the machine will appear
completely hung to the user.

You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

OSR_Community_User · December 29, 2001, 4:42pm

On NTE, I had a performance.problems when I didn’t have interrupt affinity, and since the device heavily used devices were SCSI
port based, I couldn’t modify the call to IoConnectInterrupt to set the affinity. Instead, I hacked up the HAL to force all
device interrupts to CPU 0, which was adequate for the first product. In fact, because the devices had queues of completed
operations, forcing all interrupts onto CPU0 decreased the total number of interrupts, which also had positive performance
benefits.

On the next product, I was looking at changing this to distribute interrupt service between CPUs, while ensuring that each
device only interrupts one CPU. From Jake’s comments, it sounds as if improving my HAL hack is the way to go. It sounds as if
I avoid the problems Jake described if I select the CPU based on some hash function of on the vector (so that shared interrupts
has to the same value).

The other thing that would be quite valuable would be to make these devices interrupt at DISPATCH_LEVEL, so that they would not
interrupt DPCs that are currently running. Is it possible to do this on X86 hardware? If so, would changing the HAL for
this be feasible? If the DPC is running, there is not much point in having the ISR run and queue something for the DPC. In my
case, such a change would significantly reduce the total number of interrupts, amortizing the ISR overhead across more
operations.

The device has a timer where you can program it to not interrupt until some time quantum has elapsed, allowing multiple events
to collect and generate one interrupt, and dramatically improving throughput. However, enabling time delay also increases
response time when lightly loaded (to an unacceptable level).

Thanks,
-DH
----- Original Message -----
From: “Jake Oshins”
To: “NT Developers Interest List”
Sent: Thursday, December 27, 2001 2:12 AM
Subject: [ntdev] RE: ISR Latency question

> I have to respectfully disagree with you. Each processor has a separate
> IDT, which is the “jump table” that you refer to. And when you call
> IoConnectInterrupt, while you get a single interrupt object back, it’s
> really just a head of a linked list of interrupt objects, one for each
> processor involved.
>
> If you want to see this visually, install Windows XP and hook it up to
> the latest WinDbg (or kd.) Type !idt in the command window. This is a
> debugger extension that I wrote that will dump all the ISR chains on the
> given processor, separating them out by IDT entry. If you want to see
> another processor, switch processors in the debugger using ~n, where n
> is the number of the target processor and type !idt again. You’ll then
> see the chains for that processor, which may be different from the first
> one you looked at. (In most cases, they will be the same. But at least
> you’ll be able to play with this yourself.)
>
> The example that I gave below is not meerly hypothetical. I’ve
> personally debugged several failed machines that were in exactly this
> situation.
>
> At the moment, I happen to be looking at changing IoConnectInterrupt for
> some future version of NT so that you can’t actually get yourself into
> the deadlock that I described below. The problem is that guaranteeing
> deadlock avoidance here will mean taking the affinity passed in by the
> driver as meerly a suggestion. I’ll make sure that it remains a strong
> suggestion. But there is really no way to keep the machine running if
> two drivers with different affinities are actually sharing. (You could
> do it if you could get every chipset maker to change the definition of
> an I/O APIC, which is the part of the interrupt subsystem that collects
> interrupts from devices in today’s commodity-market SMP machines, but
> that’s a huge, years-long task, one which would be harder to accomplish
> than it would be worth.)
>
> - Jake
>
> -----Original Message-----
>
> Subject: RE: ISR Latency question
> From: “Mark Roddy”
> Date: Wed, 26 Dec 2001 19:05:06 -0500
> X-Message-Number: 18
>
> I’m with you right up to the part on interrupt chaining. The chaining is
> through the interrupt object itself, not through some other data
> structure. If the second device in your example reduces the affinity to
> processor B it should be the case that all shared interrupts for that
> vector are directed only to processor B, and none to processor A.
> Processor A and B do not have individual ‘interrupt chains’, they have a
> jump table that is connected to one or more interrupt objects, and all
> processors are using the same jump table. To set the affinity for a
> specific interrupt vector, the interrupt mask for all processors not in
> the group are set so that they do not see that interrupt, and that is
> all there is to it.
>
> Now as to the question of is it rude to set the affinity on a shared
> interrupt, that is a question that can only be answered by ‘it depends’,
> and ‘suitability for purpose’.
>
> > -----Original Message-----
> > From: xxxxx@lists.osr.com
> > [mailto:xxxxx@lists.osr.com] On Behalf Of Jake Oshins
> > Sent: Wednesday, December 26, 2001 1:59 AM
> > To: NT Developers Interest List
> > Subject: [ntdev] RE: ISR Latency question
> >
> >
> > I hesitate to respond to this, since I imagine that it will
> > suck me into a never ending discussion. But if anybody
> > follows your advice, there will be even more device drivers
> > in the world that have little, unexplained, odd behaviors.
> > I’ll try to break it down point by point.
> >
> > 1. If your device is on either a PCI bus or a bus that
> > mimics the PCI protocol, then there may be other devices that
> > are hard-wired to share interrupts with your device. The PCI
> > spec says only that, when your device wants to generate an
> > interrupt, it should ground one of the INTx# pins. It’s up
> > to the motherboard designer to determine whether the INTx
> > pins are wired together or whether they each connect to a
> > distinct input on the interrupt controller. Most low-end
> > motherboards use the 8259 PIC interrupt controller, which
> > makes them interrupt-constrained. In practice, this means
> > that your device will share interrupts unless you choose the
> > motherboard carefully.
> >
> > Furthermore, if you connect with “exclusive” chosen, one of
> > three things will happen:
> >
> > A) The motherboard has already guaranteed you exclusivity, so
> > the choice is moot.
> > B) Another device is sharing, and it has already connected.
> > This will result in your device failing to connect its interrupt.
> > C) Another device is sharing, and it has not already connect.
> > This will result in your device working and the other driver
> > will experience a failure to connect an interrupt. (Please
> > don’t write any device drivers that actually cause other
> > devices to fail to function.)
> >
> > 2. I’m not sure exactly what you mean by this. But, again,
> > because PCI devices may be forced to share interrupts, I’d
> > like to write for a moment on the topic of affinity. If your
> > device is sharing, then it must share each and every
> > processor with the other devices on the chains. Consider
> > this example. You’re sharing with a SCSI controller. The
> > SCSI controller has already started, connecting its ISR to
> > processor A and processor B. The input on the interrupt
> > controller will subsequently be unmasked, and directed to
> > both processors. Now your device gets IRP_MN_START_DEVICE
> > with an affinity mask that includes both processors. You
> > decide to change that mask to include only processor B, and
> > you connect your interrupt. This will cause the kernel to
> > connect your ISR to processor B’s chain, but not to processor
> > A’s. Now your device interrupts. It may go to either
> > processor. If it goes to processor B, everything works fine,
> > since your ISR will run. If the next interrupt goes to
> > processor A, then processor A will start running through its
> > ISR chain. Since your driver’s ISR is not in that chain,
> > this interrupt cannot be dismissed. (I consider it a bug in
> > the PCI spec that there is no way to dismiss an interrupt
> > without running a driver-supplied ISR. But that’s another
> > discussion, particularly because we’ve gotten that fixed in
> > PCI 2.3.) When processor A gets to the end of the chain, it
> > acks the interrupt, even though no ISR has claimed it. Since
> > PCI interrupts are level-triggered, this will cause the
> > interrupt to be immediately re-asserted. Most SMP machines
> > are built with APIC interrupt controllers that will re-assert
> > this interrupt on the same processor that just failed to
> > handle it, causing processor A to now go into an endless
> > loop, failing to handle the interrupt. Processor A will
> > remain at the associated Device IRQL, never dropping down low
> > enough to handle DPCs. The machine will continue to run only
> > as long as it doesn’t depend on processor A to handle a DPC.
> > Dependencies of this sort usually take between 30 and 90
> > seconds to crop up. At that point, the machine will appear
> > completely hung to the user.
> >
> > The point I’m tring to make is that you should never do
> > anything with your affinity mask other than just pass it through.
> >
> > 3. This seems harmless, though it seems just as likely to
> > reduce the performance of the machine.
> >
> > 4. If you have that much control over your environment, this
> > might be the way to go. But, again, if you do this at high
> > IRQL, you’ll eventually deadlock the machine. If you do it
> > at low IRQL, you’ll may still have latency problems.
> >
> > 5. The DIRQL that you connect with is used mainly when you
> > call KeSynchronizeExecution, so that the system can take the
> > spinlock at the right IRQL. The DIRQL that your ISR is
> > called at is determined by the vector that you’re attached
> > to. You can’t help that.
> >
> > The latency between when a device interrupts and when its ISR
> > is called is mostly a matter of waiting for other ISRs at
> > equal or greater IRQL. Or it’s a matter of waiting for code
> > that has explicitly raised IRQL.
> >
> > - Jake Oshins
> > (the guy who maintains interrupt-related stuff in the NT kernel)
> >
> > -----Original Message-----
> > Subject: ISR Latency question
> > From: “Assaf Wodeslavsky”
> > Date: Tue, 25 Dec 2001 00:21:06 +0200
> > X-Message-Number: 1
> >
> > if you want to reduce latency do this:
> > 1. use exclusive vector when connecting
> > otherwize, the kernel will loop through all registered
> > isr’s (on your shared vector)
> > having each one query it’s hardware
> > 2. use affinity when connecting if you are smp
> > 3. call your isr in the background!!!
> > to make sure that the TLB, code cache and data cache are
> > not flushing your isr away 4. you might consider running on
> > smp and not using interrupts at all
> > simply busy loop on one processor waiting for the int# condition
> > (osr’s book hints at this)
> > 5. you might want to connect as a higher DIRQL then the
> > system assigns you, if you feel other devices are bothering you.
> >
> > the bottom line is to understand, in my oprinion, that
> > interrupt latency is not an issue of how long before the cpu
> > gets interrupted! that happens very quickly (and is 100%
> > deterministic - you should be able to find real numbers from
> > the pc manufacturer’s data sheets),
> >
> > but rather, the time it takes the cpu to translate the IDT
> > virtual address into a physical address, and then to access
> > variables from RAM, needing to translate each one’s address
> > from virtual to physical (requiring 3 RAM accesses).
> >
> > Also, if using shared irq’s, the actual IDT’s isr has to loop
> > through all isr’s on the same irq, each one needing to go
> > through the same story.
> >
> > So, in my opinion, it is not hardware interrupt latency that
> > is producing the large delays we always hear about, but
> > rather Operating System issues, such as searching for the
> > proper isr, TLB and the caches.
> >
> > call me if you want to discuss this further
> > 056-657-169
> > regards to Asher, Mimi, Yeoshua and the rest of the gang at
> > excalibur assaf
> >
>
> —
> You are currently subscribed to ntdev as: xxxxx@syssoftsol.com
> To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

—
You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

OSR_Community_User · December 29, 2001, 9:09pm

> The device has a timer where you can program it to not interrupt until some time quantum
has elapsed, allowing multiple events

to collect and generate one interrupt, and dramatically improving throughput. However,
enabling time delay also increases
response time when lightly loaded (to an unacceptable level).

Some network cards switch to polling under heavy load, disabling the ISR at all.

Max

You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

Jake_Oshins · December 30, 2001, 9:23am

The only way to stop a PCI device from interrupting is to call an ISR
supplied by the proper driver. This works, as long as the driver is
loaded, or as long as the device isn’t sharing interrupts.

But consider this situation. Your machine’s BIOS played a sound during
boot. It left the sound chip in an interrupting state.

Now the OS boots, enabling the SCSI adapter very early during boot. In
fact, the SCSI adapter has to be enabled in order to load the driver for
the sound chip. But the motherboard has these two devices wired
together. (Or the OS chose to configure them to the same IRQ, for
various reasons discussed on another thread.) Since the SCSI driver is
loaded and running, the IRQ will be unmasked. The machine will hang
because there is no driver loaded to handle the IRQs coming from the
sound chip.

To add a little more detail to this, consider that a level-triggered
interrupt will be re-asserted by the interrupt controller after it has
been acknowledged (or, using the inexact terminology that I employed
below, “dimissed.”) This means that, even if the OS acks the interrupt,
the interrupt won’t go away. It will just be delivered again, causing
the OS to raise IRQL right back to device-level and re-run the ISR
chain. This repeats forever, until the interrupt is masked. But since
the SCSI controller needs the interrupt to be unmasked for the driver to
function, masking it will just cause the machine to fail to boot.

If you’re a member of the PCI spec committee, then you can read the
proposals for PCI 2.3. If you’re not, then I can’t disclose them here.
I’m sorry. That’s the legal reality that I have to live with.

Jake

-----Original Message-----

Subject: RE: ISR Latency question
From: “Moreira, Alberto”
Date: Thu, 27 Dec 2001 11:34:23 -0500
X-Message-Number: 25

Jake, you mention “dismissing” an interrupt, and a fix in PCI 2.3. Could
you
maybe expand a little on what precisely you mean, and what kind of fix
this
entails ?

Alberto.

> 2. I’m not sure exactly what you mean by this. But, again,
> because PCI devices may be forced to share interrupts, I’d
> like to write for a moment on the topic of affinity. If your
> device is sharing, then it must share each and every
> processor with the other devices on the chains. Consider
> this example. You’re sharing with a SCSI controller. The
> SCSI controller has already started, connecting its ISR to
> processor A and processor B. The input on the interrupt
> controller will subsequently be unmasked, and directed to
> both processors. Now your device gets IRP_MN_START_DEVICE
> with an affinity mask that includes both processors. You
> decide to change that mask to include only processor B, and
> you connect your interrupt. This will cause the kernel to
> connect your ISR to processor B’s chain, but not to processor
> A’s. Now your device interrupts. It may go to either
> processor. If it goes to processor B, everything works fine,
> since your ISR will run. If the next interrupt goes to
> processor A, then processor A will start running through its
> ISR chain. Since your driver’s ISR is not in that chain,
> this interrupt cannot be dismissed. (I consider it a bug in
> the PCI spec that there is no way to dismiss an interrupt
> without running a driver-supplied ISR. But that’s another
> discussion, particularly because we’ve gotten that fixed in
> PCI 2.3.) When processor A gets to the end of the chain, it
> acks the interrupt, even though no ISR has claimed it. Since
> PCI interrupts are level-triggered, this will cause the
> interrupt to be immediately re-asserted. Most SMP machines
> are built with APIC interrupt controllers that will re-assert
> this interrupt on the same processor that just failed to
> handle it, causing processor A to now go into an endless
> loop, failing to handle the interrupt. Processor A will
> remain at the associated Device IRQL, never dropping down low
> enough to handle DPCs. The machine will continue to run only
> as long as it doesn’t depend on processor A to handle a DPC.
> Dependencies of this sort usually take between 30 and 90
> seconds to crop up. At that point, the machine will appear
> completely hung to the user.

—
You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

OSR_Community_User · January 2, 2002, 10:55am

The hardware doesn’t know about IRQLs, this is an OS invention. As far as
the hw goes, your interrupt is either masked or disabled, or it is not. But
you can probably implement what you want, we used to call it “interrupt
tabling” in the old Sperry 418 series. Your interrupt vector points to your
own interrupt handler, which enqueues the interrupt, does a minimum level of
hardware handshake, and goes away. In parallel, you have your own thread
dequeueing those interrupts and handling them appropriately. The 418 III did
it in hardware, mind you, and the Sperry DCP had hardware queues, interrupts
happened in the peripheral processors and there was no such a thing as a
peripheral interrupting the CPU. Alas, they don’t make them like that
anymore ! But then, isn’t that what a DPC is supposed to achieve ? You get
an interrupt, you use your ISR to enqueue it and to clear your hardware, and
schedule a DPC to handle the queue later on. If you (heresy!) run your
interrupt on a trap gate instead of on an interrupt gate, you are guaranteed
that nobody’s going to interrupt you, although of course you may still see
your interrupt routine running concurrently on more than one processor.

Unless of course your peripheral can wait until the DPC is scheduled to
resume interrupting. You can try waiting to re-enabling interrupts until
you’re well inside your DPC, and see if you can still handle things within
an acceptable response time. I personally still believe the best way to
handle this is to have a queue between peripheral and processor: the
peripheral handles the hardware device, the queue buffers stuff between
peripheral and processor, and a processor interrupt is a signal to service
the queue, not to service the device.

Alberto.

-----Original Message-----
From: Dave Harvey [mailto:xxxxx@syssoftsol.com]
Sent: Saturday, December 29, 2001 4:39 PM
To: NT Developers Interest List
Subject: [ntdev] RE: ISR Latency question

The other thing that would be quite valuable would be to make these devices
interrupt at DISPATCH_LEVEL, so that they would not
interrupt DPCs that are currently running. Is it possible to do this on
X86 hardware? If so, would changing the HAL for
this be feasible? If the DPC is running, there is not much point in having
the ISR run and queue something for the DPC. In my
case, such a change would significantly reduce the total number of
interrupts, amortizing the ISR overhead across more
operations.

The device has a timer where you can program it to not interrupt until some
time quantum has elapsed, allowing multiple events
to collect and generate one interrupt, and dramatically improving
throughput. However, enabling time delay also increases
response time when lightly loaded (to an unacceptable level).

You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com