On NTE, I had a performance.problems when I didn’t have interrupt affinity, and since the device heavily used devices were SCSI
port based, I couldn’t modify the call to IoConnectInterrupt to set the affinity. Instead, I hacked up the HAL to force all
device interrupts to CPU 0, which was adequate for the first product. In fact, because the devices had queues of completed
operations, forcing all interrupts onto CPU0 decreased the total number of interrupts, which also had positive performance
benefits.
On the next product, I was looking at changing this to distribute interrupt service between CPUs, while ensuring that each
device only interrupts one CPU. From Jake’s comments, it sounds as if improving my HAL hack is the way to go. It sounds as if
I avoid the problems Jake described if I select the CPU based on some hash function of on the vector (so that shared interrupts
has to the same value).
The other thing that would be quite valuable would be to make these devices interrupt at DISPATCH_LEVEL, so that they would not
interrupt DPCs that are currently running. Is it possible to do this on X86 hardware? If so, would changing the HAL for
this be feasible? If the DPC is running, there is not much point in having the ISR run and queue something for the DPC. In my
case, such a change would significantly reduce the total number of interrupts, amortizing the ISR overhead across more
operations.
The device has a timer where you can program it to not interrupt until some time quantum has elapsed, allowing multiple events
to collect and generate one interrupt, and dramatically improving throughput. However, enabling time delay also increases
response time when lightly loaded (to an unacceptable level).
Thanks,
-DH
----- Original Message -----
From: “Jake Oshins”
To: “NT Developers Interest List”
Sent: Thursday, December 27, 2001 2:12 AM
Subject: [ntdev] RE: ISR Latency question
> I have to respectfully disagree with you. Each processor has a separate
> IDT, which is the “jump table” that you refer to. And when you call
> IoConnectInterrupt, while you get a single interrupt object back, it’s
> really just a head of a linked list of interrupt objects, one for each
> processor involved.
>
> If you want to see this visually, install Windows XP and hook it up to
> the latest WinDbg (or kd.) Type !idt in the command window. This is a
> debugger extension that I wrote that will dump all the ISR chains on the
> given processor, separating them out by IDT entry. If you want to see
> another processor, switch processors in the debugger using ~n, where n
> is the number of the target processor and type !idt again. You’ll then
> see the chains for that processor, which may be different from the first
> one you looked at. (In most cases, they will be the same. But at least
> you’ll be able to play with this yourself.)
>
> The example that I gave below is not meerly hypothetical. I’ve
> personally debugged several failed machines that were in exactly this
> situation.
>
> At the moment, I happen to be looking at changing IoConnectInterrupt for
> some future version of NT so that you can’t actually get yourself into
> the deadlock that I described below. The problem is that guaranteeing
> deadlock avoidance here will mean taking the affinity passed in by the
> driver as meerly a suggestion. I’ll make sure that it remains a strong
> suggestion. But there is really no way to keep the machine running if
> two drivers with different affinities are actually sharing. (You could
> do it if you could get every chipset maker to change the definition of
> an I/O APIC, which is the part of the interrupt subsystem that collects
> interrupts from devices in today’s commodity-market SMP machines, but
> that’s a huge, years-long task, one which would be harder to accomplish
> than it would be worth.)
>
> - Jake
>
> -----Original Message-----
>
> Subject: RE: ISR Latency question
> From: “Mark Roddy”
> Date: Wed, 26 Dec 2001 19:05:06 -0500
> X-Message-Number: 18
>
> I’m with you right up to the part on interrupt chaining. The chaining is
> through the interrupt object itself, not through some other data
> structure. If the second device in your example reduces the affinity to
> processor B it should be the case that all shared interrupts for that
> vector are directed only to processor B, and none to processor A.
> Processor A and B do not have individual ‘interrupt chains’, they have a
> jump table that is connected to one or more interrupt objects, and all
> processors are using the same jump table. To set the affinity for a
> specific interrupt vector, the interrupt mask for all processors not in
> the group are set so that they do not see that interrupt, and that is
> all there is to it.
>
> Now as to the question of is it rude to set the affinity on a shared
> interrupt, that is a question that can only be answered by ‘it depends’,
> and ‘suitability for purpose’.
>
> > -----Original Message-----
> > From: xxxxx@lists.osr.com
> > [mailto:xxxxx@lists.osr.com] On Behalf Of Jake Oshins
> > Sent: Wednesday, December 26, 2001 1:59 AM
> > To: NT Developers Interest List
> > Subject: [ntdev] RE: ISR Latency question
> >
> >
> > I hesitate to respond to this, since I imagine that it will
> > suck me into a never ending discussion. But if anybody
> > follows your advice, there will be even more device drivers
> > in the world that have little, unexplained, odd behaviors.
> > I’ll try to break it down point by point.
> >
> > 1. If your device is on either a PCI bus or a bus that
> > mimics the PCI protocol, then there may be other devices that
> > are hard-wired to share interrupts with your device. The PCI
> > spec says only that, when your device wants to generate an
> > interrupt, it should ground one of the INTx# pins. It’s up
> > to the motherboard designer to determine whether the INTx
> > pins are wired together or whether they each connect to a
> > distinct input on the interrupt controller. Most low-end
> > motherboards use the 8259 PIC interrupt controller, which
> > makes them interrupt-constrained. In practice, this means
> > that your device will share interrupts unless you choose the
> > motherboard carefully.
> >
> > Furthermore, if you connect with “exclusive” chosen, one of
> > three things will happen:
> >
> > A) The motherboard has already guaranteed you exclusivity, so
> > the choice is moot.
> > B) Another device is sharing, and it has already connected.
> > This will result in your device failing to connect its interrupt.
> > C) Another device is sharing, and it has not already connect.
> > This will result in your device working and the other driver
> > will experience a failure to connect an interrupt. (Please
> > don’t write any device drivers that actually cause other
> > devices to fail to function.)
> >
> > 2. I’m not sure exactly what you mean by this. But, again,
> > because PCI devices may be forced to share interrupts, I’d
> > like to write for a moment on the topic of affinity. If your
> > device is sharing, then it must share each and every
> > processor with the other devices on the chains. Consider
> > this example. You’re sharing with a SCSI controller. The
> > SCSI controller has already started, connecting its ISR to
> > processor A and processor B. The input on the interrupt
> > controller will subsequently be unmasked, and directed to
> > both processors. Now your device gets IRP_MN_START_DEVICE
> > with an affinity mask that includes both processors. You
> > decide to change that mask to include only processor B, and
> > you connect your interrupt. This will cause the kernel to
> > connect your ISR to processor B’s chain, but not to processor
> > A’s. Now your device interrupts. It may go to either
> > processor. If it goes to processor B, everything works fine,
> > since your ISR will run. If the next interrupt goes to
> > processor A, then processor A will start running through its
> > ISR chain. Since your driver’s ISR is not in that chain,
> > this interrupt cannot be dismissed. (I consider it a bug in
> > the PCI spec that there is no way to dismiss an interrupt
> > without running a driver-supplied ISR. But that’s another
> > discussion, particularly because we’ve gotten that fixed in
> > PCI 2.3.) When processor A gets to the end of the chain, it
> > acks the interrupt, even though no ISR has claimed it. Since
> > PCI interrupts are level-triggered, this will cause the
> > interrupt to be immediately re-asserted. Most SMP machines
> > are built with APIC interrupt controllers that will re-assert
> > this interrupt on the same processor that just failed to
> > handle it, causing processor A to now go into an endless
> > loop, failing to handle the interrupt. Processor A will
> > remain at the associated Device IRQL, never dropping down low
> > enough to handle DPCs. The machine will continue to run only
> > as long as it doesn’t depend on processor A to handle a DPC.
> > Dependencies of this sort usually take between 30 and 90
> > seconds to crop up. At that point, the machine will appear
> > completely hung to the user.
> >
> > The point I’m tring to make is that you should never do
> > anything with your affinity mask other than just pass it through.
> >
> > 3. This seems harmless, though it seems just as likely to
> > reduce the performance of the machine.
> >
> > 4. If you have that much control over your environment, this
> > might be the way to go. But, again, if you do this at high
> > IRQL, you’ll eventually deadlock the machine. If you do it
> > at low IRQL, you’ll may still have latency problems.
> >
> > 5. The DIRQL that you connect with is used mainly when you
> > call KeSynchronizeExecution, so that the system can take the
> > spinlock at the right IRQL. The DIRQL that your ISR is
> > called at is determined by the vector that you’re attached
> > to. You can’t help that.
> >
> > The latency between when a device interrupts and when its ISR
> > is called is mostly a matter of waiting for other ISRs at
> > equal or greater IRQL. Or it’s a matter of waiting for code
> > that has explicitly raised IRQL.
> >
> > - Jake Oshins
> > (the guy who maintains interrupt-related stuff in the NT kernel)
> >
> > -----Original Message-----
> > Subject: ISR Latency question
> > From: “Assaf Wodeslavsky”
> > Date: Tue, 25 Dec 2001 00:21:06 +0200
> > X-Message-Number: 1
> >
> > if you want to reduce latency do this:
> > 1. use exclusive vector when connecting
> > otherwize, the kernel will loop through all registered
> > isr’s (on your shared vector)
> > having each one query it’s hardware
> > 2. use affinity when connecting if you are smp
> > 3. call your isr in the background!!!
> > to make sure that the TLB, code cache and data cache are
> > not flushing your isr away 4. you might consider running on
> > smp and not using interrupts at all
> > simply busy loop on one processor waiting for the int# condition
> > (osr’s book hints at this)
> > 5. you might want to connect as a higher DIRQL then the
> > system assigns you, if you feel other devices are bothering you.
> >
> > the bottom line is to understand, in my oprinion, that
> > interrupt latency is not an issue of how long before the cpu
> > gets interrupted! that happens very quickly (and is 100%
> > deterministic - you should be able to find real numbers from
> > the pc manufacturer’s data sheets),
> >
> > but rather, the time it takes the cpu to translate the IDT
> > virtual address into a physical address, and then to access
> > variables from RAM, needing to translate each one’s address
> > from virtual to physical (requiring 3 RAM accesses).
> >
> > Also, if using shared irq’s, the actual IDT’s isr has to loop
> > through all isr’s on the same irq, each one needing to go
> > through the same story.
> >
> > So, in my opinion, it is not hardware interrupt latency that
> > is producing the large delays we always hear about, but
> > rather Operating System issues, such as searching for the
> > proper isr, TLB and the caches.
> >
> > call me if you want to discuss this further
> > 056-657-169
> > regards to Asher, Mimi, Yeoshua and the rest of the gang at
> > excalibur assaf
> >
>
> —
> You are currently subscribed to ntdev as: xxxxx@syssoftsol.com
> To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com
—
You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com