irqls

> b. The original Linux kernel was influenced by Win95 and some PC Unixes,

By some other old UNIXen I think.

They also have spin_lock_irqsave() or such.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

>the extent Windows does , it still prioritizes software interrupts to one another. However, it does not

write to TPR -instead, it implements the whole thing in software.

I think in Windows this is a minor issue of HAL innerworkings.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Maxim S. Shatskih wrote:

> b. The original Linux kernel was influenced by Win95 and some PC Unixes,

By some other old UNIXen I think.

sorry I shouldn’t write “kernel”, I meant rather the specific
area of handling the interrupt controller, IRQs vs “bottom halves”,
given the PC arch of pre-pentium, pre APIC times (the multiprocessor
standard arrived later, no reentrancy in kernel …)
But PC and hardware market was already dominated and formed by Wintel.

They also have spin_lock_irqsave() or such.

yep. similar to these “special” spinlocks at HIGH_LEVEL that Jake O.
wrote about?

– pa

> I meant rather the specific area of handling the interrupt controller, IRQs vs “bottom halves”,

given the PC arch of pre-pentium, pre APIC times (the multiprocessor standard arrived later,
no reentrancy in kernel …)

You seem to confuse reentrancy with MP issues. Please note that reentrancy is NOT an issue that is specific only to MP kernel - this issue exists on UP one as well. Just consider the scenario when interrupt occurs while non -DPC code accesses shared resource and ISR queues DPC that wants to access the same resource.

Therefore, reentrancy is still there - even on UP kernel. The only difference is that you can solve this problem on UP machine simply by raising IRQL (or just disabling interrupts) without any need to spin on a spinlock, while MP one requires a " full-fledged" spinlock. Unless the target machine is equipped with APIC, the only way you can raise IRQL is to implement the whole thing in a software, while APIC one allows you simply to write to local APIC’s TPR…

Anton Bassov

>> They also have spin_lock_irqsave() or such.

yep. similar to these “special” spinlocks at HIGH_LEVEL that Jake O.

No, similar to KeAcquireInterruptSpinLock.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

>>> They also have spin_lock_irqsave() or such.

> yep. similar to these “special” spinlocks at HIGH_LEVEL that Jake O.

No, similar to KeAcquireInterruptSpinLock.

Actually, Pavel is absolutely correct - spin_lock_irqsave() and spin_lock_irq() disable all interrupts on a given CPU; spin_lock_bh() disables software interrupts but hardware ones are still enabled; and spin_lock() does not disable any interrupts. In addition to above mentioned “conventional” spinlocks, Linux kernel provides reader/writer forms of spinlock as well. There are no equivalents to KeAcquireInterruptSpinLock() under Linux, because it does not support the concept of IRQL to the extent Windows does. Furthermore, it does not really support interrupt spinlocks in Windows programmer’s understanding - instead, it allows spinlock acquisition right in ISR…

Anton Bassov

I was reading the white papers mentioned above from Jake.

It says: “The thread scheduler considers only thread priority, and not IRQL, when preempting a thread. If a thread running at IRQL=APC_LEVEL blocks, the scheduler might select a new thread for the processor that was previously running at PASSIVE_LEVEL.”

I’m kind of confused here. Say a thread T1 is running at PASSIVE_LEVEL and an APC_LEVEL interrupt occurs, so the thread will run at APC_LEVEL. And it will run maximum up to the allotted quantum by the scheduler. Now what if a higher priority thread T2 is ready to run before the quantum expires. As per white paper T2 will run at APC_LEVEL. right? But what if was supposed to run a code at a lower IRQL? How that will be taken care of and who will do that?

>expires. As per white paper T2 will run at APC_LEVEL. right?

No. APC_LEVEL is per-thread.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Nice discussion on this topic.
Can you guys suggest links/books/places where I can read more about
interrupts and/or the concept of irql, and how is this implemented
hardware/software.

With respect,
Gabriel Bercea

GaMiTech Software Development
Mobile contact: (+40)0740049634
eMail: xxxxx@gmail.com


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

The quote says what it does - there is no relationship between IRQL and thread priority whatsoever. APC_LEVEL simply means that APC delivery to currently running thread is disabled…

Anton Bassov

xxxxx@yahoo.com wrote:

I was reading the white papers mentioned above from Jake.

It says: “The thread scheduler considers only thread priority, and not IRQL, when preempting a thread. If a thread running at IRQL=APC_LEVEL blocks, the scheduler might select a new thread for the processor that was previously running at PASSIVE_LEVEL.”

I’m kind of confused here. Say a thread T1 is running at PASSIVE_LEVEL and an APC_LEVEL interrupt occurs, so the thread will run at APC_LEVEL. And it will run maximum up to the allotted quantum by the scheduler. Now what if a higher priority thread T2 is ready to run before the quantum expires. As per white paper T2 will run at APC_LEVEL. right? But what if was supposed to run a code at a lower IRQL? How that will be taken care of and who will do that?

APC level is special: it is the only “software interrupt” level below
DISPATCH (DISPATCH=2, APC=1). Thread scheduling occurs at DISPATCH level
(and this is why this level is called “dispatch”).
Therefore, a thread at APC level can be preempted, and the new active
thread in it’s turn can be at any IRQL < DISPATCH (APC or passive).

regards,
–pa

>

APC level is special: it is the only “software interrupt” level below
DISPATCH (DISPATCH=2, APC=1). Thread scheduling occurs at DISPATCH
level
(and this is why this level is called “dispatch”).
Therefore, a thread at APC level can be preempted, and the new active
thread in it’s turn can be at any IRQL < DISPATCH (APC or passive).

A vaguely related question - I’m looking at a BSoD screen right now
which is “0xD1, (x, 0xD0000002, y, z)”. The second number is supposed to
be the IRQL, which appears to be DISPATCH_LEVEL + 0xD0000000. Is there
any significance of the 0xD0000000?

James

>which is “0xD1, (x, 0xD0000002, y, z)”. The second number is supposed to

be the IRQL, which appears to be DISPATCH_LEVEL + 0xD0000000. Is there
any significance of the 0xD0000000?

Maybe something added by Verifier?


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

> APC level is special: it is the only “software interrupt” level below DISPATCH (DISPATCH=2, APC=1).

In fact, I just wonder why they chose to use software interrupt in order to deal with APCs, in the first place. More on it below…

Therefore, a thread at APC level can be preempted, and the new active thread in it’s turn can
be at any IRQL < DISPATCH (APC or passive).

As long as you implement the concept of software interrupts completely in a software there is no problem here whatsoever. However, if you use ICR to request interrupts and TPR to mask them off, making things work on per-thread basis becomes rather problematic. Apparently, they just implement APC software interrupt completely in a software, without relying upon built-in CPU features…

Anton Bassov

Anton, you frequently get hung up on how things are implemented. I applaud
your drive and ability to read the disassembled functions. That’s useful
and educational. But it doesn’t tell you the whole story. I know you won’t
listen to me, but I’ll answer for the rest who might be reading.

The contract on APCs is that a thread can be interrupted, before its quantum
expires, by an APC interrupt sent by another processor or by the current
processor while servicing a higher-priority interrupt. Whether this is
implemented in hardware or software doesn’t matter at all. The contract is
what’s important. (With that said, the contract on APCs has shifted a
little over the years, but mostly in ways that have no effect on drivers.)

The APC exists so that a running thread can be hijacked by another piece of
code which will be run within the context of the thread. The most common
uses of this are I/O completion and thread suspension from a debugger.

The current IRQL of a thread becomes part of the threads running state.
When the thread is pre-empted, the dispatcher (scheduler) will change the
IRQL of the processor to that of the newly scheduled thread. When the first
thread is scheduled again, IRQL will be restored.


Jake Oshins
Hyper-V I/O Architect
Windows Kernel Team

This post implies no warranties and confers no rights.


wrote in message news:xxxxx@ntdev…
>> APC level is special: it is the only “software interrupt” level below
>> DISPATCH (DISPATCH=2, APC=1).
>
> In fact, I just wonder why they chose to use software interrupt in order
> to deal with APCs, in the first place. More on it below…
>
>> Therefore, a thread at APC level can be preempted, and the new active
>> thread in it’s turn can
>> be at any IRQL < DISPATCH (APC or passive).
>
> As long as you implement the concept of software interrupts completely in
> a software there is no problem here whatsoever. However, if you use ICR to
> request interrupts and TPR to mask them off, making things work on
> per-thread basis becomes rather problematic. Apparently, they just
> implement APC software interrupt completely in a software, without relying
> upon built-in CPU features…
>
> Anton Bassov
>

Jake,

Anton, you frequently get hung up on how things are implemented.
I applaud your drive and ability to read the disassembled functions. That’s useful and educational.

Well, whenever I encounter this or that feature, the very first thing I try to do is to understand the domain of a problem, including “all these small nasty details”, and then ask myself “how would I
implement it”. The only reason why I do disasm is just to ensure that I understand the domain of a problem properly…

On this particular occasion I did not even have to disassemble anything - it is plainly obvious that relying upon ICR and TPR in handling APCs is rather problematic . More on it below

The contract on APCs is that a thread can be interrupted, before its quantum expires, by an APC
interrupt sent by another processor or by the current processor while servicing a higher-priority
interrupt. Whether this is implemented in hardware or software doesn’t matter at all.
The contract is what’s important.

Well, although the most important part is, of course, the contract, I would not say “Whether this is implemented in hardware or software doesn’t matter at all” - it does matter, and you will see it shortly.

The current IRQL of a thread becomes part of the threads running state. When the thread
is pre-empted, the dispatcher (scheduler) will change the IRQL of the processor to that of the
newly scheduled thread. When the first thread is scheduled again, IRQL will be restored.

Ironically, this is exactly my point. If you rely upon ICR and TPR in handling APCs, consider what happens in the situation when thread X runs at APC level; APC gets queued to it by another processor ; and then it gets preempted by the thread Y that runs at PASSIVE_LEVEL. The CPU itself does not know anything about threads, right? Instead, software interrupt will fire immediately after IRQL goes down to PASSIVE_LEVEL( i.e. a write to TPR is made). As a result, APC will get delivered to the wrong thread.

Therefore, you have to make extra provisions to ensure that the above scenario never occurs, and handling the whole thing purely in a software seems to be the most reasonable approach here…

Anton Bassov