How IRQL is reflected in hardware?

I am just reading MS white paper “Scheduling, Thread Context and IRQL”
and totally confused about what IRQL is.

Does IRQL changes refelected in any HW (CPU, APIC etc.) registers?
How PASSIVE_LEVEL,APC_LEVEL and DISPATCH_LEVEL is implemented (they are
lower thant device interrupts levels),
So I don’t think they have mapping in HW.

Thanks.

It depends on the platform and hal as to what hardware changes are effected by an IRQL raising or lowering. The list archives have some discussions about this which you could peruse. No code you write should be dependent on what does (or doesn’t) happen hardware-wise for touching IRQL though in general.

PASSIVE_LEVEL/APC_LEVEL can be thought of as specific to a thread, versus specific to a processor for DISPATCH_LEVEL and above.

  • S

-----Original Message-----
From: xxxxx@viatech.com.cn
Sent: Wednesday, June 24, 2009 00:26
To: Windows System Software Devs Interest List
Subject: [ntdev] How IRQL is reflected in hardware?

I am just reading MS white paper “Scheduling, Thread Context and IRQL”
and totally confused about what IRQL is.

Does IRQL changes refelected in any HW (CPU, APIC etc.) registers?
How PASSIVE_LEVEL,APC_LEVEL and DISPATCH_LEVEL is implemented (they are
lower thant device interrupts levels),
So I don’t think they have mapping in HW.

Thanks.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Dear Skywing,thanks for your kind reply.

Am I right to say: PASSIVE_LEVEL/APC_LEVEL are pure software issue and
handled by the OS kernel, they don’t correspond to any HW registers?
Am I right to sky: For DISPATCH_LEVEL and above, each IRQL has
correspondance with the actual interrput(and IDTs), IRQL changes will
reflected in HW registers such as IMR?

Thanks a lot.

>

Dear Skywing,thanks for your kind reply.

Am I right to say: PASSIVE_LEVEL/APC_LEVEL are pure software issue and
handled by the OS kernel, they don’t correspond to any HW registers?
Am I right to sky: For DISPATCH_LEVEL and above, each IRQL has
correspondance with the actual interrput(and IDTs), IRQL changes will
reflected in HW registers such as IMR?

For XP, and for Windows 2003 before SP2, the TASKPRI (TPR) register was
used to reflect the current IRQL.

For Windows 2003 SP2 and beyond, the IRQL is entirely software driven,
which is a bit faster in normal operation (I think) but much much faster
in a virtualised environment (every read and write to TPR gets ‘caught’
by the hypervisor and so has to be emulated, and it gets read and
written many many times per second!)

James

Thank you, James.

[Quote]
For Windows 2003 SP2 and beyond, the IRQL is entirely software driven

[Quote End]

If IRQL is entiredly SOFTWARE driven, how could the operating system guarentee that a higher IRQL can’t be interrupted by lower IRQL?
IMHO, these kind of rules should be enfored by hardware.

Regards,
HW

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of James Harper
Sent: Wednesday, June 24, 2009 4:01 PM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] How IRQL is reflected in hardware?

Dear Skywing,thanks for your kind reply.

Am I right to say: PASSIVE_LEVEL/APC_LEVEL are pure software issue and
handled by the OS kernel, they don’t correspond to any HW registers?
Am I right to sky: For DISPATCH_LEVEL and above, each IRQL has
correspondance with the actual interrput(and IDTs), IRQL changes will
reflected in HW registers such as IMR?

For XP, and for Windows 2003 before SP2, the TASKPRI (TPR) register was used to reflect the current IRQL.

For Windows 2003 SP2 and beyond, the IRQL is entirely software driven, which is a bit faster in normal operation (I think) but much much faster in a virtualised environment (every read and write to TPR gets ‘caught’
by the hypervisor and so has to be emulated, and it gets read and written many many times per second!)

James


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

>

Thank you, James.

[Quote]
For Windows 2003 SP2 and beyond, the IRQL is entirely software driven

[Quote End]

If IRQL is entiredly SOFTWARE driven, how could the operating system
guarentee
that a higher IRQL can’t be interrupted by lower IRQL?
IMHO, these kind of rules should be enfored by hardware.

Just because windows is at a high IRQL when a lower priority interrupt
is asserted doesn’t mean that Windows has to do anything other than set
a flag and go back to what it was doing. When the IRQL is lowered
windows can then pick up when it left off and service the IRQL properly.

I guess the powers that be decided that on the balance, the overhead of
fiddling with the TPR register all the time was more than the overhead
of running a small amount of ISR code in the case where one interrupt is
asserted when the system is technically at a higher IRQL. This is
certainly true in the case of a virtualisation, and probably true given
that time spent at DIRQL is supposed to be very short and so the chance
of an interrupt occurring should be low.

Please don’t take the above as gospel though, this is based on my
observations and stuff I’ve read here and there. The advantage of this
list in particular though is that sometimes you are more likely to get
correct information by stating something incorrectly than you are by
simply asking a question, so if I’m wrong I’m sure someone will step up
and correct me :slight_smile:

James

>

Just because windows is at a high IRQL when a lower priority interrupt
is asserted doesn’t mean that Windows has to do anything other than
set
a flag and go back to what it was doing. When the IRQL is lowered
windows can then pick up when it left off and service the IRQL
properly.

Forgot to mention, I’m only speaking about ACPI kernels here. The
non-ACPI kernels may well behave differently.

James

According to Russinovich it follows a concept of lazy IRQL in which the IRQL
does not always match the state of the TPR. If a lower level interrupt
occurs, the TPR is updated and control is returned to the higher level
interrupt routine. There are some discussions in the archives about this.

//Daniel

wrote in message news:xxxxx@ntdev…
> If IRQL is entiredly SOFTWARE driven, how could the operating system
> guarentee that a higher IRQL can’t be interrupted by lower IRQL?
> IMHO, these kind of rules should be enfored by hardware.
>
> Regards,
> HW
>

>

According to Russinovich it follows a concept of lazy IRQL in which
the IRQL
does not always match the state of the TPR. If a lower level interrupt
occurs, the TPR is updated and control is returned to the higher level
interrupt routine. There are some discussions in the archives about
this.

Under 2003sp2 on a ACPI kernel (UP or MP), I can definitely say that TPR
is never touched aside from very very early in boot.

For XP under Xen, I patch the kernel in one of the following ways:
. AMD - replace any TPR access with a LOCK MOV CR0 instruction (LOCK MOV
CR0 == MOV CR8, which is a hardware lazy TPR that doesn’t require an
exit to the hypervisor. Intel doesn’t have exactly the same thing but
does have something similar but it works from the hypervisor).
. Intel, without hardware assist - cache TPR in memory and read it from
there. Writes are still slow.

James

xxxxx@viatech.com.cn wrote:

If IRQL is entiredly SOFTWARE driven, how could the operating system guarentee that a higher IRQL can’t be interrupted by lower IRQL?
IMHO, these kind of rules should be enfored by hardware.

The operating system guarantees that a THREAD with a lower IRQL will
never pre-empt a thread with a higher IRQL. That’s completely under the
operating system’s control, handled by the scheduler.

Even if you get a hardware interrupt from a device with a lower IRQL
(that is managed in hardware on some of the HALs, at least), that
device’s ISR will not be allowed to run until all higher IRQL threads
have finished.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

> If IRQL is entiredly SOFTWARE driven, how could the operating system guarentee that a

higher IRQL can’t be interrupted by lower IRQL?

Actually, It can be done pretty easily…

The very first idea that comes into one’s head is making interrupt stub first disable IRQ and acknowledge interrupt to a controller and then transfer control to software-implemented interrupt dispatcher that re-enables interrupts on a given CPU and invokes ISRs. If you do it this way you can make OS software decide when and how ISR gets invoked…

Anton Bassov

xxxxx@hotmail.com wrote:

> If IRQL is entiredly SOFTWARE driven, how could the operating system guarentee that a
> higher IRQL can’t be interrupted by lower IRQL?
>

Actually, It can be done pretty easily…

The very first idea that comes into one’s head is making interrupt stub first disable IRQ and acknowledge interrupt to a controller and then transfer control to software-implemented interrupt dispatcher that re-enables interrupts on a given CPU and invokes ISRs. If you do it this way you can make OS software decide when and how ISR gets invoked…

Yes, but there’s a subtle difference between his question and your
answer, and it is exactly the difference I tried to point out in my
reply. The system doesn’t really need to guarantee that you can’t be
INTERRUPTED by a lower IRQL device (and Windows does not make that
guarantee). It just needs to guarantee that control will never be
transferred to a lower IRQL thread.

As long as we think of the operating system’s core interrupt handler as
being the highest IRQL thread, I think this mental model works perfectly
well.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Hmm… I assume that you meant interrupt service routine and not thread?

PASSIVE_LEVEL threads can still be scheduled ahead of APC_LEVEL threads. Otherwise, an APC_LEVEL thread wouldn’t be able to wait for a non-zero time.

  • S

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Tim Roberts
Sent: Wednesday, June 24, 2009 9:14 AM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] How IRQL is reflected in hardware?

xxxxx@viatech.com.cn wrote:

If IRQL is entiredly SOFTWARE driven, how could the operating system guarentee that a higher IRQL can’t be interrupted by lower IRQL?
IMHO, these kind of rules should be enfored by hardware.

The operating system guarantees that a THREAD with a lower IRQL will
never pre-empt a thread with a higher IRQL. That’s completely under the
operating system’s control, handled by the scheduler.

Even if you get a hardware interrupt from a device with a lower IRQL
(that is managed in hardware on some of the HALs, at least), that
device’s ISR will not be allowed to run until all higher IRQL threads
have finished.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Skywing wrote:

Hmm… I assume that you meant interrupt service routine and not thread?

Yes, I spoke a bit loosely. For purposes of building a “mental model”,
one can think of an ISR as simply being a high priority thread, even
though we know that it actually hijacks another thread.

PASSIVE_LEVEL threads can still be scheduled ahead of APC_LEVEL threads. Otherwise, an APC_LEVEL thread wouldn’t be able to wait for a non-zero time.

Well, a thread that is waiting is not ready to run, and will never be
scheduled, no matter how high its priority is. A PASSIVE_LEVEL thread
will not be scheduled as long as an APC_LEVEL thread is ready to run.
Agreed?


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

No, as far as I know (and from a quick glance at things to double check), the wait IRQL doesn’t have any bearing on that.

  • S

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Tim Roberts
Sent: Wednesday, June 24, 2009 1:21 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] How IRQL is reflected in hardware?

Skywing wrote:

Hmm… I assume that you meant interrupt service routine and not thread?

Yes, I spoke a bit loosely. For purposes of building a “mental model”,
one can think of an ISR as simply being a high priority thread, even
though we know that it actually hijacks another thread.

PASSIVE_LEVEL threads can still be scheduled ahead of APC_LEVEL threads. Otherwise, an APC_LEVEL thread wouldn’t be able to wait for a non-zero time.

Well, a thread that is waiting is not ready to run, and will never be
scheduled, no matter how high its priority is. A PASSIVE_LEVEL thread
will not be scheduled as long as an APC_LEVEL thread is ready to run.
Agreed?


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

I know YOU know the difference, but by way of explaining how Windows works, I’m concerned this type of explanation could hopelessly confuse the newbs and Linuxites.

Again, I know Tim knows this, but for the archives and all of posterity: Interrupts do not run on an “interrupt thread” – they run in an arbitrary thread context, at the (synchronize IRQL) of the associated ISR.

Peter
OSR

> Well, a thread that is waiting is not ready to run, and will never be scheduled, no matter how high

its priority is. A PASSIVE_LEVEL thread will not be scheduled as long as an APC_LEVEL thread
is ready to run.

Actually, I have always thought that raising IRQL to APC_LEVEL IRQL results simply in disabling APC delivery to a given thread, without any implications to its priority against other threads in the system …

Anton Bassov

> raising IRQL to APC_LEVEL IRQL results simply in disabling APC delivery

to a given thread
Yep, APC_LEVEL is thread-specific, everything above is processor-specifi,
per

http://msdn.microsoft.com/en-us/library/ms810029.aspx:.


Processor-specific and Thread-specific IRQLs

As previously mentioned, the system’s thread scheduler runs at
IRQL=DISPATCH_LEVEL. IRQLs at or above DISPATCH_LEVEL are processor
specific. Hardware and software interrupts at these levels are targeted at
individual processors. The following processor-specific IRQLs are commonly
used by drivers:

* DISPATCH_LEVEL
* DIRQL
* HIGHEST_LEVEL

IRQLs below DISPATCH_LEVEL are thread-specific. Software interrupts at these
levels are targeted at individual threads. Drivers use the following
thread-specific IRQLs:

* PASSIVE_LEVEL
* APC_LEVEL

The thread scheduler considers only thread priority, not IRQL, when
preempting a thread. If a thread running at IRQL=APC_LEVEL blocks, the
scheduler might select a new thread for the processor that was previously
running at PASSIVE_LEVEL.

----- Original Message -----
From:
To: “Windows System Software Devs Interest List”
Sent: Wednesday, June 24, 2009 9:56 PM
Subject: RE:[ntdev] How IRQL is reflected in hardware?

>> Well, a thread that is waiting is not ready to run, and will never be
>> scheduled, no matter how high
>> its priority is. A PASSIVE_LEVEL thread will not be scheduled as long as
>> an APC_LEVEL thread
>> is ready to run.
>
> Actually, I have always thought that raising IRQL to APC_LEVEL IRQL
> results simply in disabling APC delivery to a given thread, without any
> implications to its priority against other threads in the system …
>
> Anton Bassov
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer

In fact, the guy who taught me all how it worked used to phrase it by saying
that interrupts and DPCs weren’t associated with any thread at all. He’d
use phrases like “processor 2 is running a DPC and processor 0 is running a
thread.”

Over the years, and depending on which port of NT we’re talking about, DPCs
and interrupts might run on the stack of whatever thread got interrupted or
they might run on their own stack. This isn’t architectural, merely an
implementation detail in a particular release.


Jake Oshins
Hyper-V I/O Architect
Windows Kernel Group

This post implies no warranties and confers no rights.


wrote in message news:xxxxx@ntdev…
>


>
> I know YOU know the difference, but by way of explaining how Windows
> works, I’m concerned this type of explanation could hopelessly confuse the
> newbs and Linuxites.
>
> Again, I know Tim knows this, but for the archives and all of posterity:
> Interrupts do not run on an “interrupt thread” – they run in an arbitrary
> thread context, at the (synchronize IRQL) of the associated ISR.
>
> Peter
> OSR
>
>

That’s true. IRQL is not part of scheduling, it’s part of interrupt and
lock management. It’s a mistake to think that IRQL has to do with the
scheduler at all, except to note that the scheduler runs at DISPATCH_LEVEL,
so when you’re running at DISPATCH_LEVEL or higher your code can’t be
interrupted by the scheduler.


Jake Oshins
Hyper-V I/O Architect
Windows Kernel Group

This post implies no warranties and confers no rights.


wrote in message news:xxxxx@ntdev…
>> Well, a thread that is waiting is not ready to run, and will never be
>> scheduled, no matter how high
>> its priority is. A PASSIVE_LEVEL thread will not be scheduled as long as
>> an APC_LEVEL thread
>> is ready to run.
>
> Actually, I have always thought that raising IRQL to APC_LEVEL IRQL
> results simply in disabling APC delivery to a given thread, without any
> implications to its priority against other threads in the system …
>
> Anton Bassov
>