Intrinsics/Cost of KeGetCurrentIRQL()

Hi

Since the reason for efficiency of KeAcquireSLockAtDpc() and (one of the reason i.e. w.r.t to queued part) of InstackQueuedSLocks is the cost of querying for the current IRQL, Pls let me know why querying for IRQL is a costly operation?
I am assuming this is processor specific attribute? why is it then costly?
What is happening in OS/processor during this call that makes it costly?

Or is it that KeGetCurrentIRQL() by itself is not costly, it is costly only when done (probably by OS) as part of a containing constructs like KeAcquireSpinLock, InterruptLock because now the processor-irql is stored somewhere tracking the lock context and need to be synched with rest of procs etc.?

Thanks

> Hi

Since the reason for efficiency of KeAcquireSLockAtDpc() and (one of the
reason i.e. w.r.t to queued part) of InstackQueuedSLocks is the cost of
querying for the current IRQL, Pls let me know why querying for IRQL is a
costly operation?
I am assuming this is processor specific attribute? why is it then costly?
What is happening in OS/processor during this call that makes it costly?

Or is it that KeGetCurrentIRQL() by itself is not costly, it is costly only when
done (probably by OS) as part of a containing constructs like
KeAcquireSpinLock, InterruptLock because now the processor-irql is stored
somewhere tracking the lock context and need to be synched with rest of
procs etc.?

Some of the information about the costliness of setting/querying the IRQL may be out of date and relate to the hardware implementation of IRQL where a change of IRQL would require accessing a hardware register (TPR Register). I have never measured this, but I suspect it’s not nearly as costly as it used to be (since 2003sp2), because Windows no longer uses that hardware register and instead does its magic a different way.

This is ntdev, so you can be sure that someone will correct me if I’m wrong :slight_smile:

James

Usually, IRQL is the APIC TPR, which is cheap to be queried.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

wrote in message news:xxxxx@ntdev…
> Hi
>
> Since the reason for efficiency of KeAcquireSLockAtDpc() and (one of the reason i.e. w.r.t to queued part) of InstackQueuedSLocks is the cost of querying for the current IRQL, Pls let me know why querying for IRQL is a costly operation?
> I am assuming this is processor specific attribute? why is it then costly?
> What is happening in OS/processor during this call that makes it costly?
>
> Or is it that KeGetCurrentIRQL() by itself is not costly, it is costly only when done (probably by OS) as part of a containing constructs like KeAcquireSpinLock, InterruptLock because now the processor-irql is stored somewhere tracking the lock context and need to be synched with rest of procs etc.?
>
> Thanks
>
>
>

AFAICT - the TPR is shadowed in cr8 on x64, so its ‘fast’ and probably x86
is doing something with magic register remapping internally for TPR
accesses since the APIC is ‘local’ to the CPU.

t.

On Wed, Oct 30, 2013 at 9:03 AM, Maxim S. Shatskih
wrote:

> Usually, IRQL is the APIC TPR, which is cheap to be queried.
>
> –
> Maxim S. Shatskih
> Microsoft MVP on File System And Storage
> xxxxx@storagecraft.com
> http://www.storagecraft.com
>
> wrote in message news:xxxxx@ntdev…
> > Hi
> >
> > Since the reason for efficiency of KeAcquireSLockAtDpc() and (one of the
> reason i.e. w.r.t to queued part) of InstackQueuedSLocks is the cost of
> querying for the current IRQL, Pls let me know why querying for IRQL is a
> costly operation?
> > I am assuming this is processor specific attribute? why is it then
> costly?
> > What is happening in OS/processor during this call that makes it costly?
> >
> > Or is it that KeGetCurrentIRQL() by itself is not costly, it is costly
> only when done (probably by OS) as part of a containing constructs like
> KeAcquireSpinLock, InterruptLock because now the processor-irql is stored
> somewhere tracking the lock context and need to be synched with rest of
> procs etc.?
> >
> > Thanks
> >
> >
> >
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

> Since the reason for efficiency of KeAcquireSLockAtDpc() and (one of the reason i.e. w.r.t to

queued part) of InstackQueuedSLocks is the cost of querying for the current IRQL, Pls let me know
why querying for IRQL is a costly operation?

On some pre-APIC HALs, the IRQL was implemented in software totally without using the PIC priority concept at all, I think this was for speed.

On APIC, the local APIC’s TPR is used for IRQL.

This is a major piece of news for me that 2003SP2 up does NOT do this. Have they reverted to software implementation of IRQLs?

In any case, if you’re 100% surely on DPC, then you can save this redundant operation by using another function, even if this operation is quick.

The “FromDpcLevel” functions are also implemented in NTOSKRNL itself, not in the HAL, since they do not use any hardware except the interlocked opcode of the CPU itself.

As about InstackQueued stuff - they are better NOT due to this, but due to:

a) exactly 1 interlocked op each acquire and each release
b) spinning is done on a var local to each CPU, not on a global
c) the attempt order is guaranteed to match the acquisition order, so, no starvations


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

>

Usually, IRQL is the APIC TPR, which is cheap to be queried.

For HAL’s that use TPR for IRQL, even reading is expensive, and very expensive in a VM as it causes a VMEXIT. See next email.

James

>

> Since the reason for efficiency of KeAcquireSLockAtDpc() and (one of the
reason i.e. w.r.t to
>queued part) of InstackQueuedSLocks is the cost of querying for the
current IRQL, Pls let me know
>why querying for IRQL is a costly operation?

On some pre-APIC HALs, the IRQL was implemented in software totally
without using the PIC priority concept at all, I think this was for speed.

On APIC, the local APIC’s TPR is used for IRQL.

This is a major piece of news for me that 2003SP2 up does NOT do this. Have
they reverted to software implementation of IRQLs?

Yes. Because of the performance issues for x86 XP in a VM, I wrote a driver that patched the kernel to remove all TPR access and replaced them with LOCK MOVE CR0 instructions (via a call, as that instruction is bigger). LOCK MOVE CR0 is effectively MOVE CR8 (TPR) on AMD. Intel doesn’t provide a way to access CR8/TPR , so instead I stored the value written to TPR on write, and returned this cached value on reads. Even that resulted in a significant speedup.

Current versions of OS (since SP2 of 2003) don’t use TPR at all. Presumably due to the virtualisation cost. I read a really good explanation of how lazy IRQL is implemented once, but have never been able to find it again.

James

In x64 OS, IRQL access is inlined to CR8. This is still in the WDK headers.

In x86 flavor, IRQL access is done in a function call. Win2003 optimized the access to APIC by keeping a shadow copy of current IRQL in this case.

Thx.

Got back my dev machine

–WDK
ReadCr8 macro

mov rax, cr8 read IRQL

endm

>

In x64 OS, IRQL access is inlined to CR8. This is still in the WDK headers.

On AMD processors, you can get to CR8 in x86 mode too - the LOCK prefix causes the MOVE CR0 instruction to actually access CR8

In x86 flavor, IRQL access is done in a function call. Win2003 optimized the
access to APIC by keeping a shadow copy of current IRQL in this case.

Win2003 sp2 and newer does not use the APIC Task PRiority register for IRQL at all. If you search the kernel you will not find any instructions that access the TPR register.

James

>Win2003 sp2 and newer does not use the APIC Task PRiority register for IRQL at all.

Isn’t CR8 a TPR?

For x64 Windows, there are public assembler inlines to use CR8 as IRQL.

Such kind of things are carved in stone and just cannot change (without breaking binary compat of the drivers, which MS does not want to do).


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

>

>Win2003 sp2 and newer does not use the APIC Task PRiority register for
IRQL at all.

Isn’t CR8 a TPR?

For x64 Windows, there are public assembler inlines to use CR8 as IRQL.

Such kind of things are carved in stone and just cannot change (without
breaking binary compat of the drivers, which MS does not want to do).

Yes you are right for x64 (I assume - I’ve never checked as performance was never a problem). I was referring to x86.

James