Calling KeAcquireSpinLock at DISPATCH_LEVEL

> Retrieving the IRQL is as expensive as setting it

Yes, but who forces you to implement it in a form of actually checking IRQL??? You can do it simply by
setting/clearing some bitflag in PCR every time IRQL crosses “< DPC_LEVEL/ >=DPC_LEVEL” border, plus introducing an extra PreviousIrql bitflag into spinlock itself…

Anton Bassov

Before I was a Hyper-V guy, I was the HAL guy.

And, James, your answer is specific to the implementation in 32-bit windows,
which is waning in importance. 64-bit Windows has always in-lined IRQL
functions, since the first beta DDK, which removed any option to change it
in the future, as existing drivers have code which just manipulates CR8
right in them. (CR8 is just an alias for the TPR.)

Jake Oshins
Hyper-V I/O Architect
Windows Kernel Group

This post implies no warranties and confers no rights.


“James Harper” wrote in message news:xxxxx@ntdev…

I’m probably missing something…

If it is so expensive, why isn’t this optimization done in kernel?
Check
current IRQL and change only if necessary.

I think it only really hurts when you virtualise, I forgot Jake was
the Hyper-V guy so that’s probably what he was implying anyway. From
Windows 2003sp2 and anything newer, the hardware register (TPR) is no
longer used so the problem takes care of itself.

Reading the hardware register to check IRQL is just as expensive as
writing it so you’d have to cache it somewhere (per CPU). In my drivers
for Xen I actually patched the windows kernel to do exactly that - only
update TPR if it is different (actually, on AMD architectures I write
TPR via another means that doesn’t involve a VMEXIT).

James

> What it is trying to say is that it is inefficient to call KeAcquireSpinLock if you know you are

already at DISPATCH_LEVEL

Well, we are left to guess what it is trying to say, but what it actually says is that calling KeAcquireSpinLock() at elevated IRQL is a deadly serious bug that is mentioned alongside with releasing a spinlock that you don’t hold and sleeping for nonzero interval at elevated IRQL…

Anton Bassov

Michal, there are two parts to the answer to your question.

  1. KeGetCurrentIrql is as expensive as KeAcquireSpinlock. They both touch
    the TPR/CR8.

  2. Dave Cutler decided very earlier to in-line KeRaiseIrql and KeLowerIrql
    in the DDK headers for 64-bit Windows. I can’t speak for him, but I think
    that was because AMD told him that they’d make CR8 access very cheap. I
    certainly heard them say that. Since he did that, we have had to choose
    between breaking existing x64 drivers and making it cheaper for
    virtualization and other reasons.

Jake Oshins
Hyper-V I/O Architect
Windows Kernel Group

This post implies no warranties and confers no rights.


“Michal Vodicka” wrote in message news:xxxxx@ntdev…

Well, my question was why it was done this way from the beginning and
why a cache wasn’t used when necessary, instead. It never made sense to
me why to export something which looks like implementation detail which
should be handled internally. Such an API can lead to code like this:

if (KeGetCurrentIrql() == DISPATCH_LEVEL) {
KeAcquireSpinLockAtDpcLevel(…);
}
else {
KeAcquireSpinLock(…);
}

which is worse than calling KeAcquireSpinLock unconditionally at
platforms where accessing IRQL is expensive.

Best regards,

Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Don Burn
Sent: Friday, October 08, 2010 1:29 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Calling KeAcquireSpinLock at DISPATCH_LEVEL

Jake was Mr Hardware at Microsoft long before Hyper-V.
Things like MSI
interrupts, and figuring out why some peripheral’s were
having problems
were being done by Jake, before virtualization was much of anything.

As Jake and Doron have been saying the cost of accessing the IRQL was
high because in many cases it hit hardware even back in the early NT
days, this is not a virtualization issue.

Don Burn (MVP, Windows DKD)
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr

“Michal Vodicka” wrote in message
> news:xxxxx@ntdev:
>
> > > -----Original Message-----
> > > From: xxxxx@lists.osr.com
> > > [mailto:xxxxx@lists.osr.com] On Behalf Of
> James Harper
> > > Sent: Friday, October 08, 2010 1:16 AM
> > > To: Windows System Software Devs Interest List
> > > Subject: RE: [ntdev] Calling KeAcquireSpinLock at DISPATCH_LEVEL
> > >
> > > How does it’s age compare to the warnings in the DDK docs
> about IRQL
> > > though?
> >
> > You said it hurts only for virtualization. I say NT were
> there before
> > virtualization (in Windows world…) and still there is a spinlock
> > function to be called at DISPATCH_LEVEL so this
> optimization had to have
> > some other reason than virtualization.
> >
> > Best regards,
> >
> > Michal Vodicka
> > UPEK, Inc.
> > [xxxxx@upek.com, http://www.upek.com]
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online
> at http://www.osronline.com/page.cfm?name=ListServer
>

> Michal, there are two parts to the answer to your question.

  1. KeGetCurrentIrql is as expensive as KeAcquireSpinlock. They both
    touch
    the TPR/CR8.

  2. Dave Cutler decided very earlier to in-line KeRaiseIrql and
    KeLowerIrql
    in the DDK headers for 64-bit Windows. I can’t speak for him, but I
    think
    that was because AMD told him that they’d make CR8 access very cheap.
    I
    certainly heard them say that. Since he did that, we have had to
    choose
    between breaking existing x64 drivers and making it cheaper for
    virtualization and other reasons.

CR8 access doesn’t involve a VMEXIT though, so most of the problems I’ve
seen with TRP access disappear in 64 bit mode when CR8 is used. AMD also
grant access to CR8 in 32 bit mode via a LOCK MOVE CR0 instruction…
that’s a fairly long opcode I think so maybe not so cheap but much
cheaper than actual TPR access when you are virtualised. It certainly
makes a huge speed difference under XP when I patch the kernel to
convert all TPR access to LOCK MOVE CR0.

James

So… Let’s summarize, shall we?

  1. The WDK Docs that say “NEVER call KeAcquireSpinLock from code that is running at IRQL =
    DISPATCH_LEVEL” is, at best, misleading. Other, similar, guidance has appeared in the DDK docs since NT V3.1 but never in such strong words. The docs need changed to reflect that they are referring to a performance optimization not a logic error.

  2. It may or may not be expensive to query/set the IRQL on any given processor on which your driver runs. On certain x86 systems IRQL management is mostly done in software (and was historically done this way on particular processor configurations).

  3. The IRQL functions on AMD64 are inlined, thereby demonstrating the underlying evil of inlining basic support functions (or, worse, making them macros – the IRP stack manipulation macros come quickly to mind) and proving that Cutler is not omniscient.

  4. It’s incredibly helpful to have folks like Doron and Jake, who actually have insight into how the code works, answer questions on this forum… otherwise we could be in the midst of a long “what’s wrong with those people, they could just cache the IRQL, they are soooo stooopid” thread.

Peter
OSR

Thanks, Jake. It is good to know reasoning behind it.

Best regards,

Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Jake Oshins
Sent: Friday, October 08, 2010 6:08 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Calling KeAcquireSpinLock at DISPATCH_LEVEL

Michal, there are two parts to the answer to your question.

  1. KeGetCurrentIrql is as expensive as KeAcquireSpinlock.
    They both touch
    the TPR/CR8.

  2. Dave Cutler decided very earlier to in-line KeRaiseIrql
    and KeLowerIrql
    in the DDK headers for 64-bit Windows. I can’t speak for
    him, but I think
    that was because AMD told him that they’d make CR8 access
    very cheap. I
    certainly heard them say that. Since he did that, we have
    had to choose
    between breaking existing x64 drivers and making it cheaper for
    virtualization and other reasons.

Jake Oshins
Hyper-V I/O Architect
Windows Kernel Group

This post implies no warranties and confers no rights.


“Michal Vodicka” wrote in message news:xxxxx@ntdev…

Well, my question was why it was done this way from the beginning and
why a cache wasn’t used when necessary, instead. It never
made sense to
me why to export something which looks like implementation
detail which
should be handled internally. Such an API can lead to code like this:

if (KeGetCurrentIrql() == DISPATCH_LEVEL) {
KeAcquireSpinLockAtDpcLevel(…);
}
else {
KeAcquireSpinLock(…);
}

which is worse than calling KeAcquireSpinLock unconditionally at
platforms where accessing IRQL is expensive.

Best regards,

Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]

> -----Original Message-----
> From: xxxxx@lists.osr.com
> [mailto:xxxxx@lists.osr.com] On Behalf Of Don Burn
> Sent: Friday, October 08, 2010 1:29 AM
> To: Windows System Software Devs Interest List
> Subject: RE:[ntdev] Calling KeAcquireSpinLock at DISPATCH_LEVEL
>
> Jake was Mr Hardware at Microsoft long before Hyper-V.
> Things like MSI
> interrupts, and figuring out why some peripheral’s were
> having problems
> were being done by Jake, before virtualization was much of anything.
>
> As Jake and Doron have been saying the cost of accessing
the IRQL was
> high because in many cases it hit hardware even back in the early NT
> days, this is not a virtualization issue.
>
>
> Don Burn (MVP, Windows DKD)
> Windows Filesystem and Driver Consulting
> Website: http://www.windrvr.com
> Blog: http://msmvps.com/blogs/WinDrvr
>
>
>
>
> “Michal Vodicka” wrote in message
> > news:xxxxx@ntdev:
> >
> > > > -----Original Message-----
> > > > From: xxxxx@lists.osr.com
> > > > [mailto:xxxxx@lists.osr.com] On Behalf Of
> > James Harper
> > > > Sent: Friday, October 08, 2010 1:16 AM
> > > > To: Windows System Software Devs Interest List
> > > > Subject: RE: [ntdev] Calling KeAcquireSpinLock at DISPATCH_LEVEL
> > > >
> > > > How does it’s age compare to the warnings in the DDK docs
> > about IRQL
> > > > though?
> > >
> > > You said it hurts only for virtualization. I say NT were
> > there before
> > > virtualization (in Windows world…) and still there is a spinlock
> > > function to be called at DISPATCH_LEVEL so this
> > optimization had to have
> > > some other reason than virtualization.
> > >
> > > Best regards,
> > >
> > > Michal Vodicka
> > > UPEK, Inc.
> > > [xxxxx@upek.com, http://www.upek.com]
> >
> >
> > —
> > NTDEV is sponsored by OSR
> >
> > For our schedule of WDF, WDM, debugging and other seminars visit:
> > http://www.osr.com/seminars
> >
> > To unsubscribe, visit the List Server section of OSR Online
> > at http://www.osronline.com/page.cfm?name=ListServer
> >
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online
> at http://www.osronline.com/page.cfm?name=ListServer
>

wrote in message news:xxxxx@ntdev…
> So… Let’s summarize, shall we?
>
> 1) The WDK Docs that say “NEVER call KeAcquireSpinLock from code that is
> running at IRQL =
> DISPATCH_LEVEL” is, at best, misleading. Other, similar, guidance has
> appeared in the DDK docs since NT V3.1 but never in such strong words.
> The docs need changed to reflect that they are referring to a performance
> optimization not a logic error.
>
> 2) It may or may not be expensive to query/set the IRQL on any given
> processor on which your driver runs. On certain x86 systems IRQL
> management is mostly done in software (and was historically done this way
> on particular processor configurations).
>
> 3) The IRQL functions on AMD64 are inlined, thereby demonstrating the
> underlying evil of inlining basic support functions (or, worse, making
> them macros – the IRP stack manipulation macros come quickly to mind) and
> proving that Cutler is not omniscient.
>
> 4) It’s incredibly helpful to have folks like Doron and Jake, who actually
> have insight into how the code works, answer questions on this forum…
> otherwise we could be in the midst of a long “what’s wrong with those
> people, they could just cache the IRQL, they are soooo stooopid” thread.
>
> Peter
> OSR
>

Does (2) also mean that executive spinlocks (all kinds of them)
are expensive and should be avoided whenever possible?
Especially, using spinlocks to sync threads running at PASSIVE,
just in case if a DISPATCH code path may be added later,
is waste of performance; instead, passive-only locks should be used?

Re. “underlying evil of inlining basic support functions” - IMHO
this is inherent weakness of
native code vs. JIT code generation, optimized for specific target system.
This sort of evil could not been avoided then. Now it may be viewed as
preliminary optimization.


Regards,
– pa

Pavel, when you say “executive spinlocks” I’m not sure what you mean. I’ll
mentally insert “dispatcher objects” in my mind and answer your question.

All dispatcher objects (which are the PASSIVE_LEVEL synchronization objects,
more or less, in NT) involve at least an interlocked compare/exchange
operation, and if there’s any lock contention, a couple of changes of IRQL
and some more interlocked operations. In total, I suspect that they’re
higher in overhead than spinlocks.

Executive objects (like the Fast Mutex) also involve an IRQL change, even
when there’s no contention.

In summary, don’t pick your synchronization mechanism for its perceived
overhead. Pick it for its semantics.

Jake Oshins
Hyper-V I/O Architect
Windows Kernel Group

This post implies no warranties and confers no rights.


“Pavel A.” wrote in message news:xxxxx@ntdev…

wrote in message news:xxxxx@ntdev…
> So… Let’s summarize, shall we?
>
> 1) The WDK Docs that say “NEVER call KeAcquireSpinLock from code that is
> running at IRQL =
> DISPATCH_LEVEL” is, at best, misleading. Other, similar, guidance has
> appeared in the DDK docs since NT V3.1 but never in such strong words. The
> docs need changed to reflect that they are referring to a performance
> optimization not a logic error.
>
> 2) It may or may not be expensive to query/set the IRQL on any given
> processor on which your driver runs. On certain x86 systems IRQL
> management is mostly done in software (and was historically done this way
> on particular processor configurations).
>
> 3) The IRQL functions on AMD64 are inlined, thereby demonstrating the
> underlying evil of inlining basic support functions (or, worse, making
> them macros – the IRP stack manipulation macros come quickly to mind) and
> proving that Cutler is not omniscient.
>
> 4) It’s incredibly helpful to have folks like Doron and Jake, who actually
> have insight into how the code works, answer questions on this forum…
> otherwise we could be in the midst of a long “what’s wrong with those
> people, they could just cache the IRQL, they are soooo stooopid” thread.
>
> Peter
> OSR
>

Does (2) also mean that executive spinlocks (all kinds of them)
are expensive and should be avoided whenever possible?
Especially, using spinlocks to sync threads running at PASSIVE,
just in case if a DISPATCH code path may be added later,
is waste of performance; instead, passive-only locks should be used?

Re. “underlying evil of inlining basic support functions” - IMHO
this is inherent weakness of
native code vs. JIT code generation, optimized for specific target system.
This sort of evil could not been avoided then. Now it may be viewed as
preliminary optimization.


Regards,
– pa

> Pavel, when you say “executive spinlocks” I’m not sure what you mean.

Apparently, he means not spinlock (which is, IIRC, implemented by HAL under Windows) but ERESOURCE and/or FAST_MUTEX, acquisition of which may also change IRQL from PASSIVE_LEVEL to APC_LEVEL…

If this is the case his question does not seem to make sense in itself, because then he is speaking about objects that rely upon dispatcher-level constructs that synchronize between threads, rather than between CPUs, and may involve context switches if there is any contention between threads . His question is comparable to asking about seconds while performing an operation that takes hours to accomplish…

Anton Bassov

wrote in message news:xxxxx@ntdev…
>> Pavel, when you say “executive spinlocks” I’m not sure what you mean.
>
> Apparently, he means not spinlock (which is, IIRC, implemented by HAL
> under Windows) but ERESOURCE and/or FAST_MUTEX, acquisition of which may
> also change IRQL from PASSIVE_LEVEL to APC_LEVEL…

No, he means the kernel/HAL spinlocks.
He is mainly interested in situation of low contention, when a mutex can
take just a
couple of interlocked ops (cheap when cached), but a spinlock always raises
IRQL -
which appears to be more expensive than he believed.
–pa

> If this is the case his question does not seem to make sense in itself,
> because then he is speaking about objects that rely upon dispatcher-level
> constructs that synchronize between threads, rather than between CPUs, and
> may involve context switches if there is any contention between threads .
> His question is comparable to asking about seconds while performing an
> operation that takes hours to accomplish…
>
> Anton Bassov
>

> He is mainly interested in situation of low contention, when a mutex can take just a couple of interlocked

ops (cheap when cached), but a spinlock always raises IRQL - which appears to be more expensive
than he believed.

Is he speaking about synchronizing between threads or processors, in the first place??? In the former case using spinlocks is unreasonable in most cases anyway ( which, btw, has absolutely nothing to do with overhead implied by changing IRQL), unless operation guarded by a spinlock involves just a few lines of code and the resource of interest is in high demand . In the latter case spinlock is the only possible option…

Anton Bassov

wrote in message news:xxxxx@ntdev…
>> He is mainly interested in situation of low contention, when a mutex can
>> take just a couple of interlocked
>> ops (cheap when cached), but a spinlock always raises IRQL - which
>> appears to be more expensive
>> than he believed.
>
> Is he speaking about synchronizing between threads or processors, in the
> first place??? In the former case using spinlocks is unreasonable in most
> cases anyway ( which, btw, has absolutely nothing to do with overhead
> implied by changing IRQL), unless operation guarded by a spinlock
> involves just a few lines of code and the resource of interest is in high
> demand . In the latter case spinlock is the only possible option…
>
> Anton Bassov

Better let move to another thread.
For this one, I’d only like to aplologize for “executive spinlocks”;
I thought this is how the “normal” dispatch-level spinlock are called
(as opposed to interrupt and other spinlocks ). If this name applies only
to dispatcher objects, I’m sorry.

- pa

> Apparently, he means not spinlock (which is, IIRC, implemented by HAL under Windows)

Only the IRQL part of the spinlock. The xxxAtDpcLevel spinlock functions are not in HAL.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

> Pavel, when you say “executive spinlocks” I’m not sure what you mean.

Executive Spin Lock: The proper name for “garden variety” spin locks implemented by the kernel, acquired with KeAcquireSpinLock, released by KeReleaseSpinLock.

I didn’t name 'em, but that’s been how they’ve been referred to forever.

See:
http://msdn.microsoft.com/en-us/library/ff548114(VS.85).aspx

http://download.microsoft.com/download/e/b/a/eba1050f-a31d-436b-9281-92cdfeae4b45/synch_table.doc

http://msdn.microsoft.com/en-us/library/ff565513(VS.85).aspx

Not to be confused with In Stack Queued spin locks or Interrupt spin locks.

Peter
OSR