Re: Problem using queued spinlocks on single cpu comp uter

Michal,

There are ways to handle one structure per acquisition that don’t require a
new object per acquisition. As for code such as

SpinLock.Lock();

if (something_went_wrong) {
throw Error;
}
SpinLock.Unlock();

whoever uses it is asking for it, no ? Yuk. That’s not what critical
sections were invented for !

Alberto.

-----Original Message-----
From: Michal Vodicka [mailto:xxxxx@veridicom.cz.nospam]
Sent: Friday, September 26, 2003 4:35 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Re: Problem using queued spinlocks on single cpu comp
uter


From:
xxxxx@compuware.com[SMTP:xxxxx@compuware.com]
Reply To: xxxxx@lists.osr.com
Sent: Friday, September 26, 2003 4:28 PM
To: xxxxx@lists.osr.com
Subject: [ntdev] Re: Problem using queued spinlocks on single cpu
comp uter

  1. I really dislike splitting spinlock usage into two objects, I think it
    defeats the purpose of C++ encapsulation.

Maybe. But OS expects one KLOCK_QUEUE_HANDLE per acquisition and you have to
design spinlock class this way. The easy way is to use two classes. The hard
way is to use one object and allocate one KLOCK_QUEUE_HANDLE per processor
and for every acquire decide which one to use. Complicated, error prone and
no real advantage.

I can see it’s convenient, you
create the object on the stack and when you collapse the frame, presto,
it’s
gone - but that comes at the price of encapsulation, and makes me feel
like
I’m not programming in object-oriented style any longer !

A bit dogmatic view. With C++ exceptions and templates some things changed.

As for exception
safety, a spinlock should be the lowest level in the hierarchy, so, it
shouldn’t have to concern itself with that kind of thing: you either spin
or
you don’t. If processor P1 is spinning, nothing that happens to P2 should
matter as far as P1 goes, unless of course the OS decides to take a
catastrophic exit, but then, that’s not a driver writer issue.

No, you don’t understand how it was mentioned. Example should show it:

SpinLock.Lock();

if (something_went_wrong) {
throw Error;
}
SpinLock.Unlock();

Apparently, if something went wrong, spinlock isn’t unlocked. To solve it,
Error has to be catched and lock unlocked. Local acquisitor object is
unlocked automatically. The result is shorter and cleaner code.

Above can be fixed other way, of course. Below can’t:

SpinLock.Lock();
a = b;
SpinLock.Unlock();

Quite innocent, isn’t it? What if operator ‘=’ throws an exception? With
exception unsafe class as SpinLock is you always have to enclose protected
sections into try/catch block. With exception safe class as lock acquisitor
is you don’t have to. The point is without exception safe classes you always
have to take care about exceptions and sooner or later you make an error.
For example, if above code if used for integers originally and later moved
to a template and used for classes whose operators can throw exceptions.
That’s why classes as lock acquisitor are widely used for synchronization
objects as mutexes, evens and critical sections. Other possibility is to not
use C++ exceptions at all.

(I intentionally ignore function exception specification i.e. throw() which
can help if used correctly. AFAIK, VC compiler also ignored it in past
years; I’m not sure about latest versions)

Best regards,

Michal Vodicka
STMicroelectronics Design and Application s.r.o.
[michal.vodicka@st.com, http:://www.st.com]


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@compuware.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it.

Michal,

Sorry if I annoyed you, my apologies.

I now understand that each processor must have its own lock structure, but
that’s easily handled. You could try for example something like

struct queuelocks
{
protected:
static KLOCK_QUEUE_HANDLE ql[MaxNumberOfProcessors];
public:
queuelocks() { }
};

struct spinlock : public queuelocks
{
protected:
KSPIN_LOCK lk;
public:
spinlock() : queuelocks() { }
Acquire( ) {
KeAcquireInStackQueuedSpinlock(&lk,&ql[KeGetCurrentProcessorNumber()]; }
Release( ) {
KeReleaseInStackQueuedSpinlock(&lk,&ql[KeGetCurrentProcessorNumber()]; }
}

Using the spinlock is as simple as s.Acquire() and s.Release(), all the rest
of the work is done once at init time, and the code is localized in one
place:

s.Acquire();
{
// critical section goes here…
}
s.Release();

This code also gets rid of those dynamic allocations that people objected
to. And if you don’t like that static variable, you can remove it at the
cost of grabbing additional memory for each spinlock, or you can compromise
by having an array of PKLOCK_QUEUE_HANDLEs pointing to structures created by
the constructor or even allocated at DriverEntry time. Another advantage is
that you can use it in multithreaded situations, for example, a
producer-consumer:

Thread A()
{
e.Acquire();
{
Produce();
}
f.Release();
}

Thread B()
{
f.Acquire();
{
Consume();
}
e.Release();
}

The above code has another advantage: I can quickly replace Acquire( ) and
Release( ) with machine code, or I can derive such an object, for example,

struct hardlock : public spinlock
{
Acquire ( ) { TestAndSetAssemblerMacroGoesHere( ); }
Release ( ) { SetSpinLockVariableToZeroByHand( ); }
}

which may be required at debug time when the OS is out to lunch. Moreover, I
could even move Acquire( ) and Release( ) to the superclass, for example,

struct queuelocks
{
protected:
static KLOCK_QUEUE_HANDLE ql[MaxNumberOfProcessors];
KSPIN_LOCK s;
public:
queuelocks() { }
virtual Acquire() {
KeAcquireInStackQueuedSpinlock(s,&ql[KeGetCurrentProcessorNumber()]; }
virtual Release() {
KeReleaseInStackQueuedSpinlock(s,&ql[KeGetCurrentProcessorNumber()]; }
};

And now I have the option to override the Acquire and Release methods with
my own code (for example, a test, test and set), or with calls to plain
vanilla KeAcquireSpinLock( ) and to KeReleaseSpinLock( ): I can derive
multiple kinds of spinlocks from that one superclass. I can add common
debugging code to the queuelocks class too.

There are umpteen variations on this theme. But of course, caveat emptor, I
didn’t try the code in a real machine, so I don’t know if the compiler will
generate good kernel-side code. The assumption of course is that you only
need one KLOCK_QUEUE_HANDLE per processor, meaning, a processor will never
be waiting on more than one spinlock.

Maybe I reacted too strongly, my apologies, but I don’t like things such as
throwing exceptions from inside a critical section, I believe one should
release the spinlock first and then go deal with exception conditions.

Alberto.

-----Original Message-----
From: Michal Vodicka [mailto:xxxxx@veridicom.cz.nospam]
Sent: Friday, September 26, 2003 6:52 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Re: Problem using queued spinlocks on single cpu comp
uter


From:
xxxxx@compuware.com[SMTP:xxxxx@compuware.com]
Reply To: xxxxx@lists.osr.com
Sent: Saturday, September 27, 2003 12:00 AM
To: xxxxx@lists.osr.com
Subject: [ntdev] Re: Problem using queued spinlocks on single cpu
comp uter

There are ways to handle one structure per acquisition that don’t require
a
new object per acquisition.

Sure. The hard way I mentioned, for example. Show me one which is as easy
and at least as efficient as using new object on the stack.

As for code such as

SpinLock.Lock();

if (something_went_wrong) {
throw Error;
}
SpinLock.Unlock();

whoever uses it is asking for it, no ? Yuk. That’s not what critical
sections were invented for !

Sure. This was only example which should show the problem blatantly obvious
way. You ignore the next one and following text. It can be my English or you
just don’t want to understand. In any case I give up.

Best regards,

Michal Vodicka
STMicroelectronics Design and Application s.r.o.
[michal.vodicka@st.com, http:://www.st.com]


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@compuware.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it.

I did indeed ignore a few implementation details, but I feel the IRQL level
race can be handled; worse comes to worse, a cli/sti pair does wonders to
prevent preemption. As far as the one handle per acquire, I’m not sure I
like the idea of having more than one acquire per CPU, although, to be fair,
that restriction would break my producer-consumer example ! In my code,
MaxNumberOfProcessors is an upper limit on the size of your SMP - say, 16,
32, 64. It’s a constant, and the space wasted isn’t that bad if your data
structure is small enough. Or you can ask the OS to get it for you and set
your data structure up accordingly at construction time.

But I was thinking, this whole thing is a bit academic. How about this for a
barebones queued spinlock,

class qlock
{
private:
unsigned int curr, next;
public:
qlock() { curr = 0; next = 0; }
void Acquire() { unsigned int n = xAdd(next,1); while (n != curr); } //
take your number, wait till you’re called
void Release() { xAdd(curr,1); } // one more customer happily served, next
!
};

The n=xAdd(a,1) construct is equivalent to an hw-level locked, indivisible
. This code is a straightforward application of the Bakery Algorithm,
the queue only exists in our minds, and it wraps around at 0xffffffff. You
can tweak it to serve whatever you need ! And, what’s an advantage to me, it
works even if the OS’s down; besides, it takes less code to implement than
other solutions that use the OS call. And look, ma, no handles !

As far as the produce-consumer, if you have a spinlock class, it’s a simple
matter to add a counter to it. I was just using the producer-consumer as an
example of something you may want to control without necessarily using an
on-stack alloc and release.

One point in the code I’ve written this last time was to make sure that
everything that requires object or memory management is done at init time.
Again, it keeps all the spinlock maintenance in the same place, and it
avoids allocations and deallocations while running.

As for exceptions, I believe there’s a difference between spinlocks and
other synchronization objects, in that in a spinlock a processor is looping
and if we don’t handle things well, that processor’s going to hang. It may
be a question of personal preference, but I don’t like to do anything inside
a spinlock but the bare minimum I can get away with. I like code structured
in layers, like an onion, and I like to see spinlock protected code at the
very bottom !

Alberto.

-----Original Message-----
From: Michal Vodicka [mailto:xxxxx@veridicom.cz.nospam]
Sent: Monday, September 29, 2003 2:01 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Re: Problem using queued spinlocks on single cpu comp
uter

Alberto,

> Sorry if I annoyed you, my apologies.
>
OK, I wasn’t too annoyed; just felt you don’t want to hear what me and
others try to say.

> I now understand that each processor must have its own lock structure, but
> that’s easily handled. You could try for example something like
>
> struct queuelocks
> {
> protected:
> static KLOCK_QUEUE_HANDLE ql[MaxNumberOfProcessors];
> public:
> queuelocks() { }
> };
>
> struct spinlock : public queuelocks
> {
> protected:
> KSPIN_LOCK lk;
> public:
> spinlock() : queuelocks() { }
> Acquire( ) {
> KeAcquireInStackQueuedSpinlock(&lk,&ql[KeGetCurrentProcessorNumber()]; }
> Release( ) {
> KeReleaseInStackQueuedSpinlock(&lk,&ql[KeGetCurrentProcessorNumber()]; }
> }
>
There are race conditions: a small windows between
KeGetCurrentProcessorNumber() and real spinlock acquire. If above code is
called at PASSIVE_LEVEL thread can be rescheduled on another CPU in the
meantime. As Max pointed out before, you’d have to raise IRQL to
DISPATCH_LEVEL to make it safe.

Next complaints: it depends on current implementation; from docs it seems
there should be one handle per acquire and not per CPU. What is
MaxNumberOfProcessors? If you set it to for example 64, you waste space most
times and still it is possible the code would fail in the future. You can
allocate it dynamically depending on real conditions which makes code more
complicated. Compare it with stack acquisitor object; the only necessary
thing is to decrease ESP and in most cases the instruction is already there
and only constant changes. Elegant and efficient.

It is of course possible to use similar code as above but it is always more
complicated, less efficient and error prone. And for what? OOP purity? BTW,
how do you like, for example, STL functors?

> Using the spinlock is as simple as s.Acquire() and s.Release(), all the
> rest
> of the work is done once at init time, and the code is localized in one
> place:
>
> s.Acquire();
> {
> // critical section goes here…
> }
> s.Release();
>
Exception unsafe… :wink:

> This code also gets rid of those dynamic allocations that people objected
> to. And if you don’t like that static variable, you can remove it at the
> cost of grabbing additional memory for each spinlock, or you can
> compromise
> by having an array of PKLOCK_QUEUE_HANDLEs pointing to structures created
> by
> the constructor or even allocated at DriverEntry time.
>
Yes but still on-stack object per-acquire consumes less memory.

> Another advantage is
> that you can use it in multithreaded situations, for example, a
> producer-consumer:
>
> Thread A()
> {
> e.Acquire();
> {
> Produce();
> }
> f.Release();
> }
>
> Thread B()
> {
> f.Acquire();
> {
> Consume();
> }
> e.Release();
> }
>
I believe only semaphores can used used above way. Imagine what happens if
producer is run before consumer and spinlock f which wasn’t acquired is
released. I don’t see how pure spinlock (without additional counter) could
be used for producer-consumer solution.

> The above code has another advantage: I can quickly replace Acquire( ) and
> Release( ) with machine code, or I can derive such an object, for example,
>
> struct hardlock : public spinlock
> {
> Acquire ( ) { TestAndSetAssemblerMacroGoesHere( ); }
> Release ( ) { SetSpinLockVariableToZeroByHand( ); }
> }
>
> which may be required at debug time when the OS is out to lunch. Moreover,
> I
> could even move Acquire( ) and Release( ) to the superclass, for example,
>
> struct queuelocks
> {
> protected:
> static KLOCK_QUEUE_HANDLE ql[MaxNumberOfProcessors];
> KSPIN_LOCK s;
> public:
> queuelocks() { }
> virtual Acquire() {
> KeAcquireInStackQueuedSpinlock(s,&ql[KeGetCurrentProcessorNumber()]; }
> virtual Release() {
> KeReleaseInStackQueuedSpinlock(s,&ql[KeGetCurrentProcessorNumber()]; }
> };
>
> And now I have the option to override the Acquire and Release methods with
> my own code (for example, a test, test and set), or with calls to plain
> vanilla KeAcquireSpinLock( ) and to KeReleaseSpinLock( ): I can derive
> multiple kinds of spinlocks from that one superclass. I can add common
> debugging code to the queuelocks class too.
>
I don’t see a diffence against acquirer class. You can change it any
necessary way, too, without change of code which uses it.

> There are umpteen variations on this theme. But of course, caveat emptor,
> I
> didn’t try the code in a real machine, so I don’t know if the compiler
> will
> generate good kernel-side code. The assumption of course is that you only
> need one KLOCK_QUEUE_HANDLE per processor, meaning, a processor will never
> be waiting on more than one spinlock.
>
> Maybe I reacted too strongly, my apologies, but I don’t like things such
> as
> throwing exceptions from inside a critical section, I believe one should
> release the spinlock first and then go deal with exception conditions.
>
Let’s forget about spinlock for a while and think about any synchronization
object as event, critical section, mutex and semaphore. There are two ways
how to handle error conditions with C++: classic status codes and C++
exceptions. If you decide for exceptions any function including operators
can throw an exception and you can’t assume code inside critical section
behaves differently. Throwing exception directly can seem silly but
generally you can’t avoid indirect throw from a function or operator called.
Actually, it is perfectly reasonable under normal circumstances (where
spinlock may not fall). It is necessary to be prepared for this situation
and that’s what were autolocking classes designed for. I don’t want to
discuss C++ exceptions pros and cons; the list has different purpose and I
don’t feel quite competent for it. They just enforce some rules which may
not be ignored; exception safe class design is one. Spinlocks enforce other
rules and both may not mix well. Personally, I like C++ exceptions in user
mode but don’t think they are appropriate for kernel use (no flames, please
:). Every environment has specific requirements and it is necessary to
choose tools and design code accordingly.

Best regards,

Michal Vodicka
STMicroelectronics Design and Application s.r.o.
[michal.vodicka@st.com, http:://www.st.com]


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@compuware.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it.

>Do you mean:

__asm pushfd
__asm cli
KeAcquireInStackQueuedSpinlock(&lk,&ql[KeGetCurrentProcessorNumber()];
__asm popfd

I actually meant
pushf/cli/n=KeGetCurrentProcessorNumber/popf/KeAcquireIStackQueuedSpinlock(&
lk,&ql[n]), but now I see that the window may not disappear. Yet what burns
more cycles, a quick spin or a context switch ? I don’t know if the increase
in I/O reactivity, which might be the natural consequence of allowing that
kind of preemption, pays off the additional aggravation and complexity in
managing the i/o complex. The way to speed up an interrupt system isn’t to
make it more reactive, but to lower the number of interrupts and the latency
of handling them.

Horrible, isn’t it? The problem is you have to prevent preemption around
whole acquire call because code between handle selection and actual lock
acquire has to be protected. No, only raise to DISPATCH_LEVEL is correct
solution with all negative consequences on performance (and where you store
original IRQL? :slight_smile:

I don’t know, Michal, I don’t know. What I find horrible is the fact that I
can’t trust my state even I’m doing kernel work. Is that good, safe,
efficient ? What do I gain by allowing an OS service to be preempted
arbitrarily to the point I can’t even trust my processor number ?

My point was you don’t know this size at compilation size for general
purpose driver. And even if you use the maximum possible, it can change in
the future. Yes, space waste isn’t too big but we need to compare it with
stack object solution.

I’m going to bet that a lot of other stuff is going to change before you get
to use many more than 4 processors, so, I wouldn’t worry about that kind of
thing !

Or you can ask the OS to get it for you and set
your data structure up accordingly at construction time.

Yes, I mentioned it. Again, compare complexity with stack-only solution.

The difference is at what time we do memory management. I tried to design it
in a way that all the maintenance is done at construct time, so that the
Acquire() call ends up generating inline code that does little more than
loop on the lock, in fact, it will probably generate inline code.

OK, but now we’re far from original problem. BTW, it would deadlock at
DISPATCH_LEVEL and one CPU only.

The original problem was to achieve a queued spinlock. With one CPU, you
call Acquire(), it bumps the tail, you then get into the critical section
because now head==tail, then you call Release() which bumps the head. And
you could always bypass that code for the one-cpu case. If I remember, I
staggered the head and the tail to take that possibility into account, but
again, I’m losing my ability to handle detail. As for IRQL, fine, raise the
IRQL first, but then I still believe that we’d be better off if we just
turned off interrupts in that processor during the spin. I don’t at all mind
a solution that says pushf/cli/spin/popf, it’d save a lot of aggravation.

Memory management necessary is one instruction i.e. sub esp,8. And whole
thing requires one line i.e. acquired variable declaration contrary to two
lines with your spinlock (lock, unlock).

I’m not sure I follow you here, you have to Acquire and you have to Release:
two lines.

OK, I’d agree spinlocks is special case. Exception safety was only one of
acquirer advantages mentioned and it is general design principle (or design
pattern) for critical sections locks. It always needs two class solutions;
lock class itself and acquirer-like class. For me it makes sense to design
spinlock class the same way even if exception safety advantage isn’t used.

Again: it is possible to design it the way you suggest but it is
unnecessarily complicated, error prone and has worse performance than two
class design. Look again at code Chuck or Max posted on the begin and
compare. The simplest design is usually the best one.

A strong point of using OO is that we’re now able to code in ways that are
not accessible to plain vanilla block structured languages! I don’t see why
one well-encapsulated class is any more error prone or more complicated than
anything else, and the complication if any comes from the semantics of the
call itself and of the OS, and not from the inherent problem ! I also have a
problem with implicit actions such as the combination of Release-on-destruct
and destruct-on-exit, which leads to an implicit Release-on-exit: this may
be convenient and easy to code, but I believe spinlocks are important enough
that we should explicitly release them, if nothing else in consideration
with the guys who will eventually inherit the code. I also don’t like to use
two objects here, and I find the limitation that spinlocks must be acquired
and released within the same stack frame, well, a limitation.

My approach has to do with the way I see spinlocks: a hardware level
mechanism, that should be at the very bottom of the OS layering, that should
not have to know about any other OS mechanisms, including memory management
and exception throwing. A mechanism that should be used in a tight setting,
loop on the lock, do your job, release the lock, and make sure your job is
as small as possible: a fine grain tool, not a coarse grain one !

Alberto.

The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it.

>I don’t understand. The necessity to protect against context switch is

because of preemptive multitasking. I don’t see the relation to IO
reactivity. Well, the probability you’re preempted in a bad moment changes
but it has to be handled the same way even if probability is very low. Am I
missing something?

There are two different mechanisms in action here: preemption and
interruption.

A CPU is running, an interrupt hits: the interrupt handler gets control, it
interrupts the current code, does its job, and exits. At this point the OS
has the choice of preempting or not preempting the interrupted task.
Typically, an interrupt will hit when it’s got higher interrupt priority (do
not confuse it with dispatching priority ! ) than the code that’s currently
running. In many systems, this is implemented purely at hardware level: the
equivalent to WinNT IRQLs are programmed in the interrupt controller and
only higher priority interrupts will ever reach the processor.

The easiest way to think of it is by looking at how the Protected Mode
architecture handles it. You’re running under a task, your Task Register
points to some entry in your GDT or LDT. You get an interrupt, and if you
have an interrupt gate or a task gate in the IDT, you do not get a
multitasking context switch: you’re running this new interrupt routine, but
you didn’t get “preempted”, you get “interrupted” ! However, if you have a
Task Gate in your IDT, your interrupt will run on that task and hence your
running task will be context-switched. So, your average interrupt doesn’t
preempt, what it does is, well, it interrupts. At the end of an interrupt
handling the OS may decide to preempt, but that’s a different matter.

Now, why would we want to allow a nested interrupt to happen ? We’re not
talking about preemptive multitasking here, we’re talking about nesting
interruptions. The one reason to allow an interrupt to interrupt an
interrupt is to increase reactivity of the system: that is, to try to
minimize the possibility of losing an interrupt because we missed it while
handling another interrupt handler with interrupts disabled. So, in WinNT, a
program may be running at passive or dispatch level, that is, not inside an
interrupt handler: an interrupt hits, and it’s at some higher IRQL level,
and the rules of the OS mimic the time-honored hardware implementation that
only lets higher priority interrupts to hit. When that happens, the
time-honored technique is, handle the interrupt, quiet down your peripheral,
and get back to the interrupted code as fast as you can !

At this point, it may be someone’s idea of improvement to allow the current
task to be preempted, but I don’t think that’s necessarily a good idea.
Because, as we’ve been discussing, the CPU may be in the middle of a
hardlock loop, and the result of preemption may be deadlock. So, no, the
point of allowing nested interrupts isn’t preemptive multitasking, it’s
rather to increase the reactivity of the system in terms of a potentially
faster interrupt handling. So, if I’m inside a spinlock loop, it’s ok to be
interrupted - I don’t call it “preemption” to make a difference between this
and the time-slice-driven mechanism - but whoever interrupts must make sure
deadlocks cannot occur. Now, if we’re allowing interrupt routines to wait on
spinlocks, deadlocks are always a very present danger, so, there’s only one
solution: interrupts must be either partially or totally disabled while an
interrupt handler or a spinlock is going on. If we have good hardware
support, this is all done in hardware and it’s transparent to the
programmer, but if not, it’s got to be done in software. Total interrupt
disabling is done via manipulating the flags, while partial interrupt
disabling is done by raising the IRQL.

If we don’t want to try to get the I/O to be more reactive, why IRQLs ?
Might as well run interrupt handlers with interrupts disabled and we’d be
better off. It may be the case that a timer tick may have to break a
spinlock loop, but that’s going to be a timeout and an error condition ! I
do not see what’s the possible benefit of interrupting a CPU in the middle
of a spinlock loop and getting it to handle another task that can reach for
the same spinlock and somehow prevent the original holder from releasing it,
the only possible reason I see is I/O reactivity.

Normally, you shouldn’t need to know processor number. Code running at
PASSIVE_LEVEL is preemptive and I see it as an advantage (kernel threads).
What is an alternative? Cooperative multitasking in kernel? You can always
raise IRQL to DISPATCH_LEVEL to avoid preemption.

But the issue is not preemption, the issue’s interruption. Apart from the
potential undesirability of letting a time slice context switch to take
place inside a spinlock loop - that’s about preemption allright - we also
have the potential undesirability of allowing interrupts within a spinlocked
critical section. The “raise the IRQL” trick sits on the assumption that
well, if my interrupt routine is running at IRQL level umpteen and
busywaits, that no other routine at a higher IRQL will busywait on the same
piece of memory, or a deadlock may occur. So, I don’t know if I’m at passive
level or at dispatch level, but in either case a spinlock should be
protected from preemption, although not necessarily from interruption.

Disable interrupts just because some code waits for a lock? It affects
whole
system because of local conditions. Do you really think it is a good idea?

Disabling interrupts only affects that one CPU anyway, and spinlocks are
supposed to be very fine grained ! Note well, raising the IRQL is a form of
disabling interrupts, it’s a partial interrupt disable. The difference
between raising the IRQL and issuing a CLI here is basically how much I/O
interruption we’re willing to allow that one CPU to be subjected to, and
again, the issue becomes one of reactivity: what do I really gain by
allowing that to happen ? If we were running on a marvellously fast bus with
an immense i/o bandwidth, well, maybe, but we’re running on a 133Mhz bus.
Take a 3Ghz machine, and that one CPU can run over 20 instructions during
the time it takes for the PCI bus to perform one interrupt cycle. I’d be
curious to figure out how many instructions a 3Ghz Xeon CPU can execute
during the average PCI bus interrupt latency, that’s going to be a measure
of how many interrupts we’d possibly lose by allowing a 50-instruction
critical section, say, to run on a CPU with interrupts disabled. So, I’m not
sure I’m worried about affecting the whole system, unless of course one
abuses of the concept of a spinlocked critical section.

It doesn’t mean it must be used in cases where it causes unnecessary
complications. The goal of OOP is to simplify programming and make code
more
readable and not to make developers headache because of purity.

OO gives us new programming paradigms. I shouldn’t be programming C++ in C
style any more than I should be programming C in assembler style ! I daresay
that C++ programming should simplify programming to OO programmers, and it
should make code more readable to OO programmers !

Yes, it is this case. OS enforces some rules and class design have to
fulfil
them even if you don’t like it.

I’m not complaining about the rules, I’m complaining about the loose
semantics. For example, the hw architecture doesn’t allow an interrupt
between a move to esp and a move to ss. Maybe the OS should not allow a task
to be rescheduled to another processor between the time one reads the
processor number and the time one uses it ?

This depends on personal preferences. Automatic locks are widely used and
shouldn’t confuse experienced C++ developers. On the contrary, the
necessity
to manually unlock can be confusing to somebody who routinely uses
autolocks.

You know, we used to have automatic typing in Fortran, and that was turned
down in favor of explicit typing. And it was done for a reason !
Synchronization code, likewise, will benefit from explicitly acquiring and
freeing those synchronization constructs. But you may be right, it could be
a question of personal preference.

I’d agree. With these requirements it makes sense to use stack frame,
doesn’t it?

Not necessarily: these things don’t need to be in the same thread.

Hmm, guess why queued spinlock acquire/release routines have
…InStack…in the name. Doesn’t it imply usage?

It certainly does, but again, I find it restrictive. And here again, even
within the same stack, I still find it a good idea to explicitly free it !

Alberto.

The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it.

>Ah, I see. You don’t like the way OS is implemented now. Of course, there

are disadvantages against what you offer but also advantages. To simplify
things we can see it as three levels, passive which allows preemption,
dispatch which allows interruption and interrupt level. If I understand
correctly, you claim two levels should be enough. I’d see it as return to
windos world.

It’s rather a rational use of the hardware architectural concepts. A task is
something that runs on a TSS. The current TSS in a processor is pointed by
its TR. The decision to let a task be premptable or not is what makes the
essential difference between passive and dispatch level, but the code is
still running under a TSS. Now, when you interrupt, you can run under the
current TSS - with an interrupt or a trap gate - or you can run under your
own TSS, which would be a task gate.

I make a sharp difference between scheduling and interrupting, they’re two
different mechanisms. In the majority of the cases an interrupt routine runs
under the same TSS as the currently running task, so - and this is very
important - interrupts don’t ordinarily context-switch. You get control in
your interrupt routine, you save the registers, you do whatever it takes,
and you get out, but you normally won’t switch tasks.

Makes sense. However, when NT were designed, 486 (or 386?) was the fastest
PC available and situation was different. Even if there are very good
reasons for removing IRQLs I believe nobody could dare it now.

IRQLs make sense, but there’s a few safeguards that I believe should be
enforced. The first is, let’s not allow preemption during interrupt
servicing, or, if we do, go all the way and do it through a task gate in the
IDT. Also, I believe that IRQLs should be enforced by the hardware, so that
if you’re a clock level, for example, no device or lower interrupt hits your
processor because the interrupt controller won’t let it through.

And that’s a problem because then OS stability depends of average driver
quality. Several years before I used NT4 on a PC with SCSI adapter from one
big and well known company. There was no apparent problems with adapter
driver (if I ignore one BSOD per month during boot) running on free build.
Things were different with checked build. Usually, assertion failure was
raised during this driver initialization; something as
SpinLockSpinningTooLong and if I remember correctly, it meant longer spin
than 0.5 sec. Next example is NDIS and serialized miniports. Drivers
handlers are called with spinlock held and processing involves at least
several function calls, can take whole packet copy and maybe encryption
(original serialized IM drivers). It would be real problem if interruption
isn’t allowed.

That’s why I believe that I/O handling should be done in Ring 1 and not in
Ring 0. This would allow Ring 0 code to run with interrupts disabled, and
there’s no real reason why I/O code should be considered trusted code par
with the OS itself. Now, if a spinlock spins too long, that means that the
system is either badly designed or overloaded, or both. But today machines
are very fast, so, overloading becomes less and less of an issue - although
if the next generation of I/O standards gets to be as fast as it promises,
we’re going to have to revisit this whole area.

Alberto.

The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it.