Inventing Spinlocks [was Synchronizing ...]

I’m very reluctant to get involved in this, but I feel like I can’t stop
myself. I’ll probably regret it later…

First, very few spinlock-related performance problems are frontside bus
traffic issues. You get to the frontside bus bottleneck only after
you’ve already refactored your algorithms such that they are nearly
optimal in all other respects. It’s far more likely that he’s holding a
lock when he doesn’t need to be, and that’s his performance problem. No
amount of lock algorithm manipulation will fix that.

Second, you give a solution that is intended to drop the amount of
frontside bus traffic when using a spinlock. And while your
implementation may be better than KeAcquireSpinLock, it’s far, far worse
than KeAcquireInStackQueuedSpinLock, which is another OS function that
is already available.

Jake Oshins
Windows Base Kernel Team

This posting is provided “AS IS” with no warranties, and confers no
rights.

-----Original Message-----
Subject: Re: Synchronizing access to a linked list
From: “Moreira, Alberto”
Date: Mon, 22 Dec 2003 12:10:17 -0500
X-Message-Number: 18

Gary,

I talk for myself, of course.

And I have no problem shoving the OS out of my way when I feel compelled
to
do so.

As far as that spinlock thing went, I had a reason, let me try to
explain.
The guy said he had a performance problem with his implementation. My
first
reaction was, well, if you have high volume of traffic on one spinlock,
you
may be seeing contention at front-side bus level. Now, I suspected that
the
implementation of KeAcquireSpinlock( ) did a plain vanilla bts-based
test
and set, and although I could not say for sure, it might be the case
that a
lot of traffic on a single bts might generate a lot of bus traffic and
hence
slow his SMP implementation. It is a simple test to replace a
test-and-set
with a better spinlock, which is a test-and-test-and-set: it entails
adding
a while loop in front of the bts loop. That takes what, replacing two
lines
of code in his critical section with two while loops and one release.
That
can be done and tested in less than five minutes, if it increases the
performance he would know one of his bottleneck, of not, just delete it
and
go to the next stage.

However, you see, it is not quite possible to convert the Windows
implementation of KeAcquireSpinlock( ) from test-and-set to
test-and-test-and-set, unless one goes down one notch and operates
within
the OS. If one would disassemble the code, one might find out that all
that’s needed is to insert the one while loop in front of the call to
KeAcquireSpinlock( ), but that would require disassembly, etc., which
takes
time and energy: when all that’s needed is a three-line-of-code change
and
test.

So, what do you suggest, that we don’t try it ? Neglect five minutes of
work, and for what, to toe the party line ? I don’t think you’re going
to
convince me to suggest that kind of thing.

Alberto.