A barebones spinlock uses a bus-locking instruction to do one single
read-modify-write access to memory, and loops on the result of that
instruction. Adding a locked acquisition and an extra variable to the loop
makes it no longer barebones. Examples of a barebones spinlock:
while(CompareAndReplace(memory, true));
while(Bit-test-and-set(bit, true));
Each requires two machine instructions: a test and a jump. The corresponding
lock release requires one instruction, and no lock. The test is
self-synchronized by the hardware, the jump’s freewheeling, and there’s no
requirement of atomicity that encompasses both test and jump. The lock
prefix, if required, is part of the test instruction itself. I suspect that
i386 designers didn’t automatically lock the bus in instructions such as
cmpxchg or xadd to avoid the ensuing bus overhead when such instructions are
used in contexts that aren’t sensitive to SMP interference, and to not incur
in the consequent decrease in instruction latency and hence performance.
Note that the test locks the bus for the duration of the read-modify-write,
but between the test and the next test the bus is available. Granted, the
processor is so much faster than the bus that the performance hit on the bus
throughput is real, but then, you can easily do something like
while (CompareAndReplace(memory, true)) delay(n);
where delay can be some cacheable memory read loop, which frees the bus for
other people to use. We’re already stalling the processor anyway, so, the
issue is, is it more profitable to stall the processor or to stall the bus ?
This is, incidentally, one of the things that the author of that paper
suggested, together with exponential backoff: the “n” in the delay(n) is
randomized to some statistical distribution.
The alternative is a test-and-test-and-set loop, which is slightly more
complex:
while (memory); while (CompareAndReplace(memory,true) { while (memory); }
Which replaces the read-modify-write with a read, and that should decrease
the bus overhead by alleviating the MESI protocol overhead. This can be
implemented in machine code, for example,
top:
test memory, 1
jnz top
lock bts memory, 1
jnz top
Read Gregory Andrews’s excellent OS book if you want more details. And
before anyone pooh-poohs any of these ideas, I say, go and measure it.
Publish a paper. Be peer-reviewed. Then we’ll talk!
Alberto.
----- Original Message -----
From:
To: “Windows System Software Devs Interest List”
Sent: Saturday, December 29, 2007 2:06 PM
Subject: RE:[ntdev] Are callbacks guaranteed to run consecutively?
>> You can use xchg to implement a spinlock, even though it doesn’t set the
>> flags.
>
> Sure - after all, this instruction locks the bus (you don’t have to even
> prefix it with LOCK - if will assert #LOCK signal anyway). However, the
> whole thing is, again, based upon bus-level locking at the hardware level,
> and, according to you, everything that relies upon bus-level locking is
> already not a “bare-bone spinlock”. Therefore, the question is still
> open - what is a “bare-bone spinlock” in your understanding???
>
> Anton Bassov
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer