The lock prefix “atomicizes” the instruction by locking the front-side bus
between the read and the write accesses. Lock prefixes do not operate at
processor level or at memory level, they operate on the front-side bus.
Instructions like inc and dec rely on a read-modify-write sequence, which
involves two bus cycles: a read and a write. If you don’t use the lock
prefix, you can get one or more bus cycles between the read and the write,
with the potential for a race condition. Again, I recommend Mindshare’s
book, even though it’s written for hw dudes, it’s a great resource!
Actually, as an aside, I heard that there are buses out there that enable a
processor to ship a modified cache line directly to a read requestor: no
need to even access memory. But I don’t have specific data or reliable
information to share.
Alberto.
----- Original Message -----
From:
To: “Windows System Software Devs Interest List”
Sent: Thursday, October 25, 2007 4:14 PM
Subject: RE:[ntdev] interlocked instructions perform better with more
processors ?
>> The lock prefix per-se doesn’t do much
>> performance damage, unless your locked operation is repeated over and
>> again,
>
> This is why I said that my proposed improvement seems to be marginal (in
> fact, in most cases it is).
> However, again, imagine a spinlock that protects a dispatcher database -
> this spinlock gets accessed upon any state modification of any dispatcher
> object, i.e. upon every context switch, every wait, every modification of
> event state and so on. On a busy system with, say, 8 CPUs it is going to
> be almost always held by someone at any particular moment, and, while it
> is being help, few attempts to acquire it are more than likely to be made
> by other CPUs. More on it below
>
>
>> and even then, read cycles are less expensive than memory writes,because
>> memory writes will cause other processors to invalidate their own copies
>> of
>> the written cache line, which will lead to additional traffic on the bus
>> when those processors try to read the information again.
>
>
> I am afraid you confuse bus locking with atomicity. There is no such thing
> as interlocked read or interlocked write - LOCK prefix can be used only
> with some certain intruction like INC, DEC, CMPXCGH, BST, etc, and all
> these instructions result in modification of the target variable’s state
> (in fact, the very concept of interlocked operation that does not modify
> state of the target variable just does not make sense it itself).
> Therefore, every interlocked operation that is made by CPU X on some
> variable is going to invalidate cache lines of all CPUs that attempt to
> read the state of this variable. This is the only reason why I said that
> if spinlock is always in high demand, my proposed modification may have
> some overall performance benefits, although in the vast majority of cases
> this improvement is just negligible…
>
>> As far as disassembling, my advise is, don’t. I find it way better to
>> rely on one’s own wits.
> …
>
>> Actually, I trust my own hand better than anyone else’s, even if that
>> person
>> has been one of the writers of Windows code.
>
>
> You should realize that every piece of ntoskrnl.exe’s and HAL.DLL’s code
> is based upon the collective research that was made by numerous people,
> and not only at MSFT. If you disassemble this code, you are going to
> understand not only what it does, but also why it does things this
> way. Therefore, I believe disassembling Windows kernel is just a
> wonderfull educational experience
> that helps you better understand the underlying concepts. At the same time
> I see what you mean -
> indeed, you should not take it as something absolute…
>
> Anton Bassov
>
>
>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer