memory access consistency across SMP nodes

Hi,

I need to access a particular DWORD (includes several bitfields,
declared as volatile) very frequently for read access and sometimes for
write. I would like to use a spinlock / pushlock to synchronize the
write accesses, but allow unsynchronized read access (at any time, even
when some other thread has a write lock on the variable). My question
is, that does the x86 / x64 architecture guarantee for me, that as soon
as one thread (from any CPU / core) writes to that DWORD, immediately,
from the next instruction any other threads that might try to read that
DWORD from any other CPU / core will see the updated value?

thank you very much,

Sandor LUKACS

Writing into the DWORD is done using an atomic instruction?

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Sandor LUKACS
Sent: Tuesday, January 29, 2008 2:33 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] memory access consistency across SMP nodes

Hi,

I need to access a particular DWORD (includes several bitfields,
declared as volatile) very frequently for read access and sometimes for
write. I would like to use a spinlock / pushlock to synchronize the
write accesses, but allow unsynchronized read access (at any time, even
when some other thread has a write lock on the variable). My question
is, that does the x86 / x64 architecture guarantee for me, that as soon
as one thread (from any CPU / core) writes to that DWORD, immediately,
from the next instruction any other threads that might try to read that
DWORD from any other CPU / core will see the updated value?

thank you very much,

Sandor LUKACS


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

#########################################
THIS EMAIL MESSAGE IS FOR THE SOLE USE OF THE INTENDED RECIPIENT(S) AND MAY CONTAIN CONFIDENTIAL AND PRIVILEGED INFORMATION. ANY UNAUTHORIZED REVIEW, USE, DISCLOSURE OR DISTRIBUTION IS PROHIBITED.BEFORE OPENING ANY ATTACHMENTS PLEASE CHECK FOR VIRUSES AND DEFECTS.IF YOU ARE NOT THE INTENDED RECIPIENT, PLEASE NOTIFY US IMMEDIATELY BY REPLY E-MAIL AND DELETE THE ORIGINAL MESSAGE.
#########################################

Ananthu Rao wrote:

Writing into the DWORD is done using an atomic instruction?

Not necessarily. I mean, I would like to use normal C code to change
those values. However, in this case I’m afraid that the answer to my
initial question is NO (meaning, no consistency is guaranteed instantly

  • correct me if I’m wrong). On the other hand, if I would use
    Interlocked() operations, then the answer might be YES (consistency is
    guaranteed across SMP nodes because of LOCK). However, the point why I
    would like to avoid using Interlocked() ops for write is that they can
    be costly, and I already own a lock (spinlock/pushlock) for the write
    operation (and, the reason for why I can’t get rid completely of the
    spinlock and use one single atomic interlocked() is, that in several
    scenarios I need to set an entire bit field, several bits, to a given
    value/pattern; for this, I need AFAIK two operations at least, an AND -
    to clear the mask and an OR - to set the mask).

Could you clarify if the consistency is assured in the two above cases
(with Interlocked() and without).

thank you,


Sandor LUKACS
Analyst Programmer, R&D
BitDefender

E-mail: xxxxx@bitdefender.com
Phone : +40 264 443 008

www.bitdefender.com

If the cache and memory subsystem is working properly on a multiprocessor
system, a write to a word completed in clock X should be seen by all other
processors in clock X+1. If only one processor is writing and all others
are reading you can quite efficiently use normal read and write actions.

HOWEVER, beware of how C code gets compiled! If the variables aren’t
declared volatile, the actual read many have been done dozens of
instructions earlier and the variable stored in a register or other temp
field. So the write may hapen immediately before some other code and that
code not see the change, because in reality its read had been done long
before the change.

If multiple processors are writing to the word, AND they are constructing
the value to write ONLY from local state, they can still do normal writes.

But if you need to do a read/modify/write you MUST use some sort of locking
protocol to insure that you don’t get the case of proc A reading the
variable, proc B writing it, and then proc A writing its updated value over
the value written by B. An interlocked operator will lock the bus and cause
appropriate cache and memory transactions to happen to insure that a read
followed by a write can be done atomically with reference to other
processors.

Since you seem to have an interlock of your own on writes, you should be
able to do simple writes with no problem, as long as everyone acquires the
interlock before doing the write.

Loren

----- Original Message -----
From: “Sandor LUKACS”
To: “Windows System Software Devs Interest List”
Sent: Tuesday, January 29, 2008 2:58 AM
Subject: Re: [ntdev] memory access consistency across SMP nodes

> Ananthu Rao wrote:
>> Writing into the DWORD is done using an atomic instruction?
>>
> Not necessarily. I mean, I would like to use normal C code to change those
> values. However, in this case I’m afraid that the answer to my initial
> question is NO (meaning, no consistency is guaranteed instantly - correct
> me if I’m wrong). On the other hand, if I would use Interlocked()
> operations, then the answer might be YES (consistency is guaranteed across
> SMP nodes because of LOCK). However, the point why I would like to avoid
> using Interlocked() ops for write is that they can be costly, and I
> already own a lock (spinlock/pushlock) for the write operation (and, the
> reason for why I can’t get rid completely of the spinlock and use one
> single atomic interlocked() is, that in several scenarios I need to set an
> entire bit field, several bits, to a given value/pattern; for this, I need
> AFAIK two operations at least, an AND - to clear the mask and an OR - to
> set the mask).
>
> Could you clarify if the consistency is assured in the two above cases
> (with Interlocked() and without).
>
> thank you,
>
> –
> Sandor LUKACS
> Analyst Programmer, R&D
> BitDefender
> --------------------------------
> E-mail: xxxxx@bitdefender.com
> Phone : +40 264 443 008
> --------------------------------
> www.bitdefender.com
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer

Loren Wilton wrote:

If the cache and memory subsystem is working properly on a
multiprocessor system, a write to a word completed in clock X should
be seen by all other processors in clock X+1. If only one processor
is writing and all others are reading you can quite efficiently use
normal read and write actions.

HOWEVER, beware of how C code gets compiled! If the variables aren’t
declared volatile, the actual read many have been done dozens of
instructions earlier and the variable stored in a register or other
temp field. So the write may hapen immediately before some other code
and that code not see the change, because in reality its read had been
done long before the change.

If multiple processors are writing to the word, AND they are
constructing the value to write ONLY from local state, they can still
do normal writes.

But if you need to do a read/modify/write you MUST use some sort of
locking protocol to insure that you don’t get the case of proc A
reading the variable, proc B writing it, and then proc A writing its
updated value over the value written by B. An interlocked operator
will lock the bus and cause appropriate cache and memory transactions
to happen to insure that a read followed by a write can be done
atomically with reference to other processors.

Since you seem to have an interlock of your own on writes, you should
be able to do simple writes with no problem, as long as everyone
acquires the interlock before doing the write.

Loren
Yes, I’m aware that I need to use volatile, and that I need to do any
changes to that dword in cycles of acquire-write-lock /
read-current-value / construct-new-value-if-needed-in-several-steps /
write-back-new-value-as-a-single-write / release-write-lock (to avoid
the unwanted case, when a reader could read an inconsistent state of the
dword, if the new value must be constructed in several steps).

thank you very-very much, have a nice day,

Sandor LUKACS

> Yes, I’m aware that I need to use volatile, and that I need to do any

changes to that dword in cycles of acquire-write-lock / read-current-value
/ construct-new-value-if-needed-in-several-steps /
write-back-new-value-as-a-single-write / release-write-lock (to avoid the
unwanted case, when a reader could read an inconsistent state of the
dword, if the new value must be constructed in several steps).

Actually you can do the write update without needing the interlock if you
use CmpXchg, which has some funciton prototype that exposes it; I forget the
name. Probably InterlockedCompareExchange or the like.

Trick: Read the current value, modify it to the new value. Attempt to
cmpare-exchange, supplying both the old and new values. If the value in
memory still matches the old value, the replacement will occur.

If the value in memory no longer matches the old value, the overwrite will
not take place; instead the old value you supplied will be replaced with the
value currently in memory, and you get a different boolean result from the
function.

In this case you redo the insert with the new memory value and reattempt the
compare exchange. You keep doing that until it succeeds or you get really
bored.

If you don’t have huge update rates this can be more efficient than
acquiring a lock. If your update rates or high or updates from multiple
processors tend to group this can be less efficient, since you will ned up
buzzing rather than winning on the first or second try.

Loren