atomic reads/writes on 32/64 bit processor

OSR_Community_User · March 7, 2008, 8:21am

Hello all,

I have an enum which is read by multiple threads and written from time to time by a single writer. The field is a 32bit aligned value. Is it safe to access it without a spinlock? I mean, is it possible that a reader will receive a corrupted value (invalid enum type)?

If yes, will this code work on 64bit processors? What if the enum is stored in a char or short?

Thank you.

OSR_Community_User · March 7, 2008, 8:28am

xxxxx@yahoo.com wrote:

Hello all,

I have an enum which is read by multiple threads and written from time to time by a single writer. The field is a 32bit aligned value. Is it safe to access it without a spinlock? I mean, is it possible that a reader will receive a corrupted value (invalid enum type)?

I assume, that a corrupted value would mean to read some bytes/bits from
a previous 32 bit value and some other bytes/bits from the
currently-being-written 32 bit value.

If the 32 bit value is 4 byte boundary aligned, then you can have
multiple readers and a single writer using interlocked operations, that
will surely do it.

Sandor LUKACS

If yes, will this code work on 64bit processors? What if the enum is stored in a char or short?

OSR_Community_User · March 7, 2008, 8:44am

Hi,

My question is about simple reads/writes, not about interlocked operations.

anton_bassov · March 7, 2008, 3:40pm

As long as a variable is properly aligned, any read/write to it is atomic. On earlier processors
it had to be naturally aligned (i.e. on byte-word-dword boundary, depending on operand size) in order to ensure atomicity, but on more recent ones the only requirement is that it does not cross the cache line. Therefore, if you have only one writer and the target variable is properly aligned, you don’t really need any synchronization constructs. Just make sure that you have declared your target variable with 'volatile" modifier - otherwise, compiler may play its tricks on you…

Anton Bassov

Daniel_Terhell · March 7, 2008, 5:02pm

On multiprocessor systems this brings about race conditions because of the
way the cache is organized. CPUs are fetching values from memory before they
are needed because of speculative execution. To prevent race conditions you
will need to protect also read operations with appropriate memory barrier
instructions which means a lock or an interlocked instruction.

/Daniel

wrote in message news:xxxxx@ntdev…
> Hello all,
>
> I have an enum which is read by multiple threads and written from time to
> time by a single writer. The field is a 32bit aligned value. Is it safe to
> access it without a spinlock? I mean, is it possible that a reader will
> receive a corrupted value (invalid enum type)?
>
> If yes, will this code work on 64bit processors? What if the enum is
> stored in a char or short?
>
> Thank you.
>

anton_bassov · March 7, 2008, 6:30pm

> To prevent race conditions you will need to protect also read operations

with appropriate memory barrier instructions which means a lock or
an interlocked instruction.

Sorry, but this is nonsense…

There is no such thing as “interlocked read” or “interlocked write” - LOCK prefix can be used only with some certain instructions that imply reading and writing to the target memory location in one go (INC, DEC, XADD, etc), and MOV is not among them. Certainly, one could implement “interlocked read” as an exchange of variable with itself, but this is totally pointless exercise.
If you need a memory barrier, there are more lightweight options available (i.e. SFENCE), so that you don’t need to achieve it with interlocked operation (which, indeed, acts as an implicit memory barrier). In any case, you don’t really have to care about the memory barrier in this context. FYI, even properly implemented spinlock does not care about it in its inner loop. Its inner loop is a simple read, and the release is a simple write - as long as spinlock is properly implemented, set-and-test is the only locked operation that it makes. What you have to care about in this context is atomicity (which is achieved by proper alignment), plus about making sure that compiler does not play tricks on you by “optimizations” that it may apply…

Anton Bassov

Daniel_Terhell · March 7, 2008, 10:39pm

Nothing nonsense about what I wrote, you can read everything here
http://msdn2.microsoft.com/en-us/library/ms686355.aspx

Bottom line remains he needs to protect his read operations as well to
prevent race conditions because of memory ordering issues. Using the
volatile keyword as you say is one of the possible solutions but as far as I
know there is nowhere written exactly how the WDK compiler will actually
treat it. So if you want to make “sure the compiler does not play tricks”
better take care of the locking yourself.

/Daniel

wrote in message news:xxxxx@ntdev…
> Sorry, but this is nonsense…
>
> There is no such thing as “interlocked read” or “interlocked write” - LOCK
> prefix can be used only with some certain instructions that imply reading
> and writing to the target memory location in one go (INC, DEC, XADD, etc),
> and MOV is not among them. Certainly, one could implement “interlocked
> read” as an exchange of variable with itself, but this is totally
> pointless exercise.
> If you need a memory barrier, there are more lightweight options available
> (i.e. SFENCE), so that you don’t need to achieve it with interlocked
> operation (which, indeed, acts as an implicit memory barrier). In any
> case, you don’t really have to care about the memory barrier in this
> context. FYI, even properly implemented spinlock does not care about it in
> its inner loop. Its inner loop is a simple read, and the release is a
> simple write - as long as spinlock is properly implemented, set-and-test
> is the only locked operation that it makes. What you have to care about in
> this context is atomicity (which is achieved by proper alignment), plus
> about making sure that compiler does not play tricks on you by
> “optimizations” that it may apply…

>
> Anton Bassov
>

anton_bassov · March 8, 2008, 12:41am

> Nothing nonsense about what I wrote, you can read everything

here http://msdn2.microsoft.com/en-us/library/ms686355.aspx

The link that you have provided speaks about memory ordering issues that don’t really apply here (more on it below). They arise when you want to ensure that CPU X sees variables updated in exactly the same order that CPU Y actually updates them. Consider the following scenario. CPU X executes the following lines:

a=b;
b=c;

Due to speculative reads and writes there is no guarantee that CPU Y will see a updated before b. Therefore, if you want to make sure CPU Y always sees these lines in the right order, you have to insert SFENCE instruction in between these two lines so that the first update will get commited to memory befor the second one takes place (I assume x86 and x86_64 here - IA64, i.e. Itanium, offers instructions that provide acquire-only and release-only semantics as well). This is what that article that you have provided a link to speaks about. However, we speak about something different here…

Bottom line remains he needs to protect his read operations as well to prevent
race conditions because of memory ordering issues.

As I already told you, *in this context* he does not have to care about memory-ordering issues - he has a single writer, single variable and multiple readers here. His situation is similar to the one when spinlock is being held by CPU A while CPUs B,C and D try to acquire it, spinning in an inner loop until spinlock gets released so that they can attempt set-and-test. This kind of thing is implemented by simple MOV instruction, without either SFENCE or LOCK prefix

Using the volatile keyword as you say is one of the possible solutions

‘Volatile’ keyword applies to optimizations that are made by *compiler* , while SFENCE applies to the ones made by CPU. These are different things (although they could get combined - for example, compiler could generate SFENCE prior to MOV if the target variable is declared with ‘volatile’ modifier). However, apparently, it does not do it - otherwise, there would be no need for KeMemoryBarrier() macro. You need ‘volatile’ modifier here not because of memory reordering that may be made by CPU but because of the one that may be made by compiler…

but as far as I know there is nowhere written exactly how the WDK compiler will
actually treat it.

AFAIK, it is a part of a language specification - it has to ensure that it actually reads the memory location that is declared as ‘volatile’ every time the code accesses it (otherwise, it could save its contents in a register in advance)…

Anton Bassov

OSR_Community_User · March 9, 2008, 12:16pm

If you listen to the purists, they say that if you care about portability, volatile is almost useless for multi-threading purposes. volatile is designed to address issues arising in single-threaded situations, such a memory mapped i/o, not multi-threading.

volatile is not enough to avoid all compiler optimizations tricks. One thing you might not know (I didn’t), is that volatile will not prevent *compiler* reordering (I’m not talking about hardware reordering, but about compiler reordering). The compiler won’t reorder between mutilple “volatile” accesses, but it might (and sometimes does) reorder between volatile and non-volatile accesses. This alone could break some multi-threading algoryhtms.

This might be just in theory as far as windows kernel development, because MS compilers seem to be over-conservative about volatile. I understand that later releases actually infer a memory barrier, but this is not portable. And this might mean that abusing volatile could be both non-portable and possibly overkill.

OSR_Community_User · March 9, 2008, 12:34pm

You’re using an MS compiler to write a Windows kernel mode driver;
portability concerns aren’t very high on the list of requirements. If
they are, ‘volatile’ is the least of your problems, so unless using
‘volatile’ causes problems in and of itself, overkill beats underkill in
this case, I think.

mm

xxxxx@rahul.net wrote:

If you listen to the purists, they say that if you care about portability, volatile is almost useless for multi-threading purposes. volatile is designed to address issues arising in single-threaded situations, such a memory mapped i/o, not multi-threading.

volatile is not enough to avoid all compiler optimizations tricks. One thing you might not know (I didn’t), is that volatile will not prevent *compiler* reordering (I’m not talking about hardware reordering, but about compiler reordering). The compiler won’t reorder between mutilple “volatile” accesses, but it might (and sometimes does) reorder between volatile and non-volatile accesses. This alone could break some multi-threading algoryhtms.

This might be just in theory as far as windows kernel development, because MS compilers seem to be over-conservative about volatile. I understand that later releases actually infer a memory barrier, but this is not portable. And this might mean that abusing volatile could be both non-portable and possibly overkill.

OSR_Community_User · March 9, 2008, 1:48pm

Jon Morrison from the Windows Reliability team put up a pretty interesting post on his new blog that relates to this about memory operation reordering and what barriers do for you.

http://blogs.msdn.com/itgoestoeleven/

-p

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@hotmail.com
Sent: Friday, March 07, 2008 9:42 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] atomic reads/writes on 32/64 bit processor

Nothing nonsense about what I wrote, you can read everything
here http://msdn2.microsoft.com/en-us/library/ms686355.aspx

The link that you have provided speaks about memory ordering issues that don’t really apply here (more on it below). They arise when you want to ensure that CPU X sees variables updated in exactly the same order that CPU Y actually updates them. Consider the following scenario. CPU X executes the following lines:

a=b;
b=c;

Due to speculative reads and writes there is no guarantee that CPU Y will see a updated before b. Therefore, if you want to make sure CPU Y always sees these lines in the right order, you have to insert SFENCE instruction in between these two lines so that the first update will get commited to memory befor the second one takes place (I assume x86 and x86_64 here - IA64, i.e. Itanium, offers instructions that provide acquire-only and release-only semantics as well). This is what that article that you have provided a link to speaks about. However, we speak about something different here…

Bottom line remains he needs to protect his read operations as well to prevent
race conditions because of memory ordering issues.

As I already told you, *in this context* he does not have to care about memory-ordering issues - he has a single writer, single variable and multiple readers here. His situation is similar to the one when spinlock is being held by CPU A while CPUs B,C and D try to acquire it, spinning in an inner loop until spinlock gets released so that they can attempt set-and-test. This kind of thing is implemented by simple MOV instruction, without either SFENCE or LOCK prefix

Using the volatile keyword as you say is one of the possible solutions

‘Volatile’ keyword applies to optimizations that are made by *compiler* , while SFENCE applies to the ones made by CPU. These are different things (although they could get combined - for example, compiler could generate SFENCE prior to MOV if the target variable is declared with ‘volatile’ modifier). However, apparently, it does not do it - otherwise, there would be no need for KeMemoryBarrier() macro. You need ‘volatile’ modifier here not because of memory reordering that may be made by CPU but because of the one that may be made by compiler…

but as far as I know there is nowhere written exactly how the WDK compiler will
actually treat it.

AFAIK, it is a part of a language specification - it has to ensure that it actually reads the memory location that is declared as ‘volatile’ every time the code accesses it (otherwise, it could save its contents in a register in advance)…

Anton Bassov

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · March 9, 2008, 2:21pm

MM wrote:

You’re using an MS compiler to write a Windows kernel mode driver;
portability concerns aren’t very high on the list of requirements. If
they are, ‘volatile’ is the least of your problems,…

I agree, that’s why I said that regarding Windows kernel development, it is mostly just in theory. But even here it might be relevant because MS changed the semantics of “volatile” since VS 2005. Earlier versions *didn’t* infer a memory barrier. This means that if you have older binaries using volatile for these purposes, it might be unsafe.

so unless using ‘volatile’ causes problems in and of itself, overkill beats underkill
in this case, I think.

Agree again. But sometimes the whole point of using “volatile” is to avoid overkill by not using memory barriers or some lock primitives. So my point is that, sometimes, using an explicit memory barrier or a lock primitive in a single place, could be safer and more efficient that using volatile all over the place.

OSR_Community_User · March 9, 2008, 2:25pm

Jorge:

I hadn’t considered what you said about using “volatile” to avoid
overkill. You make an excellent point.

Cheers,

mm
xxxxx@rahul.net wrote:

MM wrote:

> You’re using an MS compiler to write a Windows kernel mode driver;
> portability concerns aren’t very high on the list of requirements. If
> they are, ‘volatile’ is the least of your problems,…

I agree, that’s why I said that regarding Windows kernel development, it is mostly just in theory. But even here it might be relevant because MS changed the semantics of “volatile” since VS 2005. Earlier versions *didn’t* infer a memory barrier. This means that if you have older binaries using volatile for these purposes, it might be unsafe.

> so unless using ‘volatile’ causes problems in and of itself, overkill beats underkill
> in this case, I think.

Agree again. But sometimes the whole point of using “volatile” is to avoid overkill by not using memory barriers or some lock primitives. So my point is that, sometimes, using an explicit memory barrier or a lock primitive in a single place, could be safer and more efficient that using volatile all over the place.