Compiler Reordering and Thread safety

After reading this

https://stackoverflow.com/questions/14624776/can-a-bool-read-write-operation-be-not-atomic-on-x86

In particular someone mentioned:

compiler optimization: the compiler shuffles the order of reads and writes under the assumption that the values are not accessed from another thread, resulting in chaos

Let say I have a function that write to a non-page pool allocated by me.

void func1() { value=0; //both variables are in non-paged pool isValid=true; }

I have another function that read value and isValid in another thread:

void func2() { if (isValid) { if (value==0) ... } }

The 2 functions are unrelated e.g. in different projects. Is it possible that the compiler reorders the assignment in func1() in some unpredictable way so that isValid is set to true before value is set to 0? Since in the compiler point of view, the variables are written to but never read by the func1(), so reordered writes doesn’t break anything in func1.

I’d mark each variable as volatile to ensure it is reloaded when accessed,
and not optimized.

Interesting demonstration of the danger of using global variables (although, of course, the same thing can be done with context structures).

If these were not globals, the compiler could assume their order was irrelevant. Because they are globals, it is my understanding that, because of sequence points, the stores will be made in this order.

HOWEVER, any time you have a condition where your “state” involves multiple related pieces of data, you need to consider whether it’s time to introduce an interlock.

Actually, the variables are in a context structure allocated in non-paged pool:

struct Mycontext { int value; bool isValid; }; void func1(Mycontext* ctx) { ctx->value=0; ctx->isValid=true; }

To guarantee write order, should I use volatile like this?

void func1(volatile Mycontext* ctx) { ctx->value=0; ctx->isValid=true; } void func2(volatile Mycontext* ctx) { if (isValid) { if (value==0)... } }

I also came across something called std::atomic. Is it available to kernel mode code?

Here is a good read:

https://www.quora.com/What-is-volatile-Memory-in-C-and-how-is-it-used

Note another thing here. Compiler might not rearrange the instructions.
But the CPU can still execute them out of order. Volatile does not prevent
that. It is a language construct, not a hardware notion in this context.

You need to use memory barriers for your case. Interlocking would bee
somewhat nisleading in the code samples here really. Reading and writing
ints and booleans on x86 is always atomic. The order of executions is not
(unless a memory barrier is introduced, which interlocking happens to add
on x86).

Regards, Dejan.

Another issue. Is the memory barrier provided by InterlockedXXX guarantees all read/writes b4 it are written to memory? Is volatile still necessary with InterlockedXXX ? The doc only mentions about reordering cannot cross a memory barrier. But does memory barrier implies the read/write must be written to memory?

e.g.

void test1(volatile LONG& a) {
InterlockedIncrement(&a);
}

Is the volatile declaration necessary?

You’ll want to read about MBs a bit more than forum posts can give you.
Short answer: sort of.
Long answer: depends on the type of MB.

IIRC, on x86 Interlocked instructions provide a full MB, so expected pre
and post memory access is assurred. But do not trust me on this, it is an
“iirc”.

this is a complex subject that can’t easily be covered in a few forum posts

there are two general kinds of cross-thread memory accesses. One kind where it is sufficient that the second thread eventually reads the value that the first thread has written, and another when two or more threads must read exactly the values written by another immediately.

The first is something like an exit loop flag

while(!bExit)
{

}

assuming the loop is fast, the wrong value for bExit could be read many times, before eventually an assignment like bExit=true; became visible and the loop terminates

The second class is more demanding. Two or more threads modifying a linked list or something like that. If it is only the memory locations involved in the interlocked operation that must be atomically modified, then the no fence API versions should be used. But if the interlocked operations imply implementation of a synchronization mechanism, then the acquire or release semantics should be used. The default of a full memory barrier is wasteful

volatile is considered only by the compiler and prevents caching a variable in a register and some types of reordering.

MSVC always had volatile construct a full memory barrier. This didn’t matter much over the years as the x86 is strongly ordered. However, with ARM this caused performance to suck so now the MSVC semantics of volatile are different depending on the target platform and/or compiler switches. It’s all called out pretty clearly here:

https://learn.microsoft.com/en-us/cpp/cpp/volatile-cpp?view=msvc-170

Generally I try to stay away from being too cute with this stuff and use OS provided mechanisms instead of trying to invent something clever. Some things that I find useful outside of normal locking primitives:

  1. RTL_RUN_ONCE package. Good for handling the “one time init” race: https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/ntddk/nf-ntddk-rtlrunonceinitialize
  2. EX_RUNDOWN_REF package. Good for a resource that mostly sticks around but can sometimes come and go: https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-exacquirerundownprotection

I think that this new behaviour has everything to do with MSVC supporting other architectures again and trying to maintain compatible behaviour rather than some implicit contract around volatile. Back when I programmed for Alpha in the early 90’s, there was certainly no memory order guarantee implied by volatile - only that the compiler would not optimize away the memory access

I agree that it is unwise to rely on the details and it is better by far to use interlocked functions or other constructs that hide the platform details away from your code

Absolutely, interlocked functions are the ONLY way to go when synchronizing
access across threads.

I used volatile quite a bit in the DOS and Windows 3.x/9x days. Esp. in
interrupt handlers where you would be guaranteed a particular variable
would not be reloaded owing to optimizations.