Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results


Before Posting...

Please check out the Community Guidelines in the Announcements and Administration Category.

More Info on Driver Writing and Debugging

The free OSR Learning Library has more than 50 articles on a wide variety of topics about writing and debugging device drivers and Minifilters. From introductory level to advanced. All the articles have been recently reviewed and updated, and are written using the clear and definitive style you've come to expect from OSR over the years.

Check out The OSR Learning Library at:

Compiler Reordering and Thread safety

alec_leealec_lee Member Posts: 50

After reading this

In particular someone mentioned:

compiler optimization: the compiler shuffles the order of reads and writes under the assumption that the values are not accessed from another thread, resulting in chaos

Let say I have a function that write to a non-page pool allocated by me.

void func1() {
value=0; //both variables are in non-paged pool

I have another function that read value and isValid in another thread:

void func2() {
if (isValid) {
if (value==0) ...

The 2 functions are unrelated e.g. in different projects. Is it possible that the compiler reorders the assignment in func1() in some unpredictable way so that isValid is set to true before value is set to 0? Since in the compiler point of view, the variables are written to but never read by the func1(), so reordered writes doesn't break anything in func1.


  • Jamey_KirbyJamey_Kirby Member - All Emails Posts: 461
    via Email
    I'd mark each variable as volatile to ensure it is reloaded when accessed,
    and not optimized.
  • Tim_RobertsTim_Roberts Member - All Emails Posts: 14,719

    Interesting demonstration of the danger of using global variables (although, of course, the same thing can be done with context structures).

    If these were not globals, the compiler could assume their order was irrelevant. Because they are globals, it is my understanding that, because of sequence points, the stores will be made in this order.

    HOWEVER, any time you have a condition where your "state" involves multiple related pieces of data, you need to consider whether it's time to introduce an interlock.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • alec_leealec_lee Member Posts: 50
    edited March 1

    Actually, the variables are in a context structure allocated in non-paged pool:

    struct Mycontext {
    int value;
    bool isValid;

    void func1(Mycontext* ctx) {

    To guarantee write order, should I use volatile like this?

    void func1(volatile Mycontext* ctx) {

    void func2(volatile Mycontext* ctx) {
    if (isValid) {
    if (value==0)...

    I also came across something called std::atomic. Is it available to kernel mode code?

  • Jamey_KirbyJamey_Kirby Member - All Emails Posts: 461
    via Email
  • Dejan_MaksimovicDejan_Maksimovic Member - All Emails Posts: 608
    via Email
    Note another thing here. Compiler might not rearrange the instructions.
    But the CPU can still execute them out of order. Volatile does not prevent
    that. It is a language construct, not a hardware notion in this context.

    You need to use memory barriers for your case. Interlocking would bee
    somewhat nisleading in the code samples here really. Reading and writing
    ints and booleans on x86 is always atomic. The order of executions is not
    (unless a memory barrier is introduced, which interlocking happens to add
    on x86).

    Regards, Dejan.
  • alec_leealec_lee Member Posts: 50
    edited March 1

    Another issue. Is the memory barrier provided by InterlockedXXX guarantees all read/writes b4 it are written to memory? Is volatile still necessary with InterlockedXXX ? The doc only mentions about reordering cannot cross a memory barrier. But does memory barrier implies the read/write must be written to memory?


    void test1(volatile LONG& a) {

    Is the volatile declaration necessary?

  • Dejan_MaksimovicDejan_Maksimovic Member - All Emails Posts: 608
    via Email
    You'll want to read about MBs a bit more than forum posts can give you.
    Short answer: sort of.
    Long answer: depends on the type of MB.

    IIRC, on x86 Interlocked instructions provide a full MB, so expected pre
    and post memory access is assurred. But do not trust me on this, it is an
  • MBond2MBond2 Member Posts: 629

    this is a complex subject that can't easily be covered in a few forum posts

    there are two general kinds of cross-thread memory accesses. One kind where it is sufficient that the second thread eventually reads the value that the first thread has written, and another when two or more threads must read exactly the values written by another immediately.

    The first is something like an exit loop flag


    assuming the loop is fast, the wrong value for bExit could be read many times, before eventually an assignment like bExit=true; became visible and the loop terminates

    The second class is more demanding. Two or more threads modifying a linked list or something like that. If it is only the memory locations involved in the interlocked operation that must be atomically modified, then the no fence API versions should be used. But if the interlocked operations imply implementation of a synchronization mechanism, then the acquire or release semantics should be used. The default of a full memory barrier is wasteful

    volatile is considered only by the compiler and prevents caching a variable in a register and some types of reordering.

  • Scott_Noone_(OSR)Scott_Noone_(OSR) Administrator Posts: 3,650

    MSVC always had volatile construct a full memory barrier. This didn't matter much over the years as the x86 is strongly ordered. However, with ARM this caused performance to suck so now the MSVC semantics of volatile are different depending on the target platform and/or compiler switches. It's all called out pretty clearly here:

    Generally I try to stay away from being too cute with this stuff and use OS provided mechanisms instead of trying to invent something clever. Some things that I find useful outside of normal locking primitives:

    1. RTL_RUN_ONCE package. Good for handling the "one time init" race:
    2. EX_RUNDOWN_REF package. Good for a resource that mostly sticks around but can sometimes come and go:


  • MBond2MBond2 Member Posts: 629

    I think that this new behaviour has everything to do with MSVC supporting other architectures again and trying to maintain compatible behaviour rather than some implicit contract around volatile. Back when I programmed for Alpha in the early 90's, there was certainly no memory order guarantee implied by volatile - only that the compiler would not optimize away the memory access

    I agree that it is unwise to rely on the details and it is better by far to use interlocked functions or other constructs that hide the platform details away from your code

  • Jamey_KirbyJamey_Kirby Member - All Emails Posts: 461
    via Email
    Absolutely, interlocked functions are the ONLY way to go when synchronizing
    access across threads.

    I used volatile quite a bit in the DOS and Windows 3.x/9x days. Esp. in
    interrupt handlers where you would be guaranteed a particular variable
    would not be reloaded owing to optimizations.
Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. Sign in or register to get started.

Upcoming OSR Seminars
OSR has suspended in-person seminars due to the Covid-19 outbreak. But, don't miss your training! Attend via the internet instead!
Kernel Debugging 13-17 May 2024 Live, Online
Developing Minifilters 1-5 Apr 2024 Live, Online
Internals & Software Drivers 11-15 Mar 2024 Live, Online
Writing WDF Drivers 26 Feb - 1 Mar 2024 Live, Online