Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results
The free OSR Learning Library has more than 50 articles on a wide variety of topics about writing and debugging device drivers and Minifilters. From introductory level to advanced. All the articles have been recently reviewed and updated, and are written using the clear and definitive style you've come to expect from OSR over the years.
Check out The OSR Learning Library at: https://www.osr.com/osr-learning-library/
After reading this
https://stackoverflow.com/questions/14624776/can-a-bool-read-write-operation-be-not-atomic-on-x86
In particular someone mentioned:
compiler optimization: the compiler shuffles the order of reads and writes under the assumption that the values are not accessed from another thread, resulting in chaos
Let say I have a function that write to a non-page pool allocated by me.
void func1() {
value=0; //both variables are in non-paged pool
isValid=true;
}
I have another function that read value and isValid in another thread:
void func2() {
if (isValid) {
if (value==0) ...
}
}
The 2 functions are unrelated e.g. in different projects. Is it possible that the compiler reorders the assignment in func1() in some unpredictable way so that isValid is set to true before value is set to 0? Since in the compiler point of view, the variables are written to but never read by the func1(), so reordered writes doesn't break anything in func1.
Upcoming OSR Seminars | ||
---|---|---|
OSR has suspended in-person seminars due to the Covid-19 outbreak. But, don't miss your training! Attend via the internet instead! | ||
Kernel Debugging | 13-17 May 2024 | Live, Online |
Developing Minifilters | 1-5 Apr 2024 | Live, Online |
Internals & Software Drivers | 11-15 Mar 2024 | Live, Online |
Writing WDF Drivers | 26 Feb - 1 Mar 2024 | Live, Online |
Comments
and not optimized.
Interesting demonstration of the danger of using global variables (although, of course, the same thing can be done with context structures).
If these were not globals, the compiler could assume their order was irrelevant. Because they are globals, it is my understanding that, because of sequence points, the stores will be made in this order.
HOWEVER, any time you have a condition where your "state" involves multiple related pieces of data, you need to consider whether it's time to introduce an interlock.
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.
Actually, the variables are in a context structure allocated in non-paged pool:
struct Mycontext {
int value;
bool isValid;
};
void func1(Mycontext* ctx) {
ctx->value=0;
ctx->isValid=true;
}
To guarantee write order, should I use volatile like this?
void func1(volatile Mycontext* ctx) {
ctx->value=0;
ctx->isValid=true;
}
void func2(volatile Mycontext* ctx) {
if (isValid) {
if (value==0)...
}
}
I also came across something called std::atomic. Is it available to kernel mode code?
https://www.quora.com/What-is-volatile-Memory-in-C-and-how-is-it-used
But the CPU can still execute them out of order. Volatile does not prevent
that. It is a language construct, not a hardware notion in this context.
You need to use memory barriers for your case. Interlocking would bee
somewhat nisleading in the code samples here really. Reading and writing
ints and booleans on x86 is always atomic. The order of executions is not
(unless a memory barrier is introduced, which interlocking happens to add
on x86).
Regards, Dejan.
Another issue. Is the memory barrier provided by InterlockedXXX guarantees all read/writes b4 it are written to memory? Is volatile still necessary with InterlockedXXX ? The doc only mentions about reordering cannot cross a memory barrier. But does memory barrier implies the read/write must be written to memory?
e.g.
void test1(volatile LONG& a) {
InterlockedIncrement(&a);
}
Is the volatile declaration necessary?
Short answer: sort of.
Long answer: depends on the type of MB.
IIRC, on x86 Interlocked instructions provide a full MB, so expected pre
and post memory access is assurred. But do not trust me on this, it is an
"iirc".
this is a complex subject that can't easily be covered in a few forum posts
there are two general kinds of cross-thread memory accesses. One kind where it is sufficient that the second thread eventually reads the value that the first thread has written, and another when two or more threads must read exactly the values written by another immediately.
The first is something like an exit loop flag
while(!bExit)
{
...
}
assuming the loop is fast, the wrong value for bExit could be read many times, before eventually an assignment like bExit=true; became visible and the loop terminates
The second class is more demanding. Two or more threads modifying a linked list or something like that. If it is only the memory locations involved in the interlocked operation that must be atomically modified, then the no fence API versions should be used. But if the interlocked operations imply implementation of a synchronization mechanism, then the acquire or release semantics should be used. The default of a full memory barrier is wasteful
volatile is considered only by the compiler and prevents caching a variable in a register and some types of reordering.
MSVC always had volatile construct a full memory barrier. This didn't matter much over the years as the x86 is strongly ordered. However, with ARM this caused performance to suck so now the MSVC semantics of volatile are different depending on the target platform and/or compiler switches. It's all called out pretty clearly here:
https://learn.microsoft.com/en-us/cpp/cpp/volatile-cpp?view=msvc-170
Generally I try to stay away from being too cute with this stuff and use OS provided mechanisms instead of trying to invent something clever. Some things that I find useful outside of normal locking primitives:
-scott
OSR
I think that this new behaviour has everything to do with MSVC supporting other architectures again and trying to maintain compatible behaviour rather than some implicit contract around volatile. Back when I programmed for Alpha in the early 90's, there was certainly no memory order guarantee implied by volatile - only that the compiler would not optimize away the memory access
I agree that it is unwise to rely on the details and it is better by far to use interlocked functions or other constructs that hide the platform details away from your code
access across threads.
I used volatile quite a bit in the DOS and Windows 3.x/9x days. Esp. in
interrupt handlers where you would be guaranteed a particular variable
would not be reloaded owing to optimizations.