Yeah, and a lot of that code will probably break on a machine with more
available registers, because the compiler has more chance of keeping
something in a register for more than a couple of instructions.
Not sure how you define the “memory barrier”. Here’s what the CPU can
supply:
The memory barrier can be inserted with a variety of macros/intrinsics, at
least in the 13.xxx compiler that is supplied with the 3790 DDK.
Here’s a selection of names:
MemoryFence
MemoryBarrier
_mm_mfence
All map to the mfence instruction, which will “stall the processor until
all pending memory operations are completed”.
There’s also a “lfence” and “sfence”, which do the same as mfence, except
only for “loads” and “stores” respectively. So for example
mov eax, dword ptr somewhere
mov dword ptr someother, edx
followed by:
mfence - will stall until both the read of “somewhere” and write to
“someother” is completed.
lfence - will stall until eax contains the contents of “somewhere”
sfence - will stall until edx has been written to “someother”, but eax may
still not contain the value of “somewhere”.
Note that “written” in this case may well mean “has been stored in the
cache”, not necessarily that it’s actually been written to the external
memory.
If you actually want to have things physically written to the memory, you
need to flush the CPU cache. That’s a very “naughty” thing to do, because
you may well flush a bunch of code/data that isn’t “yours”.
wbinvd is the instruction to do this (and it’s a privileged
instruction, so kernel mode only). Haven’t bothered to look for a “ddk”
name for it.
There’s also clflush, which takes an address in memory, and makes sure that
any cache-line corresponding to this address is flushed from the cache.
This can be done from user mode.
Of course, the more I think about, the more I think your question is:
How do I convince the compiler to “reload” any data held in registers. The
answer to this is: You can’t. The compiler will at it’s own logic decide
to store things in memory or in registers, depending on what registers are
available and what it thinks is the most optimal.
Calling a function (non-inlined) will reduce the number of registers that
are available to the compiler, but it doesn’t prevent it from using
registers, it just reduces the chances of something being stored in a
register somewhat.
–
Mats
xxxxx@lists.osr.com wrote on 11/10/2004 03:40:33 PM:
Uh. Yes, I have to admit I never really tested it. But also with VC 7.1
(tried it myself now) the “while(!flag);” loop expands to a simple “jmp
label”. It looks like you’re right and it’s pure luck that 90% of the
programs that I’ve seen actually work (there are so many programmers that
just NEVER use volatile - and the bad thing is, they DO write
multithreaded programs).
BTW: is there a way to insert a memory-barrier manually, some
pragma/macro/function that assures that every values are written to
memory
before the barrier and every values are fetched again from memory after
the barrier?
Would be helpful in some situations…
Regards,
Paul Groke
Mats PETERSSON
> Gesendet von: xxxxx@lists.osr.com
> 10.11.2004 14:51
> Bitte antworten an “Windows System Software Devs Interest List”
>
> An: “Windows System Software Devs Interest List”
>
> Kopie:
> Thema: Re: [ntdev] Is this Multithread safe, or does it require
> Interlocked access?
>
>
>
>
>
>
>
> Paul,
>
> I don’t agree that compilers are intentionally “volatile unaware”
> friendly.
> It just happens that x86 doesn’t have particularly many registers, and
> there are other aspects of C/C++, such as “pointer aliasing” that
prevents
> the compiler from being extremely aggressive when it comes to being
clever
> with loading data into registers and keeping it there.
>
> But I completely agree that any data that is EXPECTED to be modified by
> another thread (or through interrupts, external hardware modification,
> etc)
> HAS to be marked volatile if the compiler is expected to generate code
> that
> will work.
>
> In your example of “while(!flag);”, the compiler, without volatile, will
> be
> allowed (and probably will if you turn on optimisation) to generate the
> following code:
>
> mov eax, dword ptr flag
> L1:
> test eax, eax
> jz :L1
>
> However, testing shows that the compiler generates this:
> $L858:
> jmp SHORT $L858
> from this:
>
> int flag;
>
> int main(void)
> {
> flag = 0;
> while(!flag);
> }
>
> The same code with volatile looks like this:
>
> $L858:
> mov eax, dword ptr flag
> test eax, eax
> jz $L858
>
> So without volatile, the compiler doesn’t generate “correct” code that
> would work if the flag was set externally. With volatile, it does.
>
> –
> Mats
>
> xxxxx@lists.osr.com wrote on 11/10/2004 01:31:54 PM:
>
> > Yes, sure, it doesn’t replace the InterlockedXXX() functions (or other
> > means of synchronization) - I mentioned this only as an addition to the
> > reactions already posted. Of course if only InterlockedXXX()
functions
> > are used then making the array volatile is optional. Modern compilers
> are
>
> > very aware of “volatile unaware programmers”, but there’s absolutely no
> > guarantee the compiler will guess it right every time.
> > And as I understand volatile, it’s required if some variable can be
> > changed from anything else than the current thread of execution - I
> think
>
> > C++ does not require the compiler to create working code if such a
> > required volatile is omitted. AFAIK even a simple loop like
> > “while(!flag);” without “flag” being volatile is not required to work
as
> > expected.
> >
> > Regards,
> >
> > Paul Groke
> >
> >
> >
> >
> >
> > Mats PETERSSON
> > Gesendet von: xxxxx@lists.osr.com
> > 10.11.2004 14:13
> > Bitte antworten an “Windows System Software Devs Interest List”
> >
> > An: “Windows System Software Devs Interest List”
> >
> > Kopie:
> > Thema: Re: [ntdev] Is this Multithread safe, or does it
require
> > Interlocked access?
> >
> >
> >
> >
> >
> >
> >
> > Paul, that’s a good point. That makes the compiler more strict on the
> > operations.
> >
> > But it still doesn’t guarantee that two threads will not perform
> > operations
> > in parallel to the thread. It does however prevent the compiler from
> > “caching” the value of the GlobalArray in a register for some amount
> of
> > time during the processing.
> >
> > The volatile keyword essentially tells the compiler that “Something
else
> > may change this data at ANY time, so don’t save it away expecting that
> you
> > know what’s happening to it”.
> >
> > –
> > Mats
> > xxxxx@lists.osr.com wrote on 11/10/2004 12:56:17 PM:
> >
> > > > char * GlobalArray[20];
> > >
> > > should probably be
> > > char * volatile GlobalArray[20];
> > > or even
> > > volatile char * volatile GlobalArray[20];
> > > depending on what you touch in those threads. In 90% of all cases it
> > will
> >
> > > work without, but I’d prefer to have it defined as volatile.
> > > bye,
> > >
> > > Paul Groke
> > >
> > >
> > >
> > >
> > >
> > > Mats PETERSSON
> > > Gesendet von: xxxxx@lists.osr.com
> > > 10.11.2004 11:18
> > > Bitte antworten an “Windows System Software Devs Interest List”
> > >
> > > An: “Windows System Software Devs Interest List”
> > >
> > > Kopie:
> > > Thema: Re: [ntdev] Is this Multithread safe, or does it
> require
> > > Interlocked access?
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Like others have said, although there is an atomic ADD operation, the
> > > compiler may well dole out a separate MOV, ADD, MOV to compute the
> > value,
> > > because it has some reason to do that, such as “it’s more optimized”.
> > >
> > > I spent several days looking for a bug in a RTOS that was caused by
> the
> > > fact that this version of the OS didn’t do “x++” atomic, whilst on
all
> > the
> > > other CPU’s that used the same source code, it was atomic. In this
> case
> > it
> > > was an interrupt that also affected the same variable that caused the
> > > problem. This only happened rarely, and only in the stress-test.
Funny
> > > thing was that it only became clear that it was a bug when we got a
> > faster
> > > version of the CPU, because prior to that we’d manage to run the 24
> hour
> > > stress test to finish before running out of memory
> > >
> > > Anyways, if you have two threads accessing the same data, you need to
> > use
> > > specific “Locked” operations, or use SpinLocks. Which one is better
> for
> > > your particular case depends on what you want to achieve.
> > >
> > > The advantage of the InterlockedXXX instruction, you don’t have to
> call
> > > the
> > > OS to get hold of the lock. But only one processor in the system may
> > hold
> > > the LOCK at any given time, so you’re essentially forcing the add to
> be
> > > single processor anyways.
> > >
> > > On the other hand, if you grab a spinlock, the code to perform the
> > actual
> > > calculation will most likely run a bit quicker, but you loose some
> time
> > > actually grabbing the lock, and of course, the lock is a bit courser
> > > granularity.
> > >
> > > I suspect that your thread architecture in a real situation will be a
> > lot
> > > more complex, and if all you’re doing to the data is adding to it,
and
> > the
> > > rest of the operations are not touching that data (such as some sort
> of
> > > statistics counters), then that would be a good case for
> InterlockedXXX.
> > > On
> > > the other hand, if you’re adding to the data, then checking it’s
value
> > and
> > > taking some sort of decision based on it’s value, you probably need a
> > > spinlock or similar.
> > >
> > > –
> > > Mats
> > >
> > >
> > >
> > > -------- Notice --------
> > > The information in this message is confidential and may be legally
> > > privileged. It is intended solely for the addressee. Access to this
> > > message by anyone else is unauthorized. If you are not the intended
> > > recipient, any disclosure, copying or distribution of the message, or
> > any
> > > action taken by you in reliance on it, is prohibited and may be
> > unlawful.
> > > If you have received this message in error, please delete it and
> contact
> > > the sender immediately. Thank you.
> > >
> > >
> > > xxxxx@lists.osr.com wrote on 11/09/2004 07:54:16 PM:
> > >
> > > > I am using a global pointer here for ease of explanation, but it
> > > > will end up in some sort of class, or context structure.
> > > >
> > > > Thread1, and Thread2 will execute simultaneously, in either a
> > > > hyperthreading, or “Dual Processing” environment. I.e., elements of
> > > > GlobalArray will be incremented by 2 threads simultaneously. Are
> > > > there cacheing issues, or synchronization issues here? Is the “Add”
> > > > instruction atomic? Or, do I need to grab some sort of spinlock, or
> > > > use InterlockedXXX instructions in thread1, and thread2?
> > > >
> > > > Thanks,
> > > > James
> > > >
> > > >
> > >
> >
>
------------------------------------------------------------------------------------
>
> >
> > >
> > > > char * GlobalArray[20];
> > > >
> > > > Initialize()
> > > > {
> > > > // Set all elements of GlobalArray to 0.
> > > > ZeroArray(GlobalArray);
> > > >
> > > > Launch(Thread1);
> > > > Launch(Thread2);
> > > >
> > > > // Wait until Thread1 and 2 Complete.
> > > > wait();
> > > >
> > > > // Check the value of GlobalArray is correct.
> > > > // GlobalArray[1] = 2, GlobalArray[2] = 4, GlobalArray[3] = 6
> etc…
> > > >
> > > > }
> > > >
> > > > Thread1()
> > > > {
> > > > for(i=0; i<20; i++)
> > > > GlobalArray[i] += i;
> > > > }
> > > >
> > > > Thread2()
> > > > {
> > > > for(i=0; i<20; i++)
> > > > GlobalArray[i] += i;
> > > > } —
> > > > Questions? First check the Kernel Driver FAQ at http://www.
> > > > osronline.com/article.cfm?id=256
> > > >
> > > > You are currently subscribed to ntdev as: unknown lmsubst tag
> > argument:
> > > ‘’
> > > > To unsubscribe send a blank email to
> xxxxx@lists.osr.com
> > > > ForwardSourceID:NT00007066
> > >
> > >
> > > —
> > > Questions? First check the Kernel Driver FAQ at
> > > http://www.osronline.com/article.cfm?id=256
> > >
> > > You are currently subscribed to ntdev as: xxxxx@tab.at
> > > To unsubscribe send a blank email to xxxxx@lists.osr.com
> > >
> > > Please visit us: www.tab.at www.championsnet.net
> > > www.silverball.com
> > >
> > >
> > > —
> > > Questions? First check the Kernel Driver FAQ at http://www.
> > > osronline.com/article.cfm?id=256
> > >
> > > You are currently subscribed to ntdev as: xxxxx@3dlabs.com
> > > To unsubscribe send a blank email to xxxxx@lists.osr.com
> >
> > > ForwardSourceID:NT00007102
> >
> >
> > —
> > Questions? First check the Kernel Driver FAQ at
> > http://www.osronline.com/article.cfm?id=256
> >
> > You are currently subscribed to ntdev as: xxxxx@tab.at
> > To unsubscribe send a blank email to xxxxx@lists.osr.com
> >
> > Please visit us: www.tab.at www.championsnet.net
> > www.silverball.com
> >
> >
> > —
> > Questions? First check the Kernel Driver FAQ at http://www.
> > osronline.com/article.cfm?id=256
> >
> > You are currently subscribed to ntdev as: xxxxx@3dlabs.com
> > To unsubscribe send a blank email to xxxxx@lists.osr.com
>
> > ForwardSourceID:NT0000710E
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@tab.at
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>
> Please visit us: www.tab.at www.championsnet.net
> www.silverball.com
>
>
> —
> Questions? First check the Kernel Driver FAQ at http://www.
> osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@3dlabs.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
> ForwardSourceID:NT0000714A