Is this Multithread safe, or does it require Interlocked access?

You should not assume that any read/modify/write cycle (which is what an
increment represents) is multi-thread (or MP) safe. Use the interlocked
increment function (ideally) because it will be optimized to take
advantage of any mechanism available in the underlying system.

Regards,

Tony

Tony Mason

Consulting Partner

OSR Open Systems Resources, Inc.

http://www.osr.com http:</http:>


From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@conexant.com
Sent: Tuesday, November 09, 2004 2:54 PM
To: ntdev redirect
Subject: [ntdev] Is this Multithread safe, or does it require
Interlocked access?

I am using a global pointer here for ease of explanation, but it will
end up in some sort of class, or context structure.

Thread1, and Thread2 will execute simultaneously, in either a
hyperthreading, or “Dual Processing” environment. I.e., elements of
GlobalArray will be incremented by 2 threads simultaneously. Are there
cacheing issues, or synchronization issues here? Is the “Add”
instruction atomic? Or, do I need to grab some sort of spinlock, or use
InterlockedXXX instructions in thread1, and thread2?

Thanks,
James



char * GlobalArray[20];

Initialize()
{
// Set all elements of GlobalArray to 0.
ZeroArray(GlobalArray);

Launch(Thread1);
Launch(Thread2);

// Wait until Thread1 and 2 Complete.
wait();

// Check the value of GlobalArray is correct.
// GlobalArray[1] = 2, GlobalArray[2] = 4, GlobalArray[3] = 6 etc…

}

Thread1()
{
for(i=0; i<20; i++)
GlobalArray[i] += i;
}

Thread2()
{
for(i=0; i<20; i++)
GlobalArray[i] += i;
} —
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: unknown lmsubst tag argument:
‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com

An “add” written is assembler “could” be atomic, but your “add in C/C++” is not atomic at all, just because it consists of a sequence of instructions and that sequence can run in parallel with an identical sequence in the other thread ! You will have to use a exclusive access to your byte(s) from within your threads !

Christiaan

----- Original Message -----
From: xxxxx@conexant.com
To: Windows System Software Devs Interest List
Sent: Tuesday, November 09, 2004 8:54 PM
Subject: [ntdev] Is this Multithread safe, or does it require Interlocked access?

I am using a global pointer here for ease of explanation, but it will end up in some sort of class, or context structure.

Thread1, and Thread2 will execute simultaneously, in either a hyperthreading, or “Dual Processing” environment. I.e., elements of GlobalArray will be incremented by 2 threads simultaneously. Are there cacheing issues, or synchronization issues here? Is the “Add” instruction atomic? Or, do I need to grab some sort of spinlock, or use InterlockedXXX instructions in thread1, and thread2?

Thanks,
James


char * GlobalArray[20];

Initialize()
{
// Set all elements of GlobalArray to 0.
ZeroArray(GlobalArray);

Launch(Thread1);
Launch(Thread2);

// Wait until Thread1 and 2 Complete.
wait();

// Check the value of GlobalArray is correct.
// GlobalArray[1] = 2, GlobalArray[2] = 4, GlobalArray[3] = 6 etc…

}

Thread1()
{
for(i=0; i<20; i++)
GlobalArray[i] += i;
}

Thread2()
{
for(i=0; i<20; i++)
GlobalArray[i] += i;
} —
Questions? First check the Kernel Driver FAQ at http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: unknown lmsubst tag argument: ‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com

Like others have said, although there is an atomic ADD operation, the
compiler may well dole out a separate MOV, ADD, MOV to compute the value,
because it has some reason to do that, such as “it’s more optimized”.

I spent several days looking for a bug in a RTOS that was caused by the
fact that this version of the OS didn’t do “x++” atomic, whilst on all the
other CPU’s that used the same source code, it was atomic. In this case it
was an interrupt that also affected the same variable that caused the
problem. This only happened rarely, and only in the stress-test. Funny
thing was that it only became clear that it was a bug when we got a faster
version of the CPU, because prior to that we’d manage to run the 24 hour
stress test to finish before running out of memory :wink:

Anyways, if you have two threads accessing the same data, you need to use
specific “Locked” operations, or use SpinLocks. Which one is better for
your particular case depends on what you want to achieve.

The advantage of the InterlockedXXX instruction, you don’t have to call the
OS to get hold of the lock. But only one processor in the system may hold
the LOCK at any given time, so you’re essentially forcing the add to be
single processor anyways.

On the other hand, if you grab a spinlock, the code to perform the actual
calculation will most likely run a bit quicker, but you loose some time
actually grabbing the lock, and of course, the lock is a bit courser
granularity.

I suspect that your thread architecture in a real situation will be a lot
more complex, and if all you’re doing to the data is adding to it, and the
rest of the operations are not touching that data (such as some sort of
statistics counters), then that would be a good case for InterlockedXXX. On
the other hand, if you’re adding to the data, then checking it’s value and
taking some sort of decision based on it’s value, you probably need a
spinlock or similar.


Mats

-------- Notice --------
The information in this message is confidential and may be legally
privileged. It is intended solely for the addressee. Access to this
message by anyone else is unauthorized. If you are not the intended
recipient, any disclosure, copying or distribution of the message, or any
action taken by you in reliance on it, is prohibited and may be unlawful.
If you have received this message in error, please delete it and contact
the sender immediately. Thank you.

xxxxx@lists.osr.com wrote on 11/09/2004 07:54:16 PM:

I am using a global pointer here for ease of explanation, but it
will end up in some sort of class, or context structure.

Thread1, and Thread2 will execute simultaneously, in either a
hyperthreading, or “Dual Processing” environment. I.e., elements of
GlobalArray will be incremented by 2 threads simultaneously. Are
there cacheing issues, or synchronization issues here? Is the “Add”
instruction atomic? Or, do I need to grab some sort of spinlock, or
use InterlockedXXX instructions in thread1, and thread2?

Thanks,
James


char * GlobalArray[20];

Initialize()
{
// Set all elements of GlobalArray to 0.
ZeroArray(GlobalArray);

Launch(Thread1);
Launch(Thread2);

// Wait until Thread1 and 2 Complete.
wait();

// Check the value of GlobalArray is correct.
// GlobalArray[1] = 2, GlobalArray[2] = 4, GlobalArray[3] = 6 etc…

}

Thread1()
{
for(i=0; i<20; i++)
GlobalArray[i] += i;
}

Thread2()
{
for(i=0; i<20; i++)
GlobalArray[i] += i;
} —
Questions? First check the Kernel Driver FAQ at http://www.
osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: unknown lmsubst tag argument:
‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com
ForwardSourceID:NT00007066

> char * GlobalArray[20];

should probably be
char * volatile GlobalArray[20];
or even
volatile char * volatile GlobalArray[20];
depending on what you touch in those threads. In 90% of all cases it will
work without, but I’d prefer to have it defined as volatile.
bye,

Paul Groke

Mats PETERSSON
Gesendet von: xxxxx@lists.osr.com
10.11.2004 11:18
Bitte antworten an “Windows System Software Devs Interest List”

An: “Windows System Software Devs Interest List”

Kopie:
Thema: Re: [ntdev] Is this Multithread safe, or does it require
Interlocked access?

Like others have said, although there is an atomic ADD operation, the
compiler may well dole out a separate MOV, ADD, MOV to compute the value,
because it has some reason to do that, such as “it’s more optimized”.

I spent several days looking for a bug in a RTOS that was caused by the
fact that this version of the OS didn’t do “x++” atomic, whilst on all the
other CPU’s that used the same source code, it was atomic. In this case it
was an interrupt that also affected the same variable that caused the
problem. This only happened rarely, and only in the stress-test. Funny
thing was that it only became clear that it was a bug when we got a faster
version of the CPU, because prior to that we’d manage to run the 24 hour
stress test to finish before running out of memory :wink:

Anyways, if you have two threads accessing the same data, you need to use
specific “Locked” operations, or use SpinLocks. Which one is better for
your particular case depends on what you want to achieve.

The advantage of the InterlockedXXX instruction, you don’t have to call
the
OS to get hold of the lock. But only one processor in the system may hold
the LOCK at any given time, so you’re essentially forcing the add to be
single processor anyways.

On the other hand, if you grab a spinlock, the code to perform the actual
calculation will most likely run a bit quicker, but you loose some time
actually grabbing the lock, and of course, the lock is a bit courser
granularity.

I suspect that your thread architecture in a real situation will be a lot
more complex, and if all you’re doing to the data is adding to it, and the
rest of the operations are not touching that data (such as some sort of
statistics counters), then that would be a good case for InterlockedXXX.
On
the other hand, if you’re adding to the data, then checking it’s value and
taking some sort of decision based on it’s value, you probably need a
spinlock or similar.


Mats

-------- Notice --------
The information in this message is confidential and may be legally
privileged. It is intended solely for the addressee. Access to this
message by anyone else is unauthorized. If you are not the intended
recipient, any disclosure, copying or distribution of the message, or any
action taken by you in reliance on it, is prohibited and may be unlawful.
If you have received this message in error, please delete it and contact
the sender immediately. Thank you.

xxxxx@lists.osr.com wrote on 11/09/2004 07:54:16 PM:

> I am using a global pointer here for ease of explanation, but it
> will end up in some sort of class, or context structure.
>
> Thread1, and Thread2 will execute simultaneously, in either a
> hyperthreading, or “Dual Processing” environment. I.e., elements of
> GlobalArray will be incremented by 2 threads simultaneously. Are
> there cacheing issues, or synchronization issues here? Is the “Add”
> instruction atomic? Or, do I need to grab some sort of spinlock, or
> use InterlockedXXX instructions in thread1, and thread2?
>
> Thanks,
> James
>
>
------------------------------------------------------------------------------------

> char * GlobalArray[20];
>
> Initialize()
> {
> // Set all elements of GlobalArray to 0.
> ZeroArray(GlobalArray);
>
> Launch(Thread1);
> Launch(Thread2);
>
> // Wait until Thread1 and 2 Complete.
> wait();
>
> // Check the value of GlobalArray is correct.
> // GlobalArray[1] = 2, GlobalArray[2] = 4, GlobalArray[3] = 6 etc…
>
> }
>
> Thread1()
> {
> for(i=0; i<20; i++)
> GlobalArray[i] += i;
> }
>
> Thread2()
> {
> for(i=0; i<20; i++)
> GlobalArray[i] += i;
> } —
> Questions? First check the Kernel Driver FAQ at http://www.
> osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: unknown lmsubst tag argument:
‘’
> To unsubscribe send a blank email to xxxxx@lists.osr.com
> ForwardSourceID:NT00007066


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@tab.at
To unsubscribe send a blank email to xxxxx@lists.osr.com

Please visit us: www.tab.at www.championsnet.net
www.silverball.com

Paul, that’s a good point. That makes the compiler more strict on the
operations.

But it still doesn’t guarantee that two threads will not perform operations
in parallel to the thread. It does however prevent the compiler from
“caching” the value of the GlobalArray in a register for some amount of
time during the processing.

The volatile keyword essentially tells the compiler that “Something else
may change this data at ANY time, so don’t save it away expecting that you
know what’s happening to it”.


Mats
xxxxx@lists.osr.com wrote on 11/10/2004 12:56:17 PM:

> char * GlobalArray[20];

should probably be
char * volatile GlobalArray[20];
or even
volatile char * volatile GlobalArray[20];
depending on what you touch in those threads. In 90% of all cases it will

work without, but I’d prefer to have it defined as volatile.
bye,

Paul Groke

Mats PETERSSON
> Gesendet von: xxxxx@lists.osr.com
> 10.11.2004 11:18
> Bitte antworten an “Windows System Software Devs Interest List”
>
> An: “Windows System Software Devs Interest List”
>
> Kopie:
> Thema: Re: [ntdev] Is this Multithread safe, or does it require
> Interlocked access?
>
>
>
>
>
>
>
> Like others have said, although there is an atomic ADD operation, the
> compiler may well dole out a separate MOV, ADD, MOV to compute the value,
> because it has some reason to do that, such as “it’s more optimized”.
>
> I spent several days looking for a bug in a RTOS that was caused by the
> fact that this version of the OS didn’t do “x++” atomic, whilst on all
the
> other CPU’s that used the same source code, it was atomic. In this case
it
> was an interrupt that also affected the same variable that caused the
> problem. This only happened rarely, and only in the stress-test. Funny
> thing was that it only became clear that it was a bug when we got a
faster
> version of the CPU, because prior to that we’d manage to run the 24 hour
> stress test to finish before running out of memory :wink:
>
> Anyways, if you have two threads accessing the same data, you need to use
> specific “Locked” operations, or use SpinLocks. Which one is better for
> your particular case depends on what you want to achieve.
>
> The advantage of the InterlockedXXX instruction, you don’t have to call
> the
> OS to get hold of the lock. But only one processor in the system may hold
> the LOCK at any given time, so you’re essentially forcing the add to be
> single processor anyways.
>
> On the other hand, if you grab a spinlock, the code to perform the actual
> calculation will most likely run a bit quicker, but you loose some time
> actually grabbing the lock, and of course, the lock is a bit courser
> granularity.
>
> I suspect that your thread architecture in a real situation will be a lot
> more complex, and if all you’re doing to the data is adding to it, and
the
> rest of the operations are not touching that data (such as some sort of
> statistics counters), then that would be a good case for InterlockedXXX.
> On
> the other hand, if you’re adding to the data, then checking it’s value
and
> taking some sort of decision based on it’s value, you probably need a
> spinlock or similar.
>
> –
> Mats
>
>
>
> -------- Notice --------
> The information in this message is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this
> message by anyone else is unauthorized. If you are not the intended
> recipient, any disclosure, copying or distribution of the message, or any
> action taken by you in reliance on it, is prohibited and may be unlawful.
> If you have received this message in error, please delete it and contact
> the sender immediately. Thank you.
>
>
> xxxxx@lists.osr.com wrote on 11/09/2004 07:54:16 PM:
>
> > I am using a global pointer here for ease of explanation, but it
> > will end up in some sort of class, or context structure.
> >
> > Thread1, and Thread2 will execute simultaneously, in either a
> > hyperthreading, or “Dual Processing” environment. I.e., elements of
> > GlobalArray will be incremented by 2 threads simultaneously. Are
> > there cacheing issues, or synchronization issues here? Is the “Add”
> > instruction atomic? Or, do I need to grab some sort of spinlock, or
> > use InterlockedXXX instructions in thread1, and thread2?
> >
> > Thanks,
> > James
> >
> >
>
------------------------------------------------------------------------------------

>
> > char * GlobalArray[20];
> >
> > Initialize()
> > {
> > // Set all elements of GlobalArray to 0.
> > ZeroArray(GlobalArray);
> >
> > Launch(Thread1);
> > Launch(Thread2);
> >
> > // Wait until Thread1 and 2 Complete.
> > wait();
> >
> > // Check the value of GlobalArray is correct.
> > // GlobalArray[1] = 2, GlobalArray[2] = 4, GlobalArray[3] = 6 etc…
> >
> > }
> >
> > Thread1()
> > {
> > for(i=0; i<20; i++)
> > GlobalArray[i] += i;
> > }
> >
> > Thread2()
> > {
> > for(i=0; i<20; i++)
> > GlobalArray[i] += i;
> > } —
> > Questions? First check the Kernel Driver FAQ at http://www.
> > osronline.com/article.cfm?id=256
> >
> > You are currently subscribed to ntdev as: unknown lmsubst tag argument:
> ‘’
> > To unsubscribe send a blank email to xxxxx@lists.osr.com
> > ForwardSourceID:NT00007066
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@tab.at
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>
> Please visit us: www.tab.at www.championsnet.net
> www.silverball.com
>
>
> —
> Questions? First check the Kernel Driver FAQ at http://www.
> osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@3dlabs.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com

> ForwardSourceID:NT00007102

Yes, sure, it doesn’t replace the InterlockedXXX() functions (or other
means of synchronization) - I mentioned this only as an addition to the
reactions already posted. Of course if only InterlockedXXX() functions
are used then making the array volatile is optional. Modern compilers are
very aware of “volatile unaware programmers”, but there’s absolutely no
guarantee the compiler will guess it right every time.
And as I understand volatile, it’s required if some variable can be
changed from anything else than the current thread of execution - I think
C++ does not require the compiler to create working code if such a
required volatile is omitted. AFAIK even a simple loop like
“while(!flag);” without “flag” being volatile is not required to work as
expected.

Regards,

Paul Groke

Mats PETERSSON
Gesendet von: xxxxx@lists.osr.com
10.11.2004 14:13
Bitte antworten an “Windows System Software Devs Interest List”

An: “Windows System Software Devs Interest List”

Kopie:
Thema: Re: [ntdev] Is this Multithread safe, or does it require
Interlocked access?

Paul, that’s a good point. That makes the compiler more strict on the
operations.

But it still doesn’t guarantee that two threads will not perform
operations
in parallel to the thread. It does however prevent the compiler from
“caching” the value of the GlobalArray in a register for some amount of
time during the processing.

The volatile keyword essentially tells the compiler that “Something else
may change this data at ANY time, so don’t save it away expecting that you
know what’s happening to it”.


Mats
xxxxx@lists.osr.com wrote on 11/10/2004 12:56:17 PM:

> > char * GlobalArray[20];
>
> should probably be
> char * volatile GlobalArray[20];
> or even
> volatile char * volatile GlobalArray[20];
> depending on what you touch in those threads. In 90% of all cases it
will

> work without, but I’d prefer to have it defined as volatile.
> bye,
>
> Paul Groke
>
>
>
>
>
> Mats PETERSSON
> Gesendet von: xxxxx@lists.osr.com
> 10.11.2004 11:18
> Bitte antworten an “Windows System Software Devs Interest List”
>
> An: “Windows System Software Devs Interest List”
>
> Kopie:
> Thema: Re: [ntdev] Is this Multithread safe, or does it require
> Interlocked access?
>
>
>
>
>
>
>
> Like others have said, although there is an atomic ADD operation, the
> compiler may well dole out a separate MOV, ADD, MOV to compute the
value,
> because it has some reason to do that, such as “it’s more optimized”.
>
> I spent several days looking for a bug in a RTOS that was caused by the
> fact that this version of the OS didn’t do “x++” atomic, whilst on all
the
> other CPU’s that used the same source code, it was atomic. In this case
it
> was an interrupt that also affected the same variable that caused the
> problem. This only happened rarely, and only in the stress-test. Funny
> thing was that it only became clear that it was a bug when we got a
faster
> version of the CPU, because prior to that we’d manage to run the 24 hour
> stress test to finish before running out of memory :wink:
>
> Anyways, if you have two threads accessing the same data, you need to
use
> specific “Locked” operations, or use SpinLocks. Which one is better for
> your particular case depends on what you want to achieve.
>
> The advantage of the InterlockedXXX instruction, you don’t have to call
> the
> OS to get hold of the lock. But only one processor in the system may
hold
> the LOCK at any given time, so you’re essentially forcing the add to be
> single processor anyways.
>
> On the other hand, if you grab a spinlock, the code to perform the
actual
> calculation will most likely run a bit quicker, but you loose some time
> actually grabbing the lock, and of course, the lock is a bit courser
> granularity.
>
> I suspect that your thread architecture in a real situation will be a
lot
> more complex, and if all you’re doing to the data is adding to it, and
the
> rest of the operations are not touching that data (such as some sort of
> statistics counters), then that would be a good case for InterlockedXXX.
> On
> the other hand, if you’re adding to the data, then checking it’s value
and
> taking some sort of decision based on it’s value, you probably need a
> spinlock or similar.
>
> –
> Mats
>
>
>
> -------- Notice --------
> The information in this message is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this
> message by anyone else is unauthorized. If you are not the intended
> recipient, any disclosure, copying or distribution of the message, or
any
> action taken by you in reliance on it, is prohibited and may be
unlawful.
> If you have received this message in error, please delete it and contact
> the sender immediately. Thank you.
>
>
> xxxxx@lists.osr.com wrote on 11/09/2004 07:54:16 PM:
>
> > I am using a global pointer here for ease of explanation, but it
> > will end up in some sort of class, or context structure.
> >
> > Thread1, and Thread2 will execute simultaneously, in either a
> > hyperthreading, or “Dual Processing” environment. I.e., elements of
> > GlobalArray will be incremented by 2 threads simultaneously. Are
> > there cacheing issues, or synchronization issues here? Is the “Add”
> > instruction atomic? Or, do I need to grab some sort of spinlock, or
> > use InterlockedXXX instructions in thread1, and thread2?
> >
> > Thanks,
> > James
> >
> >
>
------------------------------------------------------------------------------------

>
> > char * GlobalArray[20];
> >
> > Initialize()
> > {
> > // Set all elements of GlobalArray to 0.
> > ZeroArray(GlobalArray);
> >
> > Launch(Thread1);
> > Launch(Thread2);
> >
> > // Wait until Thread1 and 2 Complete.
> > wait();
> >
> > // Check the value of GlobalArray is correct.
> > // GlobalArray[1] = 2, GlobalArray[2] = 4, GlobalArray[3] = 6 etc…
> >
> > }
> >
> > Thread1()
> > {
> > for(i=0; i<20; i++)
> > GlobalArray[i] += i;
> > }
> >
> > Thread2()
> > {
> > for(i=0; i<20; i++)
> > GlobalArray[i] += i;
> > } —
> > Questions? First check the Kernel Driver FAQ at http://www.
> > osronline.com/article.cfm?id=256
> >
> > You are currently subscribed to ntdev as: unknown lmsubst tag
argument:
> ‘’
> > To unsubscribe send a blank email to xxxxx@lists.osr.com
> > ForwardSourceID:NT00007066
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@tab.at
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>
> Please visit us: www.tab.at www.championsnet.net
> www.silverball.com
>
>
> —
> Questions? First check the Kernel Driver FAQ at http://www.
> osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@3dlabs.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com

> ForwardSourceID:NT00007102


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@tab.at
To unsubscribe send a blank email to xxxxx@lists.osr.com

Please visit us: www.tab.at www.championsnet.net
www.silverball.com

Paul,

I don’t agree that compilers are intentionally “volatile unaware” friendly.
It just happens that x86 doesn’t have particularly many registers, and
there are other aspects of C/C++, such as “pointer aliasing” that prevents
the compiler from being extremely aggressive when it comes to being clever
with loading data into registers and keeping it there.

But I completely agree that any data that is EXPECTED to be modified by
another thread (or through interrupts, external hardware modification, etc)
HAS to be marked volatile if the compiler is expected to generate code that
will work.

In your example of “while(!flag);”, the compiler, without volatile, will be
allowed (and probably will if you turn on optimisation) to generate the
following code:

mov eax, dword ptr flag
L1:
test eax, eax
jz :L1

However, testing shows that the compiler generates this:
$L858:
jmp SHORT $L858
from this:

int flag;

int main(void)
{
flag = 0;
while(!flag);
}

The same code with volatile looks like this:

$L858:
mov eax, dword ptr flag
test eax, eax
jz $L858

So without volatile, the compiler doesn’t generate “correct” code that
would work if the flag was set externally. With volatile, it does.


Mats

xxxxx@lists.osr.com wrote on 11/10/2004 01:31:54 PM:

Yes, sure, it doesn’t replace the InterlockedXXX() functions (or other
means of synchronization) - I mentioned this only as an addition to the
reactions already posted. Of course if only InterlockedXXX() functions
are used then making the array volatile is optional. Modern compilers are

very aware of “volatile unaware programmers”, but there’s absolutely no
guarantee the compiler will guess it right every time.
And as I understand volatile, it’s required if some variable can be
changed from anything else than the current thread of execution - I think

C++ does not require the compiler to create working code if such a
required volatile is omitted. AFAIK even a simple loop like
“while(!flag);” without “flag” being volatile is not required to work as
expected.

Regards,

Paul Groke

Mats PETERSSON
> Gesendet von: xxxxx@lists.osr.com
> 10.11.2004 14:13
> Bitte antworten an “Windows System Software Devs Interest List”
>
> An: “Windows System Software Devs Interest List”
>
> Kopie:
> Thema: Re: [ntdev] Is this Multithread safe, or does it require
> Interlocked access?
>
>
>
>
>
>
>
> Paul, that’s a good point. That makes the compiler more strict on the
> operations.
>
> But it still doesn’t guarantee that two threads will not perform
> operations
> in parallel to the thread. It does however prevent the compiler from
> “caching” the value of the GlobalArray in a register for some amount
of
> time during the processing.
>
> The volatile keyword essentially tells the compiler that “Something else
> may change this data at ANY time, so don’t save it away expecting that
you
> know what’s happening to it”.
>
> –
> Mats
> xxxxx@lists.osr.com wrote on 11/10/2004 12:56:17 PM:
>
> > > char * GlobalArray[20];
> >
> > should probably be
> > char * volatile GlobalArray[20];
> > or even
> > volatile char * volatile GlobalArray[20];
> > depending on what you touch in those threads. In 90% of all cases it
> will
>
> > work without, but I’d prefer to have it defined as volatile.
> > bye,
> >
> > Paul Groke
> >
> >
> >
> >
> >
> > Mats PETERSSON
> > Gesendet von: xxxxx@lists.osr.com
> > 10.11.2004 11:18
> > Bitte antworten an “Windows System Software Devs Interest List”
> >
> > An: “Windows System Software Devs Interest List”
> >
> > Kopie:
> > Thema: Re: [ntdev] Is this Multithread safe, or does it
require
> > Interlocked access?
> >
> >
> >
> >
> >
> >
> >
> > Like others have said, although there is an atomic ADD operation, the
> > compiler may well dole out a separate MOV, ADD, MOV to compute the
> value,
> > because it has some reason to do that, such as “it’s more optimized”.
> >
> > I spent several days looking for a bug in a RTOS that was caused by the
> > fact that this version of the OS didn’t do “x++” atomic, whilst on all
> the
> > other CPU’s that used the same source code, it was atomic. In this case
> it
> > was an interrupt that also affected the same variable that caused the
> > problem. This only happened rarely, and only in the stress-test. Funny
> > thing was that it only became clear that it was a bug when we got a
> faster
> > version of the CPU, because prior to that we’d manage to run the 24
hour
> > stress test to finish before running out of memory :wink:
> >
> > Anyways, if you have two threads accessing the same data, you need to
> use
> > specific “Locked” operations, or use SpinLocks. Which one is better for
> > your particular case depends on what you want to achieve.
> >
> > The advantage of the InterlockedXXX instruction, you don’t have to call
> > the
> > OS to get hold of the lock. But only one processor in the system may
> hold
> > the LOCK at any given time, so you’re essentially forcing the add to be
> > single processor anyways.
> >
> > On the other hand, if you grab a spinlock, the code to perform the
> actual
> > calculation will most likely run a bit quicker, but you loose some time
> > actually grabbing the lock, and of course, the lock is a bit courser
> > granularity.
> >
> > I suspect that your thread architecture in a real situation will be a
> lot
> > more complex, and if all you’re doing to the data is adding to it, and
> the
> > rest of the operations are not touching that data (such as some sort of
> > statistics counters), then that would be a good case for
InterlockedXXX.
> > On
> > the other hand, if you’re adding to the data, then checking it’s value
> and
> > taking some sort of decision based on it’s value, you probably need a
> > spinlock or similar.
> >
> > –
> > Mats
> >
> >
> >
> > -------- Notice --------
> > The information in this message is confidential and may be legally
> > privileged. It is intended solely for the addressee. Access to this
> > message by anyone else is unauthorized. If you are not the intended
> > recipient, any disclosure, copying or distribution of the message, or
> any
> > action taken by you in reliance on it, is prohibited and may be
> unlawful.
> > If you have received this message in error, please delete it and
contact
> > the sender immediately. Thank you.
> >
> >
> > xxxxx@lists.osr.com wrote on 11/09/2004 07:54:16 PM:
> >
> > > I am using a global pointer here for ease of explanation, but it
> > > will end up in some sort of class, or context structure.
> > >
> > > Thread1, and Thread2 will execute simultaneously, in either a
> > > hyperthreading, or “Dual Processing” environment. I.e., elements of
> > > GlobalArray will be incremented by 2 threads simultaneously. Are
> > > there cacheing issues, or synchronization issues here? Is the “Add”
> > > instruction atomic? Or, do I need to grab some sort of spinlock, or
> > > use InterlockedXXX instructions in thread1, and thread2?
> > >
> > > Thanks,
> > > James
> > >
> > >
> >
>
------------------------------------------------------------------------------------

>
> >
> > > char * GlobalArray[20];
> > >
> > > Initialize()
> > > {
> > > // Set all elements of GlobalArray to 0.
> > > ZeroArray(GlobalArray);
> > >
> > > Launch(Thread1);
> > > Launch(Thread2);
> > >
> > > // Wait until Thread1 and 2 Complete.
> > > wait();
> > >
> > > // Check the value of GlobalArray is correct.
> > > // GlobalArray[1] = 2, GlobalArray[2] = 4, GlobalArray[3] = 6 etc…
> > >
> > > }
> > >
> > > Thread1()
> > > {
> > > for(i=0; i<20; i++)
> > > GlobalArray[i] += i;
> > > }
> > >
> > > Thread2()
> > > {
> > > for(i=0; i<20; i++)
> > > GlobalArray[i] += i;
> > > } —
> > > Questions? First check the Kernel Driver FAQ at http://www.
> > > osronline.com/article.cfm?id=256
> > >
> > > You are currently subscribed to ntdev as: unknown lmsubst tag
> argument:
> > ‘’
> > > To unsubscribe send a blank email to xxxxx@lists.osr.com
> > > ForwardSourceID:NT00007066
> >
> >
> > —
> > Questions? First check the Kernel Driver FAQ at
> > http://www.osronline.com/article.cfm?id=256
> >
> > You are currently subscribed to ntdev as: xxxxx@tab.at
> > To unsubscribe send a blank email to xxxxx@lists.osr.com
> >
> > Please visit us: www.tab.at www.championsnet.net
> > www.silverball.com
> >
> >
> > —
> > Questions? First check the Kernel Driver FAQ at http://www.
> > osronline.com/article.cfm?id=256
> >
> > You are currently subscribed to ntdev as: xxxxx@3dlabs.com
> > To unsubscribe send a blank email to xxxxx@lists.osr.com
>
> > ForwardSourceID:NT00007102
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@tab.at
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>
> Please visit us: www.tab.at www.championsnet.net
> www.silverball.com
>
>
> —
> Questions? First check the Kernel Driver FAQ at http://www.
> osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@3dlabs.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com

> ForwardSourceID:NT0000710E

If I am correct , the only possible atomic operation would be in assembler :

char *pLocation = &GlobalArray[i] ;

_asm mov al , byte ptr i
_asm mov ebx , pLocation
_asm lock add byte ptr [ebx] , al

I don’t think you can force a compiler to ever generate that …

Christiaan

----- Original Message -----
From: “Mats PETERSSON”
To: “Windows System Software Devs Interest List”
Sent: Wednesday, November 10, 2004 2:51 PM
Subject: Re: [ntdev] Is this Multithread safe, or does it require Interlocked access?

>
>
>
>
>
> Paul,
>
> I don’t agree that compilers are intentionally “volatile unaware” friendly.
> It just happens that x86 doesn’t have particularly many registers, and
> there are other aspects of C/C++, such as “pointer aliasing” that prevents
> the compiler from being extremely aggressive when it comes to being clever
> with loading data into registers and keeping it there.
>
> But I completely agree that any data that is EXPECTED to be modified by
> another thread (or through interrupts, external hardware modification, etc)
> HAS to be marked volatile if the compiler is expected to generate code that
> will work.
>
> In your example of “while(!flag);”, the compiler, without volatile, will be
> allowed (and probably will if you turn on optimisation) to generate the
> following code:
>
> mov eax, dword ptr flag
> L1:
> test eax, eax
> jz :L1
>
> However, testing shows that the compiler generates this:
> $L858:
> jmp SHORT $L858
> from this:
>
> int flag;
>
> int main(void)
> {
> flag = 0;
> while(!flag);
> }
>
> The same code with volatile looks like this:
>
> $L858:
> mov eax, dword ptr flag
> test eax, eax
> jz $L858
>
> So without volatile, the compiler doesn’t generate “correct” code that
> would work if the flag was set externally. With volatile, it does.
>
> –
> Mats
>
> xxxxx@lists.osr.com wrote on 11/10/2004 01:31:54 PM:
>
> > Yes, sure, it doesn’t replace the InterlockedXXX() functions (or other
> > means of synchronization) - I mentioned this only as an addition to the
> > reactions already posted. Of course if only InterlockedXXX() functions
> > are used then making the array volatile is optional. Modern compilers are
>
> > very aware of “volatile unaware programmers”, but there’s absolutely no
> > guarantee the compiler will guess it right every time.
> > And as I understand volatile, it’s required if some variable can be
> > changed from anything else than the current thread of execution - I think
>
> > C++ does not require the compiler to create working code if such a
> > required volatile is omitted. AFAIK even a simple loop like
> > “while(!flag);” without “flag” being volatile is not required to work as
> > expected.
> >
> > Regards,
> >
> > Paul Groke
> >
> >
> >
> >
> >
> > Mats PETERSSON
> > Gesendet von: xxxxx@lists.osr.com
> > 10.11.2004 14:13
> > Bitte antworten an “Windows System Software Devs Interest List”
> >
> > An: “Windows System Software Devs Interest List”
> >
> > Kopie:
> > Thema: Re: [ntdev] Is this Multithread safe, or does it require
> > Interlocked access?
> >
> >
> >
> >
> >
> >
> >
> > Paul, that’s a good point. That makes the compiler more strict on the
> > operations.
> >
> > But it still doesn’t guarantee that two threads will not perform
> > operations
> > in parallel to the thread. It does however prevent the compiler from
> > “caching” the value of the GlobalArray in a register for some amount
> of
> > time during the processing.
> >
> > The volatile keyword essentially tells the compiler that “Something else
> > may change this data at ANY time, so don’t save it away expecting that
> you
> > know what’s happening to it”.
> >
> > –
> > Mats
> > xxxxx@lists.osr.com wrote on 11/10/2004 12:56:17 PM:
> >
> > > > char * GlobalArray[20];
> > >
> > > should probably be
> > > char * volatile GlobalArray[20];
> > > or even
> > > volatile char * volatile GlobalArray[20];
> > > depending on what you touch in those threads. In 90% of all cases it
> > will
> >
> > > work without, but I’d prefer to have it defined as volatile.
> > > bye,
> > >
> > > Paul Groke
> > >
> > >
> > >
> > >
> > >
> > > Mats PETERSSON
> > > Gesendet von: xxxxx@lists.osr.com
> > > 10.11.2004 11:18
> > > Bitte antworten an “Windows System Software Devs Interest List”
> > >
> > > An: “Windows System Software Devs Interest List”
> > >
> > > Kopie:
> > > Thema: Re: [ntdev] Is this Multithread safe, or does it
> require
> > > Interlocked access?
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Like others have said, although there is an atomic ADD operation, the
> > > compiler may well dole out a separate MOV, ADD, MOV to compute the
> > value,
> > > because it has some reason to do that, such as “it’s more optimized”.
> > >
> > > I spent several days looking for a bug in a RTOS that was caused by the
> > > fact that this version of the OS didn’t do “x++” atomic, whilst on all
> > the
> > > other CPU’s that used the same source code, it was atomic. In this case
> > it
> > > was an interrupt that also affected the same variable that caused the
> > > problem. This only happened rarely, and only in the stress-test. Funny
> > > thing was that it only became clear that it was a bug when we got a
> > faster
> > > version of the CPU, because prior to that we’d manage to run the 24
> hour
> > > stress test to finish before running out of memory :wink:
> > >
> > > Anyways, if you have two threads accessing the same data, you need to
> > use
> > > specific “Locked” operations, or use SpinLocks. Which one is better for
> > > your particular case depends on what you want to achieve.
> > >
> > > The advantage of the InterlockedXXX instruction, you don’t have to call
> > > the
> > > OS to get hold of the lock. But only one processor in the system may
> > hold
> > > the LOCK at any given time, so you’re essentially forcing the add to be
> > > single processor anyways.
> > >
> > > On the other hand, if you grab a spinlock, the code to perform the
> > actual
> > > calculation will most likely run a bit quicker, but you loose some time
> > > actually grabbing the lock, and of course, the lock is a bit courser
> > > granularity.
> > >
> > > I suspect that your thread architecture in a real situation will be a
> > lot
> > > more complex, and if all you’re doing to the data is adding to it, and
> > the
> > > rest of the operations are not touching that data (such as some sort of
> > > statistics counters), then that would be a good case for
> InterlockedXXX.
> > > On
> > > the other hand, if you’re adding to the data, then checking it’s value
> > and
> > > taking some sort of decision based on it’s value, you probably need a
> > > spinlock or similar.
> > >
> > > –
> > > Mats
> > >
> > >
> > >
> > > -------- Notice --------
> > > The information in this message is confidential and may be legally
> > > privileged. It is intended solely for the addressee. Access to this
> > > message by anyone else is unauthorized. If you are not the intended
> > > recipient, any disclosure, copying or distribution of the message, or
> > any
> > > action taken by you in reliance on it, is prohibited and may be
> > unlawful.
> > > If you have received this message in error, please delete it and
> contact
> > > the sender immediately. Thank you.
> > >
> > >
> > > xxxxx@lists.osr.com wrote on 11/09/2004 07:54:16 PM:
> > >
> > > > I am using a global pointer here for ease of explanation, but it
> > > > will end up in some sort of class, or context structure.
> > > >
> > > > Thread1, and Thread2 will execute simultaneously, in either a
> > > > hyperthreading, or “Dual Processing” environment. I.e., elements of
> > > > GlobalArray will be incremented by 2 threads simultaneously. Are
> > > > there cacheing issues, or synchronization issues here? Is the “Add”
> > > > instruction atomic? Or, do I need to grab some sort of spinlock, or
> > > > use InterlockedXXX instructions in thread1, and thread2?
> > > >
> > > > Thanks,
> > > > James
> > > >
> > > >
> > >
> >
> ------------------------------------------------------------------------------------
>
> >
> > >
> > > > char * GlobalArray[20];
> > > >
> > > > Initialize()
> > > > {
> > > > // Set all elements of GlobalArray to 0.
> > > > ZeroArray(GlobalArray);
> > > >
> > > > Launch(Thread1);
> > > > Launch(Thread2);
> > > >
> > > > // Wait until Thread1 and 2 Complete.
> > > > wait();
> > > >
> > > > // Check the value of GlobalArray is correct.
> > > > // GlobalArray[1] = 2, GlobalArray[2] = 4, GlobalArray[3] = 6 etc…
> > > >
> > > > }
> > > >
> > > > Thread1()
> > > > {
> > > > for(i=0; i<20; i++)
> > > > GlobalArray[i] += i;
> > > > }
> > > >
> > > > Thread2()
> > > > {
> > > > for(i=0; i<20; i++)
> > > > GlobalArray[i] += i;
> > > > } —
> > > > Questions? First check the Kernel Driver FAQ at http://www.
> > > > osronline.com/article.cfm?id=256
> > > >
> > > > You are currently subscribed to ntdev as: unknown lmsubst tag
> > argument:
> > > ‘’
> > > > To unsubscribe send a blank email to xxxxx@lists.osr.com
> > > > ForwardSourceID:NT00007066
> > >
> > >
> > > —
> > > Questions? First check the Kernel Driver FAQ at
> > > http://www.osronline.com/article.cfm?id=256
> > >
> > > You are currently subscribed to ntdev as: xxxxx@tab.at
> > > To unsubscribe send a blank email to xxxxx@lists.osr.com
> > >
> > > Please visit us: www.tab.at www.championsnet.net
> > > www.silverball.com
> > >
> > >
> > > —
> > > Questions? First check the Kernel Driver FAQ at http://www.
> > > osronline.com/article.cfm?id=256
> > >
> > > You are currently subscribed to ntdev as: xxxxx@3dlabs.com
> > > To unsubscribe send a blank email to xxxxx@lists.osr.com
> >
> > > ForwardSourceID:NT00007102
> >
> >
> > —
> > Questions? First check the Kernel Driver FAQ at
> > http://www.osronline.com/article.cfm?id=256
> >
> > You are currently subscribed to ntdev as: xxxxx@tab.at
> > To unsubscribe send a blank email to xxxxx@lists.osr.com
> >
> > Please visit us: www.tab.at www.championsnet.net
> > www.silverball.com
> >
> >
> > —
> > Questions? First check the Kernel Driver FAQ at http://www.
> > osronline.com/article.cfm?id=256
> >
> > You are currently subscribed to ntdev as: xxxxx@3dlabs.com
> > To unsubscribe send a blank email to xxxxx@lists.osr.com
>
> > ForwardSourceID:NT0000710E
>
>
> —
> Questions? First check the Kernel Driver FAQ at http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@compaqnet.be
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>

Uh. Yes, I have to admit I never really tested it. But also with VC 7.1
(tried it myself now) the “while(!flag);” loop expands to a simple “jmp
label”. It looks like you’re right and it’s pure luck that 90% of the
programs that I’ve seen actually work (there are so many programmers that
just NEVER use volatile - and the bad thing is, they DO write
multithreaded programs).
BTW: is there a way to insert a memory-barrier manually, some
pragma/macro/function that assures that every values are written to memory
before the barrier and every values are fetched again from memory after
the barrier?
Would be helpful in some situations…

Regards,
Paul Groke

Mats PETERSSON
Gesendet von: xxxxx@lists.osr.com
10.11.2004 14:51
Bitte antworten an “Windows System Software Devs Interest List”

An: “Windows System Software Devs Interest List”

Kopie:
Thema: Re: [ntdev] Is this Multithread safe, or does it require
Interlocked access?

Paul,

I don’t agree that compilers are intentionally “volatile unaware”
friendly.
It just happens that x86 doesn’t have particularly many registers, and
there are other aspects of C/C++, such as “pointer aliasing” that prevents
the compiler from being extremely aggressive when it comes to being clever
with loading data into registers and keeping it there.

But I completely agree that any data that is EXPECTED to be modified by
another thread (or through interrupts, external hardware modification,
etc)
HAS to be marked volatile if the compiler is expected to generate code
that
will work.

In your example of “while(!flag);”, the compiler, without volatile, will
be
allowed (and probably will if you turn on optimisation) to generate the
following code:

mov eax, dword ptr flag
L1:
test eax, eax
jz :L1

However, testing shows that the compiler generates this:
$L858:
jmp SHORT $L858
from this:

int flag;

int main(void)
{
flag = 0;
while(!flag);
}

The same code with volatile looks like this:

$L858:
mov eax, dword ptr flag
test eax, eax
jz $L858

So without volatile, the compiler doesn’t generate “correct” code that
would work if the flag was set externally. With volatile, it does.


Mats

xxxxx@lists.osr.com wrote on 11/10/2004 01:31:54 PM:

> Yes, sure, it doesn’t replace the InterlockedXXX() functions (or other
> means of synchronization) - I mentioned this only as an addition to the
> reactions already posted. Of course if only InterlockedXXX() functions
> are used then making the array volatile is optional. Modern compilers
are

> very aware of “volatile unaware programmers”, but there’s absolutely no
> guarantee the compiler will guess it right every time.
> And as I understand volatile, it’s required if some variable can be
> changed from anything else than the current thread of execution - I
think

> C++ does not require the compiler to create working code if such a
> required volatile is omitted. AFAIK even a simple loop like
> “while(!flag);” without “flag” being volatile is not required to work as
> expected.
>
> Regards,
>
> Paul Groke
>
>
>
>
>
> Mats PETERSSON
> Gesendet von: xxxxx@lists.osr.com
> 10.11.2004 14:13
> Bitte antworten an “Windows System Software Devs Interest List”
>
> An: “Windows System Software Devs Interest List”
>
> Kopie:
> Thema: Re: [ntdev] Is this Multithread safe, or does it require
> Interlocked access?
>
>
>
>
>
>
>
> Paul, that’s a good point. That makes the compiler more strict on the
> operations.
>
> But it still doesn’t guarantee that two threads will not perform
> operations
> in parallel to the thread. It does however prevent the compiler from
> “caching” the value of the GlobalArray in a register for some amount
of
> time during the processing.
>
> The volatile keyword essentially tells the compiler that “Something else
> may change this data at ANY time, so don’t save it away expecting that
you
> know what’s happening to it”.
>
> –
> Mats
> xxxxx@lists.osr.com wrote on 11/10/2004 12:56:17 PM:
>
> > > char * GlobalArray[20];
> >
> > should probably be
> > char * volatile GlobalArray[20];
> > or even
> > volatile char * volatile GlobalArray[20];
> > depending on what you touch in those threads. In 90% of all cases it
> will
>
> > work without, but I’d prefer to have it defined as volatile.
> > bye,
> >
> > Paul Groke
> >
> >
> >
> >
> >
> > Mats PETERSSON
> > Gesendet von: xxxxx@lists.osr.com
> > 10.11.2004 11:18
> > Bitte antworten an “Windows System Software Devs Interest List”
> >
> > An: “Windows System Software Devs Interest List”
> >
> > Kopie:
> > Thema: Re: [ntdev] Is this Multithread safe, or does it
require
> > Interlocked access?
> >
> >
> >
> >
> >
> >
> >
> > Like others have said, although there is an atomic ADD operation, the
> > compiler may well dole out a separate MOV, ADD, MOV to compute the
> value,
> > because it has some reason to do that, such as “it’s more optimized”.
> >
> > I spent several days looking for a bug in a RTOS that was caused by
the
> > fact that this version of the OS didn’t do “x++” atomic, whilst on all
> the
> > other CPU’s that used the same source code, it was atomic. In this
case
> it
> > was an interrupt that also affected the same variable that caused the
> > problem. This only happened rarely, and only in the stress-test. Funny
> > thing was that it only became clear that it was a bug when we got a
> faster
> > version of the CPU, because prior to that we’d manage to run the 24
hour
> > stress test to finish before running out of memory :wink:
> >
> > Anyways, if you have two threads accessing the same data, you need to
> use
> > specific “Locked” operations, or use SpinLocks. Which one is better
for
> > your particular case depends on what you want to achieve.
> >
> > The advantage of the InterlockedXXX instruction, you don’t have to
call
> > the
> > OS to get hold of the lock. But only one processor in the system may
> hold
> > the LOCK at any given time, so you’re essentially forcing the add to
be
> > single processor anyways.
> >
> > On the other hand, if you grab a spinlock, the code to perform the
> actual
> > calculation will most likely run a bit quicker, but you loose some
time
> > actually grabbing the lock, and of course, the lock is a bit courser
> > granularity.
> >
> > I suspect that your thread architecture in a real situation will be a
> lot
> > more complex, and if all you’re doing to the data is adding to it, and
> the
> > rest of the operations are not touching that data (such as some sort
of
> > statistics counters), then that would be a good case for
InterlockedXXX.
> > On
> > the other hand, if you’re adding to the data, then checking it’s value
> and
> > taking some sort of decision based on it’s value, you probably need a
> > spinlock or similar.
> >
> > –
> > Mats
> >
> >
> >
> > -------- Notice --------
> > The information in this message is confidential and may be legally
> > privileged. It is intended solely for the addressee. Access to this
> > message by anyone else is unauthorized. If you are not the intended
> > recipient, any disclosure, copying or distribution of the message, or
> any
> > action taken by you in reliance on it, is prohibited and may be
> unlawful.
> > If you have received this message in error, please delete it and
contact
> > the sender immediately. Thank you.
> >
> >
> > xxxxx@lists.osr.com wrote on 11/09/2004 07:54:16 PM:
> >
> > > I am using a global pointer here for ease of explanation, but it
> > > will end up in some sort of class, or context structure.
> > >
> > > Thread1, and Thread2 will execute simultaneously, in either a
> > > hyperthreading, or “Dual Processing” environment. I.e., elements of
> > > GlobalArray will be incremented by 2 threads simultaneously. Are
> > > there cacheing issues, or synchronization issues here? Is the “Add”
> > > instruction atomic? Or, do I need to grab some sort of spinlock, or
> > > use InterlockedXXX instructions in thread1, and thread2?
> > >
> > > Thanks,
> > > James
> > >
> > >
> >
>
------------------------------------------------------------------------------------

>
> >
> > > char * GlobalArray[20];
> > >
> > > Initialize()
> > > {
> > > // Set all elements of GlobalArray to 0.
> > > ZeroArray(GlobalArray);
> > >
> > > Launch(Thread1);
> > > Launch(Thread2);
> > >
> > > // Wait until Thread1 and 2 Complete.
> > > wait();
> > >
> > > // Check the value of GlobalArray is correct.
> > > // GlobalArray[1] = 2, GlobalArray[2] = 4, GlobalArray[3] = 6
etc…
> > >
> > > }
> > >
> > > Thread1()
> > > {
> > > for(i=0; i<20; i++)
> > > GlobalArray[i] += i;
> > > }
> > >
> > > Thread2()
> > > {
> > > for(i=0; i<20; i++)
> > > GlobalArray[i] += i;
> > > } —
> > > Questions? First check the Kernel Driver FAQ at http://www.
> > > osronline.com/article.cfm?id=256
> > >
> > > You are currently subscribed to ntdev as: unknown lmsubst tag
> argument:
> > ‘’
> > > To unsubscribe send a blank email to
xxxxx@lists.osr.com
> > > ForwardSourceID:NT00007066
> >
> >
> > —
> > Questions? First check the Kernel Driver FAQ at
> > http://www.osronline.com/article.cfm?id=256
> >
> > You are currently subscribed to ntdev as: xxxxx@tab.at
> > To unsubscribe send a blank email to xxxxx@lists.osr.com
> >
> > Please visit us: www.tab.at www.championsnet.net
> > www.silverball.com
> >
> >
> > —
> > Questions? First check the Kernel Driver FAQ at http://www.
> > osronline.com/article.cfm?id=256
> >
> > You are currently subscribed to ntdev as: xxxxx@3dlabs.com
> > To unsubscribe send a blank email to xxxxx@lists.osr.com
>
> > ForwardSourceID:NT00007102
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@tab.at
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>
> Please visit us: www.tab.at www.championsnet.net
> www.silverball.com
>
>
> —
> Questions? First check the Kernel Driver FAQ at http://www.
> osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@3dlabs.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com

> ForwardSourceID:NT0000710E


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@tab.at
To unsubscribe send a blank email to xxxxx@lists.osr.com

Please visit us: www.tab.at www.championsnet.net
www.silverball.com

Christiaan,

That is correct. the compiler will not generate code with LOCK prefix,
unless you either use inline assembler, or modify it to use
InterlockedExchangeAdd, which for x86 is defined as :
FORCEINLINE
LONG
FASTCALL
InterlockedExchangeAdd(
IN OUT LONG volatile *Addend,
IN LONG Increment
)
{
__asm {
mov eax, Increment
mov ecx, Addend
lock xadd [ecx], eax
}
}

But that would of course not work for a byte-array, so the inline assembler
would be the way to do it, but that’s hardly portable, so if possible, I
would replace the char array with a ULONG array, so that the
IntelockedExchangeAdd can be used. That way, it’s portable to any
architecture supported by Windows, althouth it may turn out to be less
efficient on some architectures than on x86.


Mats

xxxxx@lists.osr.com wrote on 11/10/2004 02:11:19 PM:

If I am correct , the only possible atomic operation would be in
assembler :

char *pLocation = &GlobalArray[i] ;

_asm mov al , byte ptr i
_asm mov ebx , pLocation
_asm lock add byte ptr [ebx] , al

I don’t think you can force a compiler to ever generate that …

Christiaan

----- Original Message -----
From: “Mats PETERSSON”
> To: “Windows System Software Devs Interest List”
> Sent: Wednesday, November 10, 2004 2:51 PM
> Subject: Re: [ntdev] Is this Multithread safe, or does it require
> Interlocked access?
>
>
> >
> >
> >
> >
> >
> > Paul,
> >
> > I don’t agree that compilers are intentionally “volatile unaware”
friendly.
> > It just happens that x86 doesn’t have particularly many registers, and
> > there are other aspects of C/C++, such as “pointer aliasing” that
prevents
> > the compiler from being extremely aggressive when it comes to being
clever
> > with loading data into registers and keeping it there.
> >
> > But I completely agree that any data that is EXPECTED to be modified by
> > another thread (or through interrupts, external hardware modification,
etc)
> > HAS to be marked volatile if the compiler is expected to generate code
that
> > will work.
> >
> > In your example of “while(!flag);”, the compiler, without volatile,
will be
> > allowed (and probably will if you turn on optimisation) to generate the
> > following code:
> >
> > mov eax, dword ptr flag
> > L1:
> > test eax, eax
> > jz :L1
> >
> > However, testing shows that the compiler generates this:
> > $L858:
> > jmp SHORT $L858
> > from this:
> >
> > int flag;
> >
> > int main(void)
> > {
> > flag = 0;
> > while(!flag);
> > }
> >
> > The same code with volatile looks like this:
> >
> > $L858:
> > mov eax, dword ptr flag
> > test eax, eax
> > jz $L858
> >
> > So without volatile, the compiler doesn’t generate “correct” code that
> > would work if the flag was set externally. With volatile, it does.
> >
> > –
> > Mats
> >
> > xxxxx@lists.osr.com wrote on 11/10/2004 01:31:54 PM:
> >
> > > Yes, sure, it doesn’t replace the InterlockedXXX() functions (or
other
> > > means of synchronization) - I mentioned this only as an addition to
the
> > > reactions already posted. Of course if only InterlockedXXX()
functions
> > > are used then making the array volatile is optional. Modern compilers
are
> >
> > > very aware of “volatile unaware programmers”, but there’s absolutely
no
> > > guarantee the compiler will guess it right every time.
> > > And as I understand volatile, it’s required if some variable can be
> > > changed from anything else than the current thread of execution - I
think
> >
> > > C++ does not require the compiler to create working code if such a
> > > required volatile is omitted. AFAIK even a simple loop like
> > > “while(!flag);” without “flag” being volatile is not required to work
as
> > > expected.
> > >
> > > Regards,
> > >
> > > Paul Groke
> > >
> > >
> > >
> > >
> > >
> > > Mats PETERSSON
> > > Gesendet von: xxxxx@lists.osr.com
> > > 10.11.2004 14:13
> > > Bitte antworten an “Windows System Software Devs Interest List”
> > >
> > > An: “Windows System Software Devs Interest List”
> > >
> > > Kopie:
> > > Thema: Re: [ntdev] Is this Multithread safe, or does it
require
> > > Interlocked access?
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Paul, that’s a good point. That makes the compiler more strict on the
> > > operations.
> > >
> > > But it still doesn’t guarantee that two threads will not perform
> > > operations
> > > in parallel to the thread. It does however prevent the compiler from
> > > “caching” the value of the GlobalArray in a register for some
amount
> > of
> > > time during the processing.
> > >
> > > The volatile keyword essentially tells the compiler that “Something
else
> > > may change this data at ANY time, so don’t save it away expecting
that
> > you
> > > know what’s happening to it”.
> > >
> > > –
> > > Mats
> > > xxxxx@lists.osr.com wrote on 11/10/2004 12:56:17 PM:
> > >
> > > > > char * GlobalArray[20];
> > > >
> > > > should probably be
> > > > char * volatile GlobalArray[20];
> > > > or even
> > > > volatile char * volatile GlobalArray[20];
> > > > depending on what you touch in those threads. In 90% of all cases
it
> > > will
> > >
> > > > work without, but I’d prefer to have it defined as volatile.
> > > > bye,
> > > >
> > > > Paul Groke
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Mats PETERSSON
> > > > Gesendet von: xxxxx@lists.osr.com
> > > > 10.11.2004 11:18
> > > > Bitte antworten an “Windows System Software Devs Interest List”
> > > >
> > > > An: “Windows System Software Devs Interest List”
> > > >
> > > > Kopie:
> > > > Thema: Re: [ntdev] Is this Multithread safe, or does it
> > require
> > > > Interlocked access?
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Like others have said, although there is an atomic ADD operation,
the
> > > > compiler may well dole out a separate MOV, ADD, MOV to compute the
> > > value,
> > > > because it has some reason to do that, such as “it’s more
optimized”.
> > > >
> > > > I spent several days looking for a bug in a RTOS that was caused by
the
> > > > fact that this version of the OS didn’t do “x++” atomic, whilst on
all
> > > the
> > > > other CPU’s that used the same source code, it was atomic. In this
case
> > > it
> > > > was an interrupt that also affected the same variable that caused
the
> > > > problem. This only happened rarely, and only in the stress-test.
Funny
> > > > thing was that it only became clear that it was a bug when we got a
> > > faster
> > > > version of the CPU, because prior to that we’d manage to run the 24
> > hour
> > > > stress test to finish before running out of memory :wink:
> > > >
> > > > Anyways, if you have two threads accessing the same data, you need
to
> > > use
> > > > specific “Locked” operations, or use SpinLocks. Which one is better
for
> > > > your particular case depends on what you want to achieve.
> > > >
> > > > The advantage of the InterlockedXXX instruction, you don’t have to
call
> > > > the
> > > > OS to get hold of the lock. But only one processor in the system
may
> > > hold
> > > > the LOCK at any given time, so you’re essentially forcing the add
to be
> > > > single processor anyways.
> > > >
> > > > On the other hand, if you grab a spinlock, the code to perform the
> > > actual
> > > > calculation will most likely run a bit quicker, but you loose some
time
> > > > actually grabbing the lock, and of course, the lock is a bit
courser
> > > > granularity.
> > > >
> > > > I suspect that your thread architecture in a real situation will be
a
> > > lot
> > > > more complex, and if all you’re doing to the data is adding to it,
and
> > > the
> > > > rest of the operations are not touching that data (such as some
sort of
> > > > statistics counters), then that would be a good case for
> > InterlockedXXX.
> > > > On
> > > > the other hand, if you’re adding to the data, then checking it’s
value
> > > and
> > > > taking some sort of decision based on it’s value, you probably need
a
> > > > spinlock or similar.
> > > >
> > > > –
> > > > Mats
> > > >
> > > >
> > > >
> > > > -------- Notice --------
> > > > The information in this message is confidential and may be legally
> > > > privileged. It is intended solely for the addressee. Access to
this
> > > > message by anyone else is unauthorized. If you are not the
intended
> > > > recipient, any disclosure, copying or distribution of the message,
or
> > > any
> > > > action taken by you in reliance on it, is prohibited and may be
> > > unlawful.
> > > > If you have received this message in error, please delete it and
> > contact
> > > > the sender immediately. Thank you.
> > > >
> > > >
> > > > xxxxx@lists.osr.com wrote on 11/09/2004 07:54:16 PM:
> > > >
> > > > > I am using a global pointer here for ease of explanation, but it
> > > > > will end up in some sort of class, or context structure.
> > > > >
> > > > > Thread1, and Thread2 will execute simultaneously, in either a
> > > > > hyperthreading, or “Dual Processing” environment. I.e., elements
of
> > > > > GlobalArray will be incremented by 2 threads simultaneously. Are
> > > > > there cacheing issues, or synchronization issues here? Is the
“Add”
> > > > > instruction atomic? Or, do I need to grab some sort of spinlock,
or
> > > > > use InterlockedXXX instructions in thread1, and thread2?
> > > > >
> > > > > Thanks,
> > > > > James
> > > > >
> > > > >
> > > >
> > >
> >
>
------------------------------------------------------------------------------------

> >
> > >
> > > >
> > > > > char * GlobalArray[20];
> > > > >
> > > > > Initialize()
> > > > > {
> > > > > // Set all elements of GlobalArray to 0.
> > > > > ZeroArray(GlobalArray);
> > > > >
> > > > > Launch(Thread1);
> > > > > Launch(Thread2);
> > > > >
> > > > > // Wait until Thread1 and 2 Complete.
> > > > > wait();
> > > > >
> > > > > // Check the value of GlobalArray is correct.
> > > > > // GlobalArray[1] = 2, GlobalArray[2] = 4, GlobalArray[3] = 6
etc…
> > > > >
> > > > > }
> > > > >
> > > > > Thread1()
> > > > > {
> > > > > for(i=0; i<20; i++)
> > > > > GlobalArray[i] += i;
> > > > > }
> > > > >
> > > > > Thread2()
> > > > > {
> > > > > for(i=0; i<20; i++)
> > > > > GlobalArray[i] += i;
> > > > > } —
> > > > > Questions? First check the Kernel Driver FAQ at http://www.
> > > > > osronline.com/article.cfm?id=256
> > > > >
> > > > > You are currently subscribed to ntdev as: unknown lmsubst tag
> > > argument:
> > > > ‘’
> > > > > To unsubscribe send a blank email to
xxxxx@lists.osr.com
> > > > > ForwardSourceID:NT00007066
> > > >
> > > >
> > > > —
> > > > Questions? First check the Kernel Driver FAQ at
> > > > http://www.osronline.com/article.cfm?id=256
> > > >
> > > > You are currently subscribed to ntdev as: xxxxx@tab.at
> > > > To unsubscribe send a blank email to
xxxxx@lists.osr.com
> > > >
> > > > Please visit us: www.tab.at www.championsnet.net
> > > > www.silverball.com
> > > >
> > > >
> > > > —
> > > > Questions? First check the Kernel Driver FAQ at http://www.
> > > > osronline.com/article.cfm?id=256
> > > >
> > > > You are currently subscribed to ntdev as: xxxxx@3dlabs.com
> > > > To unsubscribe send a blank email to
xxxxx@lists.osr.com
> > >
> > > > ForwardSourceID:NT00007102
> > >
> > >
> > > —
> > > Questions? First check the Kernel Driver FAQ at
> > > http://www.osronline.com/article.cfm?id=256
> > >
> > > You are currently subscribed to ntdev as: xxxxx@tab.at
> > > To unsubscribe send a blank email to xxxxx@lists.osr.com
> > >
> > > Please visit us: www.tab.at www.championsnet.net
> > > www.silverball.com
> > >
> > >
> > > —
> > > Questions? First check the Kernel Driver FAQ at http://www.
> > > osronline.com/article.cfm?id=256
> > >
> > > You are currently subscribed to ntdev as: xxxxx@3dlabs.com
> > > To unsubscribe send a blank email to xxxxx@lists.osr.com
> >
> > > ForwardSourceID:NT0000710E
> >
> >
> > —
> > Questions? First check the Kernel Driver FAQ at http://www.
> osronline.com/article.cfm?id=256
> >
> > You are currently subscribed to ntdev as: Christiaan.
> xxxxx@compaqnet.be
> > To unsubscribe send a blank email to xxxxx@lists.osr.com
> >
>
> —
> Questions? First check the Kernel Driver FAQ at http://www.
> osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@3dlabs.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com

> ForwardSourceID:NT0000713A

Yeah, and a lot of that code will probably break on a machine with more
available registers, because the compiler has more chance of keeping
something in a register for more than a couple of instructions.

Not sure how you define the “memory barrier”. Here’s what the CPU can
supply:
The memory barrier can be inserted with a variety of macros/intrinsics, at
least in the 13.xxx compiler that is supplied with the 3790 DDK.

Here’s a selection of names:
MemoryFence
MemoryBarrier
_mm_mfence

All map to the mfence instruction, which will “stall the processor until
all pending memory operations are completed”.

There’s also a “lfence” and “sfence”, which do the same as mfence, except
only for “loads” and “stores” respectively. So for example

mov eax, dword ptr somewhere
mov dword ptr someother, edx
followed by:
mfence - will stall until both the read of “somewhere” and write to
“someother” is completed.
lfence - will stall until eax contains the contents of “somewhere”
sfence - will stall until edx has been written to “someother”, but eax may
still not contain the value of “somewhere”.

Note that “written” in this case may well mean “has been stored in the
cache”, not necessarily that it’s actually been written to the external
memory.

If you actually want to have things physically written to the memory, you
need to flush the CPU cache. That’s a very “naughty” thing to do, because
you may well flush a bunch of code/data that isn’t “yours”.

wbinvd is the instruction to do this (and it’s a privileged
instruction, so kernel mode only). Haven’t bothered to look for a “ddk”
name for it.

There’s also clflush, which takes an address in memory, and makes sure that
any cache-line corresponding to this address is flushed from the cache.
This can be done from user mode.

Of course, the more I think about, the more I think your question is:
How do I convince the compiler to “reload” any data held in registers. The
answer to this is: You can’t. The compiler will at it’s own logic decide
to store things in memory or in registers, depending on what registers are
available and what it thinks is the most optimal.

Calling a function (non-inlined) will reduce the number of registers that
are available to the compiler, but it doesn’t prevent it from using
registers, it just reduces the chances of something being stored in a
register somewhat.


Mats

xxxxx@lists.osr.com wrote on 11/10/2004 03:40:33 PM:

Uh. Yes, I have to admit I never really tested it. But also with VC 7.1
(tried it myself now) the “while(!flag);” loop expands to a simple “jmp
label”. It looks like you’re right and it’s pure luck that 90% of the
programs that I’ve seen actually work (there are so many programmers that

just NEVER use volatile - and the bad thing is, they DO write
multithreaded programs).
BTW: is there a way to insert a memory-barrier manually, some
pragma/macro/function that assures that every values are written to
memory
before the barrier and every values are fetched again from memory after
the barrier?
Would be helpful in some situations…

Regards,
Paul Groke

Mats PETERSSON
> Gesendet von: xxxxx@lists.osr.com
> 10.11.2004 14:51
> Bitte antworten an “Windows System Software Devs Interest List”
>
> An: “Windows System Software Devs Interest List”
>
> Kopie:
> Thema: Re: [ntdev] Is this Multithread safe, or does it require
> Interlocked access?
>
>
>
>
>
>
>
> Paul,
>
> I don’t agree that compilers are intentionally “volatile unaware”
> friendly.
> It just happens that x86 doesn’t have particularly many registers, and
> there are other aspects of C/C++, such as “pointer aliasing” that
prevents
> the compiler from being extremely aggressive when it comes to being
clever
> with loading data into registers and keeping it there.
>
> But I completely agree that any data that is EXPECTED to be modified by
> another thread (or through interrupts, external hardware modification,
> etc)
> HAS to be marked volatile if the compiler is expected to generate code
> that
> will work.
>
> In your example of “while(!flag);”, the compiler, without volatile, will
> be
> allowed (and probably will if you turn on optimisation) to generate the
> following code:
>
> mov eax, dword ptr flag
> L1:
> test eax, eax
> jz :L1
>
> However, testing shows that the compiler generates this:
> $L858:
> jmp SHORT $L858
> from this:
>
> int flag;
>
> int main(void)
> {
> flag = 0;
> while(!flag);
> }
>
> The same code with volatile looks like this:
>
> $L858:
> mov eax, dword ptr flag
> test eax, eax
> jz $L858
>
> So without volatile, the compiler doesn’t generate “correct” code that
> would work if the flag was set externally. With volatile, it does.
>
> –
> Mats
>
> xxxxx@lists.osr.com wrote on 11/10/2004 01:31:54 PM:
>
> > Yes, sure, it doesn’t replace the InterlockedXXX() functions (or other
> > means of synchronization) - I mentioned this only as an addition to the
> > reactions already posted. Of course if only InterlockedXXX()
functions
> > are used then making the array volatile is optional. Modern compilers
> are
>
> > very aware of “volatile unaware programmers”, but there’s absolutely no
> > guarantee the compiler will guess it right every time.
> > And as I understand volatile, it’s required if some variable can be
> > changed from anything else than the current thread of execution - I
> think
>
> > C++ does not require the compiler to create working code if such a
> > required volatile is omitted. AFAIK even a simple loop like
> > “while(!flag);” without “flag” being volatile is not required to work
as
> > expected.
> >
> > Regards,
> >
> > Paul Groke
> >
> >
> >
> >
> >
> > Mats PETERSSON
> > Gesendet von: xxxxx@lists.osr.com
> > 10.11.2004 14:13
> > Bitte antworten an “Windows System Software Devs Interest List”
> >
> > An: “Windows System Software Devs Interest List”
> >
> > Kopie:
> > Thema: Re: [ntdev] Is this Multithread safe, or does it
require
> > Interlocked access?
> >
> >
> >
> >
> >
> >
> >
> > Paul, that’s a good point. That makes the compiler more strict on the
> > operations.
> >
> > But it still doesn’t guarantee that two threads will not perform
> > operations
> > in parallel to the thread. It does however prevent the compiler from
> > “caching” the value of the GlobalArray in a register for some amount
> of
> > time during the processing.
> >
> > The volatile keyword essentially tells the compiler that “Something
else
> > may change this data at ANY time, so don’t save it away expecting that
> you
> > know what’s happening to it”.
> >
> > –
> > Mats
> > xxxxx@lists.osr.com wrote on 11/10/2004 12:56:17 PM:
> >
> > > > char * GlobalArray[20];
> > >
> > > should probably be
> > > char * volatile GlobalArray[20];
> > > or even
> > > volatile char * volatile GlobalArray[20];
> > > depending on what you touch in those threads. In 90% of all cases it
> > will
> >
> > > work without, but I’d prefer to have it defined as volatile.
> > > bye,
> > >
> > > Paul Groke
> > >
> > >
> > >
> > >
> > >
> > > Mats PETERSSON
> > > Gesendet von: xxxxx@lists.osr.com
> > > 10.11.2004 11:18
> > > Bitte antworten an “Windows System Software Devs Interest List”
> > >
> > > An: “Windows System Software Devs Interest List”
> > >
> > > Kopie:
> > > Thema: Re: [ntdev] Is this Multithread safe, or does it
> require
> > > Interlocked access?
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Like others have said, although there is an atomic ADD operation, the
> > > compiler may well dole out a separate MOV, ADD, MOV to compute the
> > value,
> > > because it has some reason to do that, such as “it’s more optimized”.
> > >
> > > I spent several days looking for a bug in a RTOS that was caused by
> the
> > > fact that this version of the OS didn’t do “x++” atomic, whilst on
all
> > the
> > > other CPU’s that used the same source code, it was atomic. In this
> case
> > it
> > > was an interrupt that also affected the same variable that caused the
> > > problem. This only happened rarely, and only in the stress-test.
Funny
> > > thing was that it only became clear that it was a bug when we got a
> > faster
> > > version of the CPU, because prior to that we’d manage to run the 24
> hour
> > > stress test to finish before running out of memory :wink:
> > >
> > > Anyways, if you have two threads accessing the same data, you need to
> > use
> > > specific “Locked” operations, or use SpinLocks. Which one is better
> for
> > > your particular case depends on what you want to achieve.
> > >
> > > The advantage of the InterlockedXXX instruction, you don’t have to
> call
> > > the
> > > OS to get hold of the lock. But only one processor in the system may
> > hold
> > > the LOCK at any given time, so you’re essentially forcing the add to
> be
> > > single processor anyways.
> > >
> > > On the other hand, if you grab a spinlock, the code to perform the
> > actual
> > > calculation will most likely run a bit quicker, but you loose some
> time
> > > actually grabbing the lock, and of course, the lock is a bit courser
> > > granularity.
> > >
> > > I suspect that your thread architecture in a real situation will be a
> > lot
> > > more complex, and if all you’re doing to the data is adding to it,
and
> > the
> > > rest of the operations are not touching that data (such as some sort
> of
> > > statistics counters), then that would be a good case for
> InterlockedXXX.
> > > On
> > > the other hand, if you’re adding to the data, then checking it’s
value
> > and
> > > taking some sort of decision based on it’s value, you probably need a
> > > spinlock or similar.
> > >
> > > –
> > > Mats
> > >
> > >
> > >
> > > -------- Notice --------
> > > The information in this message is confidential and may be legally
> > > privileged. It is intended solely for the addressee. Access to this
> > > message by anyone else is unauthorized. If you are not the intended
> > > recipient, any disclosure, copying or distribution of the message, or
> > any
> > > action taken by you in reliance on it, is prohibited and may be
> > unlawful.
> > > If you have received this message in error, please delete it and
> contact
> > > the sender immediately. Thank you.
> > >
> > >
> > > xxxxx@lists.osr.com wrote on 11/09/2004 07:54:16 PM:
> > >
> > > > I am using a global pointer here for ease of explanation, but it
> > > > will end up in some sort of class, or context structure.
> > > >
> > > > Thread1, and Thread2 will execute simultaneously, in either a
> > > > hyperthreading, or “Dual Processing” environment. I.e., elements of
> > > > GlobalArray will be incremented by 2 threads simultaneously. Are
> > > > there cacheing issues, or synchronization issues here? Is the “Add”
> > > > instruction atomic? Or, do I need to grab some sort of spinlock, or
> > > > use InterlockedXXX instructions in thread1, and thread2?
> > > >
> > > > Thanks,
> > > > James
> > > >
> > > >
> > >
> >
>
------------------------------------------------------------------------------------

>
> >
> > >
> > > > char * GlobalArray[20];
> > > >
> > > > Initialize()
> > > > {
> > > > // Set all elements of GlobalArray to 0.
> > > > ZeroArray(GlobalArray);
> > > >
> > > > Launch(Thread1);
> > > > Launch(Thread2);
> > > >
> > > > // Wait until Thread1 and 2 Complete.
> > > > wait();
> > > >
> > > > // Check the value of GlobalArray is correct.
> > > > // GlobalArray[1] = 2, GlobalArray[2] = 4, GlobalArray[3] = 6
> etc…
> > > >
> > > > }
> > > >
> > > > Thread1()
> > > > {
> > > > for(i=0; i<20; i++)
> > > > GlobalArray[i] += i;
> > > > }
> > > >
> > > > Thread2()
> > > > {
> > > > for(i=0; i<20; i++)
> > > > GlobalArray[i] += i;
> > > > } —
> > > > Questions? First check the Kernel Driver FAQ at http://www.
> > > > osronline.com/article.cfm?id=256
> > > >
> > > > You are currently subscribed to ntdev as: unknown lmsubst tag
> > argument:
> > > ‘’
> > > > To unsubscribe send a blank email to
> xxxxx@lists.osr.com
> > > > ForwardSourceID:NT00007066
> > >
> > >
> > > —
> > > Questions? First check the Kernel Driver FAQ at
> > > http://www.osronline.com/article.cfm?id=256
> > >
> > > You are currently subscribed to ntdev as: xxxxx@tab.at
> > > To unsubscribe send a blank email to xxxxx@lists.osr.com
> > >
> > > Please visit us: www.tab.at www.championsnet.net
> > > www.silverball.com
> > >
> > >
> > > —
> > > Questions? First check the Kernel Driver FAQ at http://www.
> > > osronline.com/article.cfm?id=256
> > >
> > > You are currently subscribed to ntdev as: xxxxx@3dlabs.com
> > > To unsubscribe send a blank email to xxxxx@lists.osr.com
> >
> > > ForwardSourceID:NT00007102
> >
> >
> > —
> > Questions? First check the Kernel Driver FAQ at
> > http://www.osronline.com/article.cfm?id=256
> >
> > You are currently subscribed to ntdev as: xxxxx@tab.at
> > To unsubscribe send a blank email to xxxxx@lists.osr.com
> >
> > Please visit us: www.tab.at www.championsnet.net
> > www.silverball.com
> >
> >
> > —
> > Questions? First check the Kernel Driver FAQ at http://www.
> > osronline.com/article.cfm?id=256
> >
> > You are currently subscribed to ntdev as: xxxxx@3dlabs.com
> > To unsubscribe send a blank email to xxxxx@lists.osr.com
>
> > ForwardSourceID:NT0000710E
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@tab.at
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>
> Please visit us: www.tab.at www.championsnet.net
> www.silverball.com
>
>
> —
> Questions? First check the Kernel Driver FAQ at http://www.
> osronline.com/article.cfm?id=256
>
> You are currently subscribed to ntdev as: xxxxx@3dlabs.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com

> ForwardSourceID:NT0000714A

This is not safe. Guard the array with a lock.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

----- Original Message -----
From: xxxxx@conexant.com
To: Windows System Software Devs Interest List
Sent: Tuesday, November 09, 2004 10:54 PM
Subject: [ntdev] Is this Multithread safe, or does it require Interlocked access?

I am using a global pointer here for ease of explanation, but it will end up in some sort of class, or context structure.

Thread1, and Thread2 will execute simultaneously, in either a hyperthreading, or “Dual Processing” environment. I.e., elements of GlobalArray will be incremented by 2 threads simultaneously. Are there cacheing issues, or synchronization issues here? Is the “Add” instruction atomic? Or, do I need to grab some sort of spinlock, or use InterlockedXXX instructions in thread1, and thread2?

Thanks,
James


char * GlobalArray[20];

Initialize()
{
// Set all elements of GlobalArray to 0.
ZeroArray(GlobalArray);

Launch(Thread1);
Launch(Thread2);

// Wait until Thread1 and 2 Complete.
wait();

// Check the value of GlobalArray is correct.
// GlobalArray[1] = 2, GlobalArray[2] = 4, GlobalArray[3] = 6 etc…

}

Thread1()
{
for(i=0; i<20; i++)
GlobalArray[i] += i;
}

Thread2()
{
for(i=0; i<20; i++)
GlobalArray[i] += i;
} —
Questions? First check the Kernel Driver FAQ at http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: unknown lmsubst tag argument: ‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com

> required volatile is omitted. AFAIK even a simple loop like

“while(!flag);” without “flag” being volatile is not required to work as
expected.

Exactly. It will be optimized away.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com