> Below is the MSDN documentation for InterlockedExchange():
http://msdn.microsoft.com/en-us/library/windows/desktop/ms683590(v=vs.85).aspx
It says “This function is atomic with respect to calls to other
interlocked functions.” That seems reasonable because the caller should
not make any assumptions about the implementation. On most architectures
there is HW support for the interlocked functions and they are also atomic
with respect to simple reads and writes. However, if HW support is
unavailable, the interlocked functions would need to use spinlocks. Why,
then, is there no InterlockedRead() or InterlockedWrite()? Consider the
following example:
NTSTATUS try_lock(ULONG *x)
{
return(InterlockedExchange(x, 1) == 0 ? STATUS_SUCCESS :
STATUS_UNSUCCESSFUL);
}
void release_lock(ULONG *x)
{
*x = 0; // InterlockedWrite() is not available
}
Now suppose the follow sequence of events from two different processes (P2
already owns the lock):
P1: calls try_lock()
P1: acquires spinlock
The above sequence makes no sense, because it requires a read, whose
outcome is untested, so it is just a way to waste time. It accomplishes
nothing. In fact, whether it gets the lock or not in no way impacts any
of the remaining code.
P1: reads *x == 1
P2: calls release_lock()
P2: writes *x = 0
P1: writes *x = 1
P1: releases spinlock
Just to clarify: processes are irrelevant to this discussion. Only
threads matter, and they can belong to the same process. Note that there
are two-stage acquisition algorithms that work even on machines with no
interlocks, where an interrupt and context switch can occur between any
two instructions. It was even a question on a final exam in the OS course
I took in 1968, but I’m not going to try to rediscover it 46 years later.
But in the above code, you are right; the state of x is wrong. The reason
it is wrong is that your code is wrong; it makes no sense to allow thread
P2 to write to x without any synchronization, when thread P1believes it
has exclusive access. You are releasing te lock before doing te write.
There is this about sychronization: it doesn’t do any good if you don’t
use it correctly. Back in 1968, Edsgar Dijkstra wrote a now-famous paper
on the use of synchronization primitives P() and V(), and we have made
virtually no progress in understanding how to use these in a robust
fashion in the intervening 46 years. The “synchronized” attribute of Javs
and C# is just a P/V in sheep’s clothing. A tiny number of
mostly-academic languages have introduced meta-concepts such as “condition
variables”, but these languages tend to exist in only one compiler in one
university. The closest language to mainstream that exhibited anything
more sophisticated than P/V was Ada, with its “rendezvous” mechanism.
The reason your code example doesn’t work is simply that it is wrong, so
of course it looks like nonsense. That’s because it is. Consider the
race condition where P2 sets the value to 0 and P1sets it to 1. If those
threads are running on distinct cores, a timing change of a couple hundred
PICOseconds makes the difference between the sequence you show and a
sequence which leaves *x as 0. When you see something like this, you know
immediately that your code is just plain WRONG.
If, to get te address in the pointer x, you use &Schroedinger as the name
of the variable, this should give you a hint as to what you have done
wrong. The first access to the variable collapses the quantum states and
gives you A value, you just don’t know which one you have until you read
it.
Generally, for one value, InterlockedCompareAndExchange is all the locking
you need. No locking is required for a store. But when you have a tuple
of values, you want to change all members of the tuple to maintain some
correctness invariant. In this case you need a lock, because you want to
treat the N non-atomic modifications as a single atomic transaction. So
unlinking a block from a doubly-linked list requires a lock that prohibits
any other thread from either reading information for which the correctness
inariant is false, or concurrently modifying the tuple such that two
threads each think they have each left the tuple in a correct state. You
demonstrate none of that in your example. Note that you can create
locking structures that prevent concurrent access while still creating
completely erroneous results, such as the example you have given.
P1: try_lock() returns STATUS_UNSUCCESSFUL
Now the lock is left in a bad state and all future calls to try_lock()
will fail.
NTDEV is sponsored by OSR
Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
OSR is HIRING!! See http://www.osr.com/careers
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer