> Peter, I chose this forum for two reasons: (1) I know there are some sharp
and experienced engineers (some from MSFT) who frequent this forum and can
provide some useful insight. (2) I figured other developers might benefit
from the discussion.
The answers to your questions can be found in the original post but I’ll
repeat them here.
> An interlocked WRITE? And interlocked READ? What would those could BE?
They would be reads and writes that are guaranteed to be atomic with
respect to the InterlockedXxx APIs. The documentation for the
InterlockedXxx APIs say only that they are guaranteed to be atomic with
respect to *each other*.
> Can somebody give me an example of an architecture were a single store
> instruction of a native-sized value is NOT atomic?
Most architectures cannot do an atomic read of a non-native-size-aligned
value. So the limitation typically includes “aligned”
Atomicity of the read or write is not the issue. It is atomicity with
respect to the InterlockedXxx APIs. In other words can the read or write
happen *during* an InterlockedXxx operation that is manipulating the same
memory?
It has been pointed out that (a) for architectures with a
hardware-implemented bus lock (either explicit, such as the LOCK prefix,
or implicit, as the PDP-11 and VAX with “read-pause-write” instructions,
such as INC, DEC, and a few others), the atomicity is implicit and (b) for
any example you can construct that “proves” an InterlockedRead/Write is
necessary to get the answer you think should be correct, the example
represents erroneous code that would not work correctly even if they DID
exist. The example originally posed demonstrated that one particular
sequence would result in what was referred to as a “BAD” value, but in the
presence of an InterlockedWrite, I can demostrate a possible sequence that
STILL produces a “BAD” answer. You showed only one possible sequence that
gave a “BAD” result, and claimed that InerlockedWrite would make THAT
particular sequence come up with a different answer. What you failed to
demonstrate were ALL possible sequences that could execute with
InterlockedWrite, a d therefore failed to notice that the code is
erroneous because there exist different possible sequences which will
produce different answers. Demonstrating that it fails in one case does
not demonstrate that the requirement for InterlockedWrite will result in
EVERY sequence of code being correct. So demanding a hardware feature
that guarantees that under SOME execution patterns the result of executing
erroneous code will result in a “GOOD” result does not make sense.
Below is the ARMv7 assembly for the try_lock() and release_lock()
functions from the original post. I changed the return value of
release_lock() in the failure case to STATUS_TIMEOUT for clarity because
it makes the assembly a little shorter.
try_lock:
00010BB0: F3BF8F5B dmb #11
00010BB4: 2101 movs r1,#1
00010BB6: E8502F00 ldrex r2,[r0]
00010BBA: E8401300 strex r3,r1,[r0]
00010BBE: 2B00 cmp r3,#0
00010BC0: D1F9 bne 0x00010bb6
00010BC2: F3BF8F5B dmb #11
00010BC6: B90A cbnz r2,0x00010bcc
00010BC8: 2000 movs r0,#0
00010BCA: 4770 bx r14
00010BCC: F44F7081 mov.w r0,#258 ; 0x102
00010BD0: 4770 bx r14
release_lock:
00010BE0: 2300 movs r3,#0
00010BE2: 6003 str r3,[r0,#0]
00010BE4: 4770 bx r14
As you can see the InterlockedExchange() call uses LDREX/STREX to perform
the read/write. If the HW detects a memory access between these
instructions then STREX will fail to update memory and it will return the
value 1 in r3. The code will loop and repeat the operation until STREX
succeeds and returns the value 0 in r3.
Can you prove (in the sense of an irrefutable mathematical proof) that
this code (and I don’t know ARM assembler, so I don’t know what the
instruction specs are) works under EVERY possible instruction
interleaving? And if these are the functions you call in your first
example, can you prove (again, as a rigorous mathematical proof)" that
under EVERY possible interleaving of instructions, your result is
absolutely deterministic? Changing the outcome of erroneous code in a
single case does not constitute correctness.
This prevents the failure scenario I outlined in the original post.
However, release_lock() is still not quite correct because it needs a
memory barrier before the store instruction. Therefore, an
InterlockedWrite() would be useful here for that reason.
But a memory-barrier write is not the same as the InterlockedWrite you are
requesting, and again, you have to PROVE that under EVERY POSSIBLE timing
sequence your code produces an identical result. If ANY sequence
involving InterlockedWrite can result in a “BAD” value, then the code is
simply wrong and the whole concept is pointless.
Also, this is just an example of how InterlockedExchange() is implemented
on one architecture (ARMv7). ARM architectures before ARMv6 did not have
LDREX/STREX. All they had was the SWP instruction. You can implement
InterlockedExchange() using SWP but not InterlockedIncrement() and others.
Therefore, on pre-ARMv6 all the InterlockedXxx APIs would need to use an
internal mutex to guarantee atomicity between themselves. Simple reads
and writes would need to use the same mutex hence the need for an
InterlockedRead and InterlockedWrite.
Yes. The /360 series had a similar problem.
joe
NTDEV is sponsored by OSR
Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
OSR is HIRING!! See http://www.osr.com/careers
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer