InterlockedRead / InterlockedWrite

> Acquiring the InterlockedXxx spinlock (if it exists) before the write operation in release_lock()

guarantees it will occur before or after (not during) the read/write operation in try_lock()

Now I got it - you are describing a logically faulty scenario when the same variable gets updated by both
InterlockedXXX and “regular” writes with MOV instruction. Therefore, your question basically is “why don’t hardware designers want to introduce a feature that would allow me to write my logically faulty code safely”…

The easiest way to realize that such approach is logically faulty is, indeed, to start thinking of it in terms of spinlocks. In order to make it safe you would have to share this lock across InterlockedXXX and “regular” writes. However, it is (hopefully) understandable that InterlockedXXX has to acquire and release this lock
transparently to the rest of the world. Otherwise it is already going to be something other than interlocked operation - in terms of hardware it implies keeping the bus locked across multiple instructions…

Anton Bassov

Alex:

  1. On what lock try_lock() and release_lock() operate?

They operate on whatever lock the caller supplies (parameter x).

  1. What is *x in pseudo-code? Is it related in any way to the lock for try_lock() and release_lock()?

The try_lock() and release_lock() functions are not pseudo-code. They are valid compilable C code. The parameter x is the lock.

  1. If *x is the same lock as try_lock() and release_lock() operate on, WHY THE
    HELL you read and modify it outside of those functions?

I did not intend to imply the modification of the lock outside of try_lock() and release_lock(). The entire sequence of events in the failure scenario occur inside try_lock() for P1 and inside release_lock() for P2.

Anton:

Now I got it - you are describing a logically faulty scenario when the
same variable gets updated by both InterlockedXXX and “regular” writes
with MOV instruction. Therefore, your question basically is “why don’t
hardware designers want to introduce a feature that would allow me to
write my logically faulty code safely”…

No, my question is “Why isn’t there an InterlockedWrite()”? Possible answers include:

  1. It is not needed because the interlocked functions never use a spinlock internally. They are always (and will always be) atomic with respect to simple reads and writes. If so, then MSDN is misleading when it says “This function is atomic with respect to calls to other interlocked functions.” Why no mention of atomicity with respect to simple reads and writes?

  2. It is expected that you use an existing interlocked function to simulate InterlockedWrite(). For example, if you discard the result of InterlockedExchange() it is equivalent to InterlockedWrite(). However, this is less efficient. I count 137 documented InterlockedXxx() APIs on MSDN and most of these are variants that improve efficiency (omitting memory barriers for example). So why leave out InterlockedWrite()?

> -----Original Message-----

From: xxxxx@lists.osr.com [mailto:bounce-554856-
xxxxx@lists.osr.com] On Behalf Of xxxxx@qualcomm.com
Sent: Tuesday, April 01, 2014 12:35 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] InterlockedRead / InterlockedWrite

Below is the MSDN documentation for InterlockedExchange():

http://msdn.microsoft.com/en-
us/library/windows/desktop/ms683590(v=vs.85).aspx

It says “This function is atomic with respect to calls to other
interlocked functions.” That seems reasonable because the caller should
not make any assumptions about the implementation. On most architectures
there is HW support for the interlocked functions and they are also
atomic with respect to simple reads and writes. However, if HW support
is unavailable, the interlocked functions would need to use spinlocks.
Why, then, is there no InterlockedRead() or InterlockedWrite()? Consider
the following example:

NTSTATUS try_lock(ULONG *x)
{
return(InterlockedExchange(x, 1) == 0 ? STATUS_SUCCESS :
STATUS_UNSUCCESSFUL);
}

void release_lock(ULONG *x)
{
*x = 0; // InterlockedWrite() is not available
}

Now suppose the follow sequence of events from two different processes
(P2 already owns the lock):

P1: calls try_lock()
P1: acquires spinlock
P1: reads *x == 1
P2: calls release_lock()
P2: writes *x = 0
P1: writes *x = 1
P1: releases spinlock
P1: try_lock() returns STATUS_UNSUCCESSFUL

Now the lock is left in a bad state and all future calls to try_lock()
will fail.

I don’t believe that Windows runs on any hardware that requires a separate spinlock for the InterlockedExchange, so your scenario won’t happen.
There is no “P1: acquires spinlock” and “P1: releases spinlock”.
Additionally, “P1: reads *x == 1” and “P1: writes *x = 1” are atomic with respect to “P2: writes *x = 0”.
The CPU will not be interrupted, and the thread won’t be pre-empted, between the load and the store, on any supported architecture.

I’m sure Jake, Ken (or maybe PGV) will correct me if I’m wrong about that.

Phil

Not speaking for LogRhythm
Phil Barila | Senior Software Engineer
720.881.5364 (w)
The Security Intelligence Company
A LEADER 2013 SIEM Magic Quadrant
Perfect 5-Star Rating in SC Magazine for 5 Consecutive Years

> Anton:

> Now I got it - you are describing a logically faulty scenario when the
> same variable gets updated by both InterlockedXXX and “regular” writes
> with MOV instruction. Therefore, your question basically is “why don’t
> hardware designers want to introduce a feature that would allow me to
> write my logically faulty code safely”…

No, my question is “Why isn’t there an InterlockedWrite()”? Possible
answers include:

  1. It is not needed because the interlocked functions never use a spinlock
    internally. They are always (and will always be) atomic with respect to
    simple reads and writes. If so, then MSDN is misleading when it says
    “This function is atomic with respect to calls to other interlocked
    functions.” Why no mention of atomicity with respect to simple reads and
    writes?

What, exactly, would InterlockedWrite lock AGAINST? If I write *x = 0,
then either (a) the memory access required by the Interlockedxxx function
write will have to wait for the current store operation to finish or (b)
the store instruction would wait for the current Interlockedxxx operation
to finish. There is a fundamental piece of information you are completely
missing here: no memory location can be accessed concurrently in any way,
shape, or form by any two paths of execution in any arrangement of cores
from single-core to scores-of-cores. This is known to all programmers (or
so I thought). Note that caching has to tap-dance carefully to maintain
the illusion that caching does not exist, and therefore caching is
irrelevant to this discussion.

So there is no need for InterlockedRead ot InterlockedWrite because the
atomicity is implicit in the implementation of the hardware. Do note that
if the architecture requires, say, that you have to set a spin lock (e.g.,
the IBM/360 series had exactly ONE interlocked instruction, Test-and-Set,
TS, so InterlockedIncrement would be done by

spin:
TS globalLock
branch-if-not-gotten spin
L 0, Value
ADD 0, #1
ST 0, value
XOR 0,0
SB 0, globalLock

as I recollect, L would load a (32-bit) word, ST would store a (32-bit)
word, ADD would add a value to a register (0 would be Register 0). There
were no immediate instructions, so #1 created a literal in the literal
pool, and the instruction referenced that location (I will not go into the
details that no instruction could directly address memory, only give a
12-bit offset from a base register, not relevant to this discussion). SB
stored an 8-bit byte. 16-bit values were called “halfwords”, in case you
were eondering. Note that an “interlocked store” would also make no sense
in this architecture (I cannot come up with a scenario in which this could
possibly make sense, and I suspect you can’t, either). If you look at the
consequences of a store at any point, the outcome does not change in a
deterministic way if InterlockedWrite exists.

  1. It is expected that you use an existing interlocked function to
    simulate InterlockedWrite(). For example, if you discard the result of
    InterlockedExchange() it is equivalent to InterlockedWrite(). However,
    this is less efficient. I count 137 documented InterlockedXxx() APIs on
    MSDN and most of these are variants that improve efficiency (omitting
    memory barriers for example). So why leave out InterlockedWrite()?

Because it is not needed. It’s that simple.
joe


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

> Joe, if you didn’t comprehend the original question then you should have

asked for clarification or kept quiet.

What part of the original question did I fail to comprehend? Other than
the horrific dysfunctional example which was used to “prove” the need for
an InterlockedWrite.

I’ve been reasoning about concurrency for over four decades. I think I
can not only comprehend a badly-formed question, but comprehend an
artificially bogus answer which includes incorrect code, and claims that
there is a problem because the incorrect code operates, well, incorrectly.
joe


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

>to use spinlocks. Why, then, is there no InterlockedRead() or InterlockedWrite()?

InterlockedWrite is InterlockedExchange with result ignore


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

> missing here: no memory location can be accessed concurrently in any way,

…unless it spans the cache line boundary, since in this case there will be 2 accesses at physical level.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

> So there is no need for InterlockedRead ot InterlockedWrite because the atomicity is

implicit in the implementation of the hardware.

I am afraid the OP would not accept this plainly obvious answer anyway. Therefore, I am out of it - any further attempts to explain something to the OP seem to be just a waste of time…

Anton Bassov

@Dr Joe:

Lack of atomic modification (such as in IBM360) will cause the following problem even with a spinlock:

Suppose A=1.
The increment protected with a spinlock will fetch A. Then a different thread will set it to 3. Then the interlocked increment thread sets it to 2.

Assignment to 3 will be lost. This is the hazard of architectures without hardware atomic guarantee.

And a spinlock in a time slice OS is a lousy solution.

This situation to tackle this arised before there were HW support like
most anything (

Paging HW MMU - that was necessity due to overlay, segmentation etc on
software
Paging HW for host machines that is supposed to handle VM monitor
Interrupt remapper, DMA remapper
[…]
U name it !

That is why test-and-set instruction came to being …

I’m sure Joe know this very well, no need to mention that. Just too
verbose answers from him got noticed to spearhead the logics he was
trying to put forward I think!

-pro

On 4/2/2014 7:23 AM, xxxxx@broadcom.com wrote:

@Dr Joe:

Lack of atomic modification (such as in IBM360) will cause the following problem even with a spinlock:

Suppose A=1.
The increment protected with a spinlock will fetch A. Then a different thread will set it to 3. Then the interlocked increment thread sets it to 2.

Assignment to 3 will be lost. This is the hazard of architectures without hardware atomic guarantee.

And a spinlock in a time slice OS is a lousy solution.


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

xxxxx@qualcomm.com wrote:

It is surprising how much time some people will spend writing responses and how little time reading and understanding the original question.

There are TWO locks in the example. The first lock (which I refer to as “lock”) is a trylock. This is what try_lock() and release_lock() manipulate. The second lock (which I refer to as “spinlock”) is a *hypothetical* lock that the interlocked functions *may* be using to guarantee atomicity between each other. It’s hypothetical because we don’t know how the interlocked functions are implemented on every architecture we may encounter.

The try_lock() and release_lock() functions would be safe and correct if I used InterlockedWrite() instead of a simple write inside release_lock(). But InterlockedWrite() doesn’t exist.

I think you misunderstand how the interlocked primitives work.
InterlockedExchange does not guarantee that all other interlocked
operations will block while it executes. There is no global “Interlock”
lock at work here. It merely guarantees that the exchange operation
itself will happen atomically – that no other processor can execute any
instruction in the middle of that exchange operation.

That’s why InterlockedRead and InterlockedWrite doesn’t make sense –
Windows only operates on machines where writes are already atomic.

Let’s look at your example.

P1: calls try_lock()
P1: acquires spinlock
P1: reads *x == 1
P2: calls release_lock()
P2: writes *x = 0
P1: writes *x = 1
P1: releases spinlock
P1: try_lock() returns STATUS_UNSUCCESSFUL

Now the lock is left in a bad state and all future calls to try_lock() will fail.

The sequence you show cannot happen, because P2 **CANNOT** execute
between those two steps of P1. There will be no interrupts between
them, so there cannot be a context switch, and all other CPUs will be
temporarily put on hold for the few cycles this requires.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Heres’s what I think: I think this is all silly.

An interlocked WRITE? And interlocked READ? What would those could BE?

Can somebody give me an example of an architecture were a single store instruction of a native-sized value is NOT atomic? Because I can’t think of one. If there WERE such a thing, then EVERY SINGLE store operation would have to be MADE to be atomic. You could never trust anything you write to or read from memory.

Or, perhaps I’m missing some deeper question.

Mr. Raymond… is this just another random academic question or does it have any bearing on a real problem? If it’s the latter I’m happy to try to help. If it’s the former, it should go to NTTALK.

Peter
OSR
@OSRDrivers

Peter, I chose this forum for two reasons: (1) I know there are some sharp and experienced engineers (some from MSFT) who frequent this forum and can provide some useful insight. (2) I figured other developers might benefit from the discussion.

The answers to your questions can be found in the original post but I’ll repeat them here.

An interlocked WRITE? And interlocked READ? What would those could BE?

They would be reads and writes that are guaranteed to be atomic with respect to the InterlockedXxx APIs. The documentation for the InterlockedXxx APIs say only that they are guaranteed to be atomic with respect to *each other*.

Can somebody give me an example of an architecture were a single store instruction of a native-sized value is NOT atomic?

Atomicity of the read or write is not the issue. It is atomicity with respect to the InterlockedXxx APIs. In other words can the read or write happen *during* an InterlockedXxx operation that is manipulating the same memory?

Below is the ARMv7 assembly for the try_lock() and release_lock() functions from the original post. I changed the return value of release_lock() in the failure case to STATUS_TIMEOUT for clarity because it makes the assembly a little shorter.

try_lock:
00010BB0: F3BF8F5B dmb #11
00010BB4: 2101 movs r1,#1
00010BB6: E8502F00 ldrex r2,[r0]
00010BBA: E8401300 strex r3,r1,[r0]
00010BBE: 2B00 cmp r3,#0
00010BC0: D1F9 bne 0x00010bb6
00010BC2: F3BF8F5B dmb #11
00010BC6: B90A cbnz r2,0x00010bcc
00010BC8: 2000 movs r0,#0
00010BCA: 4770 bx r14
00010BCC: F44F7081 mov.w r0,#258 ; 0x102
00010BD0: 4770 bx r14

release_lock:
00010BE0: 2300 movs r3,#0
00010BE2: 6003 str r3,[r0,#0]
00010BE4: 4770 bx r14

As you can see the InterlockedExchange() call uses LDREX/STREX to perform the read/write. If the HW detects a memory access between these instructions then STREX will fail to update memory and it will return the value 1 in r3. The code will loop and repeat the operation until STREX succeeds and returns the value 0 in r3.

This prevents the failure scenario I outlined in the original post. However, release_lock() is still not quite correct because it needs a memory barrier before the store instruction. Therefore, an InterlockedWrite() would be useful here for that reason.

Also, this is just an example of how InterlockedExchange() is implemented on one architecture (ARMv7). ARM architectures before ARMv6 did not have LDREX/STREX. All they had was the SWP instruction. You can implement InterlockedExchange() using SWP but not InterlockedIncrement() and others. Therefore, on pre-ARMv6 all the InterlockedXxx APIs would need to use an internal mutex to guarantee atomicity between themselves. Simple reads and writes would need to use the same mutex hence the need for an InterlockedRead and InterlockedWrite.

> Peter, I chose this forum for two reasons: (1) I know there are some sharp

and experienced engineers (some from MSFT) who frequent this forum and can
provide some useful insight. (2) I figured other developers might benefit
from the discussion.

The answers to your questions can be found in the original post but I’ll
repeat them here.

> An interlocked WRITE? And interlocked READ? What would those could BE?

They would be reads and writes that are guaranteed to be atomic with
respect to the InterlockedXxx APIs. The documentation for the
InterlockedXxx APIs say only that they are guaranteed to be atomic with
respect to *each other*.

> Can somebody give me an example of an architecture were a single store
> instruction of a native-sized value is NOT atomic?

Most architectures cannot do an atomic read of a non-native-size-aligned
value. So the limitation typically includes “aligned”

Atomicity of the read or write is not the issue. It is atomicity with
respect to the InterlockedXxx APIs. In other words can the read or write
happen *during* an InterlockedXxx operation that is manipulating the same
memory?

It has been pointed out that (a) for architectures with a
hardware-implemented bus lock (either explicit, such as the LOCK prefix,
or implicit, as the PDP-11 and VAX with “read-pause-write” instructions,
such as INC, DEC, and a few others), the atomicity is implicit and (b) for
any example you can construct that “proves” an InterlockedRead/Write is
necessary to get the answer you think should be correct, the example
represents erroneous code that would not work correctly even if they DID
exist. The example originally posed demonstrated that one particular
sequence would result in what was referred to as a “BAD” value, but in the
presence of an InterlockedWrite, I can demostrate a possible sequence that
STILL produces a “BAD” answer. You showed only one possible sequence that
gave a “BAD” result, and claimed that InerlockedWrite would make THAT
particular sequence come up with a different answer. What you failed to
demonstrate were ALL possible sequences that could execute with
InterlockedWrite, a d therefore failed to notice that the code is
erroneous because there exist different possible sequences which will
produce different answers. Demonstrating that it fails in one case does
not demonstrate that the requirement for InterlockedWrite will result in
EVERY sequence of code being correct. So demanding a hardware feature
that guarantees that under SOME execution patterns the result of executing
erroneous code will result in a “GOOD” result does not make sense.

Below is the ARMv7 assembly for the try_lock() and release_lock()
functions from the original post. I changed the return value of
release_lock() in the failure case to STATUS_TIMEOUT for clarity because
it makes the assembly a little shorter.

try_lock:
00010BB0: F3BF8F5B dmb #11
00010BB4: 2101 movs r1,#1
00010BB6: E8502F00 ldrex r2,[r0]
00010BBA: E8401300 strex r3,r1,[r0]
00010BBE: 2B00 cmp r3,#0
00010BC0: D1F9 bne 0x00010bb6
00010BC2: F3BF8F5B dmb #11
00010BC6: B90A cbnz r2,0x00010bcc
00010BC8: 2000 movs r0,#0
00010BCA: 4770 bx r14
00010BCC: F44F7081 mov.w r0,#258 ; 0x102
00010BD0: 4770 bx r14

release_lock:
00010BE0: 2300 movs r3,#0
00010BE2: 6003 str r3,[r0,#0]
00010BE4: 4770 bx r14

As you can see the InterlockedExchange() call uses LDREX/STREX to perform
the read/write. If the HW detects a memory access between these
instructions then STREX will fail to update memory and it will return the
value 1 in r3. The code will loop and repeat the operation until STREX
succeeds and returns the value 0 in r3.

Can you prove (in the sense of an irrefutable mathematical proof) that
this code (and I don’t know ARM assembler, so I don’t know what the
instruction specs are) works under EVERY possible instruction
interleaving? And if these are the functions you call in your first
example, can you prove (again, as a rigorous mathematical proof)" that
under EVERY possible interleaving of instructions, your result is
absolutely deterministic? Changing the outcome of erroneous code in a
single case does not constitute correctness.

This prevents the failure scenario I outlined in the original post.
However, release_lock() is still not quite correct because it needs a
memory barrier before the store instruction. Therefore, an
InterlockedWrite() would be useful here for that reason.

But a memory-barrier write is not the same as the InterlockedWrite you are
requesting, and again, you have to PROVE that under EVERY POSSIBLE timing
sequence your code produces an identical result. If ANY sequence
involving InterlockedWrite can result in a “BAD” value, then the code is
simply wrong and the whole concept is pointless.

Also, this is just an example of how InterlockedExchange() is implemented
on one architecture (ARMv7). ARM architectures before ARMv6 did not have
LDREX/STREX. All they had was the SWP instruction. You can implement
InterlockedExchange() using SWP but not InterlockedIncrement() and others.
Therefore, on pre-ARMv6 all the InterlockedXxx APIs would need to use an
internal mutex to guarantee atomicity between themselves. Simple reads
and writes would need to use the same mutex hence the need for an
InterlockedRead and InterlockedWrite.

Yes. The /360 series had a similar problem.
joe


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

xxxxx@flounder.com wrote:

> Below is the ARMv7 assembly for the try_lock() and release_lock()
> functions from the original post. I changed the return value of
> release_lock() in the failure case to STATUS_TIMEOUT for clarity because
> it makes the assembly a little shorter.
>
> try_lock:
> 00010BB0: F3BF8F5B dmb #11
> 00010BB4: 2101 movs r1,#1
> 00010BB6: E8502F00 ldrex r2,[r0]
> 00010BBA: E8401300 strex r3,r1,[r0]
> 00010BBE: 2B00 cmp r3,#0
> 00010BC0: D1F9 bne 0x00010bb6
> 00010BC2: F3BF8F5B dmb #11
> 00010BC6: B90A cbnz r2,0x00010bcc
> 00010BC8: 2000 movs r0,#0
> 00010BCA: 4770 bx r14
> 00010BCC: F44F7081 mov.w r0,#258 ; 0x102
> 00010BD0: 4770 bx r14
>
> release_lock:
> 00010BE0: 2300 movs r3,#0
> 00010BE2: 6003 str r3,[r0,#0]
> 00010BE4: 4770 bx r14
>
> As you can see the InterlockedExchange() call uses LDREX/STREX to perform
> the read/write. If the HW detects a memory access between these
> instructions then STREX will fail to update memory and it will return the
> value 1 in r3. The code will loop and repeat the operation until STREX
> succeeds and returns the value 0 in r3.
Can you prove (in the sense of an irrefutable mathematical proof) that
this code (and I don’t know ARM assembler, so I don’t know what the
instruction specs are) works under EVERY possible instruction
interleaving?

It may be that you have lost the original thread, but what he’s posting
there is the ARM7 code for the InterlockedExchange primitive as
generated by Microsoft’s ARM compiler. So, in this case, it’s not the
poster you’re challenging, but Microsoft. Microsoft coders are not
infallible, but I’m guessing this sequence has been pretty well reviewed.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

> @Dr Joe:

Lack of atomic modification (such as in IBM360) will cause the following
problem even with a spinlock:

Suppose A=1.
The increment protected with a spinlock will fetch A. Then a different
thread will set it to 3. Then the interlocked increment thread sets it to
2.

Assignment to 3 will be lost. This is the hazard of architectures without
hardware atomic guarantee.

And a spinlock in a time slice OS is a lousy solution.

Note that exactly the same problem arises if you use a spinlock at
DISPATCH_LEVEL in a multicore environment, and note also that the same
failure occurs if you have Interlocked instructions including
InterlockedWrite. The reason is that the code is fundamentally wrong, and
expecting some kind of memory magic will make this erroneous example work
is a futile wish.
joe


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

>> So there is no need for InterlockedRead ot InterlockedWrite because the

> atomicity is
> implicit in the implementation of the hardware.

I am afraid the OP would not accept this plainly obvious answer anyway.
Therefore, I am out of it - any further attempts to explain something to
the OP seem to be just a waste of time…

Anton Bassov

+1
joe


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

> xxxxx@qualcomm.com wrote:

> It is surprising how much time some people will spend writing responses
> and how little time reading and understanding the original question.
>
> There are TWO locks in the example. The first lock (which I refer to as
> “lock”) is a trylock. This is what try_lock() and release_lock()
> manipulate. The second lock (which I refer to as “spinlock”) is a
> *hypothetical* lock that the interlocked functions *may* be using to
> guarantee atomicity between each other. It’s hypothetical because we
> don’t know how the interlocked functions are implemented on every
> architecture we may encounter.
>
> The try_lock() and release_lock() functions would be safe and correct if
> I used InterlockedWrite() instead of a simple write inside
> release_lock(). But InterlockedWrite() doesn’t exist.

I think you misunderstand how the interlocked primitives work.
InterlockedExchange does not guarantee that all other interlocked
operations will block while it executes. There is no global “Interlock”
lock at work here. It merely guarantees that the exchange operation
itself will happen atomically – that no other processor can execute any
instruction in the middle of that exchange operation.

That’s why InterlockedRead and InterlockedWrite doesn’t make sense –
Windows only operates on machines where writes are already atomic.

Let’s look at your example.

> P1: calls try_lock()
> P1: acquires spinlock
> P1: reads *x == 1
> P2: calls release_lock()
> P2: writes *x = 0
> P1: writes *x = 1
> P1: releases spinlock
> P1: try_lock() returns STATUS_UNSUCCESSFUL
>
> Now the lock is left in a bad state and all future calls to try_lock()
> will fail.

The sequence you show cannot happen, because P2 **CANNOT** execute
between those two steps of P1. There will be no interrupts between
them, so there cannot be a context switch, and all other CPUs will be
temporarily put on hold for the few cycles this requires.

Not quite. In Windows, acquiring the spin lock raises the acquiring core
to DISPATCH_LEVEL, so the thread P2 cannot execute, but in a multicore
system, P2 might be running on a different core. But in that case, the
code is erroneous, EVEN if thread P2 acquires the same spinlock, because
the ordering the code can be different, and will therefore result in
different outcomes, meaning the code is erroneous anyway, independent of
the means of trying to lock the operations.

But I promised +1, but this was just too far off the mark to let pass.
joe


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Mr. Newcomer, since you are not the only one who was confused by the failure scenario I described in the original post I will repeat it here with a little more detail to clarify. Keep in mind that this scenario assumes that all the InterlockedXxx() APIs are implemented using an internal mutex that provides the needed atomicity. This is a hypothetical assumption and it may not be true for any architecture currently running Windows.

P1: enters try_lock(x)
P1: enters InterlockedExchange(x,1)
P1: acquires InterlockedXxx mutex
P1: reads *x == 1
P2: enters release_lock(x)
P2: writes *x = 0
P2: exits release_lock()
P1: writes *x = 1
P1: releases InterlockedXxx mutex
P1: exits InterlockedExchange()
P1: exits try_lock() which returns STATUS_UNSUCCESSFUL