InterlockedRead / InterlockedWrite

Tim_Roberts · April 2, 2014, 6:03pm

>> P1: calls try_lock()
>> P1: acquires spinlock
>> P1: reads *x == 1
>> P2: calls release_lock()
>> P2: writes *x = 0
>> P1: writes *x = 1
>> P1: releases spinlock
>> P1: try_lock() returns STATUS_UNSUCCESSFUL
>>
>> Now the lock is left in a bad state and all future calls to try_lock()
>> will fail.
> The sequence you show cannot happen, because P2 **CANNOT** execute
> between those two steps of P1. There will be no interrupts between
> them, so there cannot be a context switch, and all other CPUs will be
> temporarily put on hold for the few cycles this requires.
Not quite. In Windows, acquiring the spin lock raises the acquiring core
to DISPATCH_LEVEL, so the thread P2 cannot execute, but in a multicore
system, P2 might be running on a different core.

You are completely missing the point. InterlockedExchange DOESN’T USE A
SPINLOCK. That is the fundamental flaw in his initial reasoning, and in
your followups. It uses instruction-level primitives that trigger
interprocessor communication paths to LOCK OUT the other processors
while the instruction runs. P2 cannot execute EVEN IN ANOTHER PROCESSOR.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Tim_Roberts · April 2, 2014, 6:04pm

xxxxx@qualcomm.com wrote:

Mr. Newcomer, since you are not the only one who was confused by the failure scenario I described in the original post I will repeat it here with a little more detail to clarify. Keep in mind that this scenario assumes that all the InterlockedXxx() APIs are implemented using an internal mutex that provides the needed atomicity.

That assumption is FALSE, and it is that failed assumption that stands
at the very core of this entire misguided thread.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

anton_bassov · April 2, 2014, 6:10pm

> An interlocked WRITE? And interlocked READ? What would those could BE?

Well, the whole thread starts reminding me of the one where a poster was asking how to handle interrupt in USB driver and was simply refusing to accept the fact that USB drivers don’t handle interrupts. We have more or less the same situation here - the OP just refuses to accept the fact that accessing the same memory location by both “regular” reads/writes and interlocked ops is just logically faulty…

For example, x86 and x86_64 ensure atomicity of interlocked ops (i.e. with LOCK prefix) in respect of both one another and “regular” reads/writes by the virtue of bus locking. So what??? What is reading a variable good for if this variable’s state may be changed by the time your code makes any practical use of this value??? This is what may happen if you access the same memory location by both “regular” reads/writes and interlocked ops. The only scenario when it may be useful is a “simple” check of the variable’s state before attempting interlocked operation. However, this is nothing more than a simple optimization that tries to
avoid performing unnecessary interlocked ops - it does not guarantee the consistency of the variable’s state, for understandable reasons…

Anton Bassov

Dan_Raymond · April 2, 2014, 6:20pm

@Tim, I did not miss your point. You stated that InterlockedXxx() never uses an internal mutex on any architecture that runs Windows. You may be right. But *my* point is that we shouldn’t be making assumptions about the internal implementation of InterlockedXxx(). We, as clients of the API, should be relying on the API documentation which states:

“This function is atomic with respect to calls to other interlocked functions.”

Notably absent from this quote is whether the function is atomic with respect to simple reads and writes.

Prokash_Sinha-1 · April 2, 2014, 6:22pm

Could it be that OP is trying to see if there is hole in write operation
( even if it is in natural boundary )…

A write is ( Read, Modify, then Write; Or is it just snap the updated
value in atomic way )…

If it is Read, modify, and write and if another Read,Modify, Write is
combed in a time line. One of the write could be lost. For example, if
two threads trying to increment a naturally aligned memory ( For
example, in a word aligned memory with one word length, so that no
multiple read, assemble and forward at the HW level )

-pro

On 4/2/2014 3:10 PM, xxxxx@hotmail.com wrote:

> An interlocked WRITE? And interlocked READ? What would those could BE?
Well, the whole thread starts reminding me of the one where a poster was asking how to handle interrupt in USB driver and was simply refusing to accept the fact that USB drivers don’t handle interrupts. We have more or less the same situation here - the OP just refuses to accept the fact that accessing the same memory location by both “regular” reads/writes and interlocked ops is just logically faulty…

For example, x86 and x86_64 ensure atomicity of interlocked ops (i.e. with LOCK prefix) in respect of both one another and “regular” reads/writes by the virtue of bus locking. So what??? What is reading a variable good for if this variable’s state may be changed by the time your code makes any practical use of this value??? This is what may happen if you access the same memory location by both “regular” reads/writes and interlocked ops. The only scenario when it may be useful is a “simple” check of the variable’s state before attempting interlocked operation. However, this is nothing more than a simple optimization that tries to
avoid performing unnecessary interlocked ops - it does not guarantee the consistency of the variable’s state, for understandable reasons…

Anton Bassov

NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Prokash_Sinha-1 · April 2, 2014, 6:32pm

But if it is between Read and Write race. May be it is better thinking
from transactional point of view. Meaning when ( at point in time) it is
read is what the reader gets … It could be before the write is fully
committed or after …

-prokash

On 4/2/2014 3:22 PM, Prokash Sinha wrote:

Could it be that OP is trying to see if there is hole in write
operation ( even if it is in natural boundary )…

A write is ( Read, Modify, then Write; Or is it just snap the updated
value in atomic way )…

If it is Read, modify, and write and if another Read,Modify, Write is
combed in a time line. One of the write could be lost. For example, if
two threads trying to increment a naturally aligned memory ( For
example, in a word aligned memory with one word length, so that no
multiple read, assemble and forward at the HW level )

-pro

On 4/2/2014 3:10 PM, xxxxx@hotmail.com wrote:
>> An interlocked WRITE? And interlocked READ? What would those could BE?
> Well, the whole thread starts reminding me of the one where a poster
> was asking how to handle interrupt in USB driver and was simply
> refusing to accept the fact that USB drivers don’t handle
> interrupts. We have more or less the same situation here - the OP
> just refuses to accept the fact that accessing the same memory
> location by both “regular” reads/writes and interlocked ops is just
> logically faulty…
>
>
> For example, x86 and x86_64 ensure atomicity of interlocked ops (i.e.
> with LOCK prefix) in respect of both one another and “regular”
> reads/writes by the virtue of bus locking. So what??? What is
> reading a variable good for if this variable’s state may be changed
> by the time your code makes any practical use of this value??? This
> is what may happen if you access the same memory location by both
> “regular” reads/writes and interlocked ops. The only scenario when it
> may be useful is a “simple” check of the variable’s state before
> attempting interlocked operation. However, this is nothing more than
> a simple optimization that tries to
> avoid performing unnecessary interlocked ops - it does not guarantee
> the consistency of the variable’s state, for understandable
> reasons…
>
>
>
> Anton Bassov
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Dan_Raymond · April 2, 2014, 6:33pm

@Anton,

the OP just refuses to accept the fact that accessing the same memory location by both “regular” reads/writes and interlocked ops is just logically faulty…

The whole purpose of the failure scenario in my original post was to demonstrate what can go wrong when you mix InterlockedExchange() with a simple write. That is why I asked about the omission of InterlockedWrite().

Prokash_Sinha-1 · April 2, 2014, 6:47pm

If the Interlocked exchange started ( which is in intel bus locking ),
write does not have a race…

IMO, you do have enough info in this thread to conclude that…

-pro

On 4/2/2014 3:33 PM, xxxxx@qualcomm.com wrote:

@Anton,

> the OP just refuses to accept the fact that accessing the same memory location by both “regular” reads/writes and interlocked ops is just logically faulty…
The whole purpose of the failure scenario in my original post was to demonstrate what can go wrong when you mix InterlockedExchange() with a simple write. That is why I asked about the omission of InterlockedWrite().

NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Peter_Viscarola_OSR · April 2, 2014, 7:14pm

OK, OK… Slow down everyone. Mr. Raymond can be, well, trying with some of these conceptual challenges (he and I have butted heads before). But in this case, he has a point… albeit a small one.

The docs DO say: “This function is atomic with respect to calls to other interlocked functions.”

He merely wants to know “OK then… Is it guaranteed to be atomic with respect to anything else?”

STRICTLY speaking, according to the docs, we don’t know if the read/increment/write operation could be interleaved with… something. A load, for example. Or a store.

My answer: You’re reading too much in the docs. Putting too much faith in the specifics of wording, the care with which that wording is sometimes crafted. I’ll bet dollars to donuts that the writer was simply trying to make the point that doing interlocked increment doesn’t do anything other than… you know… interlock the increment operation.

But, in any case, think of the consequences, regardless of atomicity of the interlocked increment instruction:

Let’s say read/inc/write *can* be interleaved with a read.

So, you start with a value of 1 and it’s interlocked incremented to 2.

In all cases, your read will either get a value of 1 or 2. Either is, in fact, perfectly acceptable… because your code COULD have read the value just BEFORE the read/inc/write or just AFTER the read/inc/write.

So, “locking” or “being atomic” against the interlocked operation doesn’t get you anything.

Write… the same way. You either update the value BEFORE the read/inc/write operation or after it. In either case, the result has got to be acceptable to you.

And in all cases, you can’t simply ++ the value… because that operation doesn’t represent a single read or write.

Peter
OSR
@OSRDrivers

OSR_Community_User · April 2, 2014, 9:59pm

Some years ago, when we were doing a multiprocessor system, one of the
problems we were running was a partial-differential-equation solver. We
had 16 “cores” (actually, PDP-11 systems), so the obvious thing to do was
to partition the input matrix into 4x4 submatrices, and have each core
work on one partition. But here’s the problem: at the boundaries between
the submatrices, the convolution algorithm required the value in an
adjacent cell. But that cell was in the partition being worked on by
another core. So, of course, we had to synchronize. Any time a border
cell was being computed, its access was locked with a mutex. The
computation crawled. Really crawled. Until one of our researches
realized what was happening, and threw the synchronization away. So
sometimes the computation was made with the value of the previous
iteration, and sometimes the computation was made with the value of the
current iteration. Since the algorithm converges, it didn’t matter which
of the two values was used. The algorithm experienced a
greater-than-order-of-magnitude improvement. This is because the
“correctness” of the algorithm did not depend upon whether the value was
the latest, or the latest-minus-one. The issue is that the correctness of
the computation was not affected by this factoid. Note quite the same as,
say, managing accurate reference counts in a shared-multicore-memory. To
demonstrate this, I once created a multithreaded app that did selective
sleep operations (this was in app space) and did operations like

Fetch(); Sleep(); Increment(); Sleep(); Store(); Sleep();

One sequence ran with a couple threads doing this; another did the same,
but did a

WaitForSingleObject(mutex); …above sequence…; ReleaseMutex();

It is fun to watch the two counters, which I display on progress bars.
One progress bar goes up linearly, the other is lagging behind, moving
sporadically, but constantly falling behind. The idea of the Sleep()
calls is to simulate intercore timing effects at the
hundreds-of-milliseconds granularity instead of the
hundreds-of-picoseconds granualarity usually seen at the instruction
level.

My observation was that the code illustrated as the “problem” exhibits the
nondeterminism that you can get with or without InterlockedRead/Write, and
the OP does not understand that if a deterministic result is required, the
code is simply wrong. It can only be correct if a deterministic result is
not required.
joe

OK, OK… Slow down everyone. Mr. Raymond can be, well, trying with some
of these conceptual challenges (he and I have butted heads before). But
in this case, he has a point… albeit a small one.

The docs DO say: “This function is atomic with respect to calls to other
interlocked functions.”

He merely wants to know “OK then… Is it guaranteed to be atomic with
respect to anything else?”

STRICTLY speaking, according to the docs, we don’t know if the
read/increment/write operation could be interleaved with… something. A
load, for example. Or a store.

My answer: You’re reading too much in the docs. Putting too much faith in
the specifics of wording, the care with which that wording is sometimes
crafted. I’ll bet dollars to donuts that the writer was simply trying to
make the point that doing interlocked increment doesn’t do anything other
than… you know… interlock the increment operation.

But, in any case, think of the consequences, regardless of atomicity of
the interlocked increment instruction:

Let’s say read/inc/write *can* be interleaved with a read.

So, you start with a value of 1 and it’s interlocked incremented to 2.

In all cases, your read will either get a value of 1 or 2. Either is, in
fact, perfectly acceptable… because your code COULD have read the value
just BEFORE the read/inc/write or just AFTER the read/inc/write.

So, “locking” or “being atomic” against the interlocked operation doesn’t
get you anything.

Write… the same way. You either update the value BEFORE the
read/inc/write operation or after it. In either case, the result has got
to be acceptable to you.

And in all cases, you can’t simply ++ the value… because that operation
doesn’t represent a single read or write.

Peter
OSR
@OSRDrivers

NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · April 2, 2014, 10:01pm

> @Anton,

> the OP just refuses to accept the fact that accessing the same memory
> location by both “regular” reads/writes and interlocked ops is just
> logically faulty…

The whole purpose of the failure scenario in my original post was to
demonstrate what can go wrong when you mix InterlockedExchange() with a
simple write. That is why I asked about the omission of
InterlockedWrite().

Since all writes are implicitly InternalockedWrite, the example proves
nothing.
joe

NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · April 2, 2014, 10:05pm

> @Tim, I did not miss your point. You stated that InterlockedXxx() never

uses an internal mutex on any architecture that runs Windows. You may be
right. But *my* point is that we shouldn’t be making assumptions about
the internal implementation of InterlockedXxx(). We, as clients of the
API, should be relying on the API documentation which states:

“This function is atomic with respect to calls to other interlocked
functions.”

Notably absent from this quote is whether the function is atomic with
respect to simple reads and writes.

And, as I keep pointing out, IT DOESN’T MATTER! Since all reads and
writes are implicitly interlocked, the nature of the failure does not
change EXCEPT IN ONE SPECIAL CASE which you selected. It does not
guarantee correctness in EVERY POSSIBLE CASE.
joe

NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Maxim_S_Shatskih · April 3, 2014, 4:32am

> lock at work here. It merely guarantees that the exchange operation

itself will happen atomically – that no other processor can execute any
instruction in the middle of that exchange operation.

No processor can execute UPDATE TO THE SAME CACHE LINE while executing this instruction.

–
Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

Maxim_S_Shatskih · April 3, 2014, 4:36am

> An interlocked WRITE? And interlocked READ? What would those could BE?

Correct.

The very notion of “interlocked” appears when the operation has 2 or more phases, so that important data is not updated by the other sneaked-in party between the phases.

Read and write (which do not cross the cache line), are single-phase operations.

So, what kind of “sneak-in” can interrupt read? none.

What kind of “sneak-in” can interrupt write? none.

–
Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

Maxim_S_Shatskih · April 3, 2014, 4:44am

> They would be reads and writes that are guaranteed to be atomic with respect to the InterlockedXxx

APIs. The documentation for the InterlockedXxx APIs say only that they are guaranteed to be atomic
with respect to *each other*.

I think we can file a docerr to MS.

If a usual write can break InterlockedXxx - then this is sheer nonsense, which means that InterlockedXxx is mis-implemented.

If InterlockedXxx can break the usual write - then this is a normal race condition. If write wins - then you see the write result. If Interlocked wins - then you see the Interlocked result. You cannot see anything other.

Same race occurs between read and Interlocked. Read will either see the new value set by Interlocked, or the old value before Interlocked. Nothing else.

You can disassemble the queued spinlock stuff in the kernel to see how both Interlocked and usual operations are used on the same location. More so, usual spinlock is also such - it has a mix of Interlocked and a read on the same location.

Atomicity of the read or write is not the issue. It is atomicity with respect to the InterlockedXxx APIs.

If R/W operations are atomic, then they are atomic with respect to anything, including the Interlocked APIs (actually LOCK-prefixed CPU opcodes).

In other words can the read or write happen *during* an InterlockedXxx operation that is manipulating
the same memory?

Read can. It will read either the old value, or the new one.

Write cannot. Otherwise, InterlockedXxx would be totally broken. Write will either occur before Interlocked or after it.

–
Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

Maxim_S_Shatskih · April 3, 2014, 4:52am

> P1: enters try_lock(x)

P1: enters InterlockedExchange(x,1)
P1: acquires InterlockedXxx mutex
P1: reads *x == 1
P2: enters release_lock(x)
P2: writes *x = 0
P2: exits release_lock()
P1: writes *x = 1
P1: releases InterlockedXxx mutex
P1: exits InterlockedExchange()
P1: exits try_lock() which returns STATUS_UNSUCCESSFUL

…and we have lock in acquired state and try_lock() returning FALSE. A bug.

To avoid this, release_lock() will really need to acquire the mutex, to guarantee it is not between read and write of InterlockedExchange.

With HW support for Interlocked (x86), this is guaranteed by the HW frontside bus ownership/cache coherency protocol.

Without such support, the song is different.

The simplest solution is to use InterlockedExchange instead of write in release_lock(). This would be InterlockedWrite.

The need in InterlockedRead is still to be proven.

–
Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

Maxim_S_Shatskih · April 3, 2014, 4:58am

Sorry, the below message of me is only correct for HW support for Interlocked (by bus locking).

If the internal mutex is used for them, due to lack of LOCK CPU opcode - then sorry. You’re correct. Usual write can sneak in between the phases of InterlockedExchange, in which case InterlockedExchange will violate its contract and will be broken.

In this case, you need to take a mutex around the write too, which is simply done by using InterlockedExchange instead of a write.

The need in an InterlockedRead is still to be proven.

a) read does not alter anything and thus cannot break any code racing with it. This is obvious.
c) so, the only way in which read can be broken is by reading a wrong value.
d) and, provided the memory stores are atomic (it is such on any CPU if they do not cross a cache line), the read can either take the old value (which is not a bug) or the new one (which is not a bug again).

So, looks like InterlockedRead is proven to be not needed.

“Maxim S. Shatskih” wrote in message news:xxxxx@ntdev…
> They would be reads and writes that are guaranteed to be atomic with respect to the InterlockedXxx
>APIs. The documentation for the InterlockedXxx APIs say only that they are guaranteed to be atomic
>with respect to each other.

I think we can file a docerr to MS.

If a usual write can break InterlockedXxx - then this is sheer nonsense, which means that InterlockedXxx is mis-implemented.

If InterlockedXxx can break the usual write - then this is a normal race condition. If write wins - then you see the write result. If Interlocked wins - then you see the Interlocked result. You cannot see anything other.

Same race occurs between read and Interlocked. Read will either see the new value set by Interlocked, or the old value before Interlocked. Nothing else.

You can disassemble the queued spinlock stuff in the kernel to see how both Interlocked and usual operations are used on the same location. More so, usual spinlock is also such - it has a mix of Interlocked and a read on the same location.

> Atomicity of the read or write is not the issue. It is atomicity with respect to the InterlockedXxx APIs.

If R/W operations are atomic, then they are atomic with respect to anything, including the Interlocked APIs (actually LOCK-prefixed CPU opcodes).

>In other words can the read or write happen during an InterlockedXxx operation that is manipulating
>the same memory?

Read can. It will read either the old value, or the new one.

Write cannot. Otherwise, InterlockedXxx would be totally broken. Write will either occur before Interlocked or after it.

–
Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

Maxim_S_Shatskih · April 3, 2014, 4:59am

>this scenario assumes that all the InterlockedXxx() APIs are implemented using an internal mutex that

provides the needed atomicity.

That assumption is FALSE

Well, on some ancient ARM CPUs…

–
Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

Maxim_S_Shatskih · April 3, 2014, 5:01am

>of both one another and “regular” reads/writes by the virtue of bus locking. So what??? What is

reading a variable good for if this variable’s state may be changed by the time your code makes any
practical use of this value???

At least you have the previous sane value, not some junk.

This is the basis of spinlock and queued spinlock implementation.

This is what may happen if you access the same memory location by both “regular” reads/writes and
interlocked ops.

…which spinlocks do (mix or usual reads and interlocked).

The only scenario when it may be useful is a “simple” check of the variable’s state before attempting
interlocked operation.

Yes, a spinlock.

avoid performing unnecessary interlocked ops - it does not guarantee the consistency of the
variable’s state

It guarantees that both a read result AND the variable state are sane values.

–
Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

Maxim_S_Shatskih · April 3, 2014, 5:08am

> Fetch(); Sleep(); Increment(); Sleep(); Store(); Sleep();

Even a single Sleep here can produce an arbitrary long delay, due to starvation effects.

Doing this under a mutex will immediately cause priority inversion and major starvation effects on the other threads.

–
Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com