Re: [ntdev] InterlockedRead / InterlockedWrite

IMHO this documentation is misleading. The reason why InterlockedRead functions do not exist is that an assumption of aligned reads being atomic is implicit for all platforms Windows runs on. Unaligned variables can’t be used for Interlocked operations, so regardless of the actual implementation of the interlocked function, a read may be performed atomically

Sent from Surface Pro

From: xxxxx@qualcomm.com
Sent: ‎Tuesday‎, ‎April‎ ‎01‎, ‎2014 ‎2‎:‎36‎ ‎PM
To: Windows System Software Devs Interest List

Below is the MSDN documentation for InterlockedExchange():

http://msdn.microsoft.com/en-us/library/windows/desktop/ms683590(v=vs.85).aspx

It says “This function is atomic with respect to calls to other interlocked functions.” That seems reasonable because the caller should not make any assumptions about the implementation. On most architectures there is HW support for the interlocked functions and they are also atomic with respect to simple reads and writes. However, if HW support is unavailable, the interlocked functions would need to use spinlocks. Why, then, is there no InterlockedRead() or InterlockedWrite()? Consider the following example:

NTSTATUS try_lock(ULONG *x)
{
return(InterlockedExchange(x, 1) == 0 ? STATUS_SUCCESS : STATUS_UNSUCCESSFUL);
}

void release_lock(ULONG *x)
{
*x = 0; // InterlockedWrite() is not available
}

Now suppose the follow sequence of events from two different processes (P2 already owns the lock):

P1: calls try_lock()
P1: acquires spinlock
P1: reads *x == 1
P2: calls release_lock()
P2: writes *x = 0
P1: writes *x = 1
P1: releases spinlock
P1: try_lock() returns STATUS_UNSUCCESSFUL

Now the lock is left in a bad state and all future calls to try_lock() will fail.


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

The question is not nonsense. The problem the OP is worried about does not exist, but that does not make the question nonsense.

Sent from Surface Pro

From: xxxxx@hotmail.com
Sent: ‎Tuesday‎, ‎April‎ ‎01‎, ‎2014 ‎7‎:‎15‎ ‎PM
To: Windows System Software Devs Interest List

It is surprising how much time some people will spend writing responses and how little time
reading and understanding the original question

Well, we are just trying to explain to you that the original question is nonsensical in itself, but you don’t want to listen to us, do you…

But InterlockedWrite() doesn’t exist.

OK, let put it simply:

A. InterlockedWrite(int * x, int val)
{

*x=val;

}

B. InterlockedWrite(int * x, int val, spinlock_t lock)
{

spin_lock(lock);
*x=val;
spin_unlock(lock);
}

Could you please explain the purpose of a spinlock in example B. If you somehow realize that it serves no purpose whatsoever and that, for all practical purposes, A and B are equivalent, you will (hopefully) realize that there is no reason for interlocked writes.

The try_lock() and release_lock() functions would be safe and correct if I used InterlockedWrite()
instead of a simple write inside release_lock().

If you handle the first part well, you will also realize that try_lock() and release_lock() functions would NOT be safe and correct in multithreaded environment, no matter how you implement your hypothetical InterlockedWrite()…

Anton Bassov


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

As any architecture that supports atomic reads of pointer sized values (aligned or otherwise) can be used to implement interlocked compare exchange (efficiency is another question), a recapitulation is that to run Windows (or any other OS I know of that supports multiple CPUs) the hardware must at least be able to read and write pointer sized values to aligned memory locations atomically.

Note that by atomically, I mean that in a single memory operation, the entire two, four or eight bytes will be written or read with no opportunity for any intervening operation to read partially committed values or generate a partial overwrite. All platforms that I have ever worked on have this property (Alpha, x86, x64, IA64) but it is not required for conformance to C specifications nor does the documentation for interlocked functions specifically mention this property

Sent from Surface Pro

From: Skywing
Sent: ‎Tuesday‎, ‎April‎ ‎01‎, ‎2014 ‎9‎:‎14‎ ‎PM
To: Windows System Software Devs Interest List

To be pedantic, ARM has non-cache-coherent DMA that may require flushes & also lacks hardware compare-exchange primitives as amd64 or x86 have (the compiler will synthesize the requisite intrinsic interlocked operation using an appropriate ldrex/strex sequence).

It would be fair to say that we presently require hardware upon which an interlocked compare exchange can be synthesized.

  • Ken (MSFT)
    (Occasional ARM kernel guy.)
    TwC Security

From: xxxxx@broadcom.com
Sent: ‎4/‎1/‎2014 13:58
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] InterlockedRead / InterlockedWrite

Your example is very confusing, if you’re talking about two different spinlocks. I suggest you rewrite it more explicitly.

If we’re talking about Windows, there is an assumption of some hardware guarantees. For example, cache coherency, and interlocked compare-exchange availability.

Interlocked compare-exchange allows you to implement the full range of atomic interlocked operations - increment, decrement, add, OR, AND, XOR, you name it.


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

A am really wondering if any of my posts are making it to the list the way this discussion is carrying on …

All architectures Windows has ever supported have guaranteed aligned reads and writes of pointer sized values are atomic. More generally, any OS that uses multiple CPUs normally assumes the same, because locking primitives depend on this kind of assurance for performance of any reasonable kind.

On x86 /x64 the hardware will guarantee atomicity of unaligned accesses too, but that isn’t the issue. The big problem is ensuring an atomic operation can include an entire WORD / DWORD / QWORD so that readers cannot get some bits from one value and other bits from another value. The interlocked operations make this externally possible by eliminating the race in the load, test, set sequence, but that doesn’t obviate the need to consider other effects.

Certainly, the documentation is incomplete in MSDN. Perhaps Diane can help to improve it.

Sent from Surface Pro

From: xxxxx@qualcomm.com
Sent: ‎Thursday‎, ‎April‎ ‎03‎, ‎2014 ‎5‎:‎40‎ ‎PM
To: Windows System Software Devs Interest List

I was able to dig up some MSDN documentation that is relevant to this discussion:

http://msdn.microsoft.com/en-us/library/windows/desktop/ms686355(v=vs.85).aspx

Here they recommend using InterlockedCompareExchange() for simple reads and InterlockedExchange() for simple writes when operations on the same data can be interleaved between different threads. This is not for atomicity but for the memory barriers. This usage is not very efficient, however. There are several unnecessary reads, writes, and memory barriers. Better choices here would be InterlockedReadAcquire() and InterlockedWriteRelease() which of course don’t exist.


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

But that is the point - that because of the guarantee of atomicity of aligned reads and writes, the need for InterlockedRead and InterlockedWrite is obviated. The documentation is incomplete, but the conclusion is the same

Sent from Surface Pro

From: Scott Noone
Sent: ‎Friday‎, ‎April‎ ‎04‎, ‎2014 ‎9‎:‎05‎ ‎AM
To: Windows System Software Devs Interest List

This thread has wandered a bit (ha!), but I think we’re all in agreement
that there is no chance of aligned reads or writes tearing on a Windows
platform. There is a separate issue of compiler and processor reordering
though, which is why we potentially need the acquire/release semantics that
Mr. Raymond was referring to below.

-scott
OSR
@OSRDrivers

“Marion Bond” wrote in message news:xxxxx@ntdev…
A am really wondering if any of my posts are making it to the list the way
this discussion is carrying on …

All architectures Windows has ever supported have guaranteed aligned reads
and writes of pointer sized values are atomic. More generally, any OS that
uses multiple CPUs normally assumes the same, because locking primitives
depend on this kind of assurance for performance of any reasonable kind.

On x86 /x64 the hardware will guarantee atomicity of unaligned accesses too,
but that isn’t the issue. The big problem is ensuring an atomic operation
can include an entire WORD / DWORD / QWORD so that readers cannot get some
bits from one value and other bits from another value. The interlocked
operations make this externally possible by eliminating the race in the
load, test, set sequence, but that doesn’t obviate the need to consider
other effects.

Certainly, the documentation is incomplete in MSDN. Perhaps Diane can help
to improve it.

Sent from Surface Pro

From: xxxxx@qualcomm.com
Sent: ‎Thursday‎, ‎April‎ ‎03‎, ‎2014 ‎5‎:‎40‎ ‎PM
To: Windows System Software Devs Interest List

I was able to dig up some MSDN documentation that is relevant to this
discussion:

http://msdn.microsoft.com/en-us/library/windows/desktop/ms686355(v=vs.85).aspx

Here they recommend using InterlockedCompareExchange() for simple reads and
InterlockedExchange() for simple writes when operations on the same data can
be interleaved between different threads. This is not for atomicity but for
the memory barriers. This usage is not very efficient, however. There are
several unnecessary reads, writes, and memory barriers. Better choices here
would be InterlockedReadAcquire() and InterlockedWriteRelease() which of
course don’t exist.


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

M M wrote:

QnV0IHRoYXQgaXMgdGhlIHBvaW50IC0gdGhhdCBiZWNhdXNlIG9mIHRoZSBndWFyYW50ZWUgb2Yg
YXRvbWljaXR5IG9mIGFsaWduZWQgcmVhZHMgYW5kIHdyaXRlcywgdGhlIG5lZWQgZm9yIEludGVy
bG9ja2VkUmVhZCBhbmQgSW50ZXJsb2NrZWRXcml0ZSBpcyBvYnZpYXRlZC4gIFRoZSBkb2N1bWVu

Can nothing be done about this? After all I want to make sure I don’t miss a single post about interlocked reads and writes.

We’re working on it, but it’s low on our list.

These posts are in Base64, and we’re working on proper decoding for the web page.

Peter
OSR