Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Home NTDEV

Before Posting...

Please check out the Community Guidelines in the Announcements and Administration Category.

More Info on Driver Writing and Debugging


The free OSR Learning Library has more than 50 articles on a wide variety of topics about writing and debugging device drivers and Minifilters. From introductory level to advanced. All the articles have been recently reviewed and updated, and are written using the clear and definitive style you've come to expect from OSR over the years.


Check out The OSR Learning Library at: https://www.osr.com/osr-learning-library/


InterlockedRead / InterlockedWrite

Dan_RaymondDan_Raymond Member Posts: 51
Below is the MSDN documentation for InterlockedExchange():

http://msdn.microsoft.com/en-us/library/windows/desktop/ms683590(v=vs.85).aspx

It says "This function is atomic with respect to calls to other interlocked functions." That seems reasonable because the caller should not make any assumptions about the implementation. On most architectures there is HW support for the interlocked functions and they are also atomic with respect to simple reads and writes. However, if HW support is unavailable, the interlocked functions would need to use spinlocks. Why, then, is there no InterlockedRead() or InterlockedWrite()? Consider the following example:

NTSTATUS try_lock(ULONG *x)
{
return(InterlockedExchange(x, 1) == 0 ? STATUS_SUCCESS : STATUS_UNSUCCESSFUL);
}

void release_lock(ULONG *x)
{
*x = 0; // InterlockedWrite() is not available
}

Now suppose the follow sequence of events from two different processes (P2 already owns the lock):

P1: calls try_lock()
P1: acquires spinlock
P1: reads *x == 1
P2: calls release_lock()
P2: writes *x = 0
P1: writes *x = 1
P1: releases spinlock
P1: try_lock() returns STATUS_UNSUCCESSFUL

Now the lock is left in a bad state and all future calls to try_lock() will fail.
«134

Comments

  • Don_BurnDon_Burn Member - All Emails Posts: 1,764
    You might want to look at Doron's blog from 8 years ago, in particular
    http://blogs.msdn.com/b/doronh/archive/2006/12/06/creating-your-own-interloc
    kedxxx-operation.aspx


    Don Burn
    Windows Filesystem and Driver Consulting
    Website: http://www.windrvr.com
    Blog: http://msmvps.com/blogs/WinDrvr




    -----Original Message-----
    From: [email protected]
    [mailto:[email protected]] On Behalf Of
    [email protected]
    Sent: Tuesday, April 01, 2014 2:35 PM
    To: Windows System Software Devs Interest List
    Subject: [ntdev] InterlockedRead / InterlockedWrite

    Below is the MSDN documentation for InterlockedExchange():

    http://msdn.microsoft.com/en-us/library/windows/desktop/ms683590(v=vs.85).as
    px

    It says "This function is atomic with respect to calls to other interlocked
    functions." That seems reasonable because the caller should not make any
    assumptions about the implementation. On most architectures there is HW
    support for the interlocked functions and they are also atomic with respect
    to simple reads and writes. However, if HW support is unavailable, the
    interlocked functions would need to use spinlocks. Why, then, is there no
    InterlockedRead() or InterlockedWrite()? Consider the following example:

    NTSTATUS try_lock(ULONG *x)
    {
    return(InterlockedExchange(x, 1) == 0 ? STATUS_SUCCESS :
    STATUS_UNSUCCESSFUL); }

    void release_lock(ULONG *x)
    {
    *x = 0; // InterlockedWrite() is not available }

    Now suppose the follow sequence of events from two different processes (P2
    already owns the lock):

    P1: calls try_lock()
    P1: acquires spinlock
    P1: reads *x == 1
    P2: calls release_lock()
    P2: writes *x = 0
    P1: writes *x = 1
    P1: releases spinlock
    P1: try_lock() returns STATUS_UNSUCCESSFUL

    Now the lock is left in a bad state and all future calls to try_lock() will
    fail.

    ---
    NTDEV is sponsored by OSR

    Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

    OSR is HIRING!! See http://www.osr.com/careers

    For our schedule of WDF, WDM, debugging and other seminars visit:
    http://www.osr.com/seminars

    To unsubscribe, visit the List Server section of OSR Online at
    http://www.osronline.com/page.cfm?name=ListServer
  • anton_bassovanton_bassov Member MODERATED Posts: 5,281
    > Why, then, is there no InterlockedRead() or InterlockedWrite()


    Simply because interlocking does not apply to simple reads and writes......


    It applies to operations that can be described by the following sequence that has to appear atomic

    A. Read data from memory to some internal temporary register
    B. Do some operation on this temporary register ( increment, decrement, test-and-set, exchange with some other register, etc)
    C. Write register contents back to memory


    As you can see, it applies only to those to operations that involve both read and write cycles to memory that have to appear atomic, and simple reads and writes are not among them....

    Anton Bassov
  • Dan_RaymondDan_Raymond Member Posts: 51
    @Don, thanks for the link.
    @Anton, did you read my entire post?
  • Alex_GrigAlex_Grig Member Posts: 3,238
    >However, if HW support is unavailable, the interlocked functions would need to use spinlocks.

    How would you implement spinlocks, if interlocked HW commands are not available?
  • anton_bassovanton_bassov Member MODERATED Posts: 5,281
    > @Anton, did you read my entire post?

    Well, to be honest, I did not understand it at all. For example,could you please explain the statements below:

    <quote>
    Now suppose the follow sequence of events from two different processes (P2 already owns the lock):

    P1: calls try_lock()
    P1: acquires spinlock
    P1: reads *x == 1
    P2: calls release_lock()

    </quote>


    You seem to be considering the scenario when P1 acquires a spinlock that is already owned by P2....


    Anton Bassov
  • Dan_RaymondDan_Raymond Member Posts: 51
    @Alex, there might be HW support sufficient for a spinlock but insufficient to directly implement all of the interlocked APIs. Or there might be no HW support at all and mutual exclusion would need to be implemented using something like Peterson's algorithm.

    @Anton, in my example there are two locks. One that is acquired/released via try_lock()/release_lock() and another that is used internally by the interlocked functions to guarantee atomicity between themselves.
  • anton_bassovanton_bassov Member MODERATED Posts: 5,281
    >> owever, if HW support is unavailable, the interlocked functions would need to use spinlocks.

    > How would you implement spinlocks, if interlocked HW commands are not available?

    Please note that an arch may limit itself to providing HW support only for a single type of interlocked operations ( test-and-set is the most likely candidate). Interlocked ops like, say, increment or compare_exchange have to implement bus locking semantics in a software using a spinlock abstraction that, in turn, is implemented on top of test-and-set( i.e. interlocked operation that is implemented by hardware), on such an arch. Check Linux sources for arches other than x86 and x86_64, and you will see it with your own eyes. Therefore, in this respect the OP's question is perfectly reasonable.....


    Anton Bassov
  • anton_bassovanton_bassov Member MODERATED Posts: 5,281
    > @Anton, in my example there are two locks. One that is acquired/released via try_lock()/release_lock()
    > and another that is used internally by the interlocked functions to guarantee atomicity between themselves.


    If you think about it carefully you will (hopefully) realize that a single operand that is guarded by 2 separate
    locks is equivalent to the one that is not guarded by any locks at all. Therefore, your original assumption is just logically faulty, and you prove it yourself by pointing out how such a "guard" can be defeated .

    In terms of stupidity such a scheme is equivalent to the "masterpiece" below:

    if(irql< DISPATCH_LEVEL)
    {

    ExAcquireFastMutex();

    do_things_with_varianble_X();

    ExReleaseFastMutex ();
    }

    else


    {

    KeAcquireSpinLock();

    do_things_with_varianble_X();

    KeReleaseSpinLock();


    }


    With such a "guard" variable X may be update by two separate paths simultaneously. The scheme that you have mentioned earlier is from the same field.....


    Anton Bassov
  • Alex_GrigAlex_Grig Member Posts: 3,238
    Your example is very confusing, if you're talking about two different spinlocks. I suggest you rewrite it more explicitly.

    If we're talking about Windows, there is an assumption of some hardware guarantees. For example, cache coherency, and interlocked compare-exchange availability.

    Interlocked compare-exchange allows you to implement the full range of atomic interlocked operations - increment, decrement, add, OR, AND, XOR, you name it.
  • OSR_Community_UserOSR_Community_User Member Posts: 110,217
    > Below is the MSDN documentation for InterlockedExchange():
    >
    > http://msdn.microsoft.com/en-us/library/windows/desktop/ms683590(v=vs.85).aspx
    >
    > It says "This function is atomic with respect to calls to other
    > interlocked functions." That seems reasonable because the caller should
    > not make any assumptions about the implementation. On most architectures
    > there is HW support for the interlocked functions and they are also atomic
    > with respect to simple reads and writes. However, if HW support is
    > unavailable, the interlocked functions would need to use spinlocks. Why,
    > then, is there no InterlockedRead() or InterlockedWrite()? Consider the
    > following example:
    >
    > NTSTATUS try_lock(ULONG *x)
    > {
    > return(InterlockedExchange(x, 1) == 0 ? STATUS_SUCCESS :
    > STATUS_UNSUCCESSFUL);
    > }
    >
    > void release_lock(ULONG *x)
    > {
    > *x = 0; // InterlockedWrite() is not available
    > }
    >
    > Now suppose the follow sequence of events from two different processes (P2
    > already owns the lock):
    >
    > P1: calls try_lock()
    > P1: acquires spinlock

    The above sequence makes no sense, because it requires a read, whose
    outcome is untested, so it is just a way to waste time. It accomplishes
    nothing. In fact, whether it gets the lock or not in no way impacts any
    of the remaining code.
    > P1: reads *x == 1
    > P2: calls release_lock()
    > P2: writes *x = 0
    > P1: writes *x = 1
    > P1: releases spinlock
    Just to clarify: processes are irrelevant to this discussion. Only
    threads matter, and they can belong to the same process. Note that there
    are two-stage acquisition algorithms that work even on machines with no
    interlocks, where an interrupt and context switch can occur between any
    two instructions. It was even a question on a final exam in the OS course
    I took in 1968, but I'm not going to try to rediscover it 46 years later.
    But in the above code, you are right; the state of x is wrong. The reason
    it is wrong is that your code is wrong; it makes no sense to allow thread
    P2 to write to x without any synchronization, when thread P1believes it
    has exclusive access. You are releasing te lock before doing te write.

    There is this about sychronization: it doesn't do any good if you don't
    use it correctly. Back in 1968, Edsgar Dijkstra wrote a now-famous paper
    on the use of synchronization primitives P() and V(), and we have made
    virtually no progress in understanding how to use these in a robust
    fashion in the intervening 46 years. The "synchronized" attribute of Javs
    and C# is just a P/V in sheep's clothing. A tiny number of
    mostly-academic languages have introduced meta-concepts such as "condition
    variables", but these languages tend to exist in only one compiler in one
    university. The closest language to mainstream that exhibited anything
    more sophisticated than P/V was Ada, with its "rendezvous" mechanism.

    The reason your code example doesn't work is simply that it is wrong, so
    of course it looks like nonsense. That's because it is. Consider the
    race condition where P2 sets the value to 0 and P1sets it to 1. If those
    threads are running on distinct cores, a timing change of a couple hundred
    PICOseconds makes the difference between the sequence you show and a
    sequence which leaves *x as 0. When you see something like this, you know
    immediately that your code is just plain WRONG.

    If, to get te address in the pointer x, you use &Schroedinger as the name
    of the variable, this should give you a hint as to what you have done
    wrong. The first access to the variable collapses the quantum states and
    gives you A value, you just don't know which one you have until you read
    it.

    Generally, for one value, InterlockedCompareAndExchange is all the locking
    you need. No locking is required for a store. But when you have a tuple
    of values, you want to change all members of the tuple to maintain some
    correctness invariant. In this case you need a lock, because you want to
    treat the N non-atomic modifications as a single atomic transaction. So
    unlinking a block from a doubly-linked list requires a lock that prohibits
    any other thread from either reading information for which the correctness
    inariant is false, or concurrently modifying the tuple such that two
    threads each think they have each left the tuple in a correct state. You
    demonstrate none of that in your example. Note that you can create
    locking structures that prevent concurrent access while still creating
    completely erroneous results, such as the example you have given.
    > P1: try_lock() returns STATUS_UNSUCCESSFUL
    >
    > Now the lock is left in a bad state and all future calls to try_lock()
    > will fail.
    >
    > ---
    > NTDEV is sponsored by OSR
    >
    > Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
    >
    > OSR is HIRING!! See http://www.osr.com/careers
    >
    > For our schedule of WDF, WDM, debugging and other seminars visit:
    > http://www.osr.com/seminars
    >
    > To unsubscribe, visit the List Server section of OSR Online at
    > http://www.osronline.com/page.cfm?name=ListServer
    >
  • OSR_Community_UserOSR_Community_User Member Posts: 110,217
    >>However, if HW support is unavailable, the interlocked functions would
    >> need to use spinlocks.
    >
    > How would you implement spinlocks, if interlocked HW commands are not
    > available?

    We had to implement them using a hypothtical "multithreaded Algol
    compiler" which had absolutely no guarantees of atomicity. It can be done
    with a two-phase acquisition strategy. But it's simpler and more
    efficient if you have interlocked instructions.
    joe

    >
    > ---
    > NTDEV is sponsored by OSR
    >
    > Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
    >
    > OSR is HIRING!! See http://www.osr.com/careers
    >
    > For our schedule of WDF, WDM, debugging and other seminars visit:
    > http://www.osr.com/seminars
    >
    > To unsubscribe, visit the List Server section of OSR Online at
    > http://www.osronline.com/page.cfm?name=ListServer
    >
  • OSR_Community_UserOSR_Community_User Member Posts: 110,217
    > Your example is very confusing, if you're talking about two different
    > spinlocks. I suggest you rewrite it more explicitly.
    >
    > If we're talking about Windows, there is an assumption of some hardware
    > guarantees. For example, cache coherency, and interlocked compare-exchange
    > availability.
    >
    > Interlocked compare-exchange allows you to implement the full range of
    > atomic interlocked operations - increment, decrement, add, OR, AND, XOR,
    > you name it.

    And if you really want to see an amazing sequence of code, look at the
    code the C compiler creates for an atomic *= operation

    #pragma omp atomic
    x *= y;

    It uses a fascinating variation on the concept of "spin lock" (in reading
    the code, ignore the code executed if te iperands are not DWORD-aligned;
    it is obvious and trivial, albeit slow with potential high lock conflict;
    look at the main path for DWORD-aligned operands)
    joe
    >
    > ---
    > NTDEV is sponsored by OSR
    >
    > Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
    >
    > OSR is HIRING!! See http://www.osr.com/careers
    >
    > For our schedule of WDF, WDM, debugging and other seminars visit:
    > http://www.osr.com/seminars
    >
    > To unsubscribe, visit the List Server section of OSR Online at
    > http://www.osronline.com/page.cfm?name=ListServer
    >
  • Dan_RaymondDan_Raymond Member Posts: 51
    It is surprising how much time some people will spend writing responses and how little time reading and understanding the original question.

    There are TWO locks in the example. The first lock (which I refer to as "lock") is a trylock. This is what try_lock() and release_lock() manipulate. The second lock (which I refer to as "spinlock") is a *hypothetical* lock that the interlocked functions *may* be using to guarantee atomicity between each other. It's hypothetical because we don't know how the interlocked functions are implemented on every architecture we may encounter.

    The try_lock() and release_lock() functions would be safe and correct if I used InterlockedWrite() instead of a simple write inside release_lock(). But InterlockedWrite() doesn't exist.
  • OSR_Community_UserOSR_Community_User Member Posts: 110,217
    > It is surprising how much time some people will spend writing responses
    > and how little time reading and understanding the original question.

    I read the original question. It was largely nonsense, because it did a
    try-lock without testing to see if it succeeded or not, and there is, as
    has been pointed out, absolutely no need for either an InterlockedRead or
    InterlockedWrite, since that is how memory works anyway.
    joe

    >
    > There are TWO locks in the example. The first lock (which I refer to as
    > "lock") is a trylock. This is what try_lock() and release_lock()
    > manipulate. The second lock (which I refer to as "spinlock") is a
    > *hypothetical* lock that the interlocked functions *may* be using to
    > guarantee atomicity between each other. It's hypothetical because we
    > don't know how the interlocked functions are implemented on every
    > architecture we may encounter.
    >
    > The try_lock() and release_lock() functions would be safe and correct if I
    > used InterlockedWrite() instead of a simple write inside release_lock().
    > But InterlockedWrite() doesn't exist.
    >
    > ---
    > NTDEV is sponsored by OSR
    >
    > Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
    >
    > OSR is HIRING!! See http://www.osr.com/careers
    >
    > For our schedule of WDF, WDM, debugging and other seminars visit:
    > http://www.osr.com/seminars
    >
    > To unsubscribe, visit the List Server section of OSR Online at
    > http://www.osronline.com/page.cfm?name=ListServer
    >
  • Dan_RaymondDan_Raymond Member Posts: 51
    Joe, if you didn't comprehend the original question then you should have asked for clarification or kept quiet.
  • anton_bassovanton_bassov Member MODERATED Posts: 5,281
    > It is surprising how much time some people will spend writing responses and how little time
    > reading and understanding the original question

    Well, we are just trying to explain to you that the original question is nonsensical in itself, but you don't want to listen to us, do you......




    > But InterlockedWrite() doesn't exist.

    OK, let put it simply:


    A. InterlockedWrite(int * x, int val)
    {

    *x=val;

    }


    B. InterlockedWrite(int * x, int val, spinlock_t lock)
    {

    spin_lock(lock);
    *x=val;
    spin_unlock(lock);
    }


    Could you please explain the purpose of a spinlock in example B. If you somehow realize that it serves no purpose whatsoever and that, for all practical purposes, A and B are equivalent, you will (hopefully) realize that there is no reason for interlocked writes.

    > The try_lock() and release_lock() functions would be safe and correct if I used InterlockedWrite()
    > instead of a simple write inside release_lock().

    If you handle the first part well, you will also realize that try_lock() and release_lock() functions would NOT be safe and correct in multithreaded environment, no matter how you implement your hypothetical InterlockedWrite().....


    Anton Bassov
  • Dan_RaymondDan_Raymond Member Posts: 51
    @Anton:

    Acquiring the InterlockedXxx spinlock (if it exists) before the write operation in release_lock() guarantees it will occur before or after (not during) the read/write operation in try_lock().

    1) If the write in release_lock() happens before the read/write in try_lock() then try_lock() will return STATUS_SUCCESS and *x==1 (GOOD).

    2) If the write in release_lock() happens after the read/write in try_lock() then try_lock() will return STATUS_UNSUCCESSFUL and *x==0 (GOOD).

    3) If the write in release_lock happens during the read/write in try_lock() then try_lock() will return STATUS_UNSUCCESSFUL and *x==1 (BAD).
  • Ken_JohnsonKen_Johnson Member - All Emails Posts: 1,559
    To be pedantic, ARM has non-cache-coherent DMA that may require flushes & also lacks hardware compare-exchange primitives as amd64 or x86 have (the compiler will synthesize the requisite intrinsic interlocked operation using an appropriate ldrex/strex sequence).

    It would be fair to say that we presently require hardware upon which an interlocked compare exchange can be synthesized.

    - Ken (MSFT)
    (Occasional ARM kernel guy.)
    TwC Security
    ________________________________
    From: [email protected]
    Sent: ?4/?1/?2014 13:58
    To: Windows System Software Devs Interest List
    Subject: RE:[ntdev] InterlockedRead / InterlockedWrite

    Your example is very confusing, if you're talking about two different spinlocks. I suggest you rewrite it more explicitly.

    If we're talking about Windows, there is an assumption of some hardware guarantees. For example, cache coherency, and interlocked compare-exchange availability.

    Interlocked compare-exchange allows you to implement the full range of atomic interlocked operations - increment, decrement, add, OR, AND, XOR, you name it.

    ---
    NTDEV is sponsored by OSR

    Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

    OSR is HIRING!! See http://www.osr.com/careers

    For our schedule of WDF, WDM, debugging and other seminars visit:
    http://www.osr.com/seminars

    To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
  • Alex_GrigAlex_Grig Member Posts: 3,238
    OK, let's ask that again:

    1. On what lock try_lock() and release_lock() operate?
    2. What is *x in pseudo-code? Is it related in any way to the lock for try_lock() and release_lock()?
    3. If *x is the same lock as try_lock() and release_lock() operate on, WHY THE HELL you read and modify it outside of those functions?
  • anton_bassovanton_bassov Member MODERATED Posts: 5,281
    > Acquiring the InterlockedXxx spinlock (if it exists) before the write operation in release_lock()
    > guarantees it will occur before or after (not during) the read/write operation in try_lock()

    Now I got it - you are describing a logically faulty scenario when the same variable gets updated by both
    InterlockedXXX and "regular" writes with MOV instruction. Therefore, your question basically is "why don't hardware designers want to introduce a feature that would allow me to write my logically faulty code safely"........

    The easiest way to realize that such approach is logically faulty is, indeed, to start thinking of it in terms of spinlocks. In order to make it safe you would have to share this lock across InterlockedXXX and "regular" writes. However, it is (hopefully) understandable that InterlockedXXX has to acquire and release this lock
    transparently to the rest of the world. Otherwise it is already going to be something other than interlocked operation - in terms of hardware it implies keeping the bus locked across multiple instructions.....


    Anton Bassov
  • Dan_RaymondDan_Raymond Member Posts: 51
    Alex:

    1) On what lock try_lock() and release_lock() operate?

    They operate on whatever lock the caller supplies (parameter x).

    2) What is *x in pseudo-code? Is it related in any way to the lock for try_lock() and release_lock()?

    The try_lock() and release_lock() functions are not pseudo-code. They are valid compilable C code. The parameter x is the lock.

    3) If *x is the same lock as try_lock() and release_lock() operate on, WHY THE
    HELL you read and modify it outside of those functions?

    I did not intend to imply the modification of the lock outside of try_lock() and release_lock(). The entire sequence of events in the failure scenario occur inside try_lock() for P1 and inside release_lock() for P2.
  • Dan_RaymondDan_Raymond Member Posts: 51
    Anton:

    > Now I got it - you are describing a logically faulty scenario when the
    > same variable gets updated by both InterlockedXXX and "regular" writes
    > with MOV instruction. Therefore, your question basically is "why don't
    > hardware designers want to introduce a feature that would allow me to
    > write my logically faulty code safely"........

    No, my question is "Why isn't there an InterlockedWrite()"? Possible answers include:

    1) It is not needed because the interlocked functions never use a spinlock internally. They are always (and will always be) atomic with respect to simple reads and writes. If so, then MSDN is misleading when it says "This function is atomic with respect to calls to other interlocked functions." Why no mention of atomicity with respect to simple reads and writes?

    2) It is expected that you use an existing interlocked function to simulate InterlockedWrite(). For example, if you discard the result of InterlockedExchange() it is equivalent to InterlockedWrite(). However, this is less efficient. I count 137 documented InterlockedXxx() APIs on MSDN and most of these are variants that improve efficiency (omitting memory barriers for example). So why leave out InterlockedWrite()?
  • Phil_BarilaPhil_Barila Member - All Emails Posts: 159
    > -----Original Message-----
    > From: [email protected] [mailto:bounce-554856-
    > [email protected]] On Behalf Of [email protected]
    > Sent: Tuesday, April 01, 2014 12:35 PM
    > To: Windows System Software Devs Interest List
    > Subject: [ntdev] InterlockedRead / InterlockedWrite
    >
    > Below is the MSDN documentation for InterlockedExchange():
    >
    > http://msdn.microsoft.com/en-
    > us/library/windows/desktop/ms683590(v=vs.85).aspx
    >
    > It says "This function is atomic with respect to calls to other
    > interlocked functions." That seems reasonable because the caller should
    > not make any assumptions about the implementation. On most architectures
    > there is HW support for the interlocked functions and they are also
    > atomic with respect to simple reads and writes. However, if HW support
    > is unavailable, the interlocked functions would need to use spinlocks.
    > Why, then, is there no InterlockedRead() or InterlockedWrite()? Consider
    > the following example:
    >
    > NTSTATUS try_lock(ULONG *x)
    > {
    > return(InterlockedExchange(x, 1) == 0 ? STATUS_SUCCESS :
    > STATUS_UNSUCCESSFUL);
    > }
    >
    > void release_lock(ULONG *x)
    > {
    > *x = 0; // InterlockedWrite() is not available
    > }
    >
    > Now suppose the follow sequence of events from two different processes
    > (P2 already owns the lock):
    >
    > P1: calls try_lock()
    > P1: acquires spinlock
    > P1: reads *x == 1
    > P2: calls release_lock()
    > P2: writes *x = 0
    > P1: writes *x = 1
    > P1: releases spinlock
    > P1: try_lock() returns STATUS_UNSUCCESSFUL
    >
    > Now the lock is left in a bad state and all future calls to try_lock()
    > will fail.

    I don't believe that Windows runs on any hardware that requires a separate spinlock for the InterlockedExchange, so your scenario won't happen.
    There is no "P1: acquires spinlock" and "P1: releases spinlock".
    Additionally, "P1: reads *x == 1" and "P1: writes *x = 1" are atomic with respect to "P2: writes *x = 0".
    The CPU will not be interrupted, and the thread won't be pre-empted, between the load and the store, on any supported architecture.

    I'm sure Jake, Ken (or maybe PGV) will correct me if I'm wrong about that.

    Phil

    Not speaking for LogRhythm
    Phil Barila | Senior Software Engineer
    720.881.5364 (w)
    The Security Intelligence Company
    A LEADER 2013 SIEM Magic Quadrant
    Perfect 5-Star Rating in SC Magazine for 5 Consecutive Years
  • OSR_Community_UserOSR_Community_User Member Posts: 110,217
    > Anton:
    >
    >> Now I got it - you are describing a logically faulty scenario when the
    >> same variable gets updated by both InterlockedXXX and "regular" writes
    >> with MOV instruction. Therefore, your question basically is "why don't
    >> hardware designers want to introduce a feature that would allow me to
    >> write my logically faulty code safely"........
    >
    > No, my question is "Why isn't there an InterlockedWrite()"? Possible
    > answers include:
    >
    > 1) It is not needed because the interlocked functions never use a spinlock
    > internally. They are always (and will always be) atomic with respect to
    > simple reads and writes. If so, then MSDN is misleading when it says
    > "This function is atomic with respect to calls to other interlocked
    > functions." Why no mention of atomicity with respect to simple reads and
    > writes?

    What, exactly, would InterlockedWrite lock AGAINST? If I write *x = 0,
    then either (a) the memory access required by the Interlockedxxx function
    write will have to wait for the current store operation to finish or (b)
    the store instruction would wait for the current Interlockedxxx operation
    to finish. There is a fundamental piece of information you are completely
    missing here: no memory location can be accessed concurrently in any way,
    shape, or form by any two paths of execution in any arrangement of cores
    from single-core to scores-of-cores. This is known to all programmers (or
    so I thought). Note that caching has to tap-dance carefully to maintain
    the illusion that caching does not exist, and therefore caching is
    irrelevant to this discussion.

    So there is no need for InterlockedRead ot InterlockedWrite because the
    atomicity is implicit in the implementation of the hardware. Do note that
    if the architecture requires, say, that you have to set a spin lock (e.g.,
    the IBM/360 series had exactly ONE interlocked instruction, Test-and-Set,
    TS, so InterlockedIncrement would be done by

    spin:
    TS globalLock
    branch-if-not-gotten spin
    L 0, Value
    ADD 0, #1
    ST 0, value
    XOR 0,0
    SB 0, globalLock

    as I recollect, L would load a (32-bit) word, ST would store a (32-bit)
    word, ADD would add a value to a register (0 would be Register 0). There
    were no immediate instructions, so #1 created a literal in the literal
    pool, and the instruction referenced that location (I will not go into the
    details that no instruction could directly address memory, only give a
    12-bit offset from a base register, not relevant to this discussion). SB
    stored an 8-bit byte. 16-bit values were called "halfwords", in case you
    were eondering. Note that an "interlocked store" would also make no sense
    in this architecture (I cannot come up with a scenario in which this could
    possibly make sense, and I suspect you can't, either). If you look at the
    consequences of a store at any point, the outcome does not change in a
    deterministic way if InterlockedWrite exists.

    >
    > 2) It is expected that you use an existing interlocked function to
    > simulate InterlockedWrite(). For example, if you discard the result of
    > InterlockedExchange() it is equivalent to InterlockedWrite(). However,
    > this is less efficient. I count 137 documented InterlockedXxx() APIs on
    > MSDN and most of these are variants that improve efficiency (omitting
    > memory barriers for example). So why leave out InterlockedWrite()?

    Because it is not needed. It's that simple.
    joe
    >
    > ---
    > NTDEV is sponsored by OSR
    >
    > Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
    >
    > OSR is HIRING!! See http://www.osr.com/careers
    >
    > For our schedule of WDF, WDM, debugging and other seminars visit:
    > http://www.osr.com/seminars
    >
    > To unsubscribe, visit the List Server section of OSR Online at
    > http://www.osronline.com/page.cfm?name=ListServer
    >
  • OSR_Community_UserOSR_Community_User Member Posts: 110,217
    > Joe, if you didn't comprehend the original question then you should have
    > asked for clarification or kept quiet.

    What part of the original question did I fail to comprehend? Other than
    the horrific dysfunctional example which was used to "prove" the need for
    an InterlockedWrite.

    I've been reasoning about concurrency for over four decades. I think I
    can not only comprehend a badly-formed question, but comprehend an
    artificially bogus answer which includes incorrect code, and claims that
    there is a problem because the incorrect code operates, well, incorrectly.
    joe

    >
    > ---
    > NTDEV is sponsored by OSR
    >
    > Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
    >
    > OSR is HIRING!! See http://www.osr.com/careers
    >
    > For our schedule of WDF, WDM, debugging and other seminars visit:
    > http://www.osr.com/seminars
    >
    > To unsubscribe, visit the List Server section of OSR Online at
    > http://www.osronline.com/page.cfm?name=ListServer
    >
  • Maxim_S._ShatskihMaxim_S._Shatskih Member Posts: 10,396
    >to use spinlocks. Why, then, is there no InterlockedRead() or InterlockedWrite()?

    InterlockedWrite is InterlockedExchange with result ignore

    --
    Maxim S. Shatskih
    Microsoft MVP on File System And Storage
    [email protected]
    http://www.storagecraft.com
  • Maxim_S._ShatskihMaxim_S._Shatskih Member Posts: 10,396
    > missing here: no memory location can be accessed concurrently in any way,

    ...unless it spans the cache line boundary, since in this case there will be 2 accesses at physical level.

    --
    Maxim S. Shatskih
    Microsoft MVP on File System And Storage
    [email protected]
    http://www.storagecraft.com
  • anton_bassovanton_bassov Member MODERATED Posts: 5,281
    > So there is no need for InterlockedRead ot InterlockedWrite because the atomicity is
    > implicit in the implementation of the hardware.

    I am afraid the OP would not accept this plainly obvious answer anyway. Therefore, I am out of it - any further attempts to explain something to the OP seem to be just a waste of time.....


    Anton Bassov
  • Alex_GrigAlex_Grig Member Posts: 3,238
    @Dr Joe:

    Lack of atomic modification (such as in IBM360) will cause the following problem even with a spinlock:

    Suppose A=1.
    The increment protected with a spinlock will fetch A. Then a different thread will set it to 3. Then the interlocked increment thread sets it to 2.

    Assignment to 3 will be lost. This is the hazard of architectures without hardware atomic guarantee.

    And a spinlock in a time slice OS is a lousy solution.
  • Prokash_Sinha-1Prokash_Sinha-1 Member - All Emails Posts: 1,214
    This situation to tackle this arised before there were HW support like
    most anything (

    Paging HW MMU - that was necessity due to overlay, segmentation etc on
    software
    Paging HW for host machines that is supposed to handle VM monitor
    Interrupt remapper, DMA remapper
    [...]
    U name it !

    That is why test-and-set instruction came to being ...

    I'm sure Joe know this very well, no need to mention that. Just too
    verbose answers from him got noticed to spearhead the logics he was
    trying to put forward I think!

    -pro

    On 4/2/2014 7:23 AM, [email protected] wrote:
    > @Dr Joe:
    >
    > Lack of atomic modification (such as in IBM360) will cause the following problem even with a spinlock:
    >
    > Suppose A=1.
    > The increment protected with a spinlock will fetch A. Then a different thread will set it to 3. Then the interlocked increment thread sets it to 2.
    >
    > Assignment to 3 will be lost. This is the hazard of architectures without hardware atomic guarantee.
    >
    > And a spinlock in a time slice OS is a lousy solution.
    >
    >
    > ---
    > NTDEV is sponsored by OSR
    >
    > Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
    >
    > OSR is HIRING!! See http://www.osr.com/careers
    >
    > For our schedule of WDF, WDM, debugging and other seminars visit:
    > http://www.osr.com/seminars
    >
    > To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
    >
Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. Sign in or register to get started.

Upcoming OSR Seminars
OSR has suspended in-person seminars due to the Covid-19 outbreak. But, don't miss your training! Attend via the internet instead!
Writing WDF Drivers 12 September 2022 Live, Online
Internals & Software Drivers 23 October 2022 Live, Online
Kernel Debugging 14 November 2022 Live, Online
Developing Minifilters 5 December 2022 Live, Online