Interleaved syncronization objects

Hi I need to synchronize access to shared resources with a mutex and a spinlock, which are taken and released in an interleaved fashion.

The code would look more or less like this:
ExAcquireFastMutex(&mutex);
[…]
oldirql = ExAcquireSpinLockExclusive(&spinlock);
[…]
ExReleaseFastMutex(&mutex);
[…]
ExReleaseSpinLockExclusive(&spinlock, oldirql);

While the code is logically correct, it however has the unwanted side effect of not properly restoring the original IRQL.
Assuming we enter the block at IRQL = PASSIVE_LEVEL, after acquiring the mutex we reach IRQL = APC_LEVEL.
The spinlock acquire call then raises it further to DISPATCH_LEVEL while returning APC_LEVEL which gets stored into oldirql.
Releasing the muxtex reverts the IRQL to that we had on entry: PASSIVE_LEVEL.
Finally releasing the spinlock restores the IRQL back to the value of oldirql, i.e. APC_LEVEL.
What’s the best and cleanest way to properly restore the original IRQL at the end of the block?

I’d also have a second, marginally related question: can the exclusive lock acquired on a spinlock be downgraded to shared? I saw the opposite is possible via ExTryConvertSharedSpinLockExclusive but i couldn’t find any downgrade funcion.
Thanks a lot!

ExReleaseFastMutex can only be called on APC_LEVEL.

It’s generally advised that locks were released in the reverse order of which they were acquired.

What problem are you trying to solve?

>While the code is logically correct…

I don’t think so. Alex is right, the first lock to be released is the last one that was acquired. You wouldn’t have IRQL issues in that case.

But why do you think you need two locks ?

Sent from my Windows 10 phone

De : xxxxx@0xacab.net
Envoyé le :samedi 1 avril 2017 16:56
À : Windows System Software Devs Interest List
Objet :[ntdev] Interleaved syncronization objects

Hi I need to synchronize access to shared resources with a mutex and a spinlock, which are taken and released in an interleaved fashion.

The code would look more or less like this:
ExAcquireFastMutex(&mutex);
[…]
oldirql = ExAcquireSpinLockExclusive(&spinlock);
[…]
ExReleaseFastMutex(&mutex);
[…]
ExReleaseSpinLockExclusive(&spinlock, oldirql);

While the code is logically correct, it however has the unwanted side effect of not properly restoring the original IRQL.
Assuming we enter the block at IRQL = PASSIVE_LEVEL, after acquiring the mutex we reach IRQL = APC_LEVEL.
The spinlock acquire call then raises it further to DISPATCH_LEVEL while returning APC_LEVEL which gets stored into oldirql.
Releasing the muxtex reverts the IRQL to that we had on entry: PASSIVE_LEVEL.
Finally releasing the spinlock restores the IRQL back to the value of oldirql, i.e. APC_LEVEL.
What’s the best and cleanest way to properly restore the original IRQL at the end of the block?

I’d also have a second, marginally related question: can the exclusive lock acquired on a spinlock be downgraded to shared? I saw the opposite is possible via ExTryConvertSharedSpinLockExclusive but i couldn’t find any downgrade funcion.
Thanks a lot!


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:>

That will definitely not work.The spinlock release will always set IRQL to
APC which is not what you want as you leave the sync logic. Usually you
want to leave it at the IRQL at which it was before acquiring the first
lock.

If you need shared access just use resources.

You need to describe your situation better. Why this scenario ? How did you
come to this solution ?

Cheers,
Gabriel
www.kasardia.com

On Sat, Apr 1, 2017 at 10:26 PM, wrote:

> >While the code is logically correct…
>
>
>
> I don’t think so. Alex is right, the first lock to be released is the last
> one that was acquired. You wouldn’t have IRQL issues in that case.
>
>
>
> But why do you think you need two locks ?
>
>
>
>
>
> Sent from my Windows 10 phone
>
>
>
> *De : *xxxxx@0xacab.net
> *Envoyé le :*samedi 1 avril 2017 16:56
> *À : *Windows System Software Devs Interest List
> Objet :[ntdev] Interleaved syncronization objects
>
>
>
> Hi I need to synchronize access to shared resources with a mutex and a
> spinlock, which are taken and released in an interleaved fashion.
>
>
>
> The code would look more or less like this:
>
> ExAcquireFastMutex(&mutex);
>
> […]
>
> oldirql = ExAcquireSpinLockExclusive(&spinlock);
>
> […]
>
> ExReleaseFastMutex(&mutex);
>
> […]
>
> ExReleaseSpinLockExclusive(&spinlock, oldirql);
>
>
>
>
>
> While the code is logically correct, it however has the unwanted side
> effect of not properly restoring the original IRQL.
>
> Assuming we enter the block at IRQL = PASSIVE_LEVEL, after acquiring the
> mutex we reach IRQL = APC_LEVEL.
>
> The spinlock acquire call then raises it further to DISPATCH_LEVEL while
> returning APC_LEVEL which gets stored into oldirql.
>
> Releasing the muxtex reverts the IRQL to that we had on entry:
> PASSIVE_LEVEL.
>
> Finally releasing the spinlock restores the IRQL back to the value of
> oldirql, i.e. APC_LEVEL.
>
> What’s the best and cleanest way to properly restore the original IRQL at
> the end of the block?
>
>
>
>
>
> I’d also have a second, marginally related question: can the exclusive
> lock acquired on a spinlock be downgraded to shared? I saw the opposite is
> possible via ExTryConvertSharedSpinLockExclusive but i couldn’t find any
> downgrade funcion.
>
> Thanks a lot!
>
>
>
>
>
>
>
> —
>
> NTDEV is sponsored by OSR
>
>
>
> Visit the list online at: http:> showlists.cfm?list=ntdev>
>
>
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
>
> Details at http:
>
>
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer&gt;
>
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: http:> showlists.cfm?list=ntdev>
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer&gt;
>


Bercea. G.</http:></http:></http:></http:>

Just preserve the lock hierarchy on unlock. If it happened that by historic reasons that unlock can be in reverse order somether deep in a call stack then add some unlock management. For example allocate on the stack or from the pool the structure

typedef struct _LockManagement{
LIST_ENTRY ListEntry;
PETHREAD Thread;
PKSPIN_LOCK SpinLockToUnlock;
KIRQL OldIrql;
PKMUTEX MutexToUnlock;
} LockManagement;

Add it in a global list protected by a spinlock at the mutext acqusition. Set MutexToUnlock to a mutext address being acquired.

Check for its presence before spinlock acquisition by comparing PsGetCurrentThread() with the Thread field value. Set SpinLockToUnlock pointer to a spinlock being acquired.

On mutex release check for a structure presence in the list, again PsGetCurrentThread() == LockManagement.Thread or just check that KeGetCurrentIrql() == DISPATCH_LEVEL . If found, check that a spinlock has been acquired and delay mutex release until spinlock being released.

On spinlock release check again for the structure in the list. If found check that there is a mutex being held and release it after the spinlock release. Remove the structure from the list when nothing left to release.

Thanks everyone for your quick replies.

So the idea is that there is a generic shared object: the main object.
The main object consists of an expanding set of slots which are, with one single exception described later[*], fully independent.
The main object is created with no slots.
A number of concurrent threads obtain access each to a dedicated unique slot allocated by the main object.
Since it’s uniquely assigned, each thread can and shall treat its assigned slot as its exclusive[*] property.

[*] There is an exception to the slot independency rule: under certain circumstances (maximum number of slots reached, timed events, user requests, etc) the main object (and all its slots) need to be handled as a whole, static thing. The main object is analysed and finally it is reset to the original empty state (i.e. all slots are discarded). During this analyse_and_reset process, access to the main object is exclusive.

So, in pseudo code (I’m still toying around with the design, so there is no actual code), the workers would work like:
W1. Obtain slot mutex (to guarantee unique allocation of slots)
W2. Get next available slot (from now own this thread is the absolute owner of the slot)
W3. Get a shared lock (communicate that work is being done on some slots - so analyze_and_reset must wait)
W4. Release slot mutex (let others tak their slot too)
W5. Operate on the slot in exclusive mode…
W6. Release shared lock (communicate that we are done operating on the slot - unblocks analyze_and_reset)

The analyze_and_reset code instead would look like this:
R1. Obtain slot mutex (to guarantee no further slot is assigned until we’re done with the reset, workers have to wait)
R2. Get an exclusive lock (wait till no more slots are being operated i.e. all workers are done with W5)
R3. Do business on the slots…
R4. Deallocate all the slots and reset the “next available slot” (i.e. reset the main object to its empty state)
R5. Release the exclusive lock (nothing really happens here since access to main object is still guarded by the mutex)
R6. Release the mutex (resume access to the main object)

The IRQL problem arises from the fact that W3 and W4 are “reversed”. However that is required in order to avoid that analyze_and_reset may discard a slot which is be
ing operated by a worker.
Hope that makes the picture clearer.
Thanks.

The set is discrete because a program is made of sequential instructions and a set is made of elements inserted in a sequential way. So a set is a list and your main object is just a list.

A number of concurrent threads obtain access each to a dedicated unique slot allocated by the main object.
Since it’s uniquely assigned, each thread can and shall treat its assigned slot as its exclusive[*] property.

The main object, does not allocate anything. This is done by the program or driver.

Each thread needs shared access (read-only) to the list to grab its dedicated slot. If that slot does not exist then that thread must create and insert the slot and therefore that thread needs exclusive access (write) to the list.

The main object is analysed and finally it is reset to the original empty state (i.e. all slots are discarded).

Removing all the elements of a list or just adding one element make no difference as you need exclusive access to the list in both cases.

I think you need an ERESOURCE, unless you need to acquire the lock at DISPATCH_LEVEL IRQL in which case you need a shared spin lock. At device IRQL (DIRQL), you must synchronize with an ISR and any access is exclusive

Sent from my Windows 10 phone

De : xxxxx@0xacab.net
Envoyé le :dimanche 2 avril 2017 16:52
À : Windows System Software Devs Interest List
Objet :RE:[ntdev] Interleaved syncronization objects

Thanks everyone for your quick replies.

So the idea is that there is a generic shared object: the main object.
The main object consists of an expanding set of slots which are, with one single exception described later[*], fully independent.
The main object is created with no slots.
A number of concurrent threads obtain access each to a dedicated unique slot allocated by the main object.
Since it’s uniquely assigned, each thread can and shall treat its assigned slot as its exclusive[*] property.

[*] There is an exception to the slot independency rule: under certain circumstances (maximum number of slots reached, timed events, user requests, etc) the main object (and all its slots) need to be handled as a whole, static thing. The main object is analysed and finally it is reset to the original empty state (i.e. all slots are discarded). During this analyse_and_reset process, access to the main object is exclusive.

So, in pseudo code (I’m still toying around with the design, so there is no actual code), the workers would work like:
W1. Obtain slot mutex (to guarantee unique allocation of slots)
W2. Get next available slot (from now own this thread is the absolute owner of the slot)
W3. Get a shared lock (communicate that work is being done on some slots - so analyze_and_reset must wait)
W4. Release slot mutex (let others tak their slot too)
W5. Operate on the slot in exclusive mode…
W6. Release shared lock (communicate that we are done operating on the slot - unblocks analyze_and_reset)

The analyze_and_reset code instead would look like this:
R1. Obtain slot mutex (to guarantee no further slot is assigned until we’re done with the reset, workers have to wait)
R2. Get an exclusive lock (wait till no more slots are being operated i.e. all workers are done with W5)
R3. Do business on the slots…
R4. Deallocate all the slots and reset the “next available slot” (i.e. reset the main object to its empty state)
R5. Release the exclusive lock (nothing really happens here since access to main object is still guarded by the mutex)
R6. Release the mutex (resume access to the main object)

The IRQL problem arises from the fact that W3 and W4 are “reversed”. However that is required in order to avoid that analyze_and_reset may discard a slot which is be
ing operated by a worker.
Hope that makes the picture clearer.
Thanks.


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:>

There is a much easier way to do what you describe that does not involve locking hierarchies or issues with IRQL.

For whatever is stored in each ?slot? maintain a list of free ?slots?. When a thread needs to acquire a ?slot?, dequeue the head of the list and when a thread is finished with a ?slot? requeue the now free ?slot? (either as head or tail depending if you want LIFO of FIFO reuse ? usually LIFO to work well with LRU based cache flush algorithms. When a thread detects that the list of free ?slots? is empty, rather than a stop the world reallocation process, simply have it dynamically allocate a new ?slot? (possibly with a hard limit where you fail the request) and when it is finished, requeue it. Thus there is an automatic balance between workload demand and resource consumption.

The synchronization scopes become smaller, the hard to solve problems with lock hierarchies and IRQL disappear and the performance improved ? what more can you ask?

Sent from Mailhttps: for Windows 10

From: xxxxx@0xacab.netmailto:xxxxx
Sent: April 2, 2017 10:52 AM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: RE:[ntdev] Interleaved syncronization objects

Thanks everyone for your quick replies.

So the idea is that there is a generic shared object: the main object.
The main object consists of an expanding set of slots which are, with one single exception described later[], fully independent.
The main object is created with no slots.
A number of concurrent threads obtain access each to a dedicated unique slot allocated by the main object.
Since it’s uniquely assigned, each thread can and shall treat its assigned slot as its exclusive[
] property.

[*] There is an exception to the slot independency rule: under certain circumstances (maximum number of slots reached, timed events, user requests, etc) the main object (and all its slots) need to be handled as a whole, static thing. The main object is analysed and finally it is reset to the original empty state (i.e. all slots are discarded). During this analyse_and_reset process, access to the main object is exclusive.

So, in pseudo code (I’m still toying around with the design, so there is no actual code), the workers would work like:
W1. Obtain slot mutex (to guarantee unique allocation of slots)
W2. Get next available slot (from now own this thread is the absolute owner of the slot)
W3. Get a shared lock (communicate that work is being done on some slots - so analyze_and_reset must wait)
W4. Release slot mutex (let others tak their slot too)
W5. Operate on the slot in exclusive mode…
W6. Release shared lock (communicate that we are done operating on the slot - unblocks analyze_and_reset)

The analyze_and_reset code instead would look like this:
R1. Obtain slot mutex (to guarantee no further slot is assigned until we’re done with the reset, workers have to wait)
R2. Get an exclusive lock (wait till no more slots are being operated i.e. all workers are done with W5)
R3. Do business on the slots…
R4. Deallocate all the slots and reset the “next available slot” (i.e. reset the main object to its empty state)
R5. Release the exclusive lock (nothing really happens here since access to main object is still guarded by the mutex)
R6. Release the mutex (resume access to the main object)

The IRQL problem arises from the fact that W3 and W4 are “reversed”. However that is required in order to avoid that analyze_and_reset may discard a slot which is be
ing operated by a worker.
Hope that makes the picture clearer.
Thanks.


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:></mailto:xxxxx></mailto:xxxxx></https:>

Hi all,
in the end ERESOURCE did the tricky.

Thanks a lot everyone!