mutex in SMP

Hi all,
Is there any issue while using mutex in multiprocessor systems?
Any scenario will be helpful.

Thanks !!!

Yes. It should be used whenever it is necessary, and not used when it is
not.

You must not create potential deadlock situations.

You might also consider FAST_MUTEX in certain contexts depending on your
goals.

Since the purpose of a mutex is to prevent concurrent access, particularly
in multiprocessor systems, there is certainly “an issue” (or possibly two
dozen issues) but the question is so ill-formed as to be unanswerable.
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@yahoo.com
Sent: Tuesday, December 30, 2008 10:16 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] mutex in SMP

Hi all,
Is there any issue while using mutex in multiprocessor systems?
Any scenario will be helpful.

Thanks !!!


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


This message has been scanned for viruses and dangerous content by
MailScanner, and is believed to be clean.

What kind of issues are you suspecting? There are no known bug with mutexs. The one quirky thing is that they can be recursively acquired on the same thread

d

Sent from my phone with no t9, all spilling mistakes are not intentional.

-----Original Message-----
From: xxxxx@yahoo.com
Sent: Tuesday, December 30, 2008 7:19 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] mutex in SMP

Hi all,
Is there any issue while using mutex in multiprocessor systems?
Any scenario will be helpful.

Thanks !!!


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Why can not we use mutex only in SMP? Why is that been suggested in many books that spin lock should be used instead mutex in SMP?

Is there any possibility of deadlock while using?

Mutexs only work at IRQL <dispatch_level and are fairly heavy weight calls>there is also the problem of FastMutexes raising IRQL to APC_LEVEL which can
impact things. Spinlocks work at DISPATCH_LEVEL (or higher with interupt
spinlocks) and are very lightweight (assuming that you do not have heavy
contention).

One of the most powerful things about Windows is the wide variety of
synchronization methods the system provides. A wise driver developer takes
advantage of the many capabilities the system gives you in this area.


Don Burn (MVP, Windows DDK)
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr

wrote in message news:xxxxx@ntdev…
> Why can not we use mutex only in SMP? Why is that been suggested in many
> books that spin lock should be used instead mutex in SMP?
>
> Is there any possibility of deadlock while using?
></dispatch_level>

> Is there any issue while using mutex in multiprocessor systems?

If you mean some mutex-related “issue” that is unique for MP system and not present on SP one, the only one that gets into my head is the performance one. Once all dispatcher objects on MP system are protected by a single dispatcher spinlock, frequently acquiring and releasing a mutex on MP system may have some impact on overall system performance, because while a given CPU does some dispatcher-related operation all other CPUs that attempt dispatcher-related operations (like context switch or mutex acquisition) have to spin on dispatcher spinlock.

This is the only answer to your question that may make some sense - otherwise, your question seems to be pretty meaningless in itself …

Anton Bassov

Oh, that’s a very different question. You cannot use a mutex at elevated
IRQL levels, in particular, you cannot use it at DISPATCH_LEVEL, and
therefore you cannot use it to synchronize access to an object that is used
both at PASSIVE_LEVEL and DISPATCH_LEVEL (for example, to protect a queue).
It has NOTHING to do with multiprocessor systems, and EVERTHING to do with
the notion of thread suspension, which is not permitted above PASSIVE_LEVEL.

It has nothing to do with deadlock. If you have a problem with deadlock, a
mutex vs. a fast mutex vs. a spin lock is probably not going to make much
difference.

It is suggested that spin locks be used because only spin locks (preferrably
queued spin locks) can synchronize DISPATCH_LEVEL with PASSIVE_LEVEL. Or
with antoher DISPATCH_LEVEL.

Using the principle “Lock the smallest amount of data for the shortest
possible time”, look at the cost of doing a spin lock (incredibly low if
there is a conflict) vs. the cost of a mutex (incredibly high if there is a
conflict) and realize that the average spin time is far lower than the cost
of calling KeWaitForSingleObject even if you were going to get acquisition,
let alone what happens if the thread is descheduled. Do you really want to
invoke the scheduler to deal with an operation that might take 10ns of
locked interval to complete? Doesn’t make sense.

So it is suggested that spin locks be used because (a) mutexes won’t work
and (b) they don’t even make sense most of the time.

Note that you can also synchronize an ISR with other levels using
KeSynchronizeExecution, but that’s a much deeper topic to go into.
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@yahoo.com
Sent: Tuesday, December 30, 2008 10:54 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] mutex in SMP

Why can not we use mutex only in SMP? Why is that been suggested in many
books that spin lock should be used instead mutex in SMP?

Is there any possibility of deadlock while using?


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


This message has been scanned for viruses and dangerous content by
MailScanner, and is believed to be clean.

“Joseph M. Newcomer” wrote in message
news:xxxxx@ntdev…
> Note that you can also synchronize an ISR with other levels using
> KeSynchronizeExecution, but that’s a much deeper topic to go into.

Actually for XP and later OS’es you can do
KeAcquireInterruptSpinLock/KeReleaseInterruptSpinLock, so
KeSynchronizeExecustion is mainly used when you need backwards
compatibility.


Don Burn (MVP, Windows DDK)
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr

> So it is suggested that spin locks be used because (a) mutexes won’t work and

(b) they don’t even make sense most of the time.

Well, the statement (b) is at least questionable…

If we ignore possible IRQL constraints, everything depends on how long you plan to occupy a critical section - spinlocks may be VERY expensive if used improperly…

Anton Bassov

If you follow the requirement of locking the smallest amount of data for =
the
shortest possible time, you find that a lot of situations involve doing
things like manipulating a few pointers (e.g., queue operations) then =
the
overhead of a mutex is a killer. Most of the time, that’s all you need. =
So
you look at the likelihood of L2 cache hits, and realize that most queue
operations take < 10ns. There’s no way that a mutex is going to improve
that performance.

So I say “most of the time” and you say “but in some cases spin locks =
have
high cost”. Why do you need to appear to disagree when you just restate
what I stated in the first place? In case you hadn’t noticed, this is =
all
about ENGINEERING, which means that some intelligence is required on the
part of the programmer. A programmer who uses spinlocks inappropriately
will produce a bad product. But I’ve seen people say that spin locks =
are
bad simply because they are spin locks, without any attempt to evaluate =
the
engineering aspects of the task; some professor told them, in their
sophomore year, that Mutexes Are Good and Spin Locks Are Bad, and they =
never
outgrew it. To counteract this, I give them the observation that =
Mutexes
Are Almost Always A Bad Choice, and leave it up to their engineering =
skills
to then understand what is going on. I give explanations, in detail, =
about
the relative merits of spin lock Classic, queued spin locks, fast =
mutexes
and mutexes, so they see the entire picture. =20

But when someone asks a question so mind-bogglingly na=EFve about there =
being
“issues” about the use of mutexes in multiprocessor systems, it suggests
that this person doesn’t actually understand any of the issues. Short =
of a
30-minute lecture, there’s no good way to condense a complex explanation
into a simple set of criteria other than saying that mutexes do not make
sense most of the time. This implies that *some* of the time, they do =
make
sense, and you had to chime in and say, in effect “some of the time, =
they do
make sense”, so what did you add here?
joe=20

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@hotmail.com
Sent: Tuesday, December 30, 2008 11:31 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] mutex in SMP

So it is suggested that spin locks be used because (a) mutexes won’t=20
work and
(b) they don’t even make sense most of the time.

Well, the statement (b) is at least questionable…

If we ignore possible IRQL constraints, everything depends on how long =
you
plan to occupy a critical section - spinlocks may be VERY expensive if =
used
improperly…

Anton Bassov=20


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:=20
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=3DListServer


This message has been scanned for viruses and dangerous content by
MailScanner, and is believed to be clean.

Much of what you’re trying to learn is summarized in the papers in the
following link:

http://www.microsoft.com/whdc/driver/kernel/default.mspx

Read the “Kernel Mode Basics” section.


Jake Oshins
Hyper-V I/O Architect
Windows Kernel Team

This post implies no warranties and confers no rights.


wrote in message news:xxxxx@ntdev…
> Hi all,
> Is there any issue while using mutex in multiprocessor systems?
> Any scenario will be helpful.
>
> Thanks !!!
>

> Is there any issue while using mutex in multiprocessor systems?

No issues. Works very well.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

>only one that gets into my head is the performance one. Once all dispatcher objects on MP system

are protected by a single dispatcher spinlock

Win7 got rid of this (as also of a single MmPfnLock).

I don’t know the details - possibly they re-implemented some parts of Ke and Mm lock-free, or splitted these locks to smaller ones.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

> Most of the time, that’s all you need. So you look at the likelihood of L2 cache hits, and

realize that most queue operations take < 10ns.

I just wonder how cache hits may be related to this discussion…

Look - spinlock operation by its very definition always involves bus locking (i.e. involves locked set-and-test , locked XADD ,etc - whatever instruction you use in your particular spinlock’s implementation it has to be prefixed with LOCK), and when #LOCK signal gets issued on the bus cache automatically gets invalidated. Therefore, don’t expect to hit the cache right after having acquired a spinlock…

So I say “most of the time” and you say “but in some cases spin locks have high cost”.
Why do you need to appear to disagree when you just restate what I stated in the first place?

Simply because you put it the way that may get easily misunderstood by a newbie like the OP - he is more than likely to come to the conclusion that kernel mutex is almost useless thing, and to start using spinlocks everywhere, including operations that may take quite a while to complete. For example, if you want to change the state of a global variable before and after an operation and ensure that no one sneaks in meanwhile (which is pretty common scenario, don’t you think), you are more than likely to be better off either with mutex ( if operation is not that frequent), or even with dedicated thread so that the whole thing
synchronizes itself if operations are frequent…

I would suggest the following logic. In order to qualify for MSFT certification requirements you should not spend more than 100 microseconds in DPC routine. Let’s say all the stuff is protected by a spinlock.
This leads us to the immediate conclusion that the ABSOLUTE MAXIMUM for any spinlock-protected operation is 100 microseconds (in practical terms, I would not allow more than 25% of it ) - if your operation may take longer than that, protecting it with a spinlock becomes unreasonable…

Anton Bassov

I was specifically referring to the L2 caching the pointers to the list
objects.

Accoroding to The Unabridged Pentium 4 (Mindshare series) pp 1180-1188, a
lock prefix does NOT invalidate the cache; it merely invalidates the cache
*line* that contains the memory target of the instruction. So the rest of
the cache, including very likely the pointers to the list, remains intact.

Note that spin locks are already specified as having a limited time budget:
10us. Anything longer requires a mutex and therefore requires passive
level.
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@hotmail.com
Sent: Tuesday, December 30, 2008 8:54 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] mutex in SMP

Most of the time, that’s all you need. So you look at the likelihood
of L2 cache hits, and realize that most queue operations take < 10ns.

I just wonder how cache hits may be related to this discussion…

Look - spinlock operation by its very definition always involves bus locking
(i.e. involves locked set-and-test , locked XADD ,etc - whatever
instruction you use in your particular spinlock’s implementation it has to
be prefixed with LOCK), and when #LOCK signal gets issued on the bus cache
automatically gets invalidated. Therefore, don’t expect to hit the cache
right after having acquired a spinlock…

So I say “most of the time” and you say “but in some cases spin locks have
high cost”.
Why do you need to appear to disagree when you just restate what I stated
in the first place?

Simply because you put it the way that may get easily misunderstood by a
newbie like the OP - he is more than likely to come to the conclusion that
kernel mutex is almost useless thing, and to start using spinlocks
everywhere, including operations that may take quite a while to complete.
For example, if you want to change the state of a global variable before and
after an operation and ensure that no one sneaks in meanwhile (which is
pretty common scenario, don’t you think), you are more than likely to be
better off either with mutex ( if operation is not that frequent), or even
with dedicated thread so that the whole thing synchronizes itself if
operations are frequent…

I would suggest the following logic. In order to qualify for MSFT
certification requirements you should not spend more than 100 microseconds
in DPC routine. Let’s say all the stuff is protected by a spinlock.
This leads us to the immediate conclusion that the ABSOLUTE MAXIMUM for any
spinlock-protected operation is 100 microseconds (in practical terms, I
would not allow more than 25% of it ) - if your operation may take longer
than that, protecting it with a spinlock becomes unreasonable…

Anton Bassov


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


This message has been scanned for viruses and dangerous content by
MailScanner, and is believed to be clean.

> Accoroding to The Unabridged Pentium 4 (Mindshare series) pp 1180-1188, a lock prefix

does NOT invalidate the cache; it merely invalidates the cache *line* that contains the memory
target of the instruction.

Correct. Not let’s look at it from the practical standpoint. IIRC, cache line is 64K, i.e. 16 pages, which is more than likely to cover all your driver’s data section. Therefore, if target data and spinlock that protects it are in your data section, by accessing spinlock you will invalidate the cache line with all your data…

Anton Bassov

A small correction, cache line sizes are usally smaller ( about 512 bytes
OTOH )!

Also pointer manipulation on a list ( created out of purely dynamic
allocation ) is a boon for cache misses.

Well, as if I’ve to sign something !
Prokash Sinha
http://prokash.squarespace.com
Success has many fathers, but failure is an orphan.

----- Original Message -----
From:
To: “Windows System Software Devs Interest List”
Sent: Wednesday, December 31, 2008 8:12 AM
Subject: RE:[ntdev] mutex in SMP

>> Accoroding to The Unabridged Pentium 4 (Mindshare series) pp 1180-1188, a
>> lock prefix
>> does NOT invalidate the cache; it merely invalidates the cache line
>> that contains the memory
>> target of the instruction.
>
> Correct. Not let’s look at it from the practical standpoint. IIRC, cache
> line is 64K, i.e. 16 pages, which is more than likely to cover all your
> driver’s data section. Therefore, if target data and spinlock that
> protects it are in your data section, by accessing spinlock you will
> invalidate the cache line with all your data…
>
>
> Anton Bassov
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer

Sorry, they did a big increase with P4 from 32 byte to 64 byte cache line
size. Your numbers are way off. On a related note, why people use the
Mindshare books when the Intel manuals are online has always confused me.
Twice in the last 10 years I have seen serious derailment of projects thanks
to Mindshare, once a software problem used through out some code due to
flaws in their pentium manual, and the really serious one, of a PCI board
that messed up any system it was inserted in since their book on PCI has
some serious errors.


Don Burn (MVP, Windows DDK)
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr
Remove StopSpam to reply

wrote in message news:xxxxx@ntdev…
>> Accoroding to The Unabridged Pentium 4 (Mindshare series) pp 1180-1188, a
>> lock prefix
>> does NOT invalidate the cache; it merely invalidates the cache line
>> that contains the memory
>> target of the instruction.
>
> Correct. Not let’s look at it from the practical standpoint. IIRC, cache
> line is 64K, i.e. 16 pages, which is more than likely to cover all your
> driver’s data section. Therefore, if target data and spinlock that
> protects it are in your data section, by accessing spinlock you will
> invalidate the cache line with all your data…
>
>
> Anton Bassov
>

Unlesss you’re not talking about the per-CPU cache, cache lines are usually 32 (or 64 for new processors) bytes long.

? S

-----Original Message-----
From: xxxxx@hotmail.com
Sent: Wednesday, December 31, 2008 10:13
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] mutex in SMP

> Accoroding to The Unabridged Pentium 4 (Mindshare series) pp 1180-1188, a lock prefix
> does NOT invalidate the cache; it merely invalidates the cache line that contains the memory
> target of the instruction.

Correct. Not let’s look at it from the practical standpoint. IIRC, cache line is 64K, i.e. 16 pages, which is more than likely to cover all your driver’s data section. Therefore, if target data and spinlock that protects it are in your data section, by accessing spinlock you will invalidate the cache line with all your data…

Anton Bassov


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

> Your numbers are way off.

Duh!!! I got suspicious almost immediately after having typed my post, then double-checked with Intel manual… and realized that I was 1024 times(!!!) off…

In other words, I put a foot in my mouth yet another time - nothing particularly new here…

Anton Bassov