Actually, deadlocks are a well-known CS topic. There are (in essence)
TWO ways of handling deadlocks: detection or avoidance.
In a deadlock detection system, the ability to detect a deadlock is an
inherent part of the system. When a deadlock is detected a corrective
action is taken to break the deadlock. For example, in a transactional
system, one transaction is forced to abort and this allows the other to
continue. Such systems are really essential when we have large, complex
interaction schemes where there is no agreement upon cooperation. This
is a very common solution in the database world, for instance, where
random new applications can be written to access various records of
various tables and lock them.
In a deadlock avoidance system, the order of lock acquisition is modeled
in a hierarchy (or more formally a directed acyclic graph). Since there
is a canonical order for lock acquisition, there is no possibility of
deadlock (assuming correct implementation).
In my experience, OS developers tend to prefer the latter implementation
(deadlock avoidance) because the OS seldom has an embedded transactional
system to ease the pain of handling runtime abort (in other words, our
“runtime abort” is a reboot). However, there are some interesting new
transactional technologies appearing within the Windows OS so perhaps
this will change in the future.
But, for the time being, driver writers (device or file system) will
stick with the deadlock avoidance system.
Of course what is great fun is when you combine this with a reentrant
OS!
Regards,
Tony
Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com
Looking forward to seeing you at the Next OSR File Systems Class October
18, 2004 in Silicon Valley!
-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Mats PETERSSON
Sent: Friday, October 29, 2004 9:45 AM
To: ntdev redirect
Subject: Re:[ntdev] Locks and Deadlocks
Hi Rod,
You haven’t got anything wrong as such. You just have to make sure that
locks are done in a structured, hierarchical, consistant order.
So, in this particular case, you always need to lock A first, then B if
you
need both. Of course, if you ONLY need A, or ONLY need B, you can lock
only
one of them.
The problem occurs when you have two places that lock both A and B, and
one
of them locks A then B, the other locks B then A. If you have the right
amount of (un)luck, this will deadlock, because Thread 1 locks A, Thread
2
locks B and then Thread 1 tries to lock B and waits forever, whilst
Thread
2 is waiting for A to be released.
There are of course other solutions:
- Not having infiinite wait for the lock to be available and if not
successful in the first X microseconds, release any current lock and
start
over.
- Use only one lock for all A + B data structures.
But any solution which locks two (or more) different locks in different
order is very likely to cause deadlocks if it’s not guaranteed that at
all
times, no other thread will EVER take hold of more than one lock at
any
given time.
The BIG problem with this, is that you may well be running the software
in
the lab for DAYS or WEEKS, and never find a problem. Only once you ship
it,
and some customer with exactly the right setup to cause a deadlock will
find the “perfect” timing to cause a deadlock every so often. And guess
what: This is almost certain to be an important customer who doesn’t
like
to see his machine(s) lock up every now and then…
And by the way, as someone pointed out, you need to make sure that you
understand how the OS deals with things too. We’ve had this happen to
the
displaydriver… Windows is supposed to only allow one thread at any
given
time to access the display driver (except when the PDEV is disabled,
which
is a side point). Unfortunately, this is broken. So the simple approach
is
to just add a lock on the entry into the driver. NBD. Except that if we
call back into Windows (EngXXXX calls), we have to let go of the lock,
because Windows may well call back into our driver, which means that
we’re
currently holding the lock and trying to acquire it at the same time…
Deadlock
–
Mats
xxxxx@lists.osr.com wrote on 10/29/2004 02:21:23 PM:
OK, I’ll bite, I never could avoid a good rat-hole and this is a
mink-hole.
So I have some code in a single thread which goes:
while (1)
{
lock(a)
manipulate(A); // A is protected by A
lock(b);
manipulate(B);
manipulate_some_more(A); // A still locked for good reasons
unlock(B);
unlock(A);
// time passes
lock(b);
mung(B);
lock(a);
mung(A, B); // both locked
unlock(a);
unlock(b);
}
OK, so this may trigger some warnings but there’s just one thread and
all
is
OK.
But
- we are locking a and locking b. So, there are other threads of
execution
which can manipulate A and B
- there are also good reasons for holding both locks (say we put A
into
a
transitional state while we play with B)
This is still OK, but what happens if the other threads of execution
ALSO
need to hold both locks - say because they want to do the same
manipulation
as above. There is no safe order for them to acquire their locks (not
unless they are incredibly deeply involved in the logic of the thread
above).
OK, so what have I got wrong?
Questions? First check the Kernel Driver FAQ at http://www.
osronline.com/article.cfm?id=256
You are currently subscribed to ntdev as: xxxxx@3dlabs.com
To unsubscribe send a blank email to xxxxx@lists.osr.com
ForwardSourceID:NT00006606
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256
You are currently subscribed to ntdev as: xxxxx@osr.com
To unsubscribe send a blank email to xxxxx@lists.osr.com