Re: Problem using queued spinlocks on single cpu comp uter

Pure insanity!

Jamey Kirby, Windows DDK MVP
StorageCraft Inc.
xxxxx@storagecraft.com
http://www.storagecraft.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Moreira, Alberto
Sent: Thursday, September 25, 2003 11:55 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Re: Problem using queued spinlocks on single cpu comp uter

It all depends on what you want to do with the bits and with the uchars as
you define them. If the functionality is complicated enough, yes, it may
make sense to encapsulate things that way. It all depends on what you want
to accomplish.

Furthermore, we have bit functionality in C, we have “unsigned char”, and so
on - why do we need BIT, UCHAR ? Waste of time, no ? Or maybe there’s some
additional effects people want to achieve that can’t be done the old way ?

And you know, who needs structs if you can make do with arrays ? Long live
Fortran II. But hey, at some point we might want to retire that Model T.

Alberto.

-----Original Message-----
From: Jamey Kirby [mailto:xxxxx@storagecraft.com]
Sent: Thursday, September 25, 2003 1:44 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Re: Problem using queued spinlocks on single cpu comp
uter

Oh, I got it… Let’s encapsulate the actual bits… That should make for
some very manageable and robust driver code; NOT!

class BIT
{
};

class UCHAR : BIT
{
};

class WCHAR : UCHAR
{
};

Oooo… Ahhhh… Thing are becoming more clear and more robust; don’t you
agree? Heck, my grand mother could maintain this code.

Jamey Kirby, Windows DDK MVP
StorageCraft Inc.
xxxxx@storagecraft.com
http://www.storagecraft.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Moreira, Alberto
Sent: Thursday, September 25, 2003 10:17 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Re: Problem using queued spinlocks on single cpu comp uter

In object oriented design, resources are no longer pure data, they’re now
objects ! The use of an object is internal to that object, only interfaces
are published. Otherwise you must give up having a healthy encapsulation,
just look at that “friend” statement you need in the class. Another problem
I see with this design, as I told Max, is that the queue handle is now
attached to the Acquirer and not to the Lock: if two different pieces of
code try to acquire the same spinlock, you will end up with two queues,
which defeats the purpose of using a queued lock.

Alberto.

-----Original Message-----
From: Chuck Batson [mailto:xxxxx@cbatson.com]
Sent: Thursday, September 25, 2003 12:46 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Re: Problem using queued spinlocks on single cpu comp
uter

That’s an interesting statement, Chuck, what advantages do you see in
separating these two ?

The conceptual separation of a resource from its use. We can use the
current discussion as the basis for an example:

class CSpinLockAcquire;

class CSpinLock
{
public:
CSpinLock() { KeInitializeSpinLock(&SpinLock); }
private:
Lock(KLOCK_QUEUE_HANDLE & h) {
KeAcquireInStackQueuedSpinLock(&SpinLock, &h); }
Unlock(KLOCK_QUEUE_HANDLE & h) {
KeReleaseInStackQueuedSpinLock(&SpinLock, &h); }
KSPIN_LOCK SpinLock;
friend class CSpinLockAcquire;
};

class CSpinLockAcquire
{
public:
CSpinLockAcquire(CSpinLock & s) : SpinLock(s) {
s.Lock(Handle); }
~CSpinLockAcquire() { s.Unlock(Handle); }
private:
CSpinLock & SpinLock;
KLOCK_QUEUE_HANDLE Handle;
};

This conceptual separation is useful any time you have a resource which
must be acquired/released, locked/unlocked, etc. There are a few
concrete benefits:

  1. Exception safety. Since locking and unlocking is handled by the
    constructor/destructor of CSpinLockAcquire, you can never have a
    situation of a lock not being matched with a corresponding unlock, even
    in the presence of exceptions.

  2. Human error and maintainability. For the same reason, it’s
    impossible for the programmer to “forget” to unlock a resource.
    Granted, if you forget to release a spin lock, you’re likely to know
    pretty quickly, but it still wastes time and effort diagnosing the
    problem. The same general principle applies to other types of resources
    (e.g. not necessarily spin locks) where usage may be more involved
    and/or subtle and hence more prone to difficult-to-diagnose “forgetting
    to unlock” errors.

  3. Acquisition/release data is separate from resource data. This is
    good for two reasons:

a. Memory storage for the resource object itself may be precious.
Consider the “single class” implementation where the spin lock and its
queue handle are stored together. The spin lock has to go into shared
memory which is accessible to all those who might potentially acquire
it. There’s no reason for the queue handle to be shared, since it’s
only used by one acquirer at a time. So separating them is beneficial
when use of shared memory has performance implications.

b. Resource acquisition data sometimes must be separate from the
resource, especially when it’s possible for more that one entity to
acquire a given resource at the same time. This is less useful in this
case, when only one processor can acquire a spinlock at any given time.
(Though it’s still a bit aesthetically displeasing to me, when you think
about two different processors calling KeAcquireInStackQueuedSpinLock()
with a pointer to the same queue handle; “luckily” only one of them will
succeed in acquiring the spin lock, so there will never be more than one
user of the queue handle… but still makes me shudder.) But there are
situations in which this is necessary. For example, a resource which
has both read and write acquires, where multiple readers are allowed but
only a single writer (with no readers) is allowed. Since you can have
multiple readers, obviously it won’t do to put the acquisition handle
(or whatever relevant acquisition-related information) in the resource
object itself.

Chuck


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@compuware.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it.


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@storagecraft.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@compuware.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it.


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@storagecraft.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

The problem is that you have added hidden complexities. Isn’t the ideal
objective in this field to simplify? We are talking device drivers here, not
the application used to process the national income taxes or the
applications used for military simulations.

I was given a device driver to fix once. It was a SCSI HBA miniport. The
code I was given had no DMA support, no DISCONNECT support and the binary
driver image was around 65K.

I spent three days, read the 400 page HBA specification and wrote a driver
that supported 32 bit DMA and DISCONNECT (among other chipset features) and
the binary was 4K. The actual code section was 1.2K

My point is that you must stop thinking in terms of complex systems. If you
simplify (and this is an art too), the upper level complexities will work
themselves out.

Jamey Kirby, Windows DDK MVP
StorageCraft Inc.
xxxxx@storagecraft.com
http://www.storagecraft.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Moreira, Alberto
Sent: Thursday, September 25, 2003 11:55 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Re: Problem using queued spinlocks on single cpu comp uter

It all depends on what you want to do with the bits and with the uchars as
you define them. If the functionality is complicated enough, yes, it may
make sense to encapsulate things that way. It all depends on what you want
to accomplish.

Furthermore, we have bit functionality in C, we have “unsigned char”, and so
on - why do we need BIT, UCHAR ? Waste of time, no ? Or maybe there’s some
additional effects people want to achieve that can’t be done the old way ?

And you know, who needs structs if you can make do with arrays ? Long live
Fortran II. But hey, at some point we might want to retire that Model T.

Alberto.

-----Original Message-----
From: Jamey Kirby [mailto:xxxxx@storagecraft.com]
Sent: Thursday, September 25, 2003 1:44 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Re: Problem using queued spinlocks on single cpu comp
uter

Oh, I got it… Let’s encapsulate the actual bits… That should make for
some very manageable and robust driver code; NOT!

class BIT
{
};

class UCHAR : BIT
{
};

class WCHAR : UCHAR
{
};

Oooo… Ahhhh… Thing are becoming more clear and more robust; don’t you
agree? Heck, my grand mother could maintain this code.

Jamey Kirby, Windows DDK MVP
StorageCraft Inc.
xxxxx@storagecraft.com
http://www.storagecraft.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Moreira, Alberto
Sent: Thursday, September 25, 2003 10:17 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Re: Problem using queued spinlocks on single cpu comp uter

In object oriented design, resources are no longer pure data, they’re now
objects ! The use of an object is internal to that object, only interfaces
are published. Otherwise you must give up having a healthy encapsulation,
just look at that “friend” statement you need in the class. Another problem
I see with this design, as I told Max, is that the queue handle is now
attached to the Acquirer and not to the Lock: if two different pieces of
code try to acquire the same spinlock, you will end up with two queues,
which defeats the purpose of using a queued lock.

Alberto.

-----Original Message-----
From: Chuck Batson [mailto:xxxxx@cbatson.com]
Sent: Thursday, September 25, 2003 12:46 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Re: Problem using queued spinlocks on single cpu comp
uter

That’s an interesting statement, Chuck, what advantages do you see in
separating these two ?

The conceptual separation of a resource from its use. We can use the
current discussion as the basis for an example:

class CSpinLockAcquire;

class CSpinLock
{
public:
CSpinLock() { KeInitializeSpinLock(&SpinLock); }
private:
Lock(KLOCK_QUEUE_HANDLE & h) {
KeAcquireInStackQueuedSpinLock(&SpinLock, &h); }
Unlock(KLOCK_QUEUE_HANDLE & h) {
KeReleaseInStackQueuedSpinLock(&SpinLock, &h); }
KSPIN_LOCK SpinLock;
friend class CSpinLockAcquire;
};

class CSpinLockAcquire
{
public:
CSpinLockAcquire(CSpinLock & s) : SpinLock(s) {
s.Lock(Handle); }
~CSpinLockAcquire() { s.Unlock(Handle); }
private:
CSpinLock & SpinLock;
KLOCK_QUEUE_HANDLE Handle;
};

This conceptual separation is useful any time you have a resource which
must be acquired/released, locked/unlocked, etc. There are a few
concrete benefits:

  1. Exception safety. Since locking and unlocking is handled by the
    constructor/destructor of CSpinLockAcquire, you can never have a
    situation of a lock not being matched with a corresponding unlock, even
    in the presence of exceptions.

  2. Human error and maintainability. For the same reason, it’s
    impossible for the programmer to “forget” to unlock a resource.
    Granted, if you forget to release a spin lock, you’re likely to know
    pretty quickly, but it still wastes time and effort diagnosing the
    problem. The same general principle applies to other types of resources
    (e.g. not necessarily spin locks) where usage may be more involved
    and/or subtle and hence more prone to difficult-to-diagnose “forgetting
    to unlock” errors.

  3. Acquisition/release data is separate from resource data. This is
    good for two reasons:

a. Memory storage for the resource object itself may be precious.
Consider the “single class” implementation where the spin lock and its
queue handle are stored together. The spin lock has to go into shared
memory which is accessible to all those who might potentially acquire
it. There’s no reason for the queue handle to be shared, since it’s
only used by one acquirer at a time. So separating them is beneficial
when use of shared memory has performance implications.

b. Resource acquisition data sometimes must be separate from the
resource, especially when it’s possible for more that one entity to
acquire a given resource at the same time. This is less useful in this
case, when only one processor can acquire a spinlock at any given time.
(Though it’s still a bit aesthetically displeasing to me, when you think
about two different processors calling KeAcquireInStackQueuedSpinLock()
with a pointer to the same queue handle; “luckily” only one of them will
succeed in acquiring the spin lock, so there will never be more than one
user of the queue handle… but still makes me shudder.) But there are
situations in which this is necessary. For example, a resource which
has both read and write acquires, where multiple readers are allowed but
only a single writer (with no readers) is allowed. Since you can have
multiple readers, obviously it won’t do to put the acquisition handle
(or whatever relevant acquisition-related information) in the resource
object itself.

Chuck


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@compuware.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it.


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@storagecraft.com
To unsubscribe send a blank email to xxxxx@lists.osr.com


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@compuware.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it.


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@storagecraft.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

> real compiler that allows us to write drivers, and with automatic garbage

collection you don’t want to mix object destruction with anything else. So,

Oh yes, garbage collector is great for drivers. I just imagined garbage
collecting the NT’s IRPs :slight_smile:

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

Alberto,

Here’s some C code (for illustrational purposes only):

static KSPIN_LOCK MyGlobalSpinLock;

void Thread1(void)
{
KLOCK_QUEUE_HANDLE qh;

// do some non-critical stuff

// grab the spin lock (assume KeInitializeSpinLock() has already
been called)
KeAcquireInStackQueuedSpinLock(&MyGlobalSpinLock, &qh);

// do some critical stuff

// release the spin lock
KeReleaseInStackQueuedSpinLock(&MyGlobalSpinLock, &qh);

// do more non-critical stuff
}

void Thread2(void)
{
KLOCK_QUEUE_HANDLE qh;

// do some non-critical stuff

// grab the spin lock (assume KeInitializeSpinLock() has already
been called)
KeAcquireInStackQueuedSpinLock(&MyGlobalSpinLock, &qh);

// do some critical stuff

// release the spin lock
KeReleaseInStackQueuedSpinLock(&MyGlobalSpinLock, &qh);

// do more non-critical stuff
}

Now again in C++, using the CSpinLock and CSpinLockAcquire classes I
posted earlier:

static CSpinLock MyGlobalSpinLock;

void Thread1(void)
{
// do some non-critical stuff

{
// grab the spin lock
CSpinLockAcquire acq(MyGlobalSpinLock)
// do some critical stuff
// release the spin lock (done in ~CSpinLockAcquire())
}

// do more non-critical stuff
}

void Thread2(void)
{
// do some non-critical stuff

{
// grab the spin lock
CSpinLockAcquire acq(MyGlobalSpinLock)
// do some critical stuff
// release the spin lock (done in ~CSpinLockAcquire())
}

// do more non-critical stuff
}

The point is NOT to compare or weigh the merits/pitfalls of C or C++,
but to point out that:

  1. If your C++ compiler is decent, the code generated from the C++
    example ends up being the same as generated from the C example. This
    C++ design certainly has the potential to generate exactly the same
    code, and if your compiler doesn’t it should. So I don’t understand
    what you mean by “taxing the spinlogic logic”. Your use of the phrase
    “create and destroy an object every time you use the Spinlock” sounds as
    if there’s some heavy-duty performance-sapping mysterious hidden C++
    things being done behind your back, when in fact nothing more or less is
    being done than is done in the C example. The exact same things are
    done in the C and C++ examples. Simply that with the C++ dual-class
    design, you get exception safety and help prevent programmer human
    error.

  2. You only need one CSpinLock for any given resource, same as you need
    only one KSPIN_LOCK for any given resource. You can reuse them (both)
    over and over like a file handle. The transient data is the
    CSpinLockAcquire or KLOCK_QUEUE_HANDLE, which is needed on a
    per-acquisition basis. I haven’t had opportunity to check what
    sizeof(KLOCK_QUEUE_HANDLE) is but I imagine it’s small, on the order of
    4 or 8 bytes. And there’s no initialization that needs to be done for a
    KLOCK_QUEUE_HANDLE, so there’s no performance penalty for having a new
    KLOCK_QUEUE_HANDLE for every acquisition. There IS, however, a
    performance penalty with bundling the KSPIN_LOCK and KLOCK_QUEUE_HANDLE
    together IF memory for shared objects is a precious, limited resource.

  3. Regardless of your personal preference about where to stick the
    KLOCK_QUEUE_HANDLE object (i.e., whether it should go in the CSpinLock
    class or the CSpinlockAcquire helper class), you should still use a
    second class for acquisition/release to enjoy the benefits of exception
    safety and preventing programmer human error.

  4. In another post, you seemed to object to the idea of “being dictated
    when to release”, using the example of not being able to acquire a lock
    in one thread, pass the lock to another thread, and terminate the
    original thread. This is, of course, indeed possible in C++ using the
    above class design. Again, there is in fact no conceptual difference
    between using C and C++ in this case – NOT (for the flame war paranoid,
    I repeat NOT) to compare C and C++, but to illustrate that there’s
    nothing “mysterious” going on behind the scenes in C++. If you want to
    pass a CSpinlockAcquire object to another thread and terminate, allocate
    it on the heap using new, and pass the pointer to the other thread.
    When the other thread is done with it, delete the object – at which
    point the spin lock will be released by the ~CSpinlockAcquire()
    destructor. Better yet, use std::auto_ptr and get the exact same thing
    with exception safety and prevention of programmer human error. C++
    does not take control away from you, the programmer, as you seem to
    suggest.

Chuck

P.S. To the general public – NOTHING in this post should be construed
as flame war material. NOTHING. If you scream “flame war” you’re
missing the point entirely. We are currently discussing particular C++
designs within the scope of the C++ language. Some C code was
introduced for comparison of generated code ONLY. If your particular
preference is to avoid C++, please feel free to avoid this post, this
thread, and screams of “flame war.” Thank you.

----- Original Message -----
From: “Moreira, Alberto”
To: “Windows System Software Devs Interest List”
Sent: Friday, September 26, 2003 4:51 AM
Subject: [ntdev] Re: Problem using queued spinlocks on single cpu comp
uter

> You can grab the LOCK_QUEUE_HANDLE structure inside the Acquire, store
it in
> a member variable in the spinlock. Come release time, delete the
structure.
> All invisible to all but the spinlock object, and no assumptions about
the
> caller’s structures. But you see, this whole design already taxes the
> spinlock logic beyond what it should be, because now you have to
create and
> destroy an object every time you use the Spinlock, be it the
Acquisitor or
> the Queue Handle. I would have expected the facility to have been
designed
> so that I could grab whatever data structures I needed at object
creation
> time, and just reuse it over and over, like a file handle.
>
> Alberto.
>
>
> -----Original Message-----
> From: Maxim S. Shatskih [mailto:xxxxx@storagecraft.com]
> Sent: Thursday, September 25, 2003 4:09 PM
> To: Windows System Software Devs Interest List
> Subject: [ntdev] Re: Problem using queued spinlocks on single cpu comp
uter
>
>
> > Must be a question of personal preference, because I don’t see why
> >
> > CSpinLockAcquisitor(&SomeGlobalLock);
> >
> > is better than
> >
> > SomeGlobalLock.Acquire( );
>
> Because, for queued spinlock, you will need a LOCK_QUEUE_HANDLE.
>
> Maxim Shatskih, Windows DDK MVP
> StorageCraft Corporation
> xxxxx@storagecraft.com
> http://www.storagecraft.com