Problem using queued spinlocks on single cpu computer

We were using regular spinlocks until we found a problem on a multi-cpu
computer (spinlocks were not on a first-come first-served basis). After
having read the DDK we found the queued spinlocks. We tested them and it
fixed our multi-cpu bug, but trying the same working code on a single cpu
computer resulted by having a slower computer. The CPU usage is always
~60% and the indicator is all red which mean Kernel usage. Can we use
queued spinlocks on single cpu computers ? Do I have to detect the number
of cpus before using spinlocks ?

Thanks

“Daniel Simard” wrote in message
>
> ~60% and the indicator is all red which mean Kernel usage. Can we use
> queued spinlocks on single cpu computers ?
>

Yes, of course.

>
> Do I have to detect the number
> of cpus before using spinlocks ?
>

No.

Sounds to me like you have a bug. We’re talking in-stack queued spin locks
here, right? Don’t forget that you still have to iniitalize the spin lock
with KeInitializeSpinLock before its first use. Also, realize that a spin
lock is either an in-stack queued spin lock OR a “regular” executive spin
lock. In other words, you can’t call KeAcquireSpinLock on it one place and
KeAcquireInStackQueueSpinLock on it in another place. Though the same
KSPIN_LOCK data structure is sued, these are actually two entirely different
locking mechanisms.

Peter
OSR

I never use regular and queued spinlock at the same time (mixed together).
I use one type or the other for all my spinlocks. We are using spinlocks
into our drivers through an generic object for all our drivers, so I can
decide for all our drivers what type of spinlocks we use. I of course
initialize the spinlocks with KeInitializeSpinLock before calling the
acquire/release functions. I found the my spinlock object (in queued type
mode) works fine and fix my problem on multi-cpu computers, but slows down
the single cpu computers. All my drivers are always tested through
DriverVerifier with all checks enabled.

Here is my new code:

CSpinLock::CSpinLock()
{
KeInitializeSpinLock(&m_oSpinLock);
}

CSpinLock::Lock()
{
KLOCK_QUEUE_HANDLE* psLockQueue;

psLockQueue = new KLOCK_QUEUE_HANDLE;
KeAcquireInStackQueuedSpinLock(&m_oSpinLock, psLockQueue);
m_psLockQueue = psLockQueue;
}

CSpinLock::Unlock()
{
KLOCK_QUEUE_HANDLE* psLockQueue;

psLockQueue = m_psLockQueue;
m_psLockQueue = 0;
KeReleaseInStackQueuedSpinLock(psLockQueue);
delete psLockQueue;
}

Here is my old code:

CSpinLock::CSpinLock()
{
KeInitializeSpinLock(&m_oSpinLock);
}

CSpinLock::Lock()
{
KeAcquireSpinLock(&m_oSpinLock, &m_oOldIrql);
}

CSpinLock::Unlock()
{
KeReleaseSpinLock(&m_oSpinLock, m_oOldIrql);
}

> CSpinLock::Lock()

{
KLOCK_QUEUE_HANDLE* psLockQueue;

psLockQueue = new KLOCK_QUEUE_HANDLE;

Red flag. Why are you using new here? You only have one
KLOCK_QUEUE_HANDLE object at a time for each CSpinLock. Make it a
member variable. Memory allocation overhead can be a significant
performance drain.

KeAcquireInStackQueuedSpinLock(&m_oSpinLock, psLockQueue);
m_psLockQueue = psLockQueue;

Whoa. This will leak memory if you Lock() twice without an intervening
Unlock().

}

CSpinLock::Unlock()
{
KLOCK_QUEUE_HANDLE* psLockQueue;

psLockQueue = m_psLockQueue;
m_psLockQueue = 0;
KeReleaseInStackQueuedSpinLock(psLockQueue);
delete psLockQueue;

Why do you use the intermediate variable psLockQueue?

}

Here is my old code:

CSpinLock::CSpinLock()
{
KeInitializeSpinLock(&m_oSpinLock);
}

CSpinLock::Lock()
{
KeAcquireSpinLock(&m_oSpinLock, &m_oOldIrql);
}

CSpinLock::Unlock()
{
KeReleaseSpinLock(&m_oSpinLock, m_oOldIrql);
}

Chuck

“Daniel Simard” wrote in message
news:xxxxx@ntdev…
>
> mode) works fine and fix my problem on multi-cpu computers, but slows down
> the single cpu computers.

Well, I don’t know what to tell you… Just like KeAcquireSpinLock(…),
KeAcquireInStackQueuedSpinLock does nothing but an IRQL raise on the
uniprocessor build of the O/S. If you trace into the code, it should be
very straight forward.

>
> Here is my new code:
>
> CSpinLock::CSpinLock()
> {
> KeInitializeSpinLock(&m_oSpinLock);
> }
>

Ah, C++ – Sorry, I don’t speak this language.

Peter
OSR

> CSpinLock::CSpinLock()

By the way, a better design is to use two separate classes: one for the
spin lock object itself, and another for acquisition/release.

Chuck

Thanks to Chuck, that was the problem. I made this new code based on this
DDK statement: “Drivers should normally allocate the structure on the
stack each time they acquire the lock”. I have tried working queued
spinlock code before and it was crashing but it was because I was using
the DRIVERTYPE=WDM at that time and I have never tried my first code after
switching to LEGACY type.

Although I know that you guys at OSR don’t like C++, but here we are using
it for 3 years and over 40 differents drivers that do “real time
audio/video editing” and it helped us a lot since we are re-using C++
objects everywhere (this is a big time gain everytime you write a new
driver because you use debugged objects).

Thanks again OSR !!!