My oh my it’s educational to debug Windows driver code written by Linux
people.
The driver I’m debugging has some wrapper functions around DDK functions.
The following two are my current focus, as I had a crash that verifier said
was caused by releasing a spinlock and lowering the IRQL to an impossible
value (30 specifically).
CL_INLINE void
cl_spinlock_acquire(
IN cl_spinlock_t* const p_spinlock )
{
CL_ASSERT( p_spinlock );
KeAcquireSpinLock( &p_spinlock->lock, &p_spinlock->irql );
}
CL_INLINE void
cl_spinlock_release(
IN cl_spinlock_t* const p_spinlock )
{
CL_ASSERT( p_spinlock );
KeReleaseSpinLock( &p_spinlock->lock, p_spinlock->irql );
}
The structure cl_spinlock_t contains the Windows spinlock and the old IRQL.
Notice the old IRQL value to KeAquireSpinLock as shared between all callers
of this function. When I write Windows driver code I always make it a local
variable. The DDK docs say “The current IRQL is saved in OldIrql. Then, the
current IRQL is reset to DISPATCH_LEVEL, and the specified spin lock is
acquired.” Which if accurate, means the old IRQL value is stored BEFORE the
lock is acquired?
This would imply:
CPU 1 running at PASSIVE_LEVEL calls cl_spinlock_acquire passing the lock
structure, which saves PASSIVE_LEVEL in the shared old IRQL variable and
gets the spinlock
CPU 2 running at DISPATCH_LEVEL calls cl_spinlock_acquire passing the lock
structure, which saves DISPATCH_LEVEL in the shared old IRQL variable and
spins on the lock
CPU 1 calls cl_spinlock_release passing the lock structure, which
incorrectly restores DISPATCH_LEVEL from the shared old IRQL variable and
releases the spinlock
CPU 2 acquires the lock and continues
CPU 2 calls cl_spinlock_release passing the lock structure, which correctly
restores DISPATCH_LEVEL from the shared old IRQL variable and releases the
spinlock
.bad things happen as CPU 1 is not supposed to be at DISPATCH_LEVEL.
So the question is, does the old IRQL variable passed to KeAquireSpinLock
need to be unique to each caller or not? I looked at the assembler for both
OS functions and it seems like the DDK documentation isn’t correct for my
x86 processor, although perhaps on some processor it is correct. For an x86,
the lock prefix in those OS functions should cause a memory barrier, and
prevent the old IRQL from getting updated if the lock isn’t acquired. The
semantics of the DDK docs is safer, although not helping me determine if
this code is broken. The DDK docs don’t say the old IRQL needs to be a
local, even though the description implies it does to prevent the above
problem.
Opinions?
- Jan