Inlined assembler and volatile APIC TPR register accesses running MPS HAL

I’m doing some CPU performance analysis with VTUNE on some drivers (perhaps 700K lines worth), and I’m getting substantial
amounts
of time spent in HAL and NT in primitives dealing with locks and IRQLs. Since NT (and therefore VTUNE) only
samples the current PC, and doesn’t walk the stack when taking samples, its impossible to know who is calling
these functions.

So I wrote my own inline spin-lock functions, and ran a perl script that renamed Ke calls to my function: e,g. KeAcquireSpinLock
to InlinedKeAcquireSpinLock. Everything still works, and now VTUNE chalks up the serialized instruction and cache miss
to the function that acquires the lock. I’m happy, but I still have IRQL manipulation overhead I don’t understand.

Using the HDK, I figure out the vector/IRQL translation when running the APIC (MPS) Hal, and I end up with functions that look
like:

__inline KIRQL InlineKeGetCurrentIrql()
{
KIRQL irql;

define TPR 0xffef0080 /* some address like this */

irql = VectorToIrql[(*(volatile ULONG *)TPR)>>4];
return irql;
}
__inline VOID InlineKeLowerIrql(KIRQL irql)
{
ULONG vector = IrqlToVector[irql];
(*(volatile ULONG *)TPR) = vector;
}

__inline VOID InlineKeReleaseSpinLock(PSPIN_LOCK lock, KIRQL irql)
{
lock->forgotFieldname = 0; // release lock
InlineKeLowerIrql(irql);
}

Checked builds work fine, but free builds die within a few seconds. I’m assuming the optimizer is nailing me in some subtile
way, perhaps in conjunction with the inline assembler that I’m using for the spin locks so I can use the lock prefix.

How does the CPU know that the Local APIC TPR register access should be a serialized instruction? Is this a function of the
address? I previously had inline assembler with the same symptoms, which used the same instruction as the HAL, which explicitly
specified “DS:”, but isn’t that the default anyway? Am I brain-dead in my use of volatile?

Anyone had compiler problems with nesting inline functions with volatile pointers perhaps mixed with inline assembler? I think
I’ve followed all the register saving rules.

I’ve gotten two symptoms:

  • a very simple thread is marked as “Running”, is using CPU time, but it never comes out of the context switch routine.
  • Sometimes I panic because I expect to be at passive level, but am in fact at DISPATCH_LEVEL.

What happens if you do a KeWaitForSingleObject at DISPATCH_LEVEL on a free build? Would you get the first symptom?

-DH


Dave Harvey, System Software Solutions, Inc.
617-964-7039, FAX 208-361-9395, xxxxx@syssoftsol.com, http://www.syssoftsol.com
Creators of RedunDisks - Robust RAID 1 for embedded systems.


Dave Harvey, System Software Solutions, Inc.
617-964-7039, FAX 208-361-9395, xxxxx@syssoftsol.com, http://www.syssoftsol.com
Creators of RedunDisks - Robust RAID 1 for embedded systems.


You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

> to the function that acquires the lock. I’m happy, but I still have IRQL manipulation overhead I
don’t understand.

Using the HDK, I figure out the vector/IRQL translation when running the APIC (MPS) Hal,
and I end up with functions that look

I think this is wrong to implement your own IRQL change mechanism, since it is HAL-dependent, and you will end with
not-so-compatible driver.

Max


You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

----- Original Message -----
From: “Maxim S. Shatskih”
To: “NT Developers Interest List”
Sent: Tuesday, January 22, 2002 2:09 AM
Subject: [ntdev] Re: Inlined assembler and volatile APIC TPR register accesses running MPS HAL

> > to the function that acquires the lock. I’m happy, but I still have IRQL manipulation overhead I
> don’t understand.
> >
> > Using the HDK, I figure out the vector/IRQL translation when running the APIC (MPS) Hal,
> and I end up with functions that look
>
> I think this is wrong to implement your own IRQL change mechanism, since it is HAL-dependent, and you will end with
> not-so-compatible driver.
I’m not trying to replace the Hal, I’m just trying to get reasonable performance numbers.
-DH

>
> Max
>
>
>
> —
> You are currently subscribed to ntdev as: xxxxx@syssoftsol.com
> To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com


You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com