Not to do advertising, but have you tried TrueTime ? It’s not a sampling
tool, it hooks the real code, and you will get as much detail as you set
your filters to get. You will also be able to put in your own probe points.
On your specific problem, have you looked at the machine code generated by
your inline functions ? Sometimes there’s code in there that you don’t
suspect. Also, because Vtune is a sampling tool, you have to run for a while
before the results really make sense.
Hope this helps !
Alberto.
-----Original Message-----
From: Dave Harvey [mailto:xxxxx@syssoftsol.com]
Sent: Monday, January 21, 2002 11:17 PM
To: NT Developers Interest List
Subject: [ntdev] Inlined assembler and volatile APIC TPR register
accesses running MPS HAL
I’m doing some CPU performance analysis with VTUNE on some drivers (perhaps
700K lines worth), and I’m getting substantial
amounts
of time spent in HAL and NT in primitives dealing with locks and IRQLs.
Since NT (and therefore VTUNE) only
samples the current PC, and doesn’t walk the stack when taking samples, its
impossible to know who is calling
these functions.
So I wrote my own inline spin-lock functions, and ran a perl script that
renamed Ke calls to my function: e,g. KeAcquireSpinLock
to InlinedKeAcquireSpinLock. Everything still works, and now VTUNE chalks
up the serialized instruction and cache miss
to the function that acquires the lock. I’m happy, but I still have IRQL
manipulation overhead I don’t understand.
Using the HDK, I figure out the vector/IRQL translation when running the
APIC (MPS) Hal, and I end up with functions that look
like:
__inline KIRQL InlineKeGetCurrentIrql()
{
KIRQL irql;
define TPR 0xffef0080 /* some address like this */
irql = VectorToIrql[(*(volatile ULONG *)TPR)>>4];
return irql;
}
__inline VOID InlineKeLowerIrql(KIRQL irql)
{
ULONG vector = IrqlToVector[irql];
(*(volatile ULONG *)TPR) = vector;
}
__inline VOID InlineKeReleaseSpinLock(PSPIN_LOCK lock, KIRQL irql)
{
lock->forgotFieldname = 0; // release lock
InlineKeLowerIrql(irql);
}
Checked builds work fine, but free builds die within a few seconds. I’m
assuming the optimizer is nailing me in some subtile
way, perhaps in conjunction with the inline assembler that I’m using for the
spin locks so I can use the lock prefix.
How does the CPU know that the Local APIC TPR register access should be a
serialized instruction? Is this a function of the
address? I previously had inline assembler with the same symptoms, which
used the same instruction as the HAL, which explicitly
specified “DS:”, but isn’t that the default anyway? Am I brain-dead in my
use of volatile?
Anyone had compiler problems with nesting inline functions with volatile
pointers perhaps mixed with inline assembler? I think
I’ve followed all the register saving rules.
I’ve gotten two symptoms:
- a very simple thread is marked as “Running”, is using CPU time, but it
never comes out of the context switch routine. - Sometimes I panic because I expect to be at passive level, but am in fact
at DISPATCH_LEVEL.
What happens if you do a KeWaitForSingleObject at DISPATCH_LEVEL on a free
build? Would you get the first symptom?
-DH
Dave Harvey, System Software Solutions, Inc.
617-964-7039, FAX 208-361-9395, xxxxx@syssoftsol.com,
http://www.syssoftsol.com
Creators of RedunDisks - Robust RAID 1 for embedded systems.
Dave Harvey, System Software Solutions, Inc.
617-964-7039, FAX 208-361-9395, xxxxx@syssoftsol.com,
http://www.syssoftsol.com
Creators of RedunDisks - Robust RAID 1 for embedded systems.
You are currently subscribed to ntdev as: xxxxx@compuware.com
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com
You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com