RE: Inlined assembler and volatile APIC TPR register acce sses running MPS HAL

It turns out that the compiler will move loads off non-volatile pointers across stores off volatile pointers.

Consider:
struct X {
int Foo;
volatile int Bar;
}

f1(X * p)
{
int foo = p->Foo;
f2(p);
… foo …
}

f2(X * p) { p->Bar = 0; }

when f2 is inlined, the compiler may assign Bar before reading Foo.

adding “volatile” in front of “int Foo” works. Why doesn’t
" int foo = (volatile int)p->Foo;"?

If the instructions are ordered as :
p->Bar = 0;
foo = p-> Foo;

the CPU can read a version of Foo that is older than the setting of Bar given processor ordering.

My read of the Intel documents is that if the instructions are ordered like this:
foo = p-> Foo;
p->Bar = 0;
The read of Foo will always preceed the write of Bar, because the write is not executed until
all previous instructions are retired. Correct?

-DH

----- Original Message -----
From: “Moreira, Alberto”
To: “NT Developers Interest List”
Sent: Tuesday, January 22, 2002 10:29 AM
Subject: [ntdev] RE: Inlined assembler and volatile APIC TPR register acce sses running MPS HAL

> Not to do advertising, but have you tried TrueTime ? It’s not a sampling
> tool, it hooks the real code, and you will get as much detail as you set
> your filters to get. You will also be able to put in your own probe points.
>
> On your specific problem, have you looked at the machine code generated by
> your inline functions ? Sometimes there’s code in there that you don’t
> suspect. Also, because Vtune is a sampling tool, you have to run for a while
> before the results really make sense.
>
> Hope this helps !
>
> Alberto.
>
> -----Original Message-----
> From: Dave Harvey [mailto:xxxxx@syssoftsol.com]
> Sent: Monday, January 21, 2002 11:17 PM
> To: NT Developers Interest List
> Subject: [ntdev] Inlined assembler and volatile APIC TPR register
> accesses running MPS HAL
>
>
> I’m doing some CPU performance analysis with VTUNE on some drivers (perhaps
> 700K lines worth), and I’m getting substantial
> amounts
> of time spent in HAL and NT in primitives dealing with locks and IRQLs.
> Since NT (and therefore VTUNE) only
> samples the current PC, and doesn’t walk the stack when taking samples, its
> impossible to know who is calling
> these functions.
>
>
> So I wrote my own inline spin-lock functions, and ran a perl script that
> renamed Ke calls to my function: e,g. KeAcquireSpinLock
> to InlinedKeAcquireSpinLock. Everything still works, and now VTUNE chalks
> up the serialized instruction and cache miss
> to the function that acquires the lock. I’m happy, but I still have IRQL
> manipulation overhead I don’t understand.
>
> Using the HDK, I figure out the vector/IRQL translation when running the
> APIC (MPS) Hal, and I end up with functions that look
> like:
>
> __inline KIRQL InlineKeGetCurrentIrql()
> {
> KIRQL irql;
> # define TPR 0xffef0080 /* some address like this /
> irql = VectorToIrql[(
(volatile ULONG )TPR)>>4];
> return irql;
> }
>__inline VOID InlineKeLowerIrql(KIRQL irql)
> {
> ULONG vector = IrqlToVector[irql];
> (
(volatile ULONG *)TPR) = vector;
> }
>
> __inline VOID InlineKeReleaseSpinLock(PSPIN_LOCK lock, KIRQL irql)
> {
> lock->forgotFieldname = 0; // release lock
> InlineKeLowerIrql(irql);
> }
>
>
> Checked builds work fine, but free builds die within a few seconds. I’m
> assuming the optimizer is nailing me in some subtile
> way, perhaps in conjunction with the inline assembler that I’m using for the
> spin locks so I can use the lock prefix.
>
> How does the CPU know that the Local APIC TPR register access should be a
> serialized instruction? Is this a function of the
> address? I previously had inline assembler with the same symptoms, which
> used the same instruction as the HAL, which explicitly
> specified “DS:”, but isn’t that the default anyway? Am I brain-dead in my
> use of volatile?
>
> Anyone had compiler problems with nesting inline functions with volatile
> pointers perhaps mixed with inline assembler? I think
> I’ve followed all the register saving rules.
>
> I’ve gotten two symptoms:
> - a very simple thread is marked as “Running”, is using CPU time, but it
> never comes out of the context switch routine.
> - Sometimes I panic because I expect to be at passive level, but am in fact
> at DISPATCH_LEVEL.
>
> What happens if you do a KeWaitForSingleObject at DISPATCH_LEVEL on a free
> build? Would you get the first symptom?
>
> -DH
>
>
>
> ----------------------------------------------------------------------------
> ------------------------------
> Dave Harvey, System Software Solutions, Inc.
> 617-964-7039, FAX 208-361-9395, xxxxx@syssoftsol.com,
> http://www.syssoftsol.com
> Creators of RedunDisks - Robust RAID 1 for embedded systems.
>
>
>
>
> ----------------------------------------------------------------------------
> ------------------------------
> Dave Harvey, System Software Solutions, Inc.
> 617-964-7039, FAX 208-361-9395, xxxxx@syssoftsol.com,
> http://www.syssoftsol.com
> Creators of RedunDisks - Robust RAID 1 for embedded systems.
>
>
>
>
>
> —
> You are currently subscribed to ntdev as: xxxxx@compuware.com
> To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com
>
> —
> You are currently subscribed to ntdev as: xxxxx@syssoftsol.com
> To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com


You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com