Mark,
There’s been a few comments to this already. I’ll add my 0.02p worth…
I am writing an ndis 5.1 driver specifically targeting machines that
support SSE2 ie Pentium4. I have 2 questions:
-
The version of cl.exe packaged with the DDK does not support /G7
as does the .net version of visual C++. Is it possible to build a
driver using the .net version of the compiler? I know it is not
generally qualified for all drivers, but I am targeting an extremely
small subset of the processor space.
-
If my interrupt code will call functions that use SSE2, I
will need
to save/restore the SSE2 context. The provided instructions (FSAVE,
FRSTOR)
save a lot of extra stuff like the FPU and MMX contexts. If only SSE2
is used, is there a quicker, more compact, method?
You probably want to use FXSAVE and FXRSTOR, first of all. FSAVE and FRSTOR
only save MMX/FP registers.
Second, you don’t know what format the xmmX register was loaded with, so
you’ll have penalties for “reformatting” the register, should you store it
with a different format than the previous load. (What I mean is: Say the
xmm2 is loaded as a pair of double precision floats, and you store it as a
set of 32-bit ints, the processor will have to “reformat” the contents to
match integer values. Next when you restore the info, you’ll load it up as
32-bit ints (assuming symetrical store/restore), and when the code
eventually returns to the user of xmm2, it will have to reformat the
contents again.) This type of reformatting is a performance penalty,
although if you’re “lucky” you’ll see little of it at the store phase, and
most of it will happen at the application that is actually using the
register.
Now, it’s probably a bad idea to call SSE2 code from an interrupt in the
first place, but if really have to, use FXSAVE and FXRSTOR. This ensures
that the processor internal state is preserved (including format info, for
example), and that any other data that the processor may want to store is
kept intact. It’s also more future proof, for instance if you’d decide to
compile this for AMD64 that has 16 SSE registers, it would still preserve
the 16 registers, and you don’t have to keep track of which it is in the
code that saves the registers.
Btw, when you say SSE2, do you actually refer to SSE Double Precision
instructions, or are you just going to use SSE instructions in general? I’m
not sure where you’d get double precision calculation needs in an NDIS
driver. SSE has a wide range of useful integer operations, but I’m pretty
sure no new ones were added in SSE2 (could be wrong, I’ve never REALLY used
SSE in anger). Just curious…
I hope this helps.
–
Mats
Thanks, Mark
Mark Roths
SoftAir Microsystems