It’s a strange calling convention, unfortunately:
You can use SSE intrinsics that the compiler provides without fear on x64 Windows in any context (XMM registers are part of the calling convention just like the standard GPRs, so you can treat them as such). Don’t try to use non-default rounding/exception modes by changing the standard value of the MxCSR register.
If you want to use YMM registers, you have to call KeSaveExtendedProcessorState(AVX_MASK). You can then use the compiler-provided intrinsics to do the processing you need. I don’t think you can call the Save/restore APIs above DISPATCH_LEVEL, but you can use them in DPCs, etc.
If you want to write assembly code, you have to follow the calling convention, including emitting the proper unwind codes. What this means is that you have to call KeSaveExtendedProcessorState, and then also save/restore the corresponding XMM register to any YMM register you use.
I wouldn’t recommend going beyond compiler intrinsics since it is nontrivial to write proper, unwindable x64 assembly code, and getting it right is important for kernelmode. If it is still considered valuable, I can give an example of proper SSE asm code, however.
-Neeraj
Windows Kernel Core Team
-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@osr.com
Sent: Monday, July 19, 2010 7:59 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Use of SSE instructions from an arbitrary thread context
Hmmmm…
The SSE registers are separate from the X87/FPP registers, so on x64 calling KeSaveFloatinPointState is unnecessary and, IIRC, actually does nothing.
You’re coding these SSE operations yourself, right? And you’re using Win7?? Are you using the XMM registers or the YMM (AVX) registers?? If you’re using the YMM registers, you have to explicitly do a KeSaveExtendedProcessorState around the operations.
I wouldn’t be terribly surprised to find that you have to save/restore the SSE registers in your DPC. If you think about it, to avoid this Windows would have to save/register the XMM registers around every DPC… and I’d be surprised to find it did that. I definitely don’t remember seeing any code that does it, now that I think of it.
Since you’re on Win7, you can try calling RtlGetEnabledExtendedFeatures during your driver’s initialization, and saving away the returned feature mask. Then call KeSaveExtendedProcessorState using that same feature mask around your SSE/AVX operations – both of these functions are new starting in Win7.
Sorry to provide such a hazy reply… Your’s is an interesting question and the WDK docs are particularly bad in their handling of XMM/YMM information. I’m curious to see the outcome,
Peter
OSR
NTDEV is sponsored by OSR
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer