Floating-point vs. fixed-point arithmetic

Alyah_Nihal · June 5, 2007, 11:53am

Hello,

I’d like to hear your opinions about floating-point operations in kernel mode.

I’m currently writing a filter driver, which applies some rather simple algorithms to incoming data (mostly mul & div, but also one hypot/sqrt).
While precision is not that important, all of them require decimal fractions.
Theoretically, I could switch to fixed-point arithmetic, but then I’d often have to use int64s instead of float32s because of potential overflows.
I think floating-point arithmetic is faster on todays processors.

According to Microsoft’s documentation of KeSaveFloatingPointState, “drivers should avoid doing any floating-point operations unless absolutely necessary”.
How do you interpret the “absolutely necessary” part?

Thanks, Alyah.

Doron_Holan · June 5, 2007, 12:02pm

What stack are you filtering? Is it possible to perform this in a user
mode helper application?

d

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@hushmail.com
Sent: Tuesday, June 05, 2007 8:55 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Floating-point vs. fixed-point arithmetic

Hello,

I’d like to hear your opinions about floating-point operations in kernel
mode.

I’m currently writing a filter driver, which applies some rather simple
algorithms to incoming data (mostly mul & div, but also one hypot/sqrt).
While precision is not that important, all of them require decimal
fractions.
Theoretically, I could switch to fixed-point arithmetic, but then I’d
often have to use int64s instead of float32s because of potential
overflows.
I think floating-point arithmetic is faster on todays processors.

According to Microsoft’s documentation of KeSaveFloatingPointState,
“drivers should avoid doing any floating-point operations unless
absolutely necessary”.
How do you interpret the “absolutely necessary” part?

Thanks, Alyah.

Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Alyah_Nihal · June 5, 2007, 12:14pm

The filter is on the Mouclass stack.
I guess you could do the computations in user mode, but that’d be even more inefficient (and inelegant).
Some mouse devices send input every millisecond.

Thanks for your reply.

Tim_Roberts · June 5, 2007, 12:41pm

xxxxx@hushmail.com wrote:

I’d like to hear your opinions about floating-point operations in kernel mode.

I’m currently writing a filter driver, which applies some rather simple algorithms to incoming data (mostly mul & div, but also one hypot/sqrt).
While precision is not that important, all of them require decimal fractions.
Theoretically, I could switch to fixed-point arithmetic, but then I’d often have to use int64s instead of float32s because of potential overflows.
I think floating-point arithmetic is faster on todays processors.

According to Microsoft’s documentation of KeSaveFloatingPointState, “drivers should avoid doing any floating-point operations unless absolutely necessary”.
How do you interpret the “absolutely necessary” part?

The key question to answer is whether the overhead of
KeSave/RestoreFloatingPointState will overwhelm the added cost of doing
the computations in fixed point. Saving the floating point state is
somewhat expensive. If each request needs three floating point
instructions, then it’s probably better to rewrite it. But if you have
60 lines of floating point code in an inner loop somewhere, then the
overhead will be neatly amortized.

I interpret “unless absolute necessary” to mean “unless it is terribly
inconvenient to do otherwise.” It certainly works; I just did an beam
forming and noise reduction filter driver for a client where I was not
allowed to have the source for their computation engine – just a
library. It worked fine. I saved/restored the state and wrapped the
calls in __try/__except and everyone was happy.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Doron_Holan · June 5, 2007, 12:41pm

Having written exactly such a driver, you should use fixed point math
and forgo FP, especially in a timing sensitive stack like mouse input.

d

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@hushmail.com
Sent: Tuesday, June 05, 2007 9:15 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Floating-point vs. fixed-point arithmetic

The filter is on the Mouclass stack.
I guess you could do the computations in user mode, but that’d be even
more inefficient (and inelegant).
Some mouse devices send input every millisecond.

Thanks for your reply.

Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Peter_Viscarola_OSR · June 5, 2007, 1:56pm

Unless you’re writing a very special driver for a very special, limited use, device you should forget about using traditional floating point: It’s not supported on x64 (and if you intend to WHQL this driver, you care… cuz you can no longer WHQL just a 32-bit driver).

Of course, for things like signal processing, you *can* use the SIMD (SSE) instructions. But that seems to be waaaaay beyond what we’re talking about here.

Peter
OSR

Alyah_Nihal · June 5, 2007, 3:28pm

Tim Roberts wrote:

The key question to answer is whether the overhead of
KeSave/RestoreFloatingPointState will overwhelm the added cost of doing
the computations in fixed point. Saving the floating point state is
somewhat expensive. If each request needs three floating point
instructions, then it’s probably better to rewrite it. But if you have
60 lines of floating point code in an inner loop somewhere, then the
overhead will be neatly amortized.

Yes, I have had similar thoughts.
If you choose floating-point arithmetic, you have the KeSave/RestoreFloatingPointState overhead and inefficient float to int truncation (x87’s default is rounding).
If you choose fixed-point arithmetic, you have inefficient multiplication, division and most importantly square root computuation.
(According to some postings on the Intel forums, modern FPUs are faster at multiplication and division than modern ALUs - especially with SIMD)

Doron Holan wrote:

Having written exactly such a driver, you should use fixed point math
and forgo FP, especially in a timing sensitive stack like mouse input.

I have done some tests with relatively weak machines (P3, P4) and couldn’t measure any significant differences between both methods.

Peter Viscarola wrote:

Unless you’re writing a very special driver for a very special, limited use,
device you should forget about using traditional floating point: It’s not
supported on x64 (and if you intend to WHQL this driver, you care… cuz you can
no longer WHQL just a 32-bit driver).

I’m afraid I don’t understand your objection; the source code compiles just fine for AMD64 (using SSE2, of course).
Additionally, the registers are automatically context switched, so no more overhead.

Tim_Roberts · June 5, 2007, 4:16pm

xxxxx@osr.com wrote:

Unless you’re writing a very special driver for a very special, limited use, device you should forget about using traditional floating point: It’s not supported on x64

Where do you see that? I thought that this whole problem went away on
x64, specifically because the floating point registers ARE now saved
during a kernel context switch, so that KeSaveFloatingPointState is a no-op.

As another poster said, the empirical evidence suggests that it works
just fine. Our floating-point based audio filter runs happily on Vista
64. It is in WHQL now, and we expect approval any time.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Peter_Viscarola_OSR · June 5, 2007, 4:47pm

No. Sorry. We’ve talked about this before on this list. Several times.

X87 style floating point IS NOT supported in the kernel on x64. Neither are MMX instructions.

SSE/SSE2? Yes. x87 floating point, MMX, 3DNow? No.

If it works on Vista, you’re just lucky that nobody else is using the FPU.

From OSR Online:

http://www.osronline.com/article.cfm?article=244

(though I don’t suppose citing OSR Online will be convincing…)

So:

http://msdn2.microsoft.com/en-us/library/a32tsf7t(vs.80).aspx

http://www.amd.com/us-en/assets/content_type/DownloadableAssets/dwamd_Porting_Win_DD_to_AMD64_Sept24.pdf

Peter
OSR

Tim_Roberts · June 5, 2007, 5:23pm

xxxxx@osr.com wrote:

No. Sorry. We’ve talked about this before on this list. Several times.

X87 style floating point IS NOT supported in the kernel on x64. Neither are MMX instructions.

SSE/SSE2? Yes. x87 floating point, MMX, 3DNow? No.

If it works on Vista, you’re just lucky that nobody else is using the FPU.

I think we are talking about two separate things here. I don’t care
about x87-style floating point instructions or inline assembler, which
is what your references mention. Those are no-brainers. When I say
“floating point in the kernel”, what I’m talking about is whatever the
amd64 compiler generates when I write C code to do floating point
computations. As near as I can tell, the amd64 compiler generates
XMM-based instruction sequences.

Given that, the restriction against floating point C code in the kernel
simply no longer applies in x64.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Peter_Viscarola_OSR · June 5, 2007, 5:27pm

Ah, perhaps (talking about different things)… that’d account for it. “What you see is a function of where you sit” I always say.

I do know that the CRTL was changed for x64 to use SSE/SSE2 (SIMD) instructions instead of native FPP x87 style instructions. Assuming your C code appropriately incorporates these libraries, you are indeed completely okey dokey.

Peter
OSR

OSR_Community_User · June 5, 2007, 10:03pm

Just out of curiosity, why aren’t the floating point and MMX
registers context switched? I understand from a historical
perspective, but I wonder why it hasn’t changed.

-J

At 10:48 AM 6/5/2007, you wrote:

No. Sorry. We’ve talked about this before on this list. Several times.

X87 style floating point IS NOT supported in the kernel on
x64. Neither are MMX instructions.

SSE/SSE2? Yes. x87 floating point, MMX, 3DNow? No.

If it works on Vista, you’re just lucky that nobody else is using the FPU.

From OSR Online:

http://www.osronline.com/article.cfm?article=244

(though I don’t suppose citing OSR Online will be convincing…)

So:

http://msdn2.microsoft.com/en-us/library/a32tsf7t(vs.80).aspx

http://www.amd.com/us-en/assets/content_type/DownloadableAssets/dwamd_Porting_Win_DD_to_AMD64_Sept24.pdf

Peter
OSR

Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · June 5, 2007, 10:06pm

I that think answers my question… you can do floating point in
your x64 C code, just not using any legacy methodology. Is that right?

-J

At 11:22 AM 6/5/2007, you wrote:

\amd64 compiler generates when I write C code to do floating point
computations. As near as I can tell, the amd64 compiler generates
XMM-based instruction sequences.

Given that, the restriction against floating point C code in the kernel
simply no longer applies in x64.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · June 5, 2007, 10:31pm

There are a large number of them and saving registers on every context switch is expensive. While user-mode code can’t be trusted to maintain its own processor state the theory was that kernel-mode code could be involved in indicating when the registers were being dirtied.

-p

From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of zeppelin@io.com
Sent: Tuesday, June 05, 2007 7:00 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Floating-point vs. fixed-point arithmetic

Just out of curiosity, why aren’t the floating point and MMX registers context switched? I understand from a historical perspective, but I wonder why it hasn’t changed.

-J

At 10:48 AM 6/5/2007, you wrote:

No. Sorry. We’ve talked about this before on this list. Several times.

X87 style floating point IS NOT supported in the kernel on x64. Neither are MMX instructions.

SSE/SSE2? Yes. x87 floating point, MMX, 3DNow? No.

If it works on Vista, you’re just lucky that nobody else is using the FPU.

From OSR Online:

http://www.osronline.com/article.cfm?article=244

(though I don’t suppose citing OSR Online will be convincing…)

So:

http://msdn2.microsoft.com/en-us/library/a32tsf7t(vs.80).aspx

http://www.amd.com/us-en/assets/content_type/DownloadableAssets/dwamd_Porting_Win_DD_to_AMD64_Sept24.pdf

Peter
OSR

Questions? First check the Kernel Driver FAQ at http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Questions? First check the Kernel Driver FAQ at http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Mike_Kemp · June 6, 2007, 3:48am

Let me throw in another warning. There is problem with intel FPUs when
either operand is denormal. They fully support denormal values but can take
more than 100 times longer to do the calcuation than with normalised values.
Denormal values occur frequently and especially if you are working with
recursive filters they can persist forever. All processes should be tested
with denormal input to see if it slows the driver below acceptable levels. A
google search will find workarounds, mostly a little painful.

M
----- Original Message -----
From: xxxxx@hushmail.com
To: Windows System Software Devs Interest List
Sent: Tuesday, June 05, 2007 8:29 PM
Subject: RE:[ntdev] Floating-point vs. fixed-point arithmetic

Tim Roberts wrote:

The key question to answer is whether the overhead of
KeSave/RestoreFloatingPointState will overwhelm the added cost of doing
the computations in fixed point. Saving the floating point state is
somewhat expensive. If each request needs three floating point
instructions, then it’s probably better to rewrite it. But if you have
60 lines of floating point code in an inner loop somewhere, then the
overhead will be neatly amortized.

Yes, I have had similar thoughts.
If you choose floating-point arithmetic, you have the
KeSave/RestoreFloatingPointState overhead and inefficient float to int
truncation (x87’s default is rounding).
If you choose fixed-point arithmetic, you have inefficient multiplication,
division and most importantly square root computuation.
(According to some postings on the Intel forums, modern FPUs are faster at
multiplication and division than modern ALUs - especially with SIMD)

Doron Holan wrote:

Having written exactly such a driver, you should use fixed point math
and forgo FP, especially in a timing sensitive stack like mouse input.

I have done some tests with relatively weak machines (P3, P4) and couldn’t
measure any significant differences between both methods.

Peter Viscarola wrote:

Unless you’re writing a very special driver for a very special, limited
use,
device you should forget about using traditional floating point: It’s not
supported on x64 (and if you intend to WHQL this driver, you care… cuz
you can
no longer WHQL just a 32-bit driver).

I’m afraid I don’t understand your objection; the source code compiles just
fine for AMD64 (using SSE2, of course).
Additionally, the registers are automatically context switched, so no more
overhead.

Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Peter_Viscarola_OSR · June 6, 2007, 10:15am

Peter’s answer is, of course, correct for the general case – Why would we want to save all the FP registers on every context switch when so few processes actually use them?

In terms of why “legacy” FPP is not supported in native x64 code: According to the (very senior) folks at both AMD and MSFT I talked to about this, they specifically decided not to support MMX/3DNow/x87 under Windows in x64 code. I seem to recall that this had *something* to do with the relative performance of legacy x87 FPP code on the x64, and how the SSE-type instructions were highly optimized.

Correct.
As long as the code you generate uses SSE/SSE2 for all FP operations, you’re absolutely safe.

Peter
OSR

Tim_Roberts · June 6, 2007, 1:08pm

zeppelin@io.com wrote:

I that think answers my question… you can do floating point in your
x64 C code, just not using any legacy methodology. Is that right?

I believe that to be correct, yes. “Legacy methodology”, in this case,
would have to mean a separate .asm file, since there is no other way of
generating x87 FPU instructions in x64 code.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.