Let me throw in another warning. There is problem with intel FPUs when
either operand is denormal. They fully support denormal values but can take
more than 100 times longer to do the calcuation than with normalised values.
Denormal values occur frequently and especially if you are working with
recursive filters they can persist forever. All processes should be tested
with denormal input to see if it slows the driver below acceptable levels. A
google search will find workarounds, mostly a little painful.
M
----- Original Message -----
From: xxxxx@hushmail.com
To: Windows System Software Devs Interest List
Sent: Tuesday, June 05, 2007 8:29 PM
Subject: RE:[ntdev] Floating-point vs. fixed-point arithmetic
Tim Roberts wrote:
The key question to answer is whether the overhead of
KeSave/RestoreFloatingPointState will overwhelm the added cost of doing
the computations in fixed point. Saving the floating point state is
somewhat expensive. If each request needs three floating point
instructions, then it’s probably better to rewrite it. But if you have
60 lines of floating point code in an inner loop somewhere, then the
overhead will be neatly amortized.
Yes, I have had similar thoughts.
If you choose floating-point arithmetic, you have the
KeSave/RestoreFloatingPointState overhead and inefficient float to int
truncation (x87’s default is rounding).
If you choose fixed-point arithmetic, you have inefficient multiplication,
division and most importantly square root computuation.
(According to some postings on the Intel forums, modern FPUs are faster at
multiplication and division than modern ALUs - especially with SIMD)
Doron Holan wrote:
Having written exactly such a driver, you should use fixed point math
and forgo FP, especially in a timing sensitive stack like mouse input.
I have done some tests with relatively weak machines (P3, P4) and couldn’t
measure any significant differences between both methods.
Peter Viscarola wrote:
Unless you’re writing a very special driver for a very special, limited
use,
device you should forget about using traditional floating point: It’s not
supported on x64 (and if you intend to WHQL this driver, you care… cuz
you can
no longer WHQL just a 32-bit driver).
I’m afraid I don’t understand your objection; the source code compiles just
fine for AMD64 (using SSE2, of course).
Additionally, the registers are automatically context switched, so no more
overhead.
Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256
To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer