Issues with sse operations

hello i running into issues with using double types for floating-point arithmetic. When I debug my code with godbolt the operations on double types seem to revert to cvtsi2sd, and it affects the entire driver.

To work around this, I had to create a separate thread using IoCreateSystemThread and perform all floating-point operations within that thread. This solution works, but I'm curious why this issue occurs and if there are any better practices or insights into handling floating-point arithmetic in kernel-mode drivers. Can anyone explain why this happens and if there are any recommended approaches to handle this more effectively? Thanks!

I have no idea what you mean by "affects the entire driver". In 64-bit code, the compiler should use SSE for floating point, and the SSE registers are saved and restored on context switch.

Sorry if I didn't explain it correctly. When I say it affects the entire driver, I mean that the floating-point operations using SSE impact the system-wide behavior, particularly causing issues with other drivers In 64-bit code, the compiler does use SSE for floating point, and the SSE registers are indeed saved and restored on context switch. However, despite this, performing floating-point operations directly in the kernel can still cause stability issues and affect other drivers. To mitigate this, I've resorted to creating a separate thread using IoCreateSystemThread to handle all floating-point operations

A quick search suggests that godbolt is some kind of online tool that shows likely assembly from snippets of C++ (and other languages) from a list of compilers. This does not seem like a good too to look into a problem like this because it is unlikely that the displayed output actually matches the instruction stream generated by the real compiler.

As to why your code sequence works differently when run in a dedicated thread, we're going to need a bunch more information before anyone can help very much. Probably the first thing is what type of driver is this? Which are the entry points where it doesn't work? What compiler & linker options are you using?

I’m working on involves dividing the cycle count by the CPU's frequency. This division yields two values: time_seconds (the whole seconds) and time_fraction (the fractional second part). To avoid precision loss due to very high cycle counts, I've implemented a mechanism where the timer wraps more frequently (every year). This ensures that the fractional part of the time remains accurate The driver is a kernel-mode driver for timing functions, utilizing the rdtsc (Read Time-Stamp Counter) instruction. Regarding the issues:

  • Type of Driver: Kernel-mode driver for timing functions.
  • Entry Points: The problems occur during timing calculations that rely on floating-point arithmetic.
  • Compiler & Linker Options:
    • Compiler: MSVC (Microsoft Visual C++)
    • Options: ` /D "USE_LIBCNTPR=1" /std:c17 /GR- /Gz /Oy- /Oi /MT /std:c++20 /fp:precise
    • Linker: Standard kernel-mode driver linker settings provided by the WDK include libcntpr.lib

Normally, performance monitoring software will collect data in KM, but then store it or transfer it to UM for analysis. This helps to reduce the effect on the values that you are trying to measure, but the measuring, and simplifies the implementation of the software

In KM, you can't just say 'places I want to watch'. You need to understand the thread context. Standard entry points can help us to understand where you are expecting your code to execute. If they are not standard, because say, you are trying to benchmark arbitrary code, then that's a factor too.

Also, I would be remiss if I didn't suggest that you probably don't want to be using floating point at all for this kind of calculation. And /fp:precise isn't going to help you.

All IEEE floating point formats suffer from numeric instability when performing operations on values that differ significantly in scale. The error is hard to quantify, and so for many years fixed point alternatives have existed. Even .Net has included the decimal type since its inception about 20 years ago because of this problem.

thank you very much i will stick with creating a separate thread since i want to do all inside the kernel :hugs:

well, if you are happy then sure. But it seems likely that you will have to deal with the other points eventually to produce a successful product