KeQueryPerformanceCounter and RDTSC

Hello,

I’ve been experimenting with high-precision timing for gaming and have been using both KeQueryPerformanceCounter and RDTSC for this purpose. Both seem to deliver fantastic precision at the nanosecond level.

However, I heard that KeQueryPerformanceCounter may have slightly more overhead compared to RDTSC. Can anyone confirm or deny this from their own experience?

Also, I’m interested in implementing IA32_TSC_DEADLINE in Windows. I checked the CPUID and bit 24 is returning 1, indicating that this feature is available, but I am not sure if Windows supports it. Has anyone been able to utilize IA32_TSC_DEADLINE in a Windows environment, or can provide any insights about this?

Lastly, I have heard about a timer in AMD’s RDPRU that’s supposed to be faster than RDTSC. Is this correct, and has anyone used this timer for game timing?

Looking forward to your insights. thanks

I’ve used it only sparingly, but here is a detailed article which may provide the answers you seek: [https://learn.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps]

The __rdtsc() intrinsic compiles to exactly one CPU instruction. It is not possible to find anything faster.

The problem with RDTSC is and has always been that the register is not synchronized across multiple cores. If your process happens to change CPUs, you might get a discrepancy. The other problem is that you don’t necessarily know at what frequency it is running.

Correct. Or, more precisely, you don’t know if rdtsc synchronizes properly among all the cores in your system.

For a long time, rdtsc has worked “correctly” across all cores on one chip/socket. The problem now is that in systems with multiple sockets, the tsc is not synchronized.

Also… if you need to do this right, you need to be very sure you know what is being timed. For example, whether prefetches are included. Google around… there is a ton of really interesting info on this question out there.

Also, I’m interested in implementing IA32_TSC_DEADLINE in Windows.

You need to forget about that. The lowest-level timer interrupts are under the complete control of the operating system scheduler. If you dink with it, it’s likely that scheduling will come crashing down.

That bit is designed for people writing their own operating system.

@Tim_Roberts said:

Also, I’m interested in implementing IA32_TSC_DEADLINE in Windows.

You need to forget about that. The lowest-level timer interrupts are under the complete control of the operating system scheduler. If you dink with it, it’s likely that scheduling will come crashing down.

That bit is designed for people writing their own operating system.

i see windows cant handle that scheduling thanks !