Interesting issue with (Ke)QueryPerformanceCounter

Bummer

On Sep 1, 2010 5:16 PM, “Gianluca Varenni”
wrote:
> Just checked with the customer. Nope. It’s an HP 8000 Elite.
>
> Have a nice day
> GV
>
> --------------------------------------------------
> From: “M. M. O’Brien”
> Sent: Wednesday, September 01, 2010 1:49 PM
> To: “Windows System Software Devs Interest List”
> Subject: RE: RE:[ntdev] Interesting issue with (Ke)QueryPerformanceCounter
>
>> Is this running in a VM?
>>
>>
>> mm
>>
>> -----Original Message-----
>> From: xxxxx@lists.osr.com
>> [mailto:xxxxx@lists.osr.com] On Behalf Of Gianluca Varenni
>> Sent: Wednesday, September 01, 2010 4:27 PM
>> To: Windows System Software Devs Interest List
>> Subject: Re: RE:[ntdev] Interesting issue with
(Ke)QueryPerformanceCounter
>>
>> One thing is sleeping less than exactly one second, one other thing is
>> reporting numbers that are almost 10 times off.
>>
>> The test runs for about 10 seconds. The difference between the first and
>> the
>> last sample is around 9 seconds (remember that I print the result of
>> GetSystemTime). QueryPerformanceCounter reports that the time difference
>> is
>> 1.17 seconds.
>>
>> GV
>>
>>
>> --------------------------------------------------
>> From:
>> Sent: Wednesday, September 01, 2010 1:17 PM
>> To: “Windows System Software Devs Interest List”
>> Subject: RE:[ntdev] Interesting issue with (Ke)QueryPerformanceCounter
>>
>>>> Has anyone ever seen this?
>>>
>>> We had a discussion with Michal around a week ago about something
>>> similar.
>>
>>> Basically, you can sleep less than the amount of time you have specified
>>> under some circumstances. You can check "Some doubts about
>>> KeStallExecutionProcessor and KeDelayExecutionThread’ thread
>>> (http://www.osronline.com/showThread.cfm?link=187626) for more info…
>>>
>>>
>>> Anton Bassov
>>>
>>> —
>>> NTDEV is sponsored by OSR
>>>
>>> For our schedule of WDF, WDM, debugging and other seminars visit:
>>> http://www.osr.com/seminars
>>>
>>> To unsubscribe, visit the List Server section of OSR Online at
>>> http://www.osronline.com/page.cfm?name=ListServer
>>
>>
>> —
>> NTDEV is sponsored by OSR
>>
>> For our schedule of WDF, WDM, debugging and other seminars visit:
>> http://www.osr.com/seminars
>>
>> To unsubscribe, visit the List Server section of OSR Online at
>> http://www.osronline.com/page.cfm?name=ListServer
>>
>>
>> —
>> NTDEV is sponsored by OSR
>>
>> For our schedule of WDF, WDM, debugging and other seminars visit:
>> http://www.osr.com/seminars
>>
>> To unsubscribe, visit the List Server section of OSR Online at
>> http://www.osronline.com/page.cfm?name=ListServer
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Gianluca Varenni wrote:

A customer of mine reported a strange issue, and I tracked the issue down to
wrong values returned by KeQueryPerformanceCounter (or the user counterpart
QueryPerformanceCounter/Frequency).

What I get on some specific customer machine is something that is not “ticks
per second”. Either the frequency is bogus, or the counter itself.

If you make some computations, the counter seems to say that every iteration
is about 0.12 seconds apart from the previous one, while it should be more
or less 1 second.
The failing machine is running XP SP3 (hal.dll 5.1.2600.5687, ntoskrnl.exe
5.1.2600.5938) on an Intel Core 2 Duo E8500.

Has anyone ever seen this?

QPC is, in this case, directly reading the processor’s cycle counter.
The processor is running at 3.1 GHz. If the machine is not doing very
much, it’s possible for the cycle frequency to change because of power
management. I have not seen differences quite this dramatic, but it
would be interesting to change your program to do __readtsc() during
each loop and see if the results are identical to QPC.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Hello,

I can’t offer an explanation either, but have you already tried running
‘bcdedit /set useplatformclock yes’ (or setting /USEPMTIMER in boot.ini),
the standard workaround for QPC-related problems?

On Wed, 01 Sep 2010 22:34:23 +0300, Gianluca Varenni
wrote:
> Hi all.
>
> A customer of mine reported a strange issue, and I tracked the issue
> down to wrong values returned by KeQueryPerformanceCounter (or the user
> counterpart QueryPerformanceCounter/Frequency).
>
> The MSDN docs clearly state that
>
> KeQueryPerformanceCounter always returns a 64-bit integer that
> represents the current value of a monotonically nondecreasing counter.
> The resolution of the timer used to accumulate the current count can be
> obtained by specifying a pointer value for PerformanceFrequency. The
> value that is returned is the number of ticks per second. The count
> starts accumulating when the computer is started up.
>
>
> What I get on some specific customer machine is something that is not
> “ticks per second”. Either the frequency is bogus, or the counter itself.
>
> I wrote a small user app to test it:
>
> int _tmain(int argc, _TCHAR* argv)
> {
> int i;
> LARGE_INTEGER freq, counter;
> SYSTEMTIME systemTime;
>
> for (i = 0; i < 10; i++)
> {
> GetSystemTime(&systemTime);
>
> if (QueryPerformanceCounter(&counter) == FALSE ||
> QueryPerformanceFrequency(&freq) == FALSE)
> {
> printf(“Error getting performance counters\n”);
> }
>
> printf(“Counter = %I64d Freq = %I64d”, counter.QuadPart,
> freq.QuadPart);
> printf(" system: %02d:%02d:%02d\n",
> systemTime.wHour,
> systemTime.wMinute,
> systemTime.wSecond);
>
> Sleep(1000);
> }
>
> return 0;
> }
>
> And this is the output on the failing machine:
>
> Counter = 56497408135970 Freq = 3158750000 system: 01:18:57
> Counter = 56497818110218 Freq = 3158750000 system: 01:18:58
> Counter = 56498206404250 Freq = 3158750000 system: 01:18:59
> Counter = 56498715528269 Freq = 3158750000 system: 01:19:00
> Counter = 56499113984213 Freq = 3158750000 system: 01:19:01
> Counter = 56499489120527 Freq = 3158750000 system: 01:19:02
> Counter = 56499867623469 Freq = 3158750000 system: 01:19:03
> Counter = 56500252073478 Freq = 3158750000 system: 01:19:04
> Counter = 56500713252945 Freq = 3158750000 system: 01:19:05
> Counter = 56501096560332 Freq = 3158750000 system: 01:19:06
>
> If you make some computations, the counter seems to say that every
> iteration is about 0.12 seconds apart from the previous one, while it
> should be more or less 1 second.
> The failing machine is running XP SP3 (hal.dll 5.1.2600.5687,
> ntoskrnl.exe 5.1.2600.5938) on an Intel Core 2 Duo E8500.
>
> Has anyone ever seen this?
>
>
> Have a nice day
> GV

I suggest you instead of sleep() run a tight loop waiting for the time to jump to the next second and see if the performance counter will behave better.

In other words, the PerfCount may run with the claimed speed only when you’re actually doing something. If the processor is idle, then power management may kick in

Newer AMD and Intel processors, including the Core 2 family, update the
TSC at a constant rate, so CPU throttling cannot be the culprit here.

(Sorry for mentioning BCDEdit in the first place. I had originally missed
that the machine is running Windows XP.)

On Thu, 02 Sep 2010 05:41:09 +0300, wrote:
> I suggest you instead of sleep() run a tight loop waiting for the time
> to jump to the next second and see if the performance counter will
> behave better.
>
> In other words, the PerfCount may run with the claimed speed only when
> you’re actually doing something. If the processor is idle, then power
> management may kick in

I think the problem is that QPC is broken on Windows XP and Vista. The CPU
guides clearly state that RDTSC is not a serializing instruction and is
subject to out of order execution among other things and the various
implementations on Windows don’t do that. (On Win7 RDTSC is no longer used
possibly depending on available hardware features).

On XP SP3:
hal!KeQueryPerformanceCounter:
806ebb94 8bff mov edi,edi
806ebb96 55 push ebp
806ebb97 8bec mov ebp,esp
806ebb99 5d pop ebp
806ebb9a ff2548156f80 jmp dword ptr [hal!HalpHeapStart+0xc
(806f1548)]
0: kd> t
hal!HalpAcpiTimerQueryPerfCount:
806e4c78 a0dcd06e80 mov al,byte ptr [hal!HalpUse8254 (806ed0dc)]
806e4c78 a0dcd06e80 mov al,byte ptr [hal!HalpUse8254 (806ed0dc)]
806e4c7d 0ac0 or al,al
806e4c7f 752d jne hal!HalpAcpiTimerQueryPerfCount+0x36
(806e4cae)
806e4c81 8b4c2404 mov ecx,dword ptr [esp+4]
806e4c85 0bc9 or ecx,ecx
806e4c87 7412 je hal!HalpAcpiTimerQueryPerfCount+0x23
(806e4c9b)
806e4c89 64a1a4000000 mov eax,dword ptr fs:[000000A4h]
806e4c8f 648b15a8000000 mov edx,dword ptr fs:[0A8h]
806e4c96 8901 mov dword ptr [ecx],eax
806e4c98 895104 mov dword ptr [ecx+4],edx
806e4c9b 0f31 rdtsc
806e4c9d 640305ac000000 add eax,dword ptr fs:[0ACh]
806e4ca4 641315b0000000 adc edx,dword ptr fs:[0B0h]
806e4cab c20400 ret 4

There is no serializing instruction here making RDTSC subject to out of
order execution among other things.

What I would try is issue a CPUID serializing instruction before calling QPC
(__CPUID is included in intrin.h) and see if that solves the problem.

You can read more about the difficulties getting RDTSC right in this
article:
http://www.ccsl.carleton.ca/~jamuir/rdtscpm1.pdf

but you can read about these as well in the latest intel and AMD manuals.

//Daniel

“Gianluca Varenni” wrote in message
news:xxxxx@ntdev…
> Hi all.
>
> A customer of mine reported a strange issue, and I tracked the issue down
> to wrong values returned by KeQueryPerformanceCounter (or the user
> counterpart QueryPerformanceCounter/Frequency).
>
> The MSDN docs clearly state that
>
> KeQueryPerformanceCounter always returns a 64-bit integer that represents
> the current value of a monotonically nondecreasing counter. The resolution
> of the timer used to accumulate the current count can be obtained by
> specifying a pointer value for PerformanceFrequency. The value that is
> returned is the number of ticks per second. The count starts accumulating
> when the computer is started up.
>
>
> What I get on some specific customer machine is something that is not
> “ticks per second”. Either the frequency is bogus, or the counter itself.
>
> I wrote a small user app to test it:
>
> int _tmain(int argc, _TCHAR* argv)
> {
> int i;
> LARGE_INTEGER freq, counter;
> SYSTEMTIME systemTime;
>
> for (i = 0; i < 10; i++)
> {
> GetSystemTime(&systemTime);
>
> if (QueryPerformanceCounter(&counter) == FALSE ||
> QueryPerformanceFrequency(&freq) == FALSE)
> {
> printf(“Error getting performance counters\n”);
> }
>
> printf(“Counter = %I64d Freq = %I64d”, counter.QuadPart, freq.QuadPart);
> printf(" system: %02d:%02d:%02d\n",
> systemTime.wHour,
> systemTime.wMinute,
> systemTime.wSecond);
>
> Sleep(1000);
> }
>
> return 0;
> }
>
> And this is the output on the failing machine:
>
> Counter = 56497408135970 Freq = 3158750000 system: 01:18:57
> Counter = 56497818110218 Freq = 3158750000 system: 01:18:58
> Counter = 56498206404250 Freq = 3158750000 system: 01:18:59
> Counter = 56498715528269 Freq = 3158750000 system: 01:19:00
> Counter = 56499113984213 Freq = 3158750000 system: 01:19:01
> Counter = 56499489120527 Freq = 3158750000 system: 01:19:02
> Counter = 56499867623469 Freq = 3158750000 system: 01:19:03
> Counter = 56500252073478 Freq = 3158750000 system: 01:19:04
> Counter = 56500713252945 Freq = 3158750000 system: 01:19:05
> Counter = 56501096560332 Freq = 3158750000 system: 01:19:06
>
> If you make some computations, the counter seems to say that every
> iteration is about 0.12 seconds apart from the previous one, while it
> should be more or less 1 second.
> The failing machine is running XP SP3 (hal.dll 5.1.2600.5687, ntoskrnl.exe
> 5.1.2600.5938) on an Intel Core 2 Duo E8500.
>
> Has anyone ever seen this?
>
>
> Have a nice day
> GV
>
>
>
>

The mention of /USEPMTIMER got me looking and lead me to this KB article (http://support.microsoft.com/kb/895980) which describes “Programs that use the QueryPerformanceCounter function may perform poorly in Windows Server 2000, in Windows Server 2003, and in Windows XP”. It sounds exactly like what you’re describing.

However, that particular issue is only for AMD CPUs, and according to this (http://h10010.www1.hp.com/wwpc/sg/en/sm/WF06a/12454-12454-64287-3328898-3328898-4065889.html) the HP Elite 8000 uses Intel CPUs.

But I think that the solution is still worth trying: update to the latest hardware/cpu drivers from Intel.

Athough it doesn’t explain the behavior of regular intervals of about 100ms
for the sake of sanity I would also make sure you are running always on the
same CPU by using SetThreadAffinityMask because each core has its own time
stamp counter and they can be out of sync. Also disable variable speed
features such as Intel Speedstep in the BIOS and read the rest of the
recommendations in this article:
http://msdn.microsoft.com/en-us/library/ee417693(VS.85).aspx

//Daniel

wrote in message news:xxxxx@ntdev…
> I think the problem is that QPC is broken on Windows XP and Vista. The CPU
> guides clearly state that RDTSC is not a serializing instruction and is
> subject to out of order execution among other things and the various
> implementations on Windows don’t do that. (On Win7 RDTSC is no longer used
> possibly depending on available hardware features).
>
> On XP SP3:
> hal!KeQueryPerformanceCounter:
> 806ebb94 8bff mov edi,edi
> 806ebb96 55 push ebp
> 806ebb97 8bec mov ebp,esp
> 806ebb99 5d pop ebp
> 806ebb9a ff2548156f80 jmp dword ptr [hal!HalpHeapStart+0xc
> (806f1548)]
> 0: kd> t
> hal!HalpAcpiTimerQueryPerfCount:
> 806e4c78 a0dcd06e80 mov al,byte ptr [hal!HalpUse8254 (806ed0dc)]
> 806e4c78 a0dcd06e80 mov al,byte ptr [hal!HalpUse8254 (806ed0dc)]
> 806e4c7d 0ac0 or al,al
> 806e4c7f 752d jne hal!HalpAcpiTimerQueryPerfCount+0x36
> (806e4cae)
> 806e4c81 8b4c2404 mov ecx,dword ptr [esp+4]
> 806e4c85 0bc9 or ecx,ecx
> 806e4c87 7412 je hal!HalpAcpiTimerQueryPerfCount+0x23
> (806e4c9b)
> 806e4c89 64a1a4000000 mov eax,dword ptr fs:[000000A4h]
> 806e4c8f 648b15a8000000 mov edx,dword ptr fs:[0A8h]
> 806e4c96 8901 mov dword ptr [ecx],eax
> 806e4c98 895104 mov dword ptr [ecx+4],edx
> 806e4c9b 0f31 rdtsc
> 806e4c9d 640305ac000000 add eax,dword ptr fs:[0ACh]
> 806e4ca4 641315b0000000 adc edx,dword ptr fs:[0B0h]
> 806e4cab c20400 ret 4
>
> There is no serializing instruction here making RDTSC subject to out of
> order execution among other things.
>
> What I would try is issue a CPUID serializing instruction before calling
> QPC (__CPUID is included in intrin.h) and see if that solves the problem.
>
> You can read more about the difficulties getting RDTSC right in this
> article:
> http://www.ccsl.carleton.ca/~jamuir/rdtscpm1.pdf
>
> but you can read about these as well in the latest intel and AMD manuals.
>
> //Daniel
>
>
>
>
> “Gianluca Varenni” wrote in message
> news:xxxxx@ntdev…
>> Hi all.
>>
>> A customer of mine reported a strange issue, and I tracked the issue down
>> to wrong values returned by KeQueryPerformanceCounter (or the user
>> counterpart QueryPerformanceCounter/Frequency).
>>
>> The MSDN docs clearly state that
>>
>> KeQueryPerformanceCounter always returns a 64-bit integer that represents
>> the current value of a monotonically nondecreasing counter. The
>> resolution of the timer used to accumulate the current count can be
>> obtained by specifying a pointer value for PerformanceFrequency. The
>> value that is returned is the number of ticks per second. The count
>> starts accumulating when the computer is started up.
>>
>>
>> What I get on some specific customer machine is something that is not
>> “ticks per second”. Either the frequency is bogus, or the counter itself.
>>
>> I wrote a small user app to test it:
>>
>> int _tmain(int argc, _TCHAR* argv)
>> {
>> int i;
>> LARGE_INTEGER freq, counter;
>> SYSTEMTIME systemTime;
>>
>> for (i = 0; i < 10; i++)
>> {
>> GetSystemTime(&systemTime);
>>
>> if (QueryPerformanceCounter(&counter) == FALSE ||
>> QueryPerformanceFrequency(&freq) == FALSE)
>> {
>> printf(“Error getting performance counters\n”);
>> }
>>
>> printf(“Counter = %I64d Freq = %I64d”, counter.QuadPart, freq.QuadPart);
>> printf(" system: %02d:%02d:%02d\n",
>> systemTime.wHour,
>> systemTime.wMinute,
>> systemTime.wSecond);
>>
>> Sleep(1000);
>> }
>>
>> return 0;
>> }
>>
>> And this is the output on the failing machine:
>>
>> Counter = 56497408135970 Freq = 3158750000 system: 01:18:57
>> Counter = 56497818110218 Freq = 3158750000 system: 01:18:58
>> Counter = 56498206404250 Freq = 3158750000 system: 01:18:59
>> Counter = 56498715528269 Freq = 3158750000 system: 01:19:00
>> Counter = 56499113984213 Freq = 3158750000 system: 01:19:01
>> Counter = 56499489120527 Freq = 3158750000 system: 01:19:02
>> Counter = 56499867623469 Freq = 3158750000 system: 01:19:03
>> Counter = 56500252073478 Freq = 3158750000 system: 01:19:04
>> Counter = 56500713252945 Freq = 3158750000 system: 01:19:05
>> Counter = 56501096560332 Freq = 3158750000 system: 01:19:06
>>
>> If you make some computations, the counter seems to say that every
>> iteration is about 0.12 seconds apart from the previous one, while it
>> should be more or less 1 second.
>> The failing machine is running XP SP3 (hal.dll 5.1.2600.5687,
>> ntoskrnl.exe 5.1.2600.5938) on an Intel Core 2 Duo E8500.
>>
>> Has anyone ever seen this?
>>
>>
>> Have a nice day
>> GV
>>
>>
>>
>>
>

Sorry for following up to myself again but what I forgot about is that QPC
has its own implementation in usermode which boils down to
ntdll.RtlQueryPerformanceCounter. Nonethelesss it suffers from the same
problems as the kernel implementations, no instruction which flushes the
pipeline, so unreliable.

//Daniel

wrote in message news:xxxxx@ntdev…
> Athough it doesn’t explain the behavior of regular intervals of about
> 100ms for the sake of sanity I would also make sure you are running always
> on the same CPU by using SetThreadAffinityMask because each core has its
> own time stamp counter and they can be out of sync. Also disable variable
> speed features such as Intel Speedstep in the BIOS and read the rest of
> the recommendations in this article:
> http://msdn.microsoft.com/en-us/library/ee417693(VS.85).aspx
>
> //Daniel
>
>
> wrote in message news:xxxxx@ntdev…
>> I think the problem is that QPC is broken on Windows XP and Vista. The
>> CPU guides clearly state that RDTSC is not a serializing instruction and
>> is subject to out of order execution among other things and the various
>> implementations on Windows don’t do that. (On Win7 RDTSC is no longer
>> used possibly depending on available hardware features).
>>
>> On XP SP3:
>> hal!KeQueryPerformanceCounter:
>> 806ebb94 8bff mov edi,edi
>> 806ebb96 55 push ebp
>> 806ebb97 8bec mov ebp,esp
>> 806ebb99 5d pop ebp
>> 806ebb9a ff2548156f80 jmp dword ptr [hal!HalpHeapStart+0xc
>> (806f1548)]
>> 0: kd> t
>> hal!HalpAcpiTimerQueryPerfCount:
>> 806e4c78 a0dcd06e80 mov al,byte ptr [hal!HalpUse8254 (806ed0dc)]
>> 806e4c78 a0dcd06e80 mov al,byte ptr [hal!HalpUse8254 (806ed0dc)]
>> 806e4c7d 0ac0 or al,al
>> 806e4c7f 752d jne hal!HalpAcpiTimerQueryPerfCount+0x36
>> (806e4cae)
>> 806e4c81 8b4c2404 mov ecx,dword ptr [esp+4]
>> 806e4c85 0bc9 or ecx,ecx
>> 806e4c87 7412 je hal!HalpAcpiTimerQueryPerfCount+0x23
>> (806e4c9b)
>> 806e4c89 64a1a4000000 mov eax,dword ptr fs:[000000A4h]
>> 806e4c8f 648b15a8000000 mov edx,dword ptr fs:[0A8h]
>> 806e4c96 8901 mov dword ptr [ecx],eax
>> 806e4c98 895104 mov dword ptr [ecx+4],edx
>> 806e4c9b 0f31 rdtsc
>> 806e4c9d 640305ac000000 add eax,dword ptr fs:[0ACh]
>> 806e4ca4 641315b0000000 adc edx,dword ptr fs:[0B0h]
>> 806e4cab c20400 ret 4
>>
>> There is no serializing instruction here making RDTSC subject to out of
>> order execution among other things.
>>
>> What I would try is issue a CPUID serializing instruction before calling
>> QPC (__CPUID is included in intrin.h) and see if that solves the problem.
>>
>> You can read more about the difficulties getting RDTSC right in this
>> article:
>> http://www.ccsl.carleton.ca/~jamuir/rdtscpm1.pdf
>>
>> but you can read about these as well in the latest intel and AMD manuals.
>>
>> //Daniel
>>
>>
>>
>>
>> “Gianluca Varenni” wrote in message
>> news:xxxxx@ntdev…
>>> Hi all.
>>>
>>> A customer of mine reported a strange issue, and I tracked the issue
>>> down to wrong values returned by KeQueryPerformanceCounter (or the user
>>> counterpart QueryPerformanceCounter/Frequency).
>>>
>>> The MSDN docs clearly state that
>>>
>>> KeQueryPerformanceCounter always returns a 64-bit integer that
>>> represents the current value of a monotonically nondecreasing counter.
>>> The resolution of the timer used to accumulate the current count can be
>>> obtained by specifying a pointer value for PerformanceFrequency. The
>>> value that is returned is the number of ticks per second. The count
>>> starts accumulating when the computer is started up.
>>>
>>>
>>> What I get on some specific customer machine is something that is not
>>> “ticks per second”. Either the frequency is bogus, or the counter
>>> itself.
>>>
>>> I wrote a small user app to test it:
>>>
>>> int _tmain(int argc, _TCHAR* argv)
>>> {
>>> int i;
>>> LARGE_INTEGER freq, counter;
>>> SYSTEMTIME systemTime;
>>>
>>> for (i = 0; i < 10; i++)
>>> {
>>> GetSystemTime(&systemTime);
>>>
>>> if (QueryPerformanceCounter(&counter) == FALSE ||
>>> QueryPerformanceFrequency(&freq) == FALSE)
>>> {
>>> printf(“Error getting performance counters\n”);
>>> }
>>>
>>> printf(“Counter = %I64d Freq = %I64d”, counter.QuadPart, freq.QuadPart);
>>> printf(" system: %02d:%02d:%02d\n",
>>> systemTime.wHour,
>>> systemTime.wMinute,
>>> systemTime.wSecond);
>>>
>>> Sleep(1000);
>>> }
>>>
>>> return 0;
>>> }
>>>
>>> And this is the output on the failing machine:
>>>
>>> Counter = 56497408135970 Freq = 3158750000 system: 01:18:57
>>> Counter = 56497818110218 Freq = 3158750000 system: 01:18:58
>>> Counter = 56498206404250 Freq = 3158750000 system: 01:18:59
>>> Counter = 56498715528269 Freq = 3158750000 system: 01:19:00
>>> Counter = 56499113984213 Freq = 3158750000 system: 01:19:01
>>> Counter = 56499489120527 Freq = 3158750000 system: 01:19:02
>>> Counter = 56499867623469 Freq = 3158750000 system: 01:19:03
>>> Counter = 56500252073478 Freq = 3158750000 system: 01:19:04
>>> Counter = 56500713252945 Freq = 3158750000 system: 01:19:05
>>> Counter = 56501096560332 Freq = 3158750000 system: 01:19:06
>>>
>>> If you make some computations, the counter seems to say that every
>>> iteration is about 0.12 seconds apart from the previous one, while it
>>> should be more or less 1 second.
>>> The failing machine is running XP SP3 (hal.dll 5.1.2600.5687,
>>> ntoskrnl.exe 5.1.2600.5938) on an Intel Core 2 Duo E8500.
>>>
>>> Has anyone ever seen this?
>>>
>>>
>>> Have a nice day
>>> GV
>>>
>>>
>>>
>>>
>>
>


From:
Sent: Wednesday, September 01, 2010 7:41 PM
To: “Windows System Software Devs Interest List”
Subject: RE:[ntdev] Interesting issue with (Ke)QueryPerformanceCounter

> I suggest you instead of sleep() run a tight loop waiting for the time to
> jump to the next second and see if the performance counter will behave
> better.

That would prove that it’s a power management issue. It still doesn’t solve
the problem of QueryPerformanceCounter returning bogus numbers :frowning:

Have a nice day
GV

>
> In other words, the PerfCount may run with the claimed speed only when
> you’re actually doing something. If the processor is idle, then power
> management may kick in
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer

You’re right… that IS a known bug. But Gianluca is reporting here isn’t that the time being returned is ragged, highly variable, or off “a little”… he’s reporting that it’s off by almost A FACTOR OF MAGNITUDE.

I’m having trouble believing out of order execution, or reading the TSC on two different processors which are not kept in sync, could be to blame for THAT level of error. Or, do you have a scenario in mind that would explain this.

This finding is really some pretty scary stuff, at least in my mind.

Peter
OSR

That makes sense but this could be in combination with a power management issue going on like Alex mentioned. If a CPU is idle, instead of doing a zero page thread it might call directly into intelppm.sys doing some magic to save consumption of the processor it is running. If RDTSC is issued after that instead of before that should make a huge difference.

Also with processors, their caches and pipelines growing larger and more sophisticated, OOE might mean something that makes sense only if you understand the design of the CPU.

The issues with QPC and RDTSC is rather long so to say and I would say it’s better not to draw any conclusions from empirical observation only.

//Daniel

Have we tried disabling speedstep?

On Sep 2, 2010 11:23 AM, wrote:
> That makes sense but this could be in combination with a power management
issue going on like Alex mentioned. If a CPU is idle, instead of doing a
zero page thread it might call directly into intelppm.sys doing some magic
to save consumption of the processor it is running. If RDTSC is issued after
that instead of before that should make a huge difference.
>
> Also with processors, their caches and pipelines growing larger and more
sophisticated, OOE might mean something that makes sense only if you
understand the design of the CPU.
>
> The issues with QPC and RDTSC is rather long so to say and I would say
it’s better not to draw any conclusions from empirical observation only.
>
> //Daniel
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

QueryPerformanceFrequency() is documented as not changing while the system
is running. If that’s true it can only really indicate the design speed of
the CPU, which appears correct for the CPU specified in this test.

Since it seems BIOSes these days throttle back the CPU, it does not sound
unreasonable that the cycle counter is running slow.

It seems the obvious test is to determine what the actual CPU clock speed is
during this test. If it is being clocked at ~10% of the rated speed on
average during this progam execution, then the counter is being read
correctly, and there is no problem.

In fact it would be surprising to observe anything else unless the actual
clock speed is fixed at the design speed (which may be a setup option which
could work around the issue for the time being).

The various documents cited so far suggest that as a cycle counter QPC can
be useful (if used with care) but it is not much use for any real time
timing.

I think the logic is that if you want to time items for external purposes,
then you should use timing linked to the interface. e.g. if your external
interface is video, then the actual frame rate is the finest resolution you
need because what you do in between frames is irrelevant. If your external
interface is audio, you should use the audio sample rate (or some
subdivision based on blocks of samples) for the same reason. Or whatever the
interface is. These timings must be known to the program since it is
interfacing to them.

But it is a major nuisance, and will mean rewriting some of my library code
for new projects!

Mike

----- Original Message -----
From: xxxxx@osr.com
To: Windows System Software Devs Interest List
Sent: Thursday, September 02, 2010 4:13 PM
Subject: RE:[ntdev] Interesting issue with (Ke)QueryPerformanceCounter

You’re right… that IS a known bug. But Gianluca is reporting here isn’t
that the time being returned is ragged, highly variable, or off “a
little”… he’s reporting that it’s off by almost A FACTOR OF MAGNITUDE.

I’m having trouble believing out of order execution, or reading the TSC on
two different processors which are not kept in sync, could be to blame for
THAT level of error. Or, do you have a scenario in mind that would explain
this.

This finding is really some pretty scary stuff, at least in my mind.

Peter
OSR


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Cay Bremer wrote:

Newer AMD and Intel processors, including the Core 2 family, update the
TSC at a constant rate, so CPU throttling cannot be the culprit here.

His machine has a Core i5, which is newer than the Core 2. I, at least,
don’t know whether this family has the guaranteed cycle counter or not


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

By the way - as your disassembly shows - KeQPC immediately jumps
thru some pointer. The default implementation does rdtsc,
but “cpu drivers” may replace it to something better.

–pa

wrote in message news:xxxxx@ntdev…
> Sorry for following up to myself again but what I forgot about is that QPC
> has its own implementation in usermode which boils down to
> ntdll.RtlQueryPerformanceCounter. Nonethelesss it suffers from the same
> problems as the kernel implementations, no instruction which flushes the
> pipeline, so unreliable.
>
> //Daniel
>
>
>
>
> wrote in message news:xxxxx@ntdev…
>> Athough it doesn’t explain the behavior of regular intervals of about
>> 100ms for the sake of sanity I would also make sure you are running
>> always on the same CPU by using SetThreadAffinityMask because each core
>> has its own time stamp counter and they can be out of sync. Also disable
>> variable speed features such as Intel Speedstep in the BIOS and read the
>> rest of the recommendations in this article:
>> http://msdn.microsoft.com/en-us/library/ee417693(VS.85).aspx
>>
>> //Daniel
>>
>>
>> wrote in message news:xxxxx@ntdev…
>>> I think the problem is that QPC is broken on Windows XP and Vista. The
>>> CPU guides clearly state that RDTSC is not a serializing instruction and
>>> is subject to out of order execution among other things and the various
>>> implementations on Windows don’t do that. (On Win7 RDTSC is no longer
>>> used possibly depending on available hardware features).
>>>
>>> On XP SP3:
>>> hal!KeQueryPerformanceCounter:
>>> 806ebb94 8bff mov edi,edi
>>> 806ebb96 55 push ebp
>>> 806ebb97 8bec mov ebp,esp
>>> 806ebb99 5d pop ebp
>>> 806ebb9a ff2548156f80 jmp dword ptr [hal!HalpHeapStart+0xc
>>> (806f1548)]
>>> 0: kd> t
>>> hal!HalpAcpiTimerQueryPerfCount:
>>> 806e4c78 a0dcd06e80 mov al,byte ptr [hal!HalpUse8254
>>> (806ed0dc)]
>>> 806e4c78 a0dcd06e80 mov al,byte ptr [hal!HalpUse8254
>>> (806ed0dc)]
>>> 806e4c7d 0ac0 or al,al
>>> 806e4c7f 752d jne hal!HalpAcpiTimerQueryPerfCount+0x36
>>> (806e4cae)
>>> 806e4c81 8b4c2404 mov ecx,dword ptr [esp+4]
>>> 806e4c85 0bc9 or ecx,ecx
>>> 806e4c87 7412 je hal!HalpAcpiTimerQueryPerfCount+0x23
>>> (806e4c9b)
>>> 806e4c89 64a1a4000000 mov eax,dword ptr fs:[000000A4h]
>>> 806e4c8f 648b15a8000000 mov edx,dword ptr fs:[0A8h]
>>> 806e4c96 8901 mov dword ptr [ecx],eax
>>> 806e4c98 895104 mov dword ptr [ecx+4],edx
>>> 806e4c9b 0f31 rdtsc
>>> 806e4c9d 640305ac000000 add eax,dword ptr fs:[0ACh]
>>> 806e4ca4 641315b0000000 adc edx,dword ptr fs:[0B0h]
>>> 806e4cab c20400 ret 4
>>>
>>> There is no serializing instruction here making RDTSC subject to out of
>>> order execution among other things.
>>>
>>> What I would try is issue a CPUID serializing instruction before calling
>>> QPC (__CPUID is included in intrin.h) and see if that solves the
>>> problem.
>>>
>>> You can read more about the difficulties getting RDTSC right in this
>>> article:
>>> http://www.ccsl.carleton.ca/~jamuir/rdtscpm1.pdf
>>>
>>> but you can read about these as well in the latest intel and AMD
>>> manuals.
>>>
>>> //Daniel
>>>
>>>
>>>
>>>
>>> “Gianluca Varenni” wrote in message
>>> news:xxxxx@ntdev…
>>>> Hi all.
>>>>
>>>> A customer of mine reported a strange issue, and I tracked the issue
>>>> down to wrong values returned by KeQueryPerformanceCounter (or the user
>>>> counterpart QueryPerformanceCounter/Frequency).
>>>>
>>>> The MSDN docs clearly state that
>>>>
>>>> KeQueryPerformanceCounter always returns a 64-bit integer that
>>>> represents the current value of a monotonically nondecreasing counter.
>>>> The resolution of the timer used to accumulate the current count can be
>>>> obtained by specifying a pointer value for PerformanceFrequency. The
>>>> value that is returned is the number of ticks per second. The count
>>>> starts accumulating when the computer is started up.
>>>>
>>>>
>>>> What I get on some specific customer machine is something that is not
>>>> “ticks per second”. Either the frequency is bogus, or the counter
>>>> itself.
>>>>
>>>> I wrote a small user app to test it:
>>>>
>>>> int _tmain(int argc, _TCHAR* argv)
>>>> {
>>>> int i;
>>>> LARGE_INTEGER freq, counter;
>>>> SYSTEMTIME systemTime;
>>>>
>>>> for (i = 0; i < 10; i++)
>>>> {
>>>> GetSystemTime(&systemTime);
>>>>
>>>> if (QueryPerformanceCounter(&counter) == FALSE ||
>>>> QueryPerformanceFrequency(&freq) == FALSE)
>>>> {
>>>> printf(“Error getting performance counters\n”);
>>>> }
>>>>
>>>> printf(“Counter = %I64d Freq = %I64d”, counter.QuadPart,
>>>> freq.QuadPart);
>>>> printf(" system: %02d:%02d:%02d\n",
>>>> systemTime.wHour,
>>>> systemTime.wMinute,
>>>> systemTime.wSecond);
>>>>
>>>> Sleep(1000);
>>>> }
>>>>
>>>> return 0;
>>>> }
>>>>
>>>> And this is the output on the failing machine:
>>>>
>>>> Counter = 56497408135970 Freq = 3158750000 system: 01:18:57
>>>> Counter = 56497818110218 Freq = 3158750000 system: 01:18:58
>>>> Counter = 56498206404250 Freq = 3158750000 system: 01:18:59
>>>> Counter = 56498715528269 Freq = 3158750000 system: 01:19:00
>>>> Counter = 56499113984213 Freq = 3158750000 system: 01:19:01
>>>> Counter = 56499489120527 Freq = 3158750000 system: 01:19:02
>>>> Counter = 56499867623469 Freq = 3158750000 system: 01:19:03
>>>> Counter = 56500252073478 Freq = 3158750000 system: 01:19:04
>>>> Counter = 56500713252945 Freq = 3158750000 system: 01:19:05
>>>> Counter = 56501096560332 Freq = 3158750000 system: 01:19:06
>>>>
>>>> If you make some computations, the counter seems to say that every
>>>> iteration is about 0.12 seconds apart from the previous one, while it
>>>> should be more or less 1 second.
>>>> The failing machine is running XP SP3 (hal.dll 5.1.2600.5687,
>>>> ntoskrnl.exe 5.1.2600.5938) on an Intel Core 2 Duo E8500.
>>>>
>>>> Has anyone ever seen this?
>>>>
>>>>
>>>> Have a nice day
>>>> GV
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

“Cay Bremer” wrote in message news:xxxxx@ntdev…
> Newer AMD and Intel processors, including the Core 2 family, update the
> TSC at a constant rate, so CPU throttling cannot be the culprit here.

??? But this would defeat the main purpose of rdtsc instruction?

– pa

> (Sorry for mentioning BCDEdit in the first place. I had originally missed
> that the machine is running Windows XP.)
>
>
> On Thu, 02 Sep 2010 05:41:09 +0300, wrote:
>> I suggest you instead of sleep() run a tight loop waiting for the time
>> to jump to the next second and see if the performance counter will
>> behave better.
>>
>> In other words, the PerfCount may run with the claimed speed only when
>> you’re actually doing something. If the processor is idle, then power
>> management may kick in
>

Pavel A. wrote:

“Cay Bremer” wrote in message news:xxxxx@ntdev…
>> Newer AMD and Intel processors, including the Core 2 family, update the
>> TSC at a constant rate, so CPU throttling cannot be the culprit here.
> ??? But this would defeat the main purpose of rdtsc instruction?

That depends entirely on what you believe its “main purpose” is. Many
people believe the main purpose of rdtsc is to provide finely-grained
elapsed time. I would say that the people who care how many CPU cycles
have actually been executed are a very small minority.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

On Fri, 03 Sep 2010 00:23:17 +0300, Pavel A. wrote:
> ??? But this would defeat the main purpose of rdtsc instruction?

You could have checked this claim yourself instead:
http://www.intel.com/Assets/PDF/manual/253668.pdf (16.11 TIME-STAMP
COUNTER)

On Thu, 02 Sep 2010 19:16:44 +0300, Tim Roberts wrote:
> His machine has a Core i5, which is newer than the Core 2. I, at least,
> don’t know whether this family has the guaranteed cycle counter or not

This stands in contradiction with his original statement:
“The failing machine is running XP […] on an Intel Core 2 Duo E8500.”

In any case, “[t]his [constant TSC] is the architectural behavior moving
forward”, according to Intel.

That’s a useful Intel reference. If there are any hardware designers here
can they confirm that the CPU is always fed a constant clock so a constant
rate counter is possible, or whether some motherboards alter the master
clock to the CPU to control its speed (which would make the constant rate
update impossible). This may be processor dependent but it would be useful
to know whether it is possible on any or all of processors from core duo
onward. Also the intel doc says that asserting DPSLP pin may stop the TSC
increment. Does anyone know whether this is used even briefly in normal
operation for power saving? Either of these would render the TSC unusable
for realtime and may explain the OP observation. thx. M

You could have checked this claim yourself instead:
http://www.intel.com/Assets/PDF/manual/253668.pdf (16.11 TIME-STAMP
COUNTER)<