Interesting issue with (Ke)QueryPerformanceCounter

I suggest you instead of sleep() run a tight loop waiting for the time to jump to the next second and see if the performance counter will behave better.

In other words, the PerfCount may run with the claimed speed only when you’re actually doing something. If the processor is idle, then power management may kick in

Newer AMD and Intel processors, including the Core 2 family, update the
TSC at a constant rate, so CPU throttling cannot be the culprit here.

(Sorry for mentioning BCDEdit in the first place. I had originally missed
that the machine is running Windows XP.)

On Thu, 02 Sep 2010 05:41:09 +0300, wrote:
> I suggest you instead of sleep() run a tight loop waiting for the time
> to jump to the next second and see if the performance counter will
> behave better.
>
> In other words, the PerfCount may run with the claimed speed only when
> you’re actually doing something. If the processor is idle, then power
> management may kick in

I think the problem is that QPC is broken on Windows XP and Vista. The CPU
guides clearly state that RDTSC is not a serializing instruction and is
subject to out of order execution among other things and the various
implementations on Windows don’t do that. (On Win7 RDTSC is no longer used
possibly depending on available hardware features).

On XP SP3:
hal!KeQueryPerformanceCounter:
806ebb94 8bff mov edi,edi
806ebb96 55 push ebp
806ebb97 8bec mov ebp,esp
806ebb99 5d pop ebp
806ebb9a ff2548156f80 jmp dword ptr [hal!HalpHeapStart+0xc
(806f1548)]
0: kd> t
hal!HalpAcpiTimerQueryPerfCount:
806e4c78 a0dcd06e80 mov al,byte ptr [hal!HalpUse8254 (806ed0dc)]
806e4c78 a0dcd06e80 mov al,byte ptr [hal!HalpUse8254 (806ed0dc)]
806e4c7d 0ac0 or al,al
806e4c7f 752d jne hal!HalpAcpiTimerQueryPerfCount+0x36
(806e4cae)
806e4c81 8b4c2404 mov ecx,dword ptr [esp+4]
806e4c85 0bc9 or ecx,ecx
806e4c87 7412 je hal!HalpAcpiTimerQueryPerfCount+0x23
(806e4c9b)
806e4c89 64a1a4000000 mov eax,dword ptr fs:[000000A4h]
806e4c8f 648b15a8000000 mov edx,dword ptr fs:[0A8h]
806e4c96 8901 mov dword ptr [ecx],eax
806e4c98 895104 mov dword ptr [ecx+4],edx
806e4c9b 0f31 rdtsc
806e4c9d 640305ac000000 add eax,dword ptr fs:[0ACh]
806e4ca4 641315b0000000 adc edx,dword ptr fs:[0B0h]
806e4cab c20400 ret 4

There is no serializing instruction here making RDTSC subject to out of
order execution among other things.

What I would try is issue a CPUID serializing instruction before calling QPC
(__CPUID is included in intrin.h) and see if that solves the problem.

You can read more about the difficulties getting RDTSC right in this
article:
http://www.ccsl.carleton.ca/~jamuir/rdtscpm1.pdf

but you can read about these as well in the latest intel and AMD manuals.

//Daniel

“Gianluca Varenni” wrote in message
news:xxxxx@ntdev…
> Hi all.
>
> A customer of mine reported a strange issue, and I tracked the issue down
> to wrong values returned by KeQueryPerformanceCounter (or the user
> counterpart QueryPerformanceCounter/Frequency).
>
> The MSDN docs clearly state that
>
> KeQueryPerformanceCounter always returns a 64-bit integer that represents
> the current value of a monotonically nondecreasing counter. The resolution
> of the timer used to accumulate the current count can be obtained by
> specifying a pointer value for PerformanceFrequency. The value that is
> returned is the number of ticks per second. The count starts accumulating
> when the computer is started up.
>
>
> What I get on some specific customer machine is something that is not
> “ticks per second”. Either the frequency is bogus, or the counter itself.
>
> I wrote a small user app to test it:
>
> int _tmain(int argc, _TCHAR* argv)
> {
> int i;
> LARGE_INTEGER freq, counter;
> SYSTEMTIME systemTime;
>
> for (i = 0; i < 10; i++)
> {
> GetSystemTime(&systemTime);
>
> if (QueryPerformanceCounter(&counter) == FALSE ||
> QueryPerformanceFrequency(&freq) == FALSE)
> {
> printf(“Error getting performance counters\n”);
> }
>
> printf(“Counter = %I64d Freq = %I64d”, counter.QuadPart, freq.QuadPart);
> printf(" system: %02d:%02d:%02d\n",
> systemTime.wHour,
> systemTime.wMinute,
> systemTime.wSecond);
>
> Sleep(1000);
> }
>
> return 0;
> }
>
> And this is the output on the failing machine:
>
> Counter = 56497408135970 Freq = 3158750000 system: 01:18:57
> Counter = 56497818110218 Freq = 3158750000 system: 01:18:58
> Counter = 56498206404250 Freq = 3158750000 system: 01:18:59
> Counter = 56498715528269 Freq = 3158750000 system: 01:19:00
> Counter = 56499113984213 Freq = 3158750000 system: 01:19:01
> Counter = 56499489120527 Freq = 3158750000 system: 01:19:02
> Counter = 56499867623469 Freq = 3158750000 system: 01:19:03
> Counter = 56500252073478 Freq = 3158750000 system: 01:19:04
> Counter = 56500713252945 Freq = 3158750000 system: 01:19:05
> Counter = 56501096560332 Freq = 3158750000 system: 01:19:06
>
> If you make some computations, the counter seems to say that every
> iteration is about 0.12 seconds apart from the previous one, while it
> should be more or less 1 second.
> The failing machine is running XP SP3 (hal.dll 5.1.2600.5687, ntoskrnl.exe
> 5.1.2600.5938) on an Intel Core 2 Duo E8500.
>
> Has anyone ever seen this?
>
>
> Have a nice day
> GV
>
>
>
>

The mention of /USEPMTIMER got me looking and lead me to this KB article (http://support.microsoft.com/kb/895980) which describes “Programs that use the QueryPerformanceCounter function may perform poorly in Windows Server 2000, in Windows Server 2003, and in Windows XP”. It sounds exactly like what you’re describing.

However, that particular issue is only for AMD CPUs, and according to this (http://h10010.www1.hp.com/wwpc/sg/en/sm/WF06a/12454-12454-64287-3328898-3328898-4065889.html) the HP Elite 8000 uses Intel CPUs.

But I think that the solution is still worth trying: update to the latest hardware/cpu drivers from Intel.

Athough it doesn’t explain the behavior of regular intervals of about 100ms
for the sake of sanity I would also make sure you are running always on the
same CPU by using SetThreadAffinityMask because each core has its own time
stamp counter and they can be out of sync. Also disable variable speed
features such as Intel Speedstep in the BIOS and read the rest of the
recommendations in this article:
http://msdn.microsoft.com/en-us/library/ee417693(VS.85).aspx

//Daniel

wrote in message news:xxxxx@ntdev…
> I think the problem is that QPC is broken on Windows XP and Vista. The CPU
> guides clearly state that RDTSC is not a serializing instruction and is
> subject to out of order execution among other things and the various
> implementations on Windows don’t do that. (On Win7 RDTSC is no longer used
> possibly depending on available hardware features).
>
> On XP SP3:
> hal!KeQueryPerformanceCounter:
> 806ebb94 8bff mov edi,edi
> 806ebb96 55 push ebp
> 806ebb97 8bec mov ebp,esp
> 806ebb99 5d pop ebp
> 806ebb9a ff2548156f80 jmp dword ptr [hal!HalpHeapStart+0xc
> (806f1548)]
> 0: kd> t
> hal!HalpAcpiTimerQueryPerfCount:
> 806e4c78 a0dcd06e80 mov al,byte ptr [hal!HalpUse8254 (806ed0dc)]
> 806e4c78 a0dcd06e80 mov al,byte ptr [hal!HalpUse8254 (806ed0dc)]
> 806e4c7d 0ac0 or al,al
> 806e4c7f 752d jne hal!HalpAcpiTimerQueryPerfCount+0x36
> (806e4cae)
> 806e4c81 8b4c2404 mov ecx,dword ptr [esp+4]
> 806e4c85 0bc9 or ecx,ecx
> 806e4c87 7412 je hal!HalpAcpiTimerQueryPerfCount+0x23
> (806e4c9b)
> 806e4c89 64a1a4000000 mov eax,dword ptr fs:[000000A4h]
> 806e4c8f 648b15a8000000 mov edx,dword ptr fs:[0A8h]
> 806e4c96 8901 mov dword ptr [ecx],eax
> 806e4c98 895104 mov dword ptr [ecx+4],edx
> 806e4c9b 0f31 rdtsc
> 806e4c9d 640305ac000000 add eax,dword ptr fs:[0ACh]
> 806e4ca4 641315b0000000 adc edx,dword ptr fs:[0B0h]
> 806e4cab c20400 ret 4
>
> There is no serializing instruction here making RDTSC subject to out of
> order execution among other things.
>
> What I would try is issue a CPUID serializing instruction before calling
> QPC (__CPUID is included in intrin.h) and see if that solves the problem.
>
> You can read more about the difficulties getting RDTSC right in this
> article:
> http://www.ccsl.carleton.ca/~jamuir/rdtscpm1.pdf
>
> but you can read about these as well in the latest intel and AMD manuals.
>
> //Daniel
>
>
>
>
> “Gianluca Varenni” wrote in message
> news:xxxxx@ntdev…
>> Hi all.
>>
>> A customer of mine reported a strange issue, and I tracked the issue down
>> to wrong values returned by KeQueryPerformanceCounter (or the user
>> counterpart QueryPerformanceCounter/Frequency).
>>
>> The MSDN docs clearly state that
>>
>> KeQueryPerformanceCounter always returns a 64-bit integer that represents
>> the current value of a monotonically nondecreasing counter. The
>> resolution of the timer used to accumulate the current count can be
>> obtained by specifying a pointer value for PerformanceFrequency. The
>> value that is returned is the number of ticks per second. The count
>> starts accumulating when the computer is started up.
>>
>>
>> What I get on some specific customer machine is something that is not
>> “ticks per second”. Either the frequency is bogus, or the counter itself.
>>
>> I wrote a small user app to test it:
>>
>> int _tmain(int argc, _TCHAR* argv)
>> {
>> int i;
>> LARGE_INTEGER freq, counter;
>> SYSTEMTIME systemTime;
>>
>> for (i = 0; i < 10; i++)
>> {
>> GetSystemTime(&systemTime);
>>
>> if (QueryPerformanceCounter(&counter) == FALSE ||
>> QueryPerformanceFrequency(&freq) == FALSE)
>> {
>> printf(“Error getting performance counters\n”);
>> }
>>
>> printf(“Counter = %I64d Freq = %I64d”, counter.QuadPart, freq.QuadPart);
>> printf(" system: %02d:%02d:%02d\n",
>> systemTime.wHour,
>> systemTime.wMinute,
>> systemTime.wSecond);
>>
>> Sleep(1000);
>> }
>>
>> return 0;
>> }
>>
>> And this is the output on the failing machine:
>>
>> Counter = 56497408135970 Freq = 3158750000 system: 01:18:57
>> Counter = 56497818110218 Freq = 3158750000 system: 01:18:58
>> Counter = 56498206404250 Freq = 3158750000 system: 01:18:59
>> Counter = 56498715528269 Freq = 3158750000 system: 01:19:00
>> Counter = 56499113984213 Freq = 3158750000 system: 01:19:01
>> Counter = 56499489120527 Freq = 3158750000 system: 01:19:02
>> Counter = 56499867623469 Freq = 3158750000 system: 01:19:03
>> Counter = 56500252073478 Freq = 3158750000 system: 01:19:04
>> Counter = 56500713252945 Freq = 3158750000 system: 01:19:05
>> Counter = 56501096560332 Freq = 3158750000 system: 01:19:06
>>
>> If you make some computations, the counter seems to say that every
>> iteration is about 0.12 seconds apart from the previous one, while it
>> should be more or less 1 second.
>> The failing machine is running XP SP3 (hal.dll 5.1.2600.5687,
>> ntoskrnl.exe 5.1.2600.5938) on an Intel Core 2 Duo E8500.
>>
>> Has anyone ever seen this?
>>
>>
>> Have a nice day
>> GV
>>
>>
>>
>>
>

Sorry for following up to myself again but what I forgot about is that QPC
has its own implementation in usermode which boils down to
ntdll.RtlQueryPerformanceCounter. Nonethelesss it suffers from the same
problems as the kernel implementations, no instruction which flushes the
pipeline, so unreliable.

//Daniel

wrote in message news:xxxxx@ntdev…
> Athough it doesn’t explain the behavior of regular intervals of about
> 100ms for the sake of sanity I would also make sure you are running always
> on the same CPU by using SetThreadAffinityMask because each core has its
> own time stamp counter and they can be out of sync. Also disable variable
> speed features such as Intel Speedstep in the BIOS and read the rest of
> the recommendations in this article:
> http://msdn.microsoft.com/en-us/library/ee417693(VS.85).aspx
>
> //Daniel
>
>
> wrote in message news:xxxxx@ntdev…
>> I think the problem is that QPC is broken on Windows XP and Vista. The
>> CPU guides clearly state that RDTSC is not a serializing instruction and
>> is subject to out of order execution among other things and the various
>> implementations on Windows don’t do that. (On Win7 RDTSC is no longer
>> used possibly depending on available hardware features).
>>
>> On XP SP3:
>> hal!KeQueryPerformanceCounter:
>> 806ebb94 8bff mov edi,edi
>> 806ebb96 55 push ebp
>> 806ebb97 8bec mov ebp,esp
>> 806ebb99 5d pop ebp
>> 806ebb9a ff2548156f80 jmp dword ptr [hal!HalpHeapStart+0xc
>> (806f1548)]
>> 0: kd> t
>> hal!HalpAcpiTimerQueryPerfCount:
>> 806e4c78 a0dcd06e80 mov al,byte ptr [hal!HalpUse8254 (806ed0dc)]
>> 806e4c78 a0dcd06e80 mov al,byte ptr [hal!HalpUse8254 (806ed0dc)]
>> 806e4c7d 0ac0 or al,al
>> 806e4c7f 752d jne hal!HalpAcpiTimerQueryPerfCount+0x36
>> (806e4cae)
>> 806e4c81 8b4c2404 mov ecx,dword ptr [esp+4]
>> 806e4c85 0bc9 or ecx,ecx
>> 806e4c87 7412 je hal!HalpAcpiTimerQueryPerfCount+0x23
>> (806e4c9b)
>> 806e4c89 64a1a4000000 mov eax,dword ptr fs:[000000A4h]
>> 806e4c8f 648b15a8000000 mov edx,dword ptr fs:[0A8h]
>> 806e4c96 8901 mov dword ptr [ecx],eax
>> 806e4c98 895104 mov dword ptr [ecx+4],edx
>> 806e4c9b 0f31 rdtsc
>> 806e4c9d 640305ac000000 add eax,dword ptr fs:[0ACh]
>> 806e4ca4 641315b0000000 adc edx,dword ptr fs:[0B0h]
>> 806e4cab c20400 ret 4
>>
>> There is no serializing instruction here making RDTSC subject to out of
>> order execution among other things.
>>
>> What I would try is issue a CPUID serializing instruction before calling
>> QPC (__CPUID is included in intrin.h) and see if that solves the problem.
>>
>> You can read more about the difficulties getting RDTSC right in this
>> article:
>> http://www.ccsl.carleton.ca/~jamuir/rdtscpm1.pdf
>>
>> but you can read about these as well in the latest intel and AMD manuals.
>>
>> //Daniel
>>
>>
>>
>>
>> “Gianluca Varenni” wrote in message
>> news:xxxxx@ntdev…
>>> Hi all.
>>>
>>> A customer of mine reported a strange issue, and I tracked the issue
>>> down to wrong values returned by KeQueryPerformanceCounter (or the user
>>> counterpart QueryPerformanceCounter/Frequency).
>>>
>>> The MSDN docs clearly state that
>>>
>>> KeQueryPerformanceCounter always returns a 64-bit integer that
>>> represents the current value of a monotonically nondecreasing counter.
>>> The resolution of the timer used to accumulate the current count can be
>>> obtained by specifying a pointer value for PerformanceFrequency. The
>>> value that is returned is the number of ticks per second. The count
>>> starts accumulating when the computer is started up.
>>>
>>>
>>> What I get on some specific customer machine is something that is not
>>> “ticks per second”. Either the frequency is bogus, or the counter
>>> itself.
>>>
>>> I wrote a small user app to test it:
>>>
>>> int _tmain(int argc, _TCHAR* argv)
>>> {
>>> int i;
>>> LARGE_INTEGER freq, counter;
>>> SYSTEMTIME systemTime;
>>>
>>> for (i = 0; i < 10; i++)
>>> {
>>> GetSystemTime(&systemTime);
>>>
>>> if (QueryPerformanceCounter(&counter) == FALSE ||
>>> QueryPerformanceFrequency(&freq) == FALSE)
>>> {
>>> printf(“Error getting performance counters\n”);
>>> }
>>>
>>> printf(“Counter = %I64d Freq = %I64d”, counter.QuadPart, freq.QuadPart);
>>> printf(" system: %02d:%02d:%02d\n",
>>> systemTime.wHour,
>>> systemTime.wMinute,
>>> systemTime.wSecond);
>>>
>>> Sleep(1000);
>>> }
>>>
>>> return 0;
>>> }
>>>
>>> And this is the output on the failing machine:
>>>
>>> Counter = 56497408135970 Freq = 3158750000 system: 01:18:57
>>> Counter = 56497818110218 Freq = 3158750000 system: 01:18:58
>>> Counter = 56498206404250 Freq = 3158750000 system: 01:18:59
>>> Counter = 56498715528269 Freq = 3158750000 system: 01:19:00
>>> Counter = 56499113984213 Freq = 3158750000 system: 01:19:01
>>> Counter = 56499489120527 Freq = 3158750000 system: 01:19:02
>>> Counter = 56499867623469 Freq = 3158750000 system: 01:19:03
>>> Counter = 56500252073478 Freq = 3158750000 system: 01:19:04
>>> Counter = 56500713252945 Freq = 3158750000 system: 01:19:05
>>> Counter = 56501096560332 Freq = 3158750000 system: 01:19:06
>>>
>>> If you make some computations, the counter seems to say that every
>>> iteration is about 0.12 seconds apart from the previous one, while it
>>> should be more or less 1 second.
>>> The failing machine is running XP SP3 (hal.dll 5.1.2600.5687,
>>> ntoskrnl.exe 5.1.2600.5938) on an Intel Core 2 Duo E8500.
>>>
>>> Has anyone ever seen this?
>>>
>>>
>>> Have a nice day
>>> GV
>>>
>>>
>>>
>>>
>>
>


From:
Sent: Wednesday, September 01, 2010 7:41 PM
To: “Windows System Software Devs Interest List”
Subject: RE:[ntdev] Interesting issue with (Ke)QueryPerformanceCounter

> I suggest you instead of sleep() run a tight loop waiting for the time to
> jump to the next second and see if the performance counter will behave
> better.

That would prove that it’s a power management issue. It still doesn’t solve
the problem of QueryPerformanceCounter returning bogus numbers :frowning:

Have a nice day
GV

>
> In other words, the PerfCount may run with the claimed speed only when
> you’re actually doing something. If the processor is idle, then power
> management may kick in
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer

You’re right… that IS a known bug. But Gianluca is reporting here isn’t that the time being returned is ragged, highly variable, or off “a little”… he’s reporting that it’s off by almost A FACTOR OF MAGNITUDE.

I’m having trouble believing out of order execution, or reading the TSC on two different processors which are not kept in sync, could be to blame for THAT level of error. Or, do you have a scenario in mind that would explain this.

This finding is really some pretty scary stuff, at least in my mind.

Peter
OSR

That makes sense but this could be in combination with a power management issue going on like Alex mentioned. If a CPU is idle, instead of doing a zero page thread it might call directly into intelppm.sys doing some magic to save consumption of the processor it is running. If RDTSC is issued after that instead of before that should make a huge difference.

Also with processors, their caches and pipelines growing larger and more sophisticated, OOE might mean something that makes sense only if you understand the design of the CPU.

The issues with QPC and RDTSC is rather long so to say and I would say it’s better not to draw any conclusions from empirical observation only.

//Daniel

Have we tried disabling speedstep?

On Sep 2, 2010 11:23 AM, wrote:
> That makes sense but this could be in combination with a power management
issue going on like Alex mentioned. If a CPU is idle, instead of doing a
zero page thread it might call directly into intelppm.sys doing some magic
to save consumption of the processor it is running. If RDTSC is issued after
that instead of before that should make a huge difference.
>
> Also with processors, their caches and pipelines growing larger and more
sophisticated, OOE might mean something that makes sense only if you
understand the design of the CPU.
>
> The issues with QPC and RDTSC is rather long so to say and I would say
it’s better not to draw any conclusions from empirical observation only.
>
> //Daniel
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

QueryPerformanceFrequency() is documented as not changing while the system
is running. If that’s true it can only really indicate the design speed of
the CPU, which appears correct for the CPU specified in this test.

Since it seems BIOSes these days throttle back the CPU, it does not sound
unreasonable that the cycle counter is running slow.

It seems the obvious test is to determine what the actual CPU clock speed is
during this test. If it is being clocked at ~10% of the rated speed on
average during this progam execution, then the counter is being read
correctly, and there is no problem.

In fact it would be surprising to observe anything else unless the actual
clock speed is fixed at the design speed (which may be a setup option which
could work around the issue for the time being).

The various documents cited so far suggest that as a cycle counter QPC can
be useful (if used with care) but it is not much use for any real time
timing.

I think the logic is that if you want to time items for external purposes,
then you should use timing linked to the interface. e.g. if your external
interface is video, then the actual frame rate is the finest resolution you
need because what you do in between frames is irrelevant. If your external
interface is audio, you should use the audio sample rate (or some
subdivision based on blocks of samples) for the same reason. Or whatever the
interface is. These timings must be known to the program since it is
interfacing to them.

But it is a major nuisance, and will mean rewriting some of my library code
for new projects!

Mike

----- Original Message -----
From: xxxxx@osr.com
To: Windows System Software Devs Interest List
Sent: Thursday, September 02, 2010 4:13 PM
Subject: RE:[ntdev] Interesting issue with (Ke)QueryPerformanceCounter

You’re right… that IS a known bug. But Gianluca is reporting here isn’t
that the time being returned is ragged, highly variable, or off “a
little”… he’s reporting that it’s off by almost A FACTOR OF MAGNITUDE.

I’m having trouble believing out of order execution, or reading the TSC on
two different processors which are not kept in sync, could be to blame for
THAT level of error. Or, do you have a scenario in mind that would explain
this.

This finding is really some pretty scary stuff, at least in my mind.

Peter
OSR


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Cay Bremer wrote:

Newer AMD and Intel processors, including the Core 2 family, update the
TSC at a constant rate, so CPU throttling cannot be the culprit here.

His machine has a Core i5, which is newer than the Core 2. I, at least,
don’t know whether this family has the guaranteed cycle counter or not


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

By the way - as your disassembly shows - KeQPC immediately jumps
thru some pointer. The default implementation does rdtsc,
but “cpu drivers” may replace it to something better.

–pa

wrote in message news:xxxxx@ntdev…
> Sorry for following up to myself again but what I forgot about is that QPC
> has its own implementation in usermode which boils down to
> ntdll.RtlQueryPerformanceCounter. Nonethelesss it suffers from the same
> problems as the kernel implementations, no instruction which flushes the
> pipeline, so unreliable.
>
> //Daniel
>
>
>
>
> wrote in message news:xxxxx@ntdev…
>> Athough it doesn’t explain the behavior of regular intervals of about
>> 100ms for the sake of sanity I would also make sure you are running
>> always on the same CPU by using SetThreadAffinityMask because each core
>> has its own time stamp counter and they can be out of sync. Also disable
>> variable speed features such as Intel Speedstep in the BIOS and read the
>> rest of the recommendations in this article:
>> http://msdn.microsoft.com/en-us/library/ee417693(VS.85).aspx
>>
>> //Daniel
>>
>>
>> wrote in message news:xxxxx@ntdev…
>>> I think the problem is that QPC is broken on Windows XP and Vista. The
>>> CPU guides clearly state that RDTSC is not a serializing instruction and
>>> is subject to out of order execution among other things and the various
>>> implementations on Windows don’t do that. (On Win7 RDTSC is no longer
>>> used possibly depending on available hardware features).
>>>
>>> On XP SP3:
>>> hal!KeQueryPerformanceCounter:
>>> 806ebb94 8bff mov edi,edi
>>> 806ebb96 55 push ebp
>>> 806ebb97 8bec mov ebp,esp
>>> 806ebb99 5d pop ebp
>>> 806ebb9a ff2548156f80 jmp dword ptr [hal!HalpHeapStart+0xc
>>> (806f1548)]
>>> 0: kd> t
>>> hal!HalpAcpiTimerQueryPerfCount:
>>> 806e4c78 a0dcd06e80 mov al,byte ptr [hal!HalpUse8254
>>> (806ed0dc)]
>>> 806e4c78 a0dcd06e80 mov al,byte ptr [hal!HalpUse8254
>>> (806ed0dc)]
>>> 806e4c7d 0ac0 or al,al
>>> 806e4c7f 752d jne hal!HalpAcpiTimerQueryPerfCount+0x36
>>> (806e4cae)
>>> 806e4c81 8b4c2404 mov ecx,dword ptr [esp+4]
>>> 806e4c85 0bc9 or ecx,ecx
>>> 806e4c87 7412 je hal!HalpAcpiTimerQueryPerfCount+0x23
>>> (806e4c9b)
>>> 806e4c89 64a1a4000000 mov eax,dword ptr fs:[000000A4h]
>>> 806e4c8f 648b15a8000000 mov edx,dword ptr fs:[0A8h]
>>> 806e4c96 8901 mov dword ptr [ecx],eax
>>> 806e4c98 895104 mov dword ptr [ecx+4],edx
>>> 806e4c9b 0f31 rdtsc
>>> 806e4c9d 640305ac000000 add eax,dword ptr fs:[0ACh]
>>> 806e4ca4 641315b0000000 adc edx,dword ptr fs:[0B0h]
>>> 806e4cab c20400 ret 4
>>>
>>> There is no serializing instruction here making RDTSC subject to out of
>>> order execution among other things.
>>>
>>> What I would try is issue a CPUID serializing instruction before calling
>>> QPC (__CPUID is included in intrin.h) and see if that solves the
>>> problem.
>>>
>>> You can read more about the difficulties getting RDTSC right in this
>>> article:
>>> http://www.ccsl.carleton.ca/~jamuir/rdtscpm1.pdf
>>>
>>> but you can read about these as well in the latest intel and AMD
>>> manuals.
>>>
>>> //Daniel
>>>
>>>
>>>
>>>
>>> “Gianluca Varenni” wrote in message
>>> news:xxxxx@ntdev…
>>>> Hi all.
>>>>
>>>> A customer of mine reported a strange issue, and I tracked the issue
>>>> down to wrong values returned by KeQueryPerformanceCounter (or the user
>>>> counterpart QueryPerformanceCounter/Frequency).
>>>>
>>>> The MSDN docs clearly state that
>>>>
>>>> KeQueryPerformanceCounter always returns a 64-bit integer that
>>>> represents the current value of a monotonically nondecreasing counter.
>>>> The resolution of the timer used to accumulate the current count can be
>>>> obtained by specifying a pointer value for PerformanceFrequency. The
>>>> value that is returned is the number of ticks per second. The count
>>>> starts accumulating when the computer is started up.
>>>>
>>>>
>>>> What I get on some specific customer machine is something that is not
>>>> “ticks per second”. Either the frequency is bogus, or the counter
>>>> itself.
>>>>
>>>> I wrote a small user app to test it:
>>>>
>>>> int _tmain(int argc, _TCHAR* argv)
>>>> {
>>>> int i;
>>>> LARGE_INTEGER freq, counter;
>>>> SYSTEMTIME systemTime;
>>>>
>>>> for (i = 0; i < 10; i++)
>>>> {
>>>> GetSystemTime(&systemTime);
>>>>
>>>> if (QueryPerformanceCounter(&counter) == FALSE ||
>>>> QueryPerformanceFrequency(&freq) == FALSE)
>>>> {
>>>> printf(“Error getting performance counters\n”);
>>>> }
>>>>
>>>> printf(“Counter = %I64d Freq = %I64d”, counter.QuadPart,
>>>> freq.QuadPart);
>>>> printf(" system: %02d:%02d:%02d\n",
>>>> systemTime.wHour,
>>>> systemTime.wMinute,
>>>> systemTime.wSecond);
>>>>
>>>> Sleep(1000);
>>>> }
>>>>
>>>> return 0;
>>>> }
>>>>
>>>> And this is the output on the failing machine:
>>>>
>>>> Counter = 56497408135970 Freq = 3158750000 system: 01:18:57
>>>> Counter = 56497818110218 Freq = 3158750000 system: 01:18:58
>>>> Counter = 56498206404250 Freq = 3158750000 system: 01:18:59
>>>> Counter = 56498715528269 Freq = 3158750000 system: 01:19:00
>>>> Counter = 56499113984213 Freq = 3158750000 system: 01:19:01
>>>> Counter = 56499489120527 Freq = 3158750000 system: 01:19:02
>>>> Counter = 56499867623469 Freq = 3158750000 system: 01:19:03
>>>> Counter = 56500252073478 Freq = 3158750000 system: 01:19:04
>>>> Counter = 56500713252945 Freq = 3158750000 system: 01:19:05
>>>> Counter = 56501096560332 Freq = 3158750000 system: 01:19:06
>>>>
>>>> If you make some computations, the counter seems to say that every
>>>> iteration is about 0.12 seconds apart from the previous one, while it
>>>> should be more or less 1 second.
>>>> The failing machine is running XP SP3 (hal.dll 5.1.2600.5687,
>>>> ntoskrnl.exe 5.1.2600.5938) on an Intel Core 2 Duo E8500.
>>>>
>>>> Has anyone ever seen this?
>>>>
>>>>
>>>> Have a nice day
>>>> GV
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

“Cay Bremer” wrote in message news:xxxxx@ntdev…
> Newer AMD and Intel processors, including the Core 2 family, update the
> TSC at a constant rate, so CPU throttling cannot be the culprit here.

??? But this would defeat the main purpose of rdtsc instruction?

– pa

> (Sorry for mentioning BCDEdit in the first place. I had originally missed
> that the machine is running Windows XP.)
>
>
> On Thu, 02 Sep 2010 05:41:09 +0300, wrote:
>> I suggest you instead of sleep() run a tight loop waiting for the time
>> to jump to the next second and see if the performance counter will
>> behave better.
>>
>> In other words, the PerfCount may run with the claimed speed only when
>> you’re actually doing something. If the processor is idle, then power
>> management may kick in
>

Pavel A. wrote:

“Cay Bremer” wrote in message news:xxxxx@ntdev…
>> Newer AMD and Intel processors, including the Core 2 family, update the
>> TSC at a constant rate, so CPU throttling cannot be the culprit here.
> ??? But this would defeat the main purpose of rdtsc instruction?

That depends entirely on what you believe its “main purpose” is. Many
people believe the main purpose of rdtsc is to provide finely-grained
elapsed time. I would say that the people who care how many CPU cycles
have actually been executed are a very small minority.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

On Fri, 03 Sep 2010 00:23:17 +0300, Pavel A. wrote:
> ??? But this would defeat the main purpose of rdtsc instruction?

You could have checked this claim yourself instead:
http://www.intel.com/Assets/PDF/manual/253668.pdf (16.11 TIME-STAMP
COUNTER)

On Thu, 02 Sep 2010 19:16:44 +0300, Tim Roberts wrote:
> His machine has a Core i5, which is newer than the Core 2. I, at least,
> don’t know whether this family has the guaranteed cycle counter or not

This stands in contradiction with his original statement:
“The failing machine is running XP […] on an Intel Core 2 Duo E8500.”

In any case, “[t]his [constant TSC] is the architectural behavior moving
forward”, according to Intel.

That’s a useful Intel reference. If there are any hardware designers here
can they confirm that the CPU is always fed a constant clock so a constant
rate counter is possible, or whether some motherboards alter the master
clock to the CPU to control its speed (which would make the constant rate
update impossible). This may be processor dependent but it would be useful
to know whether it is possible on any or all of processors from core duo
onward. Also the intel doc says that asserting DPSLP pin may stop the TSC
increment. Does anyone know whether this is used even briefly in normal
operation for power saving? Either of these would render the TSC unusable
for realtime and may explain the OP observation. thx. M

You could have checked this claim yourself instead:
http://www.intel.com/Assets/PDF/manual/253668.pdf (16.11 TIME-STAMP
COUNTER)<

“Cay Bremer” wrote in message news:xxxxx@ntdev…
> On Fri, 03 Sep 2010 00:23:17 +0300, Pavel A. wrote:
>> ??? But this would defeat the main purpose of rdtsc instruction?
>
> You could have checked this claim yourself instead:
> http://www.intel.com/Assets/PDF/manual/253668.pdf (16.11 TIME-STAMP
> COUNTER)

Thanks Cay, it is a great idea. The fine System programming manual is so
amusing reading.
Every year it is not like a year ago…
So they now call that “architectural behavior moving forward”.

There still are two delicate moments:

1. the “constant TSC” rate depends on combination of other obscure
parameters:
“The specific processor configuration determines the behavior.” (page 16-48,
more esoteric details in 30.10.5) - and this is exactly what
defines the PerformanceFrequency value. Unless the OS or a “CPU driver”
measures
the frequency in runtime, it may be possible to get wrong number - which may
lead to results of the OP.

2. What is “Invariant TSC” then? (16.11.1)

About “constant TSC”, they wrote:
“Constant TSC behavior ensures that the duration of each clock tick is
uniform and supports the
use of the TSC as a wall clock timer even if the processor core changes
frequency.”

But then goes:

“The invariant TSC will run at a constant rate in all ACPI P-, C-. and
T-states. This is
the architectural behavior moving forward. On processors with invariant TSC
support, the OS may use the TSC for wall clock timer services (instead of
ACPI or
HPET timers). …
Processor’s support for invariant TSC is indicated by
CPUID.80000007H:EDX[8].”

Now I’m confused. My Core 2 cpu does not have this cpuid bit set - but,
according to 16.11,
Core 2 has constant TSC. So can I rely on TSC being uniform if C states
change?
– pa

>
> On Thu, 02 Sep 2010 19:16:44 +0300, Tim Roberts wrote:
>> His machine has a Core i5, which is newer than the Core 2. I, at least,
>> don’t know whether this family has the guaranteed cycle counter or not
>
> This stands in contradiction with his original statement:
> “The failing machine is running XP […] on an Intel Core 2 Duo E8500.”
>
> In any case, “[t]his [constant TSC] is the architectural behavior moving
> forward”, according to Intel.
>

On Sat, 04 Sep 2010 00:31:19 +0300, Pavel A. wrote:
> 1. the “constant TSC” rate depends on combination of other obscure
> parameters:
> […]
> Unless the OS or a “CPU driver” measures the frequency in runtime,it may
> be possible to get wrong number - which may lead to results
> of the OP.

I don’t understand what you are trying to debate with me here. First, you
question their decision of changing the TSC behaviour, even though that is
obviously beyond my control, and now, you speculate about the intricacies
of frequency retrieval/measurement, which I haven’t even commented on. In
fact, I am the one who suggested to try out a different QPC timer.
In my reply that you have made the source of your controversy, I only
addressed dynamic frequency scaling (but “CPU throttling” may have been
ambiguous), which was, given the original behaviour of the TSC, actually a
reasonable guess.

(English comma rules are killing me.)

> 2. What is “Invariant TSC” then? (16.11.1)
>
> […]
>
> Now I’m confused. My Core 2 cpu does not have this cpuid bit set - but,
> according to 16.11,
> Core 2 has constant TSC. So can I rely on TSC being uniform if C states
> change?

If the documentation is to be trusted, a mere constant TSC “may” only be
stopped in C3 and deeper states:

“Following a RESET, the counter increments even when the processor is
halted by the HLT instruction [C1] or the external STPCLK# pin [C2]. Note
that the assertion of the external DPSLP# pin may cause the time-stamp
counter to stop [C3].”

[…back from a long weekend vacation…]

Some update to this. I got the results of using the __rdtsc intrisic. It
shows the same behavior of QueryPerformanceCounter (the values of QPC and
__rdtsc are basically the same).

It looks like it’s CPU throttling. I’ve asked the customer to try using the
boot.ini option /USEPMTIMER, but still haven’t heard back.

Have a nice day
GV


From: “Tim Roberts”
Sent: Thursday, September 02, 2010 9:16 AM
To: “Windows System Software Devs Interest List”
Subject: Re: [ntdev] Interesting issue with (Ke)QueryPerformanceCounter

> Cay Bremer wrote:
>> Newer AMD and Intel processors, including the Core 2 family, update the
>> TSC at a constant rate, so CPU throttling cannot be the culprit here.
>
> His machine has a Core i5, which is newer than the Core 2. I, at least,
> don’t know whether this family has the guaranteed cycle counter or not
>
> –
> Tim Roberts, xxxxx@probo.com
> Providenza & Boekelheide, Inc.
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer