KeQueryPerformanceCounter on AMD64 DualCore machines issue

Loren Wilton wrote:

So it appears that the only thing that can be done if you need
timestamps below 1ms on a multiproc system is to implement a routine
that has a thread per proc at realtime priority, and will periodically
all sync together burning processor until they can get a
near-simultaneous RDTSC value from each processor. You can then be
careful of which processor you read the timestamp counter from and use
a correction factor based on the last known offset and speed for that
processor.

The time-stamp counter is a writable register. A suitably motivated
person should be able to write a kernel driver to sync them on the fly.

How often you have to get the base sync values probably depends on the
environment. I deal with datacenter servers, so I can probably get
away with getting a sync value every 100 seconds or so. On a laptop
with active power management you might have to get sync values several
times per second to have actual accuracies in the millisecond range.

I’m surprised at how dynamic this delta is. I have a little test app
that I wrote many years ago to test the counter synchronization. I ran
it a few minutes ago on my desktop AMD 64 X2 3800+, and they were
11,976,000 cycles apart. I ran it two minutes later, and they were
11,951,000 cycles apart. Right now, they are 12,058,000 cycles apart.
The population is large enough that I believe these to be accurate to 5
places.

The clock runs at 2GHz, which means they are about 6ms apart. But even
more interesting, the delta between them varied by as much as 50us over
a period of 10 minutes.

It appears that, short of a special hardware timer plugin card, it is
no longer possible to get reliable timestamps that are good to less
than 1ms, and even those might be questionable in a number of
circumstances.

Yes. I am actually quite surprised that a simple, free-running
microsecond counter has not become part of the standard PC by now.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

> ----------

From: xxxxx@lists.osr.com[SMTP:xxxxx@lists.osr.com] on behalf of Tim Roberts[SMTP:xxxxx@probo.com]
Reply To: Windows System Software Devs Interest List
Sent: Thursday, February 15, 2007 9:23 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] KeQueryPerformanceCounter on AMD64 DualCore machines issue

Michal Vodicka wrote:
> The only remaining function is KeQueryPerformanceCounter() which uses RDTSC at my dual Opteron machine. Which means it is susceptible to problems mentioned in this thread and I can see them.
>
> So what is the right solution? I can’t believe such a basic thing just doesn’t work…
>

One answer is to use affinity to make sure your code always runs on the
same CPU.

My code usually runs in arbitrary thread context…

Best regards,

Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]

This has been a most interesting thread indeed. It seems there is no right
way; there are only varying degrees of wrongness …

“Justin Schoenwald” wrote in message news:xxxxx@ntdev…
> At 11:51 PM 2/14/2007, you wrote:
>>… so better to write your code once the right way instead of once the
>>wrong way and then later the right way after customers experience
>>problems.
>
> Ok, what is the right way?
> …assuming that “the right way” works.
>
>
>

Platforms where KeQueryPerformanceCounter returns non monotonous counter
are simply broken (in either hardware or Windows HAL).
IMHO you don’t have any good solution there, besides of just avoiding crashes
in your code.

–PA

“Michal Vodicka” wrote in message news:xxxxx@ntdev…
> ----------
> From: xxxxx@lists.osr.com[SMTP:xxxxx@lists.osr.com] on behalf of Gianluca
> Varenni[SMTP:xxxxx@gmail.com]
> Reply To: Windows System Software Devs Interest List
> Sent: Thursday, February 15, 2007 5:21 PM
> To: Windows System Software Devs Interest List
> Subject: Re: Re:[ntdev] KeQueryPerformanceCounter on AMD64 DualCore machines issue
>
> As a matter of facts, either you use that macro (ReadTimeStampCounter) or
> KeQueryPerformanceCounter, they are almost useless if used for timestamping
> a piece of information that is not cpu-bound. So, the only reliable function
> is KeQuerySystemTime (and similar) which gives timestamps in the millisecond
> range precision.
>
At most platforms I saw it is about 15.620 ms which is ages at modern CPU.

> BTW, I just discovered yesterday from a log sent by my user that DebugView
> probably uses the same function, as it shows the same identical problem with
> timestamps.
>
Yes, it does. I have the same experience with it.

Accidentally, I just needed to improve time measuring in my driver. Originally, I used KeQueryTickCount() but I need better
resolution now. So I tried KeQueryInterruptTime() which claims to return finer grained measurement (according to WDK docs). I
wondered how it is possible if KeQueryTimeIncrement() is documented to return timer increment for both functions but
nevertheless, I tried. Nothing changed, 15.620 ms resolution. For completeness, I tried KeQuerySystemTime() with the same
result.

The only remaining function is KeQueryPerformanceCounter() which uses RDTSC at my dual Opteron machine. Which means it is
susceptible to problems mentioned in this thread and I can see them.

So what is the right solution? I can’t believe such a basic thing just doesn’t work…

Best regards,

Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]

On Sun, Feb 18, 2007 at 07:27:48AM +0200, Pavel A. wrote:

Platforms where KeQueryPerformanceCounter returns non monotonous counter
are simply broken (in either hardware or Windows HAL).

I would agree, but I think you'll find that virtually every multiprocessor
machine that runs Windows XP qualifies as "broken" under your definition.

Tim Roberts, xxxxx@probo.com
Providenza & Boeklheide, Inc.

wrote in message news:xxxxx@ntdev…
> On Sun, Feb 18, 2007 at 07:27:48AM +0200, Pavel A. wrote:
>>
>> Platforms where KeQueryPerformanceCounter returns non monotonous counter
>> are simply broken (in either hardware or Windows HAL).
>
> I would agree, but I think you’ll find that virtually every multiprocessor
> machine that runs Windows XP qualifies as “broken” under your definition.

Unfortunately XP HALs seem to use a wrong time source,
the Intel HPET spec arrived too late (?)
Why there is no standard time sourse hardware indeed is a riddle,
perhaps Mark Russinovich knows the answer - but won’t tell us :frowning:

–PA

----- Original Message -----
From: “Pavel A.”
Newsgroups: ntdev
To: “Windows System Software Devs Interest List”
Sent: Saturday, February 17, 2007 9:27 PM
Subject: Re:[ntdev] Re:KeQueryPerformanceCounter on AMD64 DualCore machines
issue

> Platforms where KeQueryPerformanceCounter returns non monotonous counter
> are simply broken (in either hardware or Windows HAL).

I know. And this would explain the note for QueryPerformanceCounter (i.e.
the user level counterpart of KQPC).
On a multiprocessor machine, it should not matter which processor is called.
However, you can get different results on different processors due to bugs
in the BIOS or the HAL. To specify processor affinity for a thread, use the
SetThreadAffinityMask function.

> IMHO you don’t have any good solution there, besides of just avoiding
> crashes
> in your code.

It’s just used for timestamping received packets in WinPcap (and in other
drivers), so no crashes. It’s just not fun when users report you that the
timestamps of the captured packets are screwed up. And telling them “your
hardware is broken” is even worse.

I hoped to have some “official” answer from some MS folk, but…

GV

>
>
> --PA
>
>
> “Michal Vodicka” wrote in message
> news:xxxxx@ntdev…
>> ----------
>> From:
>> xxxxx@lists.osr.com[SMTP:xxxxx@lists.osr.com]
>> on behalf of Gianluca Varenni[SMTP:xxxxx@gmail.com]
>> Reply To: Windows System Software Devs Interest List
>> Sent: Thursday, February 15, 2007 5:21 PM
>> To: Windows System Software Devs Interest List
>> Subject: Re: Re:[ntdev] KeQueryPerformanceCounter on AMD64 DualCore
>> machines issue
>>
>> As a matter of facts, either you use that macro (ReadTimeStampCounter) or
>> KeQueryPerformanceCounter, they are almost useless if used for
>> timestamping
>> a piece of information that is not cpu-bound. So, the only reliable
>> function
>> is KeQuerySystemTime (and similar) which gives timestamps in the
>> millisecond
>> range precision.
>>
> At most platforms I saw it is about 15.620 ms which is ages at modern CPU.
>
>> BTW, I just discovered yesterday from a log sent by my user that
>> DebugView
>> probably uses the same function, as it shows the same identical problem
>> with
>> timestamps.
>>
> Yes, it does. I have the same experience with it.
>
> Accidentally, I just needed to improve time measuring in my driver.
> Originally, I used KeQueryTickCount() but I need better resolution now. So
> I tried KeQueryInterruptTime() which claims to return finer grained
> measurement (according to WDK docs). I wondered how it is possible if
> KeQueryTimeIncrement() is documented to return timer increment for both
> functions but nevertheless, I tried. Nothing changed, 15.620 ms
> resolution. For completeness, I tried KeQuerySystemTime() with the same
> result.
>
> The only remaining function is KeQueryPerformanceCounter() which uses
> RDTSC at my dual Opteron machine. Which means it is susceptible to
> problems mentioned in this thread and I can see them.
>
> So what is the right solution? I can’t believe such a basic thing just
> doesn’t work…
>
> Best regards,
>
> Michal Vodicka
> UPEK, Inc.
> [xxxxx@upek.com, http://www.upek.com]
>
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer

----- Original Message -----
From: “Pavel A.”
Newsgroups: ntdev
To: “Windows System Software Devs Interest List”
Sent: Saturday, February 17, 2007 11:00 PM
Subject: Re:[ntdev] Re:KeQueryPerformanceCounter on AMD64 DualCore machines
issue

> wrote in message news:xxxxx@ntdev…
>> On Sun, Feb 18, 2007 at 07:27:48AM +0200, Pavel A. wrote:
>>>
>>> Platforms where KeQueryPerformanceCounter returns non monotonous counter
>>> are simply broken (in either hardware or Windows HAL).
>>
>> I would agree, but I think you’ll find that virtually every
>> multiprocessor
>> machine that runs Windows XP qualifies as “broken” under your definition.
>
> Unfortunately XP HALs seem to use a wrong time source,
> the Intel HPET spec arrived too late (?)

Well, I acutally discovered the issue on an XP64 machine i.e. a win2003
kernel…

GV

> Why there is no standard time sourse hardware indeed is a riddle,
> perhaps Mark Russinovich knows the answer - but won’t tell us :frowning:
>
> --PA
>
>
>
> —
> Questions? First check the Kernel Driver FAQ at
> http://www.osronline.com/article.cfm?id=256
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer

“Gianluca Varenni” wrote in message news:xxxxx@ntdev…

> this would explain the note for QueryPerformanceCounter (i.e. the user level counterpart of KQPC).
> On a multiprocessor machine, it should not matter which processor is called. However, you can get different results on
> different processors due to bugs in the BIOS or the HAL. To specify processor affinity for a thread, use the
> SetThreadAffinityMask function.

Yep.

> I hoped to have some “official” answer from some MS folk, but…

So did I - but… But never got anything. Even Mark Russinovich prefers to tell us
about cycle counting and multimedia scheduler in Vista, but never explained
the very basic details about the timer resolution (like why the default of XP
is 15.6 ms on some platforms and 10 ms on other platforms )

Regards,
–PA