Timer for Synchronization

My driver needs to send an Ethernet packet every 250ms for synchronization to other hardware. I have a user mode app for testing which gives very good performance using the chrono high resolution clock in c++. But when I try to bring the timer into my driver using KeQueryUnbiasedInterruptIme to query the time and KeDelayExecutionThread for the delay, the resolution is far from acceptable and varies by several milliseconds (intervals of 247ms to 258ms). I also tried KeQueryPerformanceCounter and had about the same results. Is there a better approach, or would you expect this to work as well as the user mode app and assume that I’m not implementing it correctly in the driver?

Windows NT is not a realtime OS. You already know this, but that fact is going to color every bit of this discussion. You can often get “pretty okay” results on NTOS, and in very constrained setups, you can get “surprisingly good” results – at least, surprisingly good for a non-RTOS. But you cannot get perfect results all the time.

If you have hard millisecond requirements on the timing of the packets, you cannot use NTOS to do this. You’ll need an RTOS. It’s possible that you already have one: expensive NICs typically have an internal clock, a general-purpose ARM or RISC CPU, and an SDK that lets you move your program onto the NIC’s CPU. (Note that even this has an asterisk: Ethernet itself is a shared medium, which means that if anyone else on the network transmits a packet at the same time, your packet gets trashed. You can retransmit, but now you’ve lost a hundred nanoseconds.) Another option is to buy a little board (like a raspberry pi) that has an Ethernet jack and can run an RTOS.

If you must run on NTOS (or any non-RTOS like macOS or vanilla Linux), then you have to accept unbounded imprecision.

Okay, the big disclaimer is out of the way. Now I suggest you take a break from reading my reply to read through this excellent page: https://docs.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps

Regarding your specific question: KeQueryUnbiasedInterruptTime is not an exact equivalent for usermode’s QueryPerformanceCounter. The actual equivalent is KeQueryPerformanceCounter. If you switch to that, you should be able to get your driver’s numbers up to the same level of quality that you see in your usermode version.

KeQueryInterruptTime (and similarly KeQueryTickCount) are only updated once every clock tick. That means that on typical PCs, the value can be as much as 16ms stale. The Unbiased versions are similar; they’re just not artificially advanced when the system goes to sleep. For network protocols, you probably want the biased clock: for other devices on the network, time continues to advance, even if the local host has gone to low power.

There’s also a Precise variant of these clocks. These use a quick LERP to improve on the 16ms granularity. On x64, the CPU has a cycle counter (RDTSC) that is much more granular than the 16ms interrupt. So you can LERP the cycle counter with the clock interrupt to claw back a bit more precision.

But you don’t need to worry about any of that: if you really need high precision, use KeQueryPerformanceCounter. This method doesn’t have one specific implementation. Instead, it automatically selects the implementation that has the best precision on the current CPU. So if KeQueryInterruptTimePrecise is the best you can do, then KeQPC will do it. If RDTSC is the best, then KeQPC will use that. If there’s some future technology that gives an amazing clock source, KeQPC will use that. So by using KeQPC, you’re saying “only the best is good enough for me: give me the best!” without having to worry about the details.

Summary of time sources:

  • InterruptTime / TickCount: only 16ms resolution
  • Precise: somewhat better than 16ms
  • Unbiased: time stops when the CPU stops
  • KeQueryPerformanceCounter: probably the correct answer

One more caveat: note that the NIC has its own internal queue of packets. If the NIC has a huge queue of packets to be transmitted, your packet will sit and wait for an unbounded time. You may be able to avoid this problem by using 802.1p priority tags: the better NIC drivers have a separate queue for high priority packets. Of course, putting the tag on the packet will change what goes out on the wire, so make sure the other side of this connection is okay with seeing the extra header on the packet.

Alternatively, you can solve the problem by just legislating it away: demand that your solution only runs on a dedicated Ethernet port, with no other contention from other traffic.

Thank you for your reply. I have read through the link you provided. I understand that Windows is not a real time OS, but I am puzzled by the significant difference in results that I am getting between the user mode app approach and the kernel mode driver approach. Here is some additional information. The first code snippet below is from the user mode application. The accuracy of the interval that I observe in Wireshark between packet transmissions is 250.0ms +/- .02ms. The second code snippet is from the driver. From this one, I observe 250ms to 262ms in Wireshark. I will study your comments further to see if there is another approach that I should try.

App approach:

std::chrono::duration time_span;
std::chrono::high_resolution_clock::time_point tstart;
std::chrono::high_resolution_clock::time_point tnow;

while (1)
{
tstart = std::chrono::high_resolution_clock::now();
tnow = std::chrono::high_resolution_clock::now();
time_span = std::chrono::duration_cast<std::chrono::duration>(tnow - tstart);
while (time_span.count() <= .25)
send Ethernet packet
}

Driver approach:
While (syncing)
{
Interval = KeQueryPerformanceCounter(PerformanceFrequency).QuadPart – startTime.QuadPart
If (Interval < 2500000)
{
LARGE_INTEGER T250MS = { (ULONG)(-2500000 + Interval), -1};
KeDelayExecutionThread(KernelMode, FALSE, &T250MS);
}
startTime = KePerformanceCoutner(PerformanceFrequency);
send Ethernet packet and wait for completion
}

Your user-mode example is incomplete. What do you do “while (time_span.count() <= .25)”? Do you Sleep, or are you in a hard CPU loop?

It’s a hard cpu loop. That’s all it does is send a packet every 250ms.

Well, that explains the difference.

In one case you’re running continually; in the other case, you’re waiting for the timer and then your thread needs to be scheduled.

Peter

Yes, that explains it. You surely could not expect to deliver a production product that wastes 100% of a CPU, couldyou?

Your solution, however, does seem pretty clear. You use KeDelayExecutionThread to get close, and then use KeStallExecutionProcessor for the final wait. KeDelayExecutionThread gives up the CPU, but KeStallExecutionProcessor does a tight CPU loop for short waits. That should get you the resolution you need, at a much lower CPU impact.

Thanks for your replies. As it turns out, the driver actually does give the same response as the app on my laptop and home computer, but is 100 times worse as described above on my main computer at work. I compared network settings and that was not the issue. I suspected Windows version since that was the only computer running 2004, but I just tried it on another computer running 2004 and it was fine. So, this may be an unexplained anomaly on one computer, but since it is a very capable computer, I expect I will see it elsewhere. I’ll need to do more testing. Thanks again for your feedback.

This is exactly the sort of problem you expect here. Some of the time, on some machines it will work well enough. but take the same code to different hardware, or run it enough times and the problems pointed out by others surface. It is likely that the ‘better’ the machine is, the more likely you are to fail

FYI you can potentially get more insight into the timing differences across systems by using Xperf. This article might give you a starting point: Happiness is Xperf.