We sell a stable driver for CAN Bus adapters.
One feature is time controlled paket transmission, triggered by a WDFTIMER callback interrupting every 1 millisecond.
The timer setup is
The call frequency of the callback was very precise since pre-KMDF times, for almost 20 years.
A big customer now complains about a loss of precision, causing jitter in transmitted message timestamps. I traced it down to:
callback period is indeed jittering between approx 0.9 and 1.3 milliseconds.
and this occurs since Win11 24H2 10.0.26100 (including)
The driver under test was compiled with WDK 16299.
I did tests with
config.TolerableDelay = 1 ;
and
config.UseHighResolutionTimer = WdfTrue ;
The article from Bruce Dawson about "Windows Timer Resolution: The Great Rule Change" (sorry, no links allowed in this post)
The article on github by user "plankeeee": "BetterTimerResolution"
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\kernel]
"GlobalTimerResolutionRequests"=dword:00000001
ExSetTimerResolution()
All without effect. Timer resolution may go up to 500µsec, but precision remains bad.
Test machine was a i7-6700K 4/8 core.
I understand that "WDFTIMER callback jitter" maybe an epiphenomenon, perhaps caused by other system services or power policies.
Anyhow, I have pressure to "get this right again".
Any hints what changed in 10.0.26100, what to try next, what docs to read further?
I would be surprised if this ever worked reliably in any version of Windows.
I know that doesn't help your problem, but as a general rule, if you have real hardware and need an 'accurate' timer, you want an interrupt. And if you need to do it in software only, you have to spin wait. Even for something as coarse as 1ms. To use scheduler based timing, your needs should be an order of magnitude less than the scheduler granularity - about 100 ms and longer intervals.
Precision is another question for any implementation.
While none of this has changed, the problem is worse today because the scheduler now handles asymmetric multi-processing. That is P cores and E cores in the same system. And with far more capable timing hardware available to the scheduler too
Thanks for the info, MBond2. I'm also seeing similar issues with WDFTIMER jitter on newer Windows versions. Even with UseHighResolutionTimer and ExSetTimerResolution(), the timing isn't consistent anymore. This seems to have started with the recent Windows 11 builds, possibly due to changes in how the scheduler handles P and E cores. If anyone has found a workaround, like setting CPU affinity or using MMCSS, I'd appreciate any tips.
I did more tests, finding the KMDF-version also matters.
First lets to quantify the amount of "jitter":
I sent a single CAN-Bus message repeated at 1 millisecond intervals,
then build time difference to prev msg,
then get mean value and standard deviation of current time diffs.
Without any jitter, I expect (mean, standard deviation) to be (1000,0) µsecs.
In Windows build 10.0.22631, I get (1000,132), so 68% of my periods are in 1000+/-132µs
This is considered "good".
Under Windows build 10.0.26100, I get (1000,296), so 68% of my periods are in 1000+/- 300µs
Beside the Windows build number another more important factor is the KMDF-version number.
Its coded in the driver project .vcxproj as
Under 10.0.22631, compiling with SDK/WDK = 16299, KMDF-Version 1.9:
(mean, standard deviation) = (1000,132) µsecs.
Going to KMDF version 1.15:
now (mean, standard deviation) = (1000,800) µsecs, about 6 time worse jitter.
So for the same SDK/WDK 16299, changing the KMDF version introduces much more "Jitter". (Would be nice now to have data for all KMDF-versions up to 1.33. Also for different SDKs/WDKs).
Now questions:
The same KMDF library code (here 16299) acts different, depending on the "KMDF-version" selected on compile?
The KMDF sources codes should reveal functional differences in handling the WDFTIMER, depending on the KMDF-version. Where to look at?
Since iirc windows 8.1, definitely windows 10, you don’t get to install or determine the wdf runtime version. You get what it’s in the OS. So you can remove that variable from your testing on win10. The WDF version does change with every win10 and 11 release (at least it used to before I left )
There are a bunch of things here, but the first is your measuring technique
The normal or Gaussian distribution is the most common form of statistical measurement. It is taught in schools all over, well documented on the Internet, easy to calculate, and totally unsuitable for analyzing timing in a general purpose OS like Windows. The reason is that unlike an analog system where samples can be measured as too short as well as too long, none of the samples you collect will ever be too short. So unlike the expected shape
the samples will fall into a pattern like this
with none of them being faster than your target time, most of them being slightly longer than your target time, and then a very long tail of outliers that are much longer than your target time. If you ever measure an interval smaller than your target, you should mistrust how you are measuring
The standard deviation calculation that you have done indicates that there is a wider spread of values in your second test case, but it doesn't tell anything more than that. A histogram will give much more insightful information.
Then you can look at what your code does or could do to handle the inevitable situation where the timings are distributed this way.
note that the same considerations also apply to interrupt latency and the 'turbo lag' caused by CPU power state changes; but those are orders of magnitude smaller units of time