I created a thread a couple of months ago about a specific issue relating to DPC scheduling, and got some helpful replies to that. Apart getting an explanation about why the issue was occurring, I was told to use XPERF. That thread is here:
So I’ve been gathering XPERF data using the “Latency” set of flags. The metric that I am particularly interested in is the real-time delay from KeInsertQueueDpc() to the DPC actually beginning to execute. I will call this delay the “DPC scheduling latency”. Unfortunately, the nearest thing that I can see, when visualizing the .etl file in the WPA GUI, is min / max / mean DPC execution time.
Is it possible to see DPC scheduling latency in the WPA GUI? I’m wondering if this requires us to create custom events and somehow match them up to DPC fragment begin events. Or maybe there is some transformation I can do on the .etl file to get what I want?
Am I going about this the wrong way?
The issue that I am trying to resolve is that sometimes, in a particular customer’s system (a 2-node NUMA system running Windows Server 2008 R2), it seems that a DPC scheduled by the driver’s ISR has DPC scheduling latency in the order of 5 - 15 ms (perhaps more), from KeInsertQueueDpc to actually getting going on a CPU. This happens only once in a blue moon, as in a few times per couple of hours’ running. The customer’s application is not robust to this, and their data capture device loses data due to FIFO overflow.
My current goal is to use XPERF to confirm that excessive DPC scheduling latency is really happening when data loss events occur.
> it seems that a DPC scheduled by the driver’s ISR has DPC scheduling
latency in the order of 5 - 15 ms (perhaps more), from KeInsertQueueDpc to
actually getting going on a CPU
XPERF won’t report that as far as I know. You could LatencyMon and configure
it to report “interrupt to DPC latency” but it won’t report the particular
DPC you are excecuting.
Alternatively, you could do the measuring yourself from within your code.
We already put a timestamping mechanism into the driver, which is what indicated that we (appear to) experience occasional excessive DPC scheduling latency. For this we use KeQueryPerformanceCounter. I had some reservations about using this for the purpose of timestamps, and I wondered if it might be subject to step changes on e.g. changes of processor power state. In that case, the excessive DPC scheduling latency might be an artifact of the behaviour of KeQueryPerformanceCounter. That’s why I was looking to confirm that what I am seeing is real using an approved method such as XPERF.
I’ve read that early multi-core CPUs had issues with TSC registers not always being synchronised across all cores or subject to change in frequency due to technologies such as speedstep, but I would imagine that such issues have long since been resolved. The customer’s system has fairly bleeding edge Intel Xeon processors in it, so I wouldn’t expect that to be an issue.
Do you have any recommendations for a method of measuring real-time intervals that is reliable and unlikely to introduce artifacts into measurements? The candidates seem to be
KeQueryPerformanceCounter
KeQueryInterruptTimePrecise (not available in Windows Server 2008 R2, though)
KeQueryUnbiasedInterruptTime would appear to be unsuitable for the purpose of measuring intervals of real time.
KeQueryInterruptTime appears to have a resolution equal to the system tick rate, so it’s too coarse to be much use.
From: xxxxx@alpha-data.commailto:xxxxx Sent: March 10, 2016 1:43 PM To: Windows System Software Devs Interest Listmailto:xxxxx Subject: RE:[ntdev] XPERF and “DPC scheduling latency”
Daniel, thanks for the reply.
We already put a timestamping mechanism into the driver, which is what indicated that we (appear to) experience occasional excessive DPC scheduling latency. For this we use KeQueryPerformanceCounter. I had some reservations about using this for the purpose of timestamps, and I wondered if it might be subject to step changes on e.g. changes of processor power state. In that case, the excessive DPC scheduling latency might be an artifact of the behaviour of KeQueryPerformanceCounter. That’s why I was looking to confirm that what I am seeing is real using an approved method such as XPERF.
I’ve read that early multi-core CPUs had issues with TSC registers not always being synchronised across all cores or subject to change in frequency due to technologies such as speedstep, but I would imagine that such issues have long since been resolved. The customer’s system has fairly bleeding edge Intel Xeon processors in it, so I wouldn’t expect that to be an issue.
Do you have any recommendations for a method of measuring real-time intervals that is reliable and unlikely to introduce artifacts into measurements? The candidates seem to be
- KeQueryPerformanceCounter - KeQueryInterruptTimePrecise (not available in Windows Server 2008 R2, though)
KeQueryUnbiasedInterruptTime would appear to be unsuitable for the purpose of measuring intervals of real time.
KeQueryInterruptTime appears to have a resolution equal to the system tick rate, so it’s too coarse to be much use.
>I’ve read that early multi-core CPUs had issues with TSC registers not
always being synchronised across all cores
TSC synchronization is handled by the HAL, and the OS knows various policies
for this. For accurate measurements, it is suggested that you always measure
on the same processor.
or subject to change in frequency due to technologies such as speedstep,
but I would imagine that such issues have long since been resolved.
An invariant TSC means that it counts independent of the current CPU clock
speed. Since this can vary greatly, it’s suggested that you disable
TurboBoost in either the CMOS setup or power configuration settings.
>TSC synchronization is handled by the HAL, and the OS knows various
policies for this. For accurate measurements, it is suggested that you
always measure on the same processor.
It doesn’t apply to you in this case since the DPC runs on the same
processor it was requested on by default. Perhaps I should have posted this
to the other thread.
I suppose now the next step is to use our homebrew DPC scheduling latency measurements to identify times at which excessive scheduling latency occurs, and check various things in the XPERF logs at those particular times. I’m thinking that I should look for:
This is a good case for adding your own custom ETW events to the trace which
will then serve as a guide for where to look in WPA. Check out this article
for a description:
I suppose now the next step is to use our homebrew DPC scheduling latency
measurements to identify times at which excessive scheduling latency occurs,
and check various things in the XPERF logs at those particular times. I’m
thinking that I should look for: