XPERF and "DPC scheduling latency"

Tomas_Whitlock · March 9, 2016, 1:01pm

Hi all,

I created a thread a couple of months ago about a specific issue relating to DPC scheduling, and got some helpful replies to that. Apart getting an explanation about why the issue was occurring, I was told to use XPERF. That thread is here:

http://www.osronline.com/showthread.cfm?link=273859

So I’ve been gathering XPERF data using the “Latency” set of flags. The metric that I am particularly interested in is the real-time delay from KeInsertQueueDpc() to the DPC actually beginning to execute. I will call this delay the “DPC scheduling latency”. Unfortunately, the nearest thing that I can see, when visualizing the .etl file in the WPA GUI, is min / max / mean DPC execution time.

Is it possible to see DPC scheduling latency in the WPA GUI? I’m wondering if this requires us to create custom events and somehow match them up to DPC fragment begin events. Or maybe there is some transformation I can do on the .etl file to get what I want?

Am I going about this the wrong way?

The issue that I am trying to resolve is that sometimes, in a particular customer’s system (a 2-node NUMA system running Windows Server 2008 R2), it seems that a DPC scheduled by the driver’s ISR has DPC scheduling latency in the order of 5 - 15 ms (perhaps more), from KeInsertQueueDpc to actually getting going on a CPU. This happens only once in a blue moon, as in a few times per couple of hours’ running. The customer’s application is not robust to this, and their data capture device loses data due to FIFO overflow.

My current goal is to use XPERF to confirm that excessive DPC scheduling latency is really happening when data loss events occur.

regards
Tomas

Daniel_Terhell · March 10, 2016, 2:05am

> it seems that a DPC scheduled by the driver’s ISR has DPC scheduling

latency in the order of 5 - 15 ms (perhaps more), from KeInsertQueueDpc to
actually getting going on a CPU

XPERF won’t report that as far as I know. You could LatencyMon and configure
it to report “interrupt to DPC latency” but it won’t report the particular
DPC you are excecuting.

Alternatively, you could do the measuring yourself from within your code.

//Daniel

Tomas_Whitlock · March 10, 2016, 1:45pm

Daniel, thanks for the reply.

We already put a timestamping mechanism into the driver, which is what indicated that we (appear to) experience occasional excessive DPC scheduling latency. For this we use KeQueryPerformanceCounter. I had some reservations about using this for the purpose of timestamps, and I wondered if it might be subject to step changes on e.g. changes of processor power state. In that case, the excessive DPC scheduling latency might be an artifact of the behaviour of KeQueryPerformanceCounter. That’s why I was looking to confirm that what I am seeing is real using an approved method such as XPERF.

I’ve read that early multi-core CPUs had issues with TSC registers not always being synchronised across all cores or subject to change in frequency due to technologies such as speedstep, but I would imagine that such issues have long since been resolved. The customer’s system has fairly bleeding edge Intel Xeon processors in it, so I wouldn’t expect that to be an issue.

Do you have any recommendations for a method of measuring real-time intervals that is reliable and unlikely to introduce artifacts into measurements? The candidates seem to be

KeQueryPerformanceCounter
KeQueryInterruptTimePrecise (not available in Windows Server 2008 R2, though)

KeQueryUnbiasedInterruptTime would appear to be unsuitable for the purpose of measuring intervals of real time.

KeQueryInterruptTime appears to have a resolution equal to the system tick rate, so it’s too coarse to be much use.

regards
Tomas

MBond · March 10, 2016, 6:00pm

In a platform independent way, (ke)QueryPerformanceCounter is the obvious best choice. There is a much longer discussion here

https://msdn.microsoft.com/en-us/library/windows/desktop/dn553408(v=vs.85).aspx

but if you only care about one specific machine, you can through out most of the warnings

Sent from Mailhttps: for Windows 10

From: xxxxx@alpha-data.com mailto:xxxxx
Sent: March 10, 2016 1:43 PM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: RE:[ntdev] XPERF and “DPC scheduling latency”

Daniel, thanks for the reply.

We already put a timestamping mechanism into the driver, which is what indicated that we (appear to) experience occasional excessive DPC scheduling latency. For this we use KeQueryPerformanceCounter. I had some reservations about using this for the purpose of timestamps, and I wondered if it might be subject to step changes on e.g. changes of processor power state. In that case, the excessive DPC scheduling latency might be an artifact of the behaviour of KeQueryPerformanceCounter. That’s why I was looking to confirm that what I am seeing is real using an approved method such as XPERF.

I’ve read that early multi-core CPUs had issues with TSC registers not always being synchronised across all cores or subject to change in frequency due to technologies such as speedstep, but I would imagine that such issues have long since been resolved. The customer’s system has fairly bleeding edge Intel Xeon processors in it, so I wouldn’t expect that to be an issue.

Do you have any recommendations for a method of measuring real-time intervals that is reliable and unlikely to introduce artifacts into measurements? The candidates seem to be

- KeQueryPerformanceCounter
- KeQueryInterruptTimePrecise (not available in Windows Server 2008 R2, though)

KeQueryUnbiasedInterruptTime would appear to be unsuitable for the purpose of measuring intervals of real time.

KeQueryInterruptTime appears to have a resolution equal to the system tick rate, so it’s too coarse to be much use.

regards
Tomas

—
NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:></mailto:xxxxx></mailto:xxxxx></https:>

Daniel_Terhell · March 11, 2016, 4:35am

>I’ve read that early multi-core CPUs had issues with TSC registers not

always being synchronised across all cores

TSC synchronization is handled by the HAL, and the OS knows various policies
for this. For accurate measurements, it is suggested that you always measure
on the same processor.

or subject to change in frequency due to technologies such as speedstep,
but I would imagine that such issues have long since been resolved.

An invariant TSC means that it counts independent of the current CPU clock
speed. Since this can vary greatly, it’s suggested that you disable
TurboBoost in either the CMOS setup or power configuration settings.

//Daniel

Daniel_Terhell · March 11, 2016, 5:05am

>TSC synchronization is handled by the HAL, and the OS knows various

policies for this. For accurate measurements, it is suggested that you
always measure on the same processor.

It doesn’t apply to you in this case since the DPC runs on the same
processor it was requested on by default. Perhaps I should have posted this
to the other thread.

//Daniel

Tomas_Whitlock · March 11, 2016, 12:37pm

Thank you for your replies, Marion & Daniel.

I suppose now the next step is to use our homebrew DPC scheduling latency measurements to identify times at which excessive scheduling latency occurs, and check various things in the XPERF logs at those particular times. I’m thinking that I should look for:

DPCs that execute for long periods
Low-power processor states

Tomas

Scott_Noone_OSR · March 15, 2016, 11:58am

This is a good case for adding your own custom ETW events to the trace which
will then serve as a guide for where to look in WPA. Check out this article
for a description:

https://www.osr.com/nt-insider/2015-issue1/happiness-xperf/

-scott
OSR
@OSRDrivers

wrote in message news:xxxxx@ntdev…

Thank you for your replies, Marion & Daniel.

I suppose now the next step is to use our homebrew DPC scheduling latency
measurements to identify times at which excessive scheduling latency occurs,
and check various things in the XPERF logs at those particular times. I’m
thinking that I should look for:

DPCs that execute for long periods
Low-power processor states

Tomas