Question on using ETW for measuring DPC routine execution time

Hello.

I hope I’m not too far off topic here, as I will be asking a question on ETW in user mode, but since my question refers to measuring DPC routine execution times, I hope my question will be tolerated. Although I’ve been using WPP tracing for years, I’m very new to ETW. I would like to measure the maximum DPC routine execution time using ETW in user mode during a given period. Of course, advanced users can use xperf, or use other applications like dpclat and dpcmon. We do not intend to try and replace those last two applications, just have, if possible, a measurement of the last maximum DPC execution time during the measurement period. I’m not sure if this could be easily done, but I have found a DPC MOF class which looks interesting:

[Dynamic,
Description(“DPC”) : amended,
EventType{66, 68, 69},
EventTypeName{“ThreadedDPC”, “DPC”, “TimerDPC”} : amended
]
class DPC:PerfInfo
{
[WmiDataId(1),
Description(“Initial Time”) : amended,
extension(“WmiTime”),
read]
object InitialTime;
[WmiDataId(2),
Description(“Routine”) : amended,
pointer,
read]
uint32 Routine;
};

So far, by adapting MS sample ETW applications, I’ve been able to extract the member “Routine”, and the member “InitialTime”, which I gather is the 64 bit time since January 1, 1601, in 100 nanosecond units. The documentation says this is when the DPC has been entered by the routine in question. There is also a timestamp in the EVENT_RECORD structure, which represents when the trace was taken; it too, I gather, is the 64 bit time since January 1, 1601, in 100 nanosecond units. I could subtract the measured DPC entry time from the timestamp in the EVENT_RECORD structure, but that only gives part of the current routine’s DPC execution time. How could I get that routine’s full DPC execution time? Is another approach necessary? For the time being, I’m not considering threaded DPCs. Any help/insights would be appreciated.

Thank you,
Philip Lukidis

The numbers that you will see can be either performance counter, system timer or CPU cycle counter depending on the (PEVENT_TRACE_PROPERTIES) properties->WNode.ClientContext passed to StartTrace. I believe you need to use a ClientContext of 3 to be able to subtract the initial time from the time stamp provided.

//Daniel

“Philip Lukidis” wrote in message news:xxxxx@ntdev…
Hello.

I hope I’m not too far off topic here, as I will be asking a question on ETW in user mode, but since my question refers to measuring DPC routine execution times, I hope my question will be tolerated. Although I’ve been using WPP tracing for years, I’m very new to ETW. I would like to measure the maximum DPC routine execution time using ETW in user mode during a given period. Of course, advanced users can use xperf, or use other applications like dpclat and dpcmon. We do not intend to try and replace those last two applications, just have, if possible, a measurement of the last maximum DPC execution time during the measurement period. I’m not sure if this could be easily done, but I have found a DPC MOF class which looks interesting:

[Dynamic,

Description(“DPC”) : amended,

EventType{66, 68, 69},

EventTypeName{“ThreadedDPC”, “DPC”, “TimerDPC”} : amended

]

class DPC:PerfInfo

{

[WmiDataId(1),

Description(“Initial Time”) : amended,

extension(“WmiTime”),

read]

object InitialTime;

[WmiDataId(2),

Description(“Routine”) : amended,

pointer,

read]

uint32 Routine;

};

So far, by adapting MS sample ETW applications, I’ve been able to extract the member “Routine”, and the member “InitialTime”, which I gather is the 64 bit time since January 1, 1601, in 100 nanosecond units. The documentation says this is when the DPC has been entered by the routine in question. There is also a timestamp in the EVENT_RECORD structure, which represents when the trace was taken; it too, I gather, is the 64 bit time since January 1, 1601, in 100 nanosecond units. I could subtract the measured DPC entry time from the timestamp in the EVENT_RECORD structure, but that only gives part of the current routine’s DPC execution time. How could I get that routine’s full DPC execution time? Is another approach necessary? For the time being, I’m not considering threaded DPCs. Any help/insights would be appreciated.

Thank you,

Philip Lukidis

Hi,

Thank you for the reply, though I’m a bit confused. Are you saying that the “InitialTime” member is necessary in CPU ticks? I admit that I’ve been using ClientContext==1 (QPC), though I was not sure if that was the best option. In any case, doesn’t the CPU frequency change very often, making it unreliable? At least the documentation states so:

“CPU cycle counter. The CPU counter provides the highest resolution time stamp and is the least resource-intensive to retrieve. However, the CPU counter is unreliable and should not be used in production. For example, on some computers, the timers will change frequency due to thermal and power changes, in addition to stopping in some states.”
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364160(v=vs.85).aspx

Finally, is my approach viable, or would another approach be more effective (still using ETW from user mode)?

Thank you,
Philip Lukidis

From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@resplendence.com
Sent: June-16-14 3:43 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Question on using ETW for measuring DPC routine execution time

The numbers that you will see can be either performance counter, system timer or CPU cycle counter depending on the (PEVENT_TRACE_PROPERTIES) properties->WNode.ClientContext passed to StartTrace. I believe you need to use a ClientContext of 3 to be able to subtract the initial time from the time stamp provided.

//Daniel

“Philip Lukidis” > wrote in message news:xxxxx@ntdev…
Hello.

I hope I’m not too far off topic here, as I will be asking a question on ETW in user mode, but since my question refers to measuring DPC routine execution times, I hope my question will be tolerated. Although I’ve been using WPP tracing for years, I’m very new to ETW. I would like to measure the maximum DPC routine execution time using ETW in user mode during a given period. Of course, advanced users can use xperf, or use other applications like dpclat and dpcmon. We do not intend to try and replace those last two applications, just have, if possible, a measurement of the last maximum DPC execution time during the measurement period. I’m not sure if this could be easily done, but I have found a DPC MOF class which looks interesting:

[Dynamic,
Description(“DPC”) : amended,
EventType{66, 68, 69},
EventTypeName{“ThreadedDPC”, “DPC”, “TimerDPC”} : amended
]
class DPC:PerfInfo
{
[WmiDataId(1),
Description(“Initial Time”) : amended,
extension(“WmiTime”),
read]
object InitialTime;
[WmiDataId(2),
Description(“Routine”) : amended,
pointer,
read]
uint32 Routine;
};

So far, by adapting MS sample ETW applications, I’ve been able to extract the member “Routine”, and the member “InitialTime”, which I gather is the 64 bit time since January 1, 1601, in 100 nanosecond units. The documentation says this is when the DPC has been entered by the routine in question. There is also a timestamp in the EVENT_RECORD structure, which represents when the trace was taken; it too, I gather, is the 64 bit time since January 1, 1601, in 100 nanosecond units. I could subtract the measured DPC entry time from the timestamp in the EVENT_RECORD structure, but that only gives part of the current routine’s DPC execution time. How could I get that routine’s full DPC execution time? Is another approach necessary? For the time being, I’m not considering threaded DPCs. Any help/insights would be appreciated.

Thank you,
Philip Lukidis


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

>The CPU counter provides the highest resolution time stamp and is the least resource-intensive to retrieve

Some tests I did last year on the overhead of writing an ETW event said it took something like 0.8 uSec to log an event (on a couple year old SandyBridge class processor). This puts a ballpark limit on how much timing resolution you can get between ETW events. You could have an event provider who reports groups of finer granularity timing, like a single event when a request is completed, that contains the start timestamp and multiple intermediate step timestamps, and the completion timestamp (or arrays of these to reduce the ETW overhead even more). You could potentially take these compound events and after collection, reprocess them into a stream of simpler events, so you could use some of the current tools at resolutions you can’t directly achieve.

Jan

What I have seen, when I developed this years ago is that only if ClientContext is 3, the reported initial time and time stamp use the same time stamp counter so that you can calculate elapsed time.Perhaps this is dependent on the clock speed, then better disable turbo boost and variable CPU speed settings in power management.

Note that also the affair of having a ETW provider, controller and consumer can have its impact, it can cause lots of storage port interrupts to appear in case you are processing using a log file.

The interface, as I know it has a lot of quirks and missing documentation. Looks like it was designed for in-house use only.

//Daniel

“Philip Lukidis” wrote in message news:xxxxx@ntdev…
Are you saying that the “InitialTime” member is necessary in CPU ticks? I admit that I’ve been using ClientContext==1 (QPC), though I was not sure if that was the best option. In any case, doesn’t the CPU frequency change very often, making it unreliable? At least the documentation states so:

Hello,

Thanks for your reply (and to all other replies). I believe that I got it to work using QPC instead of the CPU cycle counter. I used the PROCESS_TRACE_MODE_RAW_TIMESTAMP flag for the ProcessTraceMode member of the EVENT_TRACE_LOGFILE structure (along with PROCESS_TRACE_MODE_REAL_TIME and PROCESS_TRACE_MODE_EVENT_RECORD). Then if ClientContext==1 (QPC), the values of the timestamp and InitialTime WMITIME member line up. I confirmed that I’m correct by first starting xperf, and then my test logging application, and the max delta and its associated routine address are identical in both. My application does not log to a file, but rather is a realtime consumer.
One more question if I may-in order to resolve the routine address to the module name, must I use AuxKlibQueryModuleInformation in the kernel (or any other kernel routine, please tell me if there is another), or can this be done in usermode using ETW or some other method?

Thanks again,
Philip Lukidis

From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@resplendence.com
Sent: June-17-14 3:16 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Re:Question on using ETW for measuring DPC routine execution time

What I have seen, when I developed this years ago is that only if ClientContext is 3, the reported initial time and time stamp use the same time stamp counter so that you can calculate elapsed time.Perhaps this is dependent on the clock speed, then better disable turbo boost and variable CPU speed settings in power management.

Note that also the affair of having a ETW provider, controller and consumer can have its impact, it can cause lots of storage port interrupts to appear in case you are processing using a log file.

The interface, as I know it has a lot of quirks and missing documentation. Looks like it was designed for in-house use only.

//Daniel

“Philip Lukidis” > wrote in message news:xxxxx@ntdev…
Are you saying that the “InitialTime” member is necessary in CPU ticks? I admit that I’ve been using ClientContext==1 (QPC), though I was not sure if that was the best option. In any case, doesn’t the CPU frequency change very often, making it unreliable? At least the documentation states so:


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Daniel Terhell wrote:

Perhaps this is dependent on the clock speed, then better
disable turbo boost and variable CPU speed settings in
power management.

TSC has been fixed at a constant value irrespective of SpeedStep/turbo boost and other features for quite some time now (since approx. Core 2 Duo days I think). Look for constant_tsc in /proc/cpuinfo CPU flags.

>TSC has been fixed at a constant value irrespective of SpeedStep/turbo

boost and other features for quite some time now (since >approx. Core 2 Duo
days I think). Look for constant_tsc in /proc/cpuinfo CPU flags.

I’m aware of that, that’s why I said perhaps. The warning btw comes here:
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364160(v=vs.85).aspx

But it doesn’t make quite clear what these different methods of measuring
actually boil down to.

//Daniel