Implementing a sampling based profiler using WDF

David_Yeager · July 20, 2010, 3:39pm

I am trying to write my own sampling based profiler for windows. It needs to be very low overhead and provide instruction level profiling granularity. I believe these are some of the things it must do in order to work:

Register an interrupt that will be generated by each CPU’s local APIC timer.
Use an ISR that will determine the RIP of the thread that was interrupted, and the PID of that thread (maybe even through a DPC).
Communicate that information to a user mode client program.

Here are my main questions:

Is this possible to do under KMDF? If so, would you be able to point me to an example?
If it is possible, then do I need to modify the interrupt descriptor table of each CPU directly? If so then is that even possible on 64-bit windows? If not then how else will I be able to register an interrupt service routine that gets called immediately after the interrupt occurs such that the interrupted thread’s stack contains the RIP in a fixed location? From what I understand, using the standard WdfInterruptCreate() technique results in windows traversing a list of ISRs before calling the one you registered when servicing the interrupt, which I can imagine will modify the stack such that the interrupted RIP will be impossible to find. If this is not an issue and the WdfInterruptCreate() technique should be used, how can I obtain the interrupt vector that I should register with the local APIC timer as well?

Thanks,
David

Gary_Little-3 · July 20, 2010, 4:08pm

I don’t think this is possible at any layer other than the HAL.

Gary G. Little
Sent via HTC Diamond on Sprint.

-----Original Message-----
From: xxxxx@gmail.com
Sent: Tuesday, 20 July, 2010 02:39 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Implementing a sampling based profiler using WDF

I am trying to write my own sampling based profiler for windows. It needs to be very low overhead and provide instruction level profiling granularity. I believe these are some of the things it must do in order to work:

1) Register an interrupt that will be generated by each CPU’s local APIC timer.
2) Use an ISR that will determine the RIP of the thread that was interrupted, and the PID of that thread (maybe even through a DPC).
3) Communicate that information to a user mode client program.

Here are my main questions:

1) Is this possible to do under KMDF? If so, would you be able to point me to an example?

2) If it is possible, then do I need to modify the interrupt descriptor table of each CPU directly? If so then is that even possible on 64-bit windows? If not then how else will I be able to register an interrupt service routine that gets called immediately after the interrupt occurs such that the interrupted thread’s stack contains the RIP in a fixed location? From what I understand, using the standard WdfInterruptCreate() technique results in windows traversing a list of ISRs before calling the one you registered when servicing the interrupt, which I can imagine will modify the stack such that the interrupted RIP will be impossible to find. If this is not an issue and the WdfInterruptCreate() technique should be used, how can I obtain the interrupt vector that I should register with the local APIC timer as well?

Thanks,
David

—
NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

David_Yeager · July 20, 2010, 5:19pm

Thanks for the response Gary. Is there a development kit for working with the HAL? I can’t seem to find any material on how to do that.

However I do think a driver is the place to do this for the following reasons.

I noticed that my installation of Intel VTune for 64-bit Windows comes with a single driver named sepdrv.sys, and a quick search shows that Intel refers to it as the “Sampling driver”:

http://software.intel.com/en-us/articles/intel-vtune-performance-analyzer-for-windows-known-issues-on-windows-vista-and-windows-longhorn-server-systems/

Here’s an older paper by Intel about the implementation of VTune. It describes this technique of programming the local APIC and registering an ISR for it using a WDM driver to perform this sampling. However they don’t actually show the details of how it was done:
http://www.computer.org/portal/web/csdl/doi/10.1109/RTTAS.2002.1137387

Thanks,
David

Don_Burn_1 · July 20, 2010, 5:42pm

I don’t know if it is still the case but the actual timer sampling was
done by a set of ZwXxxProfile calls from user space. I believe Vtune
used a driver for additional data collection, not the primary sampling,
but it has been a long time since I used it so I could be wrong. If you
can get Gary Nebbett’s book “Windows NT/2000 Native API Reference” he
has a sample program for profiling the kernel.

Don Burn (MVP, Windows DKD)
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr

-----Original Message-----
From: xxxxx@gmail.com [mailto:xxxxx@gmail.com]
Posted At: Tuesday, July 20, 2010 5:18 PM
Posted To: ntdev
Conversation: Implementing a sampling based profiler using WDF
Subject: RE: Implementing a sampling based profiler using WDF

Thanks for the response Gary. Is there a development kit for working
with the
HAL? I can’t seem to find any material on how to do that.

However I do think a driver is the place to do this for the following
reasons.

I noticed that my installation of Intel VTune for 64-bit Windows
comes with
a single driver named sepdrv.sys, and a quick search shows that Intel
refers
to it as the “Sampling driver”:

http://software.intel.com/en-us/articles/intel-vtune-performance-analyze
r-for-

windows-known-issues-on-windows-vista-and-windows-longhorn-server-system
s/

Here’s an older paper by Intel about the implementation of VTune.
It
describes this technique of programming the local APIC and registering
an ISR
for it using a WDM driver to perform this sampling. However they don’t
actually show the details of how it was done:
http://www.computer.org/portal/web/csdl/doi/10.1109/RTTAS.2002.1137387

Thanks,
David

__________ Information from ESET Smart Security, version of virus
signature
database 5296 (20100720) __________

The message was checked by ESET Smart Security.

http://www.eset.com

OSR_Community_User · July 20, 2010, 11:59pm

You can enable time based samping through ETW (specify EVENT_TRACE_FLAG_PROFILE when starting “NT Kernel Logger” http://msdn.microsoft.com/en-us/library/aa363784(VS.85).aspx). You can also enable callstack collection on this event. This is how Windows Performance Toolkit gets CPU sampling data.
Thanks,
Alex

From: xxxxx@lists.osr.com [xxxxx@lists.osr.com] on behalf of Don Burn [xxxxx@acm.org]
Sent: Tuesday, July 20, 2010 2:41 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Implementing a sampling based profiler using WDF

I don’t know if it is still the case but the actual timer sampling was
done by a set of ZwXxxProfile calls from user space. I believe Vtune
used a driver for additional data collection, not the primary sampling,
but it has been a long time since I used it so I could be wrong. If you
can get Gary Nebbett’s book “Windows NT/2000 Native API Reference” he
has a sample program for profiling the kernel.

Don Burn (MVP, Windows DKD)
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr

-----Original Message-----
From: xxxxx@gmail.com [mailto:xxxxx@gmail.com]
Posted At: Tuesday, July 20, 2010 5:18 PM
Posted To: ntdev
Conversation: Implementing a sampling based profiler using WDF
Subject: RE: Implementing a sampling based profiler using WDF

Thanks for the response Gary. Is there a development kit for working
with the
HAL? I can’t seem to find any material on how to do that.

However I do think a driver is the place to do this for the following
reasons.

I noticed that my installation of Intel VTune for 64-bit Windows
comes with
a single driver named sepdrv.sys, and a quick search shows that Intel
refers
to it as the “Sampling driver”:

http://software.intel.com/en-us/articles/intel-vtune-performance-analyze
r-for-

windows-known-issues-on-windows-vista-and-windows-longhorn-server-system
s/

Here’s an older paper by Intel about the implementation of VTune.
It
describes this technique of programming the local APIC and registering
an ISR
for it using a WDM driver to perform this sampling. However they don’t
actually show the details of how it was done:
http://www.computer.org/portal/web/csdl/doi/10.1109/RTTAS.2002.1137387

Thanks,
David

__________ Information from ESET Smart Security, version of virus
signature
database 5296 (20100720) __________

The message was checked by ESET Smart Security.

http://www.eset.com

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Maxim_S_Shatskih · July 21, 2010, 1:44am

> 1) Register an interrupt that will be generated by each CPU’s local APIC timer.

It is already here in the kernel, use ZwXxxProfile calls.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

David_Yeager · July 21, 2010, 3:01pm

ZwXxxProfile commands seems to be doing what I want to a certain extent. I think it’ll suffice for now. That example in “Windows NT/2000 Native API Reference” is quite useful. Thanks everyone for all the help.