What is the proper way of benchmarking functions and code blocks in Kernel? (For time consumption)

Hi,

We want to benchmark some of our functions and some of their code blocks (example: some of the for/while loops) in terms of how much time they consume in a long period for example 1-2 hours. Thus minor errors such as some 2-3 ms up or down doesn’t really matter. Target system is Windows 10+.

So i wanted to ask the experts here, how do you guys usually do it? Which APIs do you use and how do you implement it to get a proper and quality summary report of functions and the target code blocks inside each of the functions? Is there any source code based tool available that helps with this task?

Note that our drivers are are all based on c, thus we cannot use a class here (such as a timer class).

Just a sidenote:

There is a great Youtube video by a young fella that teaches how to do this using c++ in usermode, Which generates a very professional visual benchmark. Title of video: VISUAL BENCHMARKING in C++ (how to measure performance visually).

Without seeming presumptuous, if you need to mention that your C code can’t consume a C++ class (which it can, but probably shouldn’t) then you need to do some more research before you can solve your problem well. Also, 1-2 hours is a short time and not a long one

Ultimately, you have two options: 1) using a tool to profile the performance of your code; or 2) implementing profiling within your code.

Normally method 1 is used and typical tools include the Intel vTune suite (or whatever the current branding name is) for software profiling; and various hardware tools for lower level analysis

But when the performance of different parts of your application can depend on more dynamic factors, including self-profiling within code can be a very important feature. This can be very complex to implement properly. SQL query plans are a good UM example - They include factors like IO latency, lock contention (generally called latch wait), and logical contention (generally called lock wait) and more. As you can imagine there is no API to call to generate some report like this; even as difficult to understand as they notoriously are. Profiling in distributed systems is even more complex, but In any case, you start with the performance counters built into the CPU / motherboard.