You are correct that performance engineering is not something you can do at the end of development.
Performance engineering also is not applying every possible optimization to your code. This is a waste of development time and increases risk by adding additional branches, additional code paths that need to be tested, and additional edge cases that you’ll never be able to exercise properly.
Performance engineering involves understanding the system at the larger scale, knowing where the bottlenecks are and measuring to determine where you are spending large amounts of time in critical paths. You can easily optimize the code and end up with no benefit if your design isn’t prepared to run any faster.
So you should ask what percentage of the entire operation is consumed in calling KeSetEvent. If it is noticible (which would mean the entire operation was short) then I would first ask “why are you doing this operation synchronously in the first place” before I would suggest such a small change as a “performance improvement”. For a short, critical path operation I would move the post processing into the completion routine, spending a little more time at dispatch level in trade for removing the overhead of a potential context switch for each I/O operation.
-p
-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Misha Karpin
Sent: Wednesday, March 08, 2006 4:03 PM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] Documentation Bug: "Handling IRPs: What Every Driver Writer Needs to
Slava Imameyev wrote:
I do not understand what did you mean in saying: “you must avoid calling KeSetEvent on IRQL==DISPATCh_LEVEL, as you would do while holding a lock.”
Why do you object to using KeSetEvent at the DISPATCH_LEVEL? Do you know another method for a synchronizing the completion routine with the thread which calls IoCallDriver and wants to wait for an Irp completion at IRQL <=APC_LEVEL? What is the method which does not use any syncronization object with DISPATCHER_HEADER and allows to reschedule the waiting thread?
Response:
“Managing Hardware Priorities” on the WDK: “Driver code that runs at IRQL > PASSIVE_LEVEL should execute as quickly as possible. The higher the IRQL at which a routine runs, the more important it is for good overall performance to tune that routine to execute as quickly as possible.”
KeSetEvent help on WDK: Calling KeSetEvent causes the event to attain a signaled state. If the event is a notification event, the system attempts satisfy as many waits as possible on the event object before clearing the event. If the event is a synchronization event, one wait is satisfied before the event is cleared.
This is, KeSetEvent routine will acquire dispatcher lock, which could be already acquired by another processor, therefore hurting the performance.
Performance optimization is about avoiding resource contention. Then KeSetEvent routine will satisfy a waiting thread readying it for execution.
I think all this is a lot of needless work in a synchronous IRP.
Of course, the use of KeSetEvent is needed for the asynchronous case.
Peter Wieland wrote:
Must is an awfully strong word.
For a driver which is going to handle requests once or twice a second this optimization wouldn’t even be noticable.
Personally i’d rather every driver be as simple as possible than every driver be as fast as possible. Simple causes less system crashes 
Response:
Sorry for my english. I agree with you, but I think that as simple as possible does not mean as simple as not causing crashes. Also I think a driver is a lot simpler if only executes necessary instructions. One of the most common performance pitfalls is doing needless work.
Slava Imameyev wrote:
I agree with you, especially because a time spent for a processing a single page fault may exceed all the performance gain.
Response:
It depends of the frecuency of the page faults and the routine being called.
Tim Roberts wrote:
Amen to that, or “word up”, as I hear the kids say. Most micro-optimizations, and many macro-optimizations, are misplaced. With some exceptions, CPUs are now infinitely fast and memory is infinitely large.
Build the product by the rules, and then use some tangible metric to decide whether it is “fast enough”. Only when it fails that test should one open a bag of tricks.
Response:
CPUs are not so infinitely fast, as now we have more and more processors in each computer. This is actually the reason because resource contention is so important. Let me mention Amhdal law?s:
P( n, h) = 1 / ( 1 + (n -1) * h) , where n = number of processors and h =
%time lock hold.
Now suppose a 32 processors computer, and the dispatcher lock acquired 1% of the time. That is P = 0,7633%. This is, 23% of the time each processor will be spinning for the lock. Then, I have a good question. How are you going to use any tangible metric under so many diffents loads and scenarios?. IMHO performance engineering is not something that you can add at the end of development, because optimization is not doing tricky arithmetic, tunning loops or using assembly language.
Thanks,
mK
Don’t just search. Find. Check out the new MSN Search!
http://search.msn.com/
Questions? First check the Kernel Driver FAQ at http://www.osronline.com/article.cfm?id=256
To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer