Performance Problem

Hi All,

We have written a virtual storport miniport driver in which we complete the SRBs from a system thread (PsCreateSystemThread). To test the driver we have a simple application that performs single threaded random read operations in overlapped mode with 1 request outstanding. We are using a multi-processor environment (with NUMA). The problem we are seeing is that the application performs badly when it runs on certain set of CPUs and quite well when it runs on the other set.

It is observed that the round trip time for every IO request in our driver is almost same in both the cases mentioned above. But the CPU utilization is lesser when the performance is bad. This seems to indicate that somewhere in the IO request handling path a sleep is getting induced. Is there any mechanism with which I can figure out in which part of the stack the sleep is getting induced?

Regards,
AY

A “sleep”? No, I don’t think so. We rarely put “sleep” statements in our driver code…

So… maybe I’m missing something (you’re really not providing us much specific information)… but could the performance issue you’re seeing not be related to the driver/system thread/something running on a different NUMA node than that on which the application data buffer is located?

Peter
OSR

Try to use IoMeter for testing. It is not perfect tool but it could give more option for testing. Also busTrace may help you. I believe, it shows “time to live” of issued requests.

Igor Sharovar

@Peter: What I really intended to convey was that somewhere in the IO path a “wait” is getting induced. As the average round-trip time of the SRB in our virtual miniport driver remains constant we are suspecting that a “wait” is getting induced at one of the higher layers (Storport, Disk, NTFS, etc…).

@Igor: We initially tested with 64-bit IOMeter only. As it was non-trivial to set the CPU affinity mask for the IOMeter threads we ended up writing our own application. The application we have written uses IO completion ports. Even with IOMeter the performance variation was seen.

Will try out busTrace to see if we can unearth anything.