(Terribly) Degraded System Performance Running User Code

Mahmoud_Al-Qudsi · February 23, 2010, 11:12am

I have a program that loads a file (anywhere from 10MB to 5GB) a chunk at a time (ReadFile), and for each chunk performs a set of mathematical operations (basically calculates the hash).

After calculating the hash, it stores info about the chunk in an STL map (basically ) and then writes the chunk itself to another file (WriteFile).

That’s all it does. This program will cause certain PCs to choke and die. The mouse begins to stutter, the task manager takes > 2 min to show, ctrl+alt+del is unresponsive, running programs are slow… the works.

I’ve done literally everything I can think of to optimize the program, and have triple-checked all objects.

What I’ve done:

Tried different (less intensive) hashing algorithms.
Switched all allocations to nedmalloc instead of the default new operator
Switched from stl::map to unordered_set, found the performance to still be abysmal, so I switched again to Google’s dense_hash_map.
Converted all objects to store pointers to objects instead of the objects themselves.
Caching all Read and Write operations. Instead of reading a 16k chunk of the file and performing the math on it, I read 4MB into a buffer and read 16k chunks from there instead. Same for all write operations - they are coalesced into 4MB blocks before being written to disk.
Run extensive profiling with Visual Studio 2010, AMD Code Analyst, and perfmon.
Set the thread priority to THREAD_MODE_BACKGROUND_BEGIN
Set the thread priority to THREAD_PRIORITY_IDLE
Added a Sleep(100) call after every loop.
Even after all this, the application still results in a system-wide hang on certain machines under certain circumstances.

Perfmon and Process Explorer show minimal CPU usage (with the sleep), no constant reads/writes from disk, few hard pagefaults (and only ~30k pagefaults in the lifetime of the application on a 5GB input file), little virtual memory (never more than 150MB), no leaked handles, no memory leaks.

The machines I’ve tested it on run Windows XP - Windows 7, x86 and x64 versions included. None have less than 2GB RAM, though the problem is always exacerbated under lower memory conditions.

I’m at a loss as to what to do next. I don’t know what’s causing it - I’m torn between CPU or Memory as the culprit. CPU because without the sleep and under different thread priorities the system performances changes noticeably. Memory because there’s a huge difference in how often the issue occurs when using unordered_set vs Google’s dense_hash_map.

What’s really weird? Obviously, the NT kernel design is supposed to prevent this sort of behavior from ever occurring (a user-mode application driving the system to this sort of extreme poor performance!?)… but when I compile the code and run it on OS X or Linux (it’s fairly standard C++ throughout) it performs excellently even on poor machines with little RAM and weaker CPUs.

What am I supposed to do next? How do I know what the hell it is that Windows is doing behind the scenes that’s killing system performance, when all the indicators are that the application itself isn’t doing anything extreme?

Any advice would be most welcome.

OSR_Community_User · February 23, 2010, 11:28am

Could you post the code?
Could you post the total runtime for a specific file?
Could you narrow down what you mean by expressions like ‘certain PC’s,’ ‘how often the issue occurs’ et. c.?
What’s your target os?
Is this a debug or free build of your program? In particular, are you using a debug version of the crt?

mm

Mahmoud_Al-Qudsi · February 23, 2010, 11:43am

Hi mm, thanks for your quick reply. I’ve done a lot of kernel-level work in the past that should have been more complicated than this, but there’s something about this library that’s driving Windows nuts!

Unfortunately not - I don’t own the rights to it, it’s something I’m developing for our company which specializes in backup software (Genie-Soft).
Initial analysis of 3.00GB file takes ~1:30 (mm:ss), comparing it verses, say, a 2.32GB file takes ~4:12 (mm:ss) due to the additional work involved. (These are the x64 and x86 Windows 7 DVDs)
Certain PCs means that my work PC which is a 3.0GHz Core 2 Duo w/ 4GB RAM experiences the problem, my home PC with a 2.4GHz Core 2 Duo w/ 4GB RAM does not. The QA department has issues on PCs running XP through 7, in both x86 and x64 flavors. Have tested it on both IDE and SATA VelociRaptor disks, too.
Windows XP+, but I’ve built it to run on OS X Snow Leopard on my 1.5 year-old MacBook (2.4GHz) and I cannot even feel its effect while it runs. Clearly something in the NT department rather than the code itself, but not sure what?
Both. Debug versions take longer, but both affect the system equally. I’m statically linking the appropriate Debug/Release CRT library with the app, and in all cases system performance is severely degraded on these particular machines.

Prokash_Sinha-1 · February 23, 2010, 12:49pm

>

Certain PCs means that my work PC which is a 3.0GHz Core 2 Duo w/ 4GB RAM experiences the problem, my home PC with a 2.4GHz Core 2 Duo w/ 4GB RAM does not. The QA department has issues on PCs running XP through 7, in both x86 and x64 flavors. Have tested it on both IDE and SATA VelociRaptor disks, too.

How about making sure what updates are applied to machine that is not performing, or apply online update and reboot. I’ve seen problems like this in the past, if reboot was required and we did not reboot, to make couple more runs …

-pro

Igor_Sharovar · February 23, 2010, 12:56pm

I could suggest to you to try running a check build version of Windows with attached WinDbg. On WinDbg you may see additional messages from different Windows kernel components which you could not see on a free build. Running Driver verifier would be a good idea too.
You also need to narrow your problem. Try to perform your math without file access operations. Just emulate Read and Write File by getting and writing data from allocated memory. If this could work fine concentrate on file access operations. I would suggest to try using different hardware configuration, especially different hard drives, not only different version of Windows.

Igor Sharovar

OSR_Community_User · February 23, 2010, 4:02pm

Several issues.

If this is a pure application, the only way you can suck down enough time to
make the overall system unresponsive is to be running it at a raised
priority. Windows makes no attempt at “fairness” and the closest it comes
to antistarvation is the Balance Set Manager, which gives you about a 3-4
second lag (read Windows Internals for the gory details of the computation,
which is based on several system timing parameters). Setting the priority
to idle means that your app is now the one most likely to be starved,
raising its overall throughput delay.

You can’t compare the time of reading file A with the time of reading file B
based solely on file size; you need know the fragmentation of the two files
and how many seek operations were required to get each one in.

The most common failure mode is people who try to “optimize the hashing
algorithm” without actually determining if it IS the bottleneck. This
pointless flailing about in the hope of doing something that might improve
performance is like trying to find your way downtown from the suburbs by
tossing a coin at every intersection and hoping that you eventually find
yourself downtown.

If you have not gathered detailed performance numbers, you are just wasting
your time. Everything you have done indicates a futile attempt to guess at
what is going on in the total absence of any real performance data.

I did performance measurement for 15 years, and I know one thing: if you ask
someone where the time is going in their program, you will get the wrong
answer! I never found an exception in all those years (including my own
code!)

Without data, you can’t optimize.

Putting sleep calls in is generally silly. That is, the app is too slow, so
let’s artificially make it slower. It may even raise the chances that some
of your pages might be paged out, because once it comes out of sleep, it
just becomes another set of threads in the queue, and could end up behind a
lot of others. It certainly will if you set the priority low.

My first suspect would be page faults, but you’ve pretty much disproven
that. Note that allocation in the debug version of the application runtimes
is VASTLY more expensive than in the release versions, and most data you get
on performance in a debug runtime is totally and completely useless in
predicting actual performance. So if you are allocating buffers for each
I/O, it’s going to kill you. If you allocate one buffer, once, and it is
only 4MB, it probably isn’t paging.

If performance data shows minimal CPU utilization, why did you waste time
redoing hash tables? This is classic timewasting “optimization”. You have
the data that says computations don’t matter, so the approach you take is to
optimize the computations?

The PGO capabilities (Profile-Guided Optimization) can tell you how much
time you are spending computing in the kernel and how much time you are
spending computing in user space. I think it uses the performance counters
that already exist.

When I have performance problems, the LAST thing I think about is rewriting
algorithms. The FIRST thing I think about is getting REAL DATA to tell me
where the time is going, so I don’t foolishly waste my time optimizing
things that never mattered at all.

I agree that the kernel should not be driving the system into catatonia.
However, Microsoft has not been known for building the best device drivers;
in an infamous problem some years ago, they had one that gratuitously
introduced a 1.5 SECOND dead time into disk I/O (something about ATAPI
protocols) but that was fixed in XP. It doesn’t mean it couldn’t happen
again, particularly if there are third-party drivers involved.

What does diskperf tell you about I/O times? That’s the next thing I’d try.
Also, there are some performance counters for the amount of time spent at
DPC level, which is going to impact the perceived response time. If you see
a high percentage of time at DPC level, it would be worthwhile to figure out
who is doing it.
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@NeoSmart.net
Sent: Tuesday, February 23, 2010 11:13 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] (Terribly) Degraded System Performance Running User Code

I have a program that loads a file (anywhere from 10MB to 5GB) a chunk at a
time (ReadFile), and for each chunk performs a set of mathematical
operations (basically calculates the hash).

After calculating the hash, it stores info about the chunk in an STL map
(basically ) and then writes the chunk itself to another file
(WriteFile).

That’s all it does. This program will cause certain PCs to choke and die.
The mouse begins to stutter, the task manager takes > 2 min to show,
ctrl+alt+del is unresponsive, running programs are slow… the works.

I’ve done literally everything I can think of to optimize the program, and
have triple-checked all objects.

What I’ve done:

Tried different (less intensive) hashing algorithms.
Switched all allocations to nedmalloc instead of the default new operator
Switched from stl::map to unordered_set, found the performance to still be
abysmal, so I switched again to Google’s dense_hash_map.
Converted all objects to store pointers to objects instead of the objects
themselves.
Caching all Read and Write operations. Instead of reading a 16k chunk of the
file and performing the math on it, I read 4MB into a buffer and read 16k
chunks from there instead. Same for all write operations - they are
coalesced into 4MB blocks before being written to disk.
Run extensive profiling with Visual Studio 2010, AMD Code Analyst, and
perfmon.
Set the thread priority to THREAD_MODE_BACKGROUND_BEGIN
Set the thread priority to THREAD_PRIORITY_IDLE
Added a Sleep(100) call after every loop.
Even after all this, the application still results in a system-wide hang on
certain machines under certain circumstances.

Perfmon and Process Explorer show minimal CPU usage (with the sleep), no
constant reads/writes from disk, few hard pagefaults (and only ~30k
pagefaults in the lifetime of the application on a 5GB input file), little
virtual memory (never more than 150MB), no leaked handles, no memory leaks.

The machines I’ve tested it on run Windows XP - Windows 7, x86 and x64
versions included. None have less than 2GB RAM, though the problem is always
exacerbated under lower memory conditions.

I’m at a loss as to what to do next. I don’t know what’s causing it - I’m
torn between CPU or Memory as the culprit. CPU because without the sleep and
under different thread priorities the system performances changes
noticeably. Memory because there’s a huge difference in how often the issue
occurs when using unordered_set vs Google’s dense_hash_map.

What’s really weird? Obviously, the NT kernel design is supposed to prevent
this sort of behavior from ever occurring (a user-mode application driving
the system to this sort of extreme poor performance!?)… but when I
compile the code and run it on OS X or Linux (it’s fairly standard C++
throughout) it performs excellently even on poor machines with little RAM
and weaker CPUs.

What am I supposed to do next? How do I know what the hell it is that
Windows is doing behind the scenes that’s killing system performance, when
all the indicators are that the application itself isn’t doing anything
extreme?

Any advice would be most welcome.

—
NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

–
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

anton_bassov · February 23, 2010, 4:46pm

What you should do here is to try FILE_FLAG_NO_BUFFERING flag in CreateFile() call. It is well - known fact that buffered reads/writes of huge amounts of data may have very negative impact on general OS
performance due to the synchronous nature of MM’s operations, and I suspect this is exactly what happens in your particular case…

What’s really weird? Obviously, the NT kernel design is supposed to prevent this sort of behavior
from ever occurring (a user-mode application driving the system to this sort
of extreme poor performance!?) but when I compile the code and run it on OS X
or Linux (it’s fairly standard C++ throughout) it performs excellently even on poor machines with
little RAM and weaker CPUs.

I don’t want to start yet another “Windows-bashing cycle” on NTDEV. Therefore, if you are interested in why, IMHO, the whole thing is designed unreasonably from the ground up, you can post your question to NTTALK…

Anton Bassov

Alex_Grig · February 23, 2010, 9:53pm

The first suspect would be an antivirus.

Then file cache bloat. Try to use either FILE_FLAG_NO_BUFFERING, or
FILE_FLAG_SEQUENTIAL_SCAN.

wrote in message news:xxxxx@ntdev…
>I have a program that loads a file (anywhere from 10MB to 5GB) a chunk at a
>time (ReadFile), and for each chunk performs a set of mathematical
>operations (basically calculates the hash).
>
> After calculating the hash, it stores info about the chunk in an STL map
> (basically ) and then writes the chunk itself to another
> file (WriteFile).
>
> That’s all it does. This program will cause certain PCs to choke and die.
> The mouse begins to stutter, the task manager takes > 2 min to show,
> ctrl+alt+del is unresponsive, running programs are slow… the works.
>
> I’ve done literally everything I can think of to optimize the program, and
> have triple-checked all objects.
>
> What I’ve done:
>
> Tried different (less intensive) hashing algorithms.
> Switched all allocations to nedmalloc instead of the default new operator
> Switched from stl::map to unordered_set, found the performance to still be
> abysmal, so I switched again to Google’s dense_hash_map.
> Converted all objects to store pointers to objects instead of the objects
> themselves.
> Caching all Read and Write operations. Instead of reading a 16k chunk of
> the file and performing the math on it, I read 4MB into a buffer and read
> 16k chunks from there instead. Same for all write operations - they are
> coalesced into 4MB blocks before being written to disk.
> Run extensive profiling with Visual Studio 2010, AMD Code Analyst, and
> perfmon.
> Set the thread priority to THREAD_MODE_BACKGROUND_BEGIN
> Set the thread priority to THREAD_PRIORITY_IDLE
> Added a Sleep(100) call after every loop.
> Even after all this, the application still results in a system-wide hang
> on certain machines under certain circumstances.
>
> Perfmon and Process Explorer show minimal CPU usage (with the sleep), no
> constant reads/writes from disk, few hard pagefaults (and only ~30k
> pagefaults in the lifetime of the application on a 5GB input file), little
> virtual memory (never more than 150MB), no leaked handles, no memory
> leaks.
>
> The machines I’ve tested it on run Windows XP - Windows 7, x86 and x64
> versions included. None have less than 2GB RAM, though the problem is
> always exacerbated under lower memory conditions.
>
> I’m at a loss as to what to do next. I don’t know what’s causing it - I’m
> torn between CPU or Memory as the culprit. CPU because without the sleep
> and under different thread priorities the system performances changes
> noticeably. Memory because there’s a huge difference in how often the
> issue occurs when using unordered_set vs Google’s dense_hash_map.
>
> What’s really weird? Obviously, the NT kernel design is supposed to
> prevent this sort of behavior from ever occurring (a user-mode application
> driving the system to this sort of extreme poor performance!?)… but
> when I compile the code and run it on OS X or Linux (it’s fairly standard
> C++ throughout) it performs excellently even on poor machines with little
> RAM and weaker CPUs.
>
> What am I supposed to do next? How do I know what the hell it is that
> Windows is doing behind the scenes that’s killing system performance, when
> all the indicators are that the application itself isn’t doing anything
> extreme?
>
> Any advice would be most welcome.
>

Peter_Viscarola_OSR · February 23, 2010, 10:48pm

Find out where the problem is: Eliminate everything but the file reading (no table storing, no calculations, nothing). Is that still slow?

Good suggestion.

ANOTHER excellent suggestion.

Nonsense. If none of the above work, write a simple application fragment that demonstrates the problem in the simplest possible way. I know, I know… you think you’ll be wasting your time. But by doing this, you’ll either discover the cause of your problem, or you’ll enable lots of people to work on your problem in parallel.

Either way, you’ll move closer to the solution to your problem,

Peter
OSR

Maxim_S_Shatskih · February 24, 2010, 6:47am

> I’m at a loss as to what to do next.

Open all files in noncached mode. Do not use the OS’s cache.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Mahmoud_Al-Qudsi · February 24, 2010, 9:03am

* Tested the application on a checked build of Windows Server 2003 x64 with kd. Unfortunately, nothing untoward shows up in the messages, though the OS does slow (it’s running in VMware, but there’s distinct mouse stuttering within the virtual machine)

* I just tried FILE_FLAG_NO_BUFFERING: it improved the performance of the application but had no effect on the systemwide slowdown. Same with FILE_FLAG_SEQUENTIAL_SCAN.
* Antiviruses are not the problem. They result in some extra hard pagefaults and some additional I/O activity, but nothing major… and the problem persists without them, according to the past several hours of testing and measurement.

* Peter, I think you misunderstood. I have a standalone application that does the entire incremental backup, but I’m not at liberty to share the code itself. I have several other libraries that read huge files, stick data in an STL map, and perform some (admittedly little) hashing - they don’t have this problem. Unfortunately, I’m not at liberty to post the source code of the main algorithm which results in this stuttering. I *can* share binaries, but I don’t know if that’s OK with the list?

* Joseph: that’s why I’m so confused. I’ve tried both idle and background_mode, and yet the OS always suffers. I didn’t “randomly” choose to optimize the hashing algorithm: extensive profiling showed that most of the CPU time in my program went to the hashing process, and it took all of 30 minutes to swap out the hasher for another. Of course I don’t allocate a buffer per I/O transaction - I have two 4MB buffers (per thread, and there are as many threads as cores) one for read, and one for write. I do agree that the sleeps were useless, and have removed them.

Now per some advice from another mailing list, I found and installed xperf. With it, I’ve managed to find out some more info:
* The filesystem I/O cache for my reads & writes was useless, switching to no cache resulted in the expected mountain-valley disk activity for my process (before it was constantly high).
* Antivirus caused a couple of hard page faults, but nothing major.
* DPC CPU does not exceed 2.5% throughout.

One more thing: This happens regardless of the file size…

Alex_Grig · February 24, 2010, 9:52am

Is your application CPU-bound or IO-bound? What’s its CPU usage percent when
it’s running? What other processes consume CPU? Do you observe disk
thrashing when it works?

wrote in message news:xxxxx@ntdev…
>* Tested the application on a checked build of Windows Server 2003 x64 with
>kd. Unfortunately, nothing untoward shows up in the messages, though the OS
>does slow (it’s running in VMware, but there’s distinct mouse stuttering
>within the virtual machine)
>
> * I just tried FILE_FLAG_NO_BUFFERING: it improved the performance of the
> application but had no effect on the systemwide slowdown. Same with
> FILE_FLAG_SEQUENTIAL_SCAN.
> * Antiviruses are not the problem. They result in some extra hard
> pagefaults and some additional I/O activity, but nothing major… and the
> problem persists without them, according to the past several hours of
> testing and measurement.
>
> * Peter, I think you misunderstood. I have a standalone application that
> does the entire incremental backup, but I’m not at liberty to share the
> code itself. I have several other libraries that read huge files, stick
> data in an STL map, and perform some (admittedly little) hashing - they
> don’t have this problem. Unfortunately, I’m not at liberty to post the
> source code of the main algorithm which results in this stuttering. I
> can share binaries, but I don’t know if that’s OK with the list?
>
> * Joseph: that’s why I’m so confused. I’ve tried both idle and
> background_mode, and yet the OS always suffers. I didn’t “randomly” choose
> to optimize the hashing algorithm: extensive profiling showed that most of
> the CPU time in my program went to the hashing process, and it took all
> of 30 minutes to swap out the hasher for another. Of course I don’t
> allocate a buffer per I/O transaction - I have two 4MB buffers (per
> thread, and there are as many threads as cores) one for read, and one for
> write. I do agree that the sleeps were useless, and have removed them.
>
>
> Now per some advice from another mailing list, I found and installed
> xperf. With it, I’ve managed to find out some more info:
> * The filesystem I/O cache for my reads & writes was useless, switching to
> no cache resulted in the expected mountain-valley disk activity for my
> process (before it was constantly high).
> * Antivirus caused a couple of hard page faults, but nothing major.
> * DPC CPU does not exceed 2.5% throughout.
>
> One more thing: This happens regardless of the file size…
>

Mark_Roddy · February 24, 2010, 5:07pm

I think the OP insists that his application is not CPU bound and I
believe him. The suggestion to measure everything was the best one put
forth. (I have no idea why installing a checked OS was considered a
good way to isolate performance issues. Some of the suggestions here
appear to have been reflex actions in a kernel driver forum.)

I’d look at disk IO. How long are the disk queues? Measure the
response latency while your app is running compared to an otherwise
idle system. (Iometer is good for this.) Disk performance can vary
wildly between otherwise similar systems and some storage adapter
drivers and controllers are massive bottlenecks.

Mark Roddy

On Wed, Feb 24, 2010 at 9:51 AM, Alexander Grigoriev
wrote:
> Is your application CPU-bound or IO-bound? What’s its CPU usage percent when
> it’s running? What other processes consume CPU? Do you observe disk
> thrashing when it works?
>
> wrote in message news:xxxxx@ntdev…
>>* Tested the application on a checked build of Windows Server 2003 x64 with
>>kd. Unfortunately, nothing untoward shows up in the messages, though the OS
>>does slow (it’s running in VMware, but there’s distinct mouse stuttering
>>within the virtual machine)
>>
>> * I just tried FILE_FLAG_NO_BUFFERING: it improved the performance of the
>> application but had no effect on the systemwide slowdown. Same with
>> FILE_FLAG_SEQUENTIAL_SCAN.
>> * Antiviruses are not the problem. They result in some extra hard
>> pagefaults and some additional I/O activity, but nothing major… and the
>> problem persists without them, according to the past several hours of
>> testing and measurement.
>>
>> * Peter, I think you misunderstood. I have a standalone application that
>> does the entire incremental backup, but I’m not at liberty to share the
>> code itself. I have several other libraries that read huge files, stick
>> data in an STL map, and perform some (admittedly little) hashing - they
>> don’t have this problem. Unfortunately, I’m not at liberty to post the
>> source code of the main algorithm which results in this stuttering. I
>> can share binaries, but I don’t know if that’s OK with the list?
>>
>> * Joseph: that’s why I’m so confused. I’ve tried both idle and
>> background_mode, and yet the OS always suffers. I didn’t “randomly” choose
>> to optimize the hashing algorithm: extensive profiling showed that most of
>> the CPU time in my program went to the hashing process, and it took all
>> of 30 minutes to swap out the hasher for another. Of course I don’t
>> allocate a buffer per I/O transaction - I have two 4MB buffers (per
>> thread, and there are as many threads as cores) one for read, and one for
>> write. I do agree that the sleeps were useless, and have removed them.
>>
>>
>> Now per some advice from another mailing list, I found and installed
>> xperf. With it, I’ve managed to find out some more info:
>> * The filesystem I/O cache for my reads & writes was useless, switching to
>> no cache resulted in the expected mountain-valley disk activity for my
>> process (before it was constantly high).
>> * Antivirus caused a couple of hard page faults, but nothing major.
>> * DPC CPU does not exceed 2.5% throughout.
>>
>> One more thing: This happens regardless of the file size…
>>
>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>

Pavel_Lebedinsky · February 25, 2010, 4:52am

When you have an app that responds poorly (for example, task
manager doesn’t repaint every second, or something like that),
break into kd and check what the threads in this process are
doing - are they blocked on something, starved for CPU time,
etc. Try this several times to see if a pattern emerges. Use an
idle system for comparison.

Now per some advice from another mailing list, I found and installed
xperf.

With xperf you can do something called “wait analysis”:

http://social.msdn.microsoft.com/Forums/en-US/wptk_v4/thread/88dd7aca-d0ef-4c86-8806-245251c90c05

If the kernel debugger method (a.k.a. Control+C profiling)
doesn’t tell you anything useful you could try this.

–
Pavel Lebedinsky/Windows Fundamentals Test
This posting is provided “AS IS” with no warranties, and confers no rights.

Scott_Noone_OSR · February 25, 2010, 9:19am

> break into kd and check what the threads in this process are

doing - are they blocked on something, starved for CPU time,
etc. Try this several times to see if a pattern emerges. Use an
idle system for comparison.

I like to call this, “Poor Man’s Sampling.” It’s surprisingly effective as a
cheap profiling technique.

The only thing that I’ll add is that you can also break in from the target
with the SysRq key, which provides for a more immediate breakpoint than
going back to the host and hitting Ctrl+C/Ctrl+Break. I use this all of the
time whenever a target machine is acting funky and I want to see what’s
going on.

-scott

–
Scott Noone
Consulting Associate
OSR Open Systems Resources, Inc.
http://www.osronline.com

“Pavel Lebedinsky” wrote in message
news:xxxxx@ntdev…
> When you have an app that responds poorly (for example, task
> manager doesn’t repaint every second, or something like that),
> break into kd and check what the threads in this process are
> doing - are they blocked on something, starved for CPU time,
> etc. Try this several times to see if a pattern emerges. Use an
> idle system for comparison.
>
>
>> Now per some advice from another mailing list, I found and installed
>> xperf.
>
> With xperf you can do something called “wait analysis”:
>
> http://social.msdn.microsoft.com/Forums/en-US/wptk_v4/thread/88dd7aca-d0ef-4c86-8806-245251c90c05
>
> If the kernel debugger method (a.k.a. Control+C profiling)
> doesn’t tell you anything useful you could try this.
>
> –
> Pavel Lebedinsky/Windows Fundamentals Test
> This posting is provided “AS IS” with no warranties, and confers no
> rights.
>
>
>
>

Pavel_Lebedinsky · February 25, 2010, 12:14pm

Also check what the system as a whole is doing (!running -t, !ready, !locks)

–
Pavel Lebedinsky/Windows Fundamentals Test
This posting is provided “AS IS” with no warranties, and confers no rights.