This is a question about how to catch a specific page fault on Win2k,
although it happens on other flavours (notably 98) as well.
I’m developing an application which uses OpenGL for visualization.
Sometimes, on some systems, this application will start hitting 100 page
faults per second and slow to a crawl. “Ah,” you will think, “a memory
leak.”
Well, my pool size isn’t going up (much) to account for this. Even more
curiously, if I quit my application, and re-start it, it will still be slow,
or very quickly again become slow. Re-booting the machine “solves” the
problem for 15-30 minutes. Thus, something at the kernel level that either
does not get a process termination message, or is happily ignoring the same,
has to be causing this problem.
Nothing else is going on on the machine, and these page faults appear to be
the only ones happening (according to the process monitor, anyway).
My guess is, for various reasons, the texture management in the OpenGL
driver we’re using. However, that may just be slander, and even if it isn’t,
it serves me no good unless I can actually prove specifically how this is
the case. My current idea for how to do this is to remote debug a system and
put a breakpoint in the page fault handler of the kernel when the problem
has started exhibiting itself, and then try to get a stack trace from there.
Repeat 10 times and hopefully the culprit will be statistically clear.
I’ve looked through the WinDbg documentation and Win2k DDK, as well as
searched MSDN, googled around the web and the archive of this list, but I
can’t find any good pointers on how to do this (or what else could be the
culprit). Any suggestions, pointers, help or insight you may be able to
share would be much appreciated!