I have about 30 Windows 2008 and 2003 server systems that have gotten into a bad state where the system cannot be logged into or RDPed into. Services that are running mostly work, but new processes cannot be started. It is also highly correlated with an uptime of 50-60 days. Based on past experience, I expect some sort of memory leak to be the root cause. And also based on past experience, it’s likely one of our drivers or services that is the culprit (it’s an “embedded” system with custom hardware but a standard PC motherboard.) [I originally had this issue on WINDBG when I thought the issue was something else, but now think that it’s something in the driver or internals arena.]
I was able to get a memory dump of a few systems and WinDbg !vm command certainly indicates something of the sort based on the errors. However, all of the various Usages are far less than the Max/Limits, so there’s no obvious leak.
*** Virtual Memory Usage ***
Physical Memory: 521669 ( 2086676 Kb)
Page File: ??\C:\pagefile.sys
Current: 2393876 Kb Free Space: 2357824 Kb
Minimum: 2393876 Kb Maximum: 6260028 Kb
Available Pages: 88804 ( 355216 Kb)
ResAvail Pages: 988943 ( 3955772 Kb)
Locked IO Pages: 0 ( 0 Kb)
Free System PTEs: 386694 ( 1546776 Kb)
******* 681703 system cache map requests have failed ******
Modified Pages: 736 ( 2944 Kb)
Modified PF Pages: 736 ( 2944 Kb)
NonPagedPool Usage: 15092 ( 60368 Kb)
NonPagedPool Max: 386063 ( 1544252 Kb)
PagedPool 0 Usage: 6444 ( 25776 Kb)
PagedPool 1 Usage: 6365 ( 25460 Kb)
PagedPool 2 Usage: 1002 ( 4008 Kb)
PagedPool 3 Usage: 907 ( 3628 Kb)
PagedPool 4 Usage: 650 ( 2600 Kb)
PagedPool Usage: 15368 ( 61472 Kb)
PagedPool Maximum: 523264 ( 2093056 Kb)
********** 825082 pool allocations have failed **********
Session Commit: 2486 ( 9944 Kb)
Shared Commit: 8514 ( 34056 Kb)
Special Pool: 0 ( 0 Kb)
Shared Process: 5850 ( 23400 Kb)
PagedPool Commit: 15382 ( 61528 Kb)
Driver Commit: 5060 ( 20240 Kb)
Committed pages: 4294962071 (17179848284 Kb)
Commit limit: 1108180 ( 4432720 Kb)
********** Number of committed pages is near limit ********
********** 10528464 commit requests have failed **********
Total Private: 456244 ( 1824976 Kb)
But look at the Committed Pages! It is nearly 4000 times larger than Commit Limit! I’ve been doing a lot of reading and it doesn’t seem like it’s possible to get into the state that I see where the Committed Pages is at the maximum 16 TB (4294962071 == 0xFFFFEB97) but my system has a Commit Limit of a reasonable 4 GB. How are my systems committing more than the commit limit?!?!
Committed pages: 4294962071 (17179848284 Kb)
Commit limit: 1108180 ( 4432720 Kb)
********** Number of committed pages is near limit ********
********** 10528464 commit requests have failed **********
Also, this was an interesting tidbit in that my systems show 3 of the 4 types of commit request failures.
0: kd> dd nt!MiChargeCommitmentFailures
81d51f80 0093b93f 00000000 000c7189 00007c08
MiChargeCommitmentFailures[0] - If the system failed a commit request and an expansion of the pagefile has failed.
MiChargeCommitmentFailures[1] - If the system failed a commit and we have already reached the maximum pagefile size.
MiChargeCommitmentFailures[2] - If the system failed a commit while the pagefile lock is held.
MiChargeCommitmentFailures[3] - If the system failed a commit and the NewCommitValue is less than or equal to CurrentCommitValue.
Also odd is the number of pool allocations that failed when the pool usages are so much less than the pool maximums, but I suspect that’s just a failure to grow the pool larger than the current size (far less than the maximum).
I’ve gone through each process, and they all have reasonable memory and virtual memory usage. No handle leaks, no pool leaks, and so on.
I cannot figure out what is wrong other than the 16 TB Committed Pages.
Any advice, troubleshooting ideas, or anything will be appreciated!