A potentially interesting puzzler for a Tuesday...
We're looking at a series of crash dumps from a client and am hoping that
this corruption looks familiar to someone. This is an end user installation
and none of our software is running on these machines, it's strictly a crash
The crashes are spread out over 1,000+ "identical" machines. No one machine
crashes with great frequency, but spread out across the install there's a
few a day. Systems survive anywhere from a few hours to six days before
Just looking at the crash codes isn't helpful, they have about every crash
code you could ever imagine (QUOTA_UNDERRUN??) and it's blamed on various
modules. However, digging deeper a very consistent pattern emerges.
Specifically, we are consistently seeing one of two values "randomly" appear
Interestingly, when the corruption is discovered the value very, very
frequently appears at physical memory page offset 0xFD8 (most common) or
0xD70 (less common).
For example, in one crash the problem was that the MRXSMB20 image file is
3: kd> !chkimg -d mrxsmb20
fffff800826a6fd8-fffff800826a6fdd 6 bytes -
[ 89 7d 18 49 89 45:04 00 00 00 10 00 ]
fffff800826a6fdf - mrxsmb20!Smb2UpdateFileInfoCacheEntry+4cf (+0x07)
[ e8:00 ]
7 errors : mrxsmb20 (fffff800826a6fd8-fffff800826a6fdf)
Dumping the start of the corrupted range, we see our offset and value:
3: kd> dq fffff800826a6fd8
fffff800`826a6fd8 00000010`00000004 4c2b894c`0000e99c
fffff800`826a6fe8 ade901b6`41986d8b 850f02f8`83fffffd
fffff800`826a6ff8 8bc03345`fffffbb5 445e15ff`ce8b49d7
fffff800`826a7008 fb9f850f`c0840002 03fffffe`fee9ffff
In another crash a pool header is corrupted:
2: kd> !pool ffffc00089adcd70
Pool page ffffc00089adcd70 region is Paged pool
ffffc00089adcc00 size: 170 previous size: b0 (Free ) MPsc
ffffc00089adcd70 doesn't look like a valid small pool allocation, checking
if the entire page is actually part of a large page allocation...
2: kd> dq ffffc00089adcd70
ffffc000`89adcd70 00000010`00000004 8d5eb149`4b83d33a
ffffc000`89adcd80 00000000`00000000 ffffe000`bdecfec0
ffffc000`89adcd90 ffffe000`bc728860 ffffc000`89adcd98
Due to the fact that the crash appears at random in different virtual
address ranges (paged pool, non-paged pool, code, proto PTEs, working set
lists, etc.), we believe that the corruption must be generated by a device
in the system (or by the platform).
We have tried various things to narrow this further and analyzed the
corruption across hundreds of dump files. The systems are generally idle
when the corruption is discovered, which makes it hard to go back in time
and figure out who might be using the value.
So, my question to you all is: does this LOOK like anything to you? Do those
corruption values hold any meaning to you? What about the offsets of 0xFD8
or 0xD70? I realize it's not much to go on, but stranger things have