Utility to detect corrupt SYSTEM file

Are there any utilities out there to detect if a Win2000 SYSTEM file
(located in \WINNT\SYSTEM32\CONFIG) is corrupt?

Thanks,

Daniel

> Are there any utilities out there to detect if a Win2000 SYSTEM file

(located in \WINNT\SYSTEM32\CONFIG) is corrupt?

There are no available utilities. I know; I searched for a long time.

If you just want to find out IF it is corrupt, you can try loading it into
REGEDT32 using the “Load Hive” command; that will give you a load failure if
it’s corrupt. There’s also an API to do the same thing (which returns
success or failure), but I never tried to get it working and forget what it
was.

However, Microsoft has an internal test tool called CHKREG that I was able
to get under NDA, so I cannot distribute it. (However, they are looking
into putting it into a future Resource Kit.) The support group there has
access to it, so they can diagnose these problems as well.

Additionally, someone anonymous reversed-engineered the Registry format and
it can be found at:
http://www.csdn.net/FORMAT/binary/winreg.htm

The errors reported on your SYSTEM file using CHKREG are:

Used free cell 0x2120a8
Fatal: Invalid signature (0x63d0) in Security cell 0x2120a8
Size too small (416) in Bin header of Bin (15)
Invalid signature (66676572) in Bin header of Bin (15)
Actual FileOffset [f000] and Bin FileOffset [1a0] do not match in Bin (15);
Size = (1a0)

I don’t know about the first one, but the last four are something I’m very
familiar with. We had similar problems with our system, and I spent 2
months tracking them down. For us, we saw a system fail to boot due to a
corrupt “LastKnownGood Menu” about once every 25 machine-years (once a month
in a field of 300 systems), which is very rare and hard to debug!

Basically, the Registry is allocated from the same pool that normal drivers
use, although I forget if it was paged or non-paged. However, it always
allocated 4K chunks, which are called “bins”. Each bin starts with the
signature string “Hbin”, and then has a small (64 bytes or so) header, which
contains offsets and counts of the data (stored in “cells”).

The Registry bins are paged in and out in a pseudo-virtual-memory scheme.
Thus, if the bins are modified while in memory, the new data will be written
to disk.

The SYSTEM and SYSTEM.ALT files are kept in sync to deal with crashes while
writing one or the other. It first writes and flushes one, and then the
other.

When the system boots, it does a consistency check on the structures in
SYSTEM and SYSTEM.ALT (as well as SOFTWARE and so on), basically checking
the signatures and walking the offsets. If it finds ANY error, it will fail
the check, and thus will fail to boot.

What this comes down to is if you have a driver that allocates memory from
the global pool, and then writes memory that are past its bounds, there is a
chance that it is writing to the Registry data. That data will then get
saved to the Registry, and the system may fail to boot the next time the
system is rebooted. (Note that there is a very good chance that the problem
will go unnoticed until the next boot.)

It is more likely that you’ll see random crashes in the system which you
just cannot explain. Those would be due to your corrupting the pool’s
“linked list” data structures and corrupting the data of other drivers.

So, you have to start searching your drivers looking for the naughty code.

By far the easiest way to do this is to use the “special pool”. This is
part of the Windows 2000 driver verifier, but is also available on NT 4.0 SP
4 and above. Search the Knowledge Base for the relevant articles. NuMega’s
BoundsChecker for Drivers does the same thing, plus more, so if you have
that tool (or are willing to buy it), that’s a better general solution.
However, for you, I’d first use the special pool since it will catch more
extreme errors (and it looks like you have at least one of those). Both of
those tools (using different methods) catch overruns of allocated pool
memory.

In your case, you’ll notice that the invalid signature is also the ASCII
string “regf”, so I’d also start by searching your code base for that string
(which is surely a subset of a larger string). (Although a quick dictionary
search came up with 0 words with that sub-string.)

You’re sure to spend a while on this problem, and at the end of it, you’ll
have found these huge bugs, be the hero of your group, and sit back and
wonder how the driver ever ran correctly in the first place… :slight_smile:

Have fun!