Surprising bug in WinNT (and others?) CHKDSK /R bad sector checking!

Basically, the symptom was that whenever I did a CHKDSK /R on my IDE
drives, it would find a huge number of bad sectors – even on brand new
drives. On a blank formatted drive, it would find around 95% bad
sectors, thus rendering the drive useless. It would do this on all of
my WinNT systems.

I eventually tracked down the problem, but it’s actually an
architectural issue, and I’m shocked that this bug wasn’t found
previously since it’s been so many years since WinNT came out! Plus,
for all I know, this bug might exist in Windows 2000+, but the source
for ATAPI.sys doesn’t seem to exist in the DDK anymore. :frowning: If anyone
is able to check that out, I’d be very interested in the answer, and if
yes, I’d pursue it with Microsoft (we have a Premier account).

When CHKDSK /R is searching for bad sectors in free space, it writes its
pattern to a group of sectors, and then it sends down an
IOCTL_DISK_VERIFY to do verification. It tries to do this in chunks of
512 sectors (though it is usually less on non-blank drives, depending
how the used sectors are interspaced with free sectors). Class.sys gets
the request and passes it off to Disk.sys, which then passes it to
ScsiPort.sys as SCSIOP_VERIFY. Finally, it ends up in ATAPI.sys as a
request for the verification of 512 sectors. As you may know, the ATA
Read Verify Sectors (0x40) command can only handle 255 (actually 256 if
you write a 0x00) sectors at a time since the Sector Count register is
only 8 bits.

The key to this problem, though, is the discovery that on my systems, I
was running the latest version of ATAPI.sys (4.0.1381.7247), which is
available in a non-public KB article 832002. With this version of the
driver, the routine IdeVerify has a check that if it’s ever asked to
verify more than 255 sectors, it will return the error
SRB_STATUS_INVALID_REQUEST. Unfortunately, the upper layers (not sure
which) interpret that to mean that ALL of the 512 sectors that it
requested must be bad sectors, and it then marks them as bad. Arg!

So, at first it seems that they just introduced a bug. So, I went back
to the previous version of ATAPI.sys, version 4.0.1381.7210 from KB
article 812780. In there, IdeVerify also has a check if the number of
sectors is greater than 255, but in that driver (and in all the previous
ATAPI.sys’s that I looked at, including the DDK source), it just cuts
off the sector count at 255 sectors! It makes no attempt to inform the
requestor of the lesser verification, nor does it break it up into
separate I/O requests. Thus, even though CHKDSK requested that 512
sectors be verified, it’s really only verifying 255 sectors, and then
returning the status for that. The end being when you do a CHKDSK /R,
it’s really only looking for bad sectors for 49.8% of your drive. Arg!

And I also grabbed the newest versions of chkdsk.exe, disk.sys,
ntfs.sys, ftdisk.sys, class.sys, and scsiport.sys, but adding those made
no difference – ATAPI.sys still receives the requests to verify 512
sectors at a time.

Unfortunately, I need a solution for this issue, but also need the
functionality that the new driver gives. And since WinNT development
and bug fixing is all-but-dead at Microsoft, I’m stuck patching the new
ATAPI.sys to have it only do 255 sectors whenever it requests any more
than that, which limits the functionality of CHKDSK /R. (By the way,
does anyone know how to determine the new checksum for the .sys file
after hand-editing it?)

But as to where the bug actually is, it’s hard to say since it’s an
architectural issue – either CHKDSK needs to limit how much it checks
at a time to only 255 sectors, or one of the drivers in the chain needs
to break apart that request into multiple requests.

Well, I just needed to get that off my chest… However, any comments
or other suggestions would be welcomed (particularly regarding its
existence in Windows 2000+)…

(Thanks for listening!)