[OT] chkdsk eating cpu + stuck, try to understand why and what it is currently doing

Hi,

this is technically not a fs driver related question, but propably the best place to ask. Yesterday i had a horrible issue with my external usb (sata) drive and ist still not fully fixed. For reasons unknown the disk turn into a raw drive and was recognized by the os as non formated and disk (no drive letter assigned) management couldnt determine its type (GPT or MBR). BTW the system is a vista 32b. At first there was panic in the air, but i quickly calm down and started to try to find the issue. chkdsk wont work on raw disks and using a mount point didt help either, since its file system is still read as raw from chkdsk. at this point it was pretty obvious that either mbr or $mft or both had some serious issue. but knowing that there is a $mbrmirr i give the tool testdisk a try and indeed it was the $mft that was corrupted. testdisk did a recover from the backup and everythiing was fine again. disc could be mount as basic mbr ntfs with drive letter assigned and could be read with no issues at the first sight. thats the background so far.

The drive was accessible again, but knowing that the drive has some possible hardware/fs-integrity issues, i started to run chkdsk on it with parameters “chkdsk F: /F /V /R /X /B” i know some are redundant, implying /f on /r etc. but thats ok for me. stage 1-3 did finish pretty ok and chkdsk did fix index and security stuff so far. from experience i know that stage 4 could be very long, depending on the drive size and parameters (especially /b) and now the stage 4 process is stuck at 525137 of 638768 for nearly 10 hours. looking at the stack of chkdsk it mostly shows something like that:

0, ntkrnlpa.exe!KiDeliverApc+0xce
1, ntkrnlpa.exe!KiSwapThread+0x472
2, ntkrnlpa.exe!KeWaitForSingleObject+0x492
3, ntkrnlpa.exe!KiSuspendThread+0x18
4, hal.dll!HalpDispatchSoftwareInterrupt+0x49
5, hal.dll!HalpCheckForSoftwareInterrupt+0x64
6, hal.dll!HalEndSystemInterrupt+0x73
7, hal.dll!HalpIpiHandler+0x189
8, untfs.dll!operator<+0x1d
9, untfs.dll!NTFS_BITMAP::IsFree+0x22
10, untfs.dll!NTFS_BITMAP::AllocateClusters+0xea
11, untfs.dll!NTFS_ATTRIBUTE::RecoverAttribute+0x5d1
12, untfs.dll!NTFS_FILE_RECORD_SEGMENT::RecoverFile+0x102
13, untfs.dll!RecoverAllUserFiles+0x158
14, untfs.dll!NTFS_SA::VerifyAndFix+0x1142
15, ifsutil.dll!VOL_LIODPDRV::ChkDsk+0xd2
16, untfs.dll!ChkdskEx+0x50c
17, chkdsk.exe!main+0xf62
18, chkdsk.exe!_initterm_e+0x163
19, kernel32.dll!BaseThreadInitThunk+0xe
20, ntdll.dll!__RtlUserThreadStart+0x23
21, ntdll.dll!_RtlUserThreadStart+0x1b

Now come the questions:

  • what exactly does chkdsk do here? regarding the call stack on that frames its still operating, moving/fixing/recovering data on that drive, but there is no activity on that drives I/O counter and no blinkig on the drives access light.
  • is chkdsk stuck here on a e.g. bad sector and do i have to terminate the process or should i wait (even after 10 hours) till it hopefully finishes. chkdsk fully covers 100% processing power on the first physical cpu on my dual core system. coolers are running on high speed.

I know that especially the /B parameter can be extremly time consuming, but how long should i wait before i break the process by terminating chkdsk? Are there any known issues of chkdsk and bad sectors and is chkdsk according to the call stack and cpu consumption still alive and working on that drive? whats your opinion? wait or kill and if wait, how long?

best

K.