>Actually at present I am not handling any bitmap file. I am just saving bitmap in a buffer and
upon receiving a user defined IOCTL from the user app the bitmap buffer is send to the user app
to take backup. Also I have verified both the bitmaps(kernel side and user app side), both are
same and my backup and restore too are working properly. Only don’t know is what causing
the chkdsk issue. Is there any other approach that I need to follow?
I worked on a product many years ago that somebody designed with a dirty bitmap for disk replication. Unfortunately, it became clear this is not a viable strategy, unless all activity on the disk is stopped, or it records the ordered stream of write data. The order of I/O completion becomes a complex issue on a multi-core, multi-thread system. You can force everything into a serialized write stream, although you also degrade performance doing so.
You should write a little test program that does the following:
Open a file for unbuffered writing of allocation size blocks
blockIndex = 1
while (blockIndex < maxBlockIndex) {
write block index value at block blockIndex
write block index value at block 0
blockIndex++
}
While this is running, repeatedly crash the system. On restart, verification should always find block 0 points to a valid position in the file that contains the index, because there is an ordering to the writes. This is called crash consistency.
From the description of your algorithm, if the bitmap and app are copying the blocks of a disk with activity, the ordering between writes may be lost, which is essential to data integrity. For example, block 0 will frequently be marked as dirty in the bitmap, and will be updated with the latest end block index. The application code, which will be delayed in time slightly (or a lot) from the disk writes, will read the latest value of block 0, but will have a snapshot in the past of the bitmap, so will not necessarily be in sync with the current last block in the file. The later blocks of the file, and the file system metadata that describe the disk allocation, may not be included in the snapshot of the dirty bitmap. The result will be, copying the disk blocks based on a dirty bitmap, will create cases on the copied disk where block 0 does not point at a matching block in the file, it may in fact point past the end of file, which means the disk is corrupt.
You may be getting chkdisk errors because it’s detecting disk corruption, because the copy process is not maintaining write ordering, and not copying a valid moment in time snapshot of the disk. This is exactly why Windows has shadow disk copies, so backup software can see a crash consistent view of an active disk. Shadow copies actually can communicate with applications too, allowing them to bring files to a known consistent state before the snapshot is taken. There is a system API to interact with the shadow copy facilities. Among other things, the shadow copy APIs allow a copy provider to know when the disk has temporarily stopped write activity and shadow copy aware applications have flushed files to an application consistent state.
Jan