Facing CHKDSK problem

Peter_Haskin · June 9, 2014, 7:21am

Hi,

I am working on a backup application which has an upper volume filter driver based on diskperf and one user app. My driver actually monitors the sectors that are modified and sets the respective blocks in a bitmap. The user app backup the data for the blocks whose bits are set in the bitmap.

The problem I am facing is whenever the system shuts down and restarts there is CHKDSK problem. I am handling “IRP_MJ_SHUTDOWN” to free the buffers that I have allocated using “ExAllocatePoolWithTag”. As per my knowledge I am freeing almost all the allocated buffers. What can be the possible cause?

Does I need to handle the following IOCTLs,
IOCTL_DISK_COPY_DATA, IOCTL_VOLSNAP_FLUSH_AND_HOLD_WRITES, FSCTL_MOVE_FILE, or any other IOCTL to solve this problem?

Awaiting a positive response.
Thanks in advance.

Peter_Haskin · June 10, 2014, 9:45am

Hi,

Does anyone knows what can be the possible cause and solution to this?

Awaiting a positive response.

NtDev_Geek · June 11, 2014, 1:19am

As a Upper volume filter i think you should not care about IRP_MJ_SHUTDOWN.

For this IOCTL_DISK_COPY_DATA handling only when you get some corruption in your bitmap.basically it is related to Defrag.

You will see this IOCTL_VOLSNAP_FLUSH_AND_HOLD_WRITES only when your filter will above the VolSnap.sys and if yes need to handle other wise No Need and You can’t.

In your case the cache is not getting flushed on shutdown so what about IRP_MJ_FLUSH_Buffer.
Do a passthrough. hope this might help.

./nT

Peter_Haskin · June 11, 2014, 2:26am

Hi,

Thanks @NtDev Geek for your reply.

As a Upper volume filter i think you should not care about IRP_MJ_SHUTDOWN.

The reason I am handling IRP_MJ_SHUTDOWN is to free the buffers that I have allocated and also to flush the cache data at an appropriate sector so it can again be read when the system restarts.

For this IOCTL_DISK_COPY_DATA handling only when you get some corruption in your bitmap.basically it is related to Defrag.

while handling IOCTL_DISK_COPY_DATA what should I actually do?

You will see this IOCTL_VOLSNAP_FLUSH_AND_HOLD_WRITES only when your filter will above the VolSnap.sys and if yes need to handle other wise No Need and You can’t.

Sorry to ask, but how can I determine whether my filter driver is above VolSnap.sys, are there any documentation or sample related to it that can be helpful? Also when handling IOCTL_VOLSNAP_FLUSH_AND_HOLD_WRITES what should I do?

In your case the cache is not getting flushed on shutdown so what about IRP_MJ_FLUSH_Buffer.

Before handling IRP_MJ_SHUTDOWN I did handle IRP_MJ_FLUSH_Buffer but its continuously(whenever this IOCTL is called) flushes my cache right from the start of the system.
I only wants to flush the cache/buffer when the system shutdown or if a corruption occurs.

NtDev_Geek · June 11, 2014, 11:10pm

>>how can I determine whether my filter driver is above VolSnap.sys,

Use DevCon.exe in winddk tools and run the cmd devcon stack =Volume.
it will show you the volume level stack and correctly shows you the driver position.

Chepuri · June 12, 2014, 3:18am

>>The problem I am facing is whenever the system shuts down and restarts
there is CHKDSK problem. I am >>handling “IRP_MJ_SHUTDOWN” to free the
buffers that I have allocated using “ExAllocatePoolWithTag”. >>As per my
knowledge I am freeing almost all the allocated buffers. What can be the
possible cause

Can you elaborate about this design? chkdsk problem on what? Where are you
writing the backup data?
It looks like this issue is something wrong with in-flight IO.

On Thu, Jun 12, 2014 at 8:39 AM, wrote:

> >>how can I determine whether my filter driver is above VolSnap.sys,
>
> Use DevCon.exe in winddk tools and run the cmd devcon stack =Volume.
> it will show you the volume level stack and correctly shows you the driver
> position.
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

Peter_Haskin · June 12, 2014, 6:53am

Hi,

@NtDev Geek: I have used DevCon.exe and in cmd my driver is shown above volsnap.sys.
So it means I need to handle “IOCTL_VOLSNAP_FLUSH_AND_HOLD_WRITES”, can you please explain me what should I do while handling “IOCTL_VOLSNAP_FLUSH_AND_HOLD_WRITES” and “IOCTL_DISK_COPY_DATA” IOCTLs?

@Chepuri: Actually when there is any writes, the respective block’s bits are set in our bitmap. The bitmap is then send to the user app which then reads the data for the respective blocks and the data is stored on other system.
Actually If no partitions are monitored then I don’t receive chkdsk problem when the system restarts, but if any partition is being monitored then I do face chkdsk problem when the system restarts.

Alex_Grig · June 12, 2014, 1:00pm

Are you handling Force Unit Access flag properly?

NtDev_Geek · June 13, 2014, 2:57am

>>> IOCTL_VOLSNAP_FLUSH_AND_HOLD_WRITES", can you
please explain me what should I do while handling
“IOCTL_VOLSNAP_FLUSH_AND_HOLD_WRITES” and “IOCTL_DISK_COPY_DATA” IOCTLs?

First tell why you need to handle them there is no need to handle as a volume upper filter driver. but
Explain us your design then only i can tell.

No need to throw an arrow, in air blindly.

Peter_Haskin · June 13, 2014, 6:15am

Hi,

@Alex Grig: Thanks for your reply. I am not handling Force Unit Access flag. Is it necessary I need to handle it?

@NtDev Geek: Its not necessary that I have to handle those IOCTLs. I searched on google and in 2-3 posts I have found that users are referring to “IOCTL_DISK_COPY_DATA” when they are facing the chkdsk problem, so I am not sure whether its the solution, I also referred to “IOCTL_DISK_COPY_DATA” article on Msdn but didn’t reached to any conclusion. So finally I posted here.
Frankly speaking actually I don’t know whats causing the problem, I checked my whole code to see if there is any buffer which I have allocated and is not getting freed but I have not found any. So basically I want to know is what should I check so that I can know the cause of the problem and what can be the possible solution.

NtDev_Geek · June 19, 2014, 2:06am

@Peter, there are numerous scenarios in which you can get the check disk errors.i think you are approaching the problem incorrectly.
try this if not already…
When you are actually updating writes in you bimap file check for each value written in the bitmap with the when ur app’s reading that bitmap for backup.if both the values are correct that means matching, then there is no problem in ur driver. if not matches then need to see other options.

./Dev

Peter_Haskin · June 25, 2014, 3:13am

@NtDev Geek: Thanks for your reply.

Actually at present I am not handling any bitmap file. I am just saving bitmap in a buffer and upon receiving a user defined IOCTL from the user app the bitmap buffer is send to the user app to take backup. Also I have verified both the bitmaps(kernel side and user app side), both are same and my backup and restore too are working properly. Only don’t know is what causing the chkdsk issue. Is there any other approach that I need to follow?

Note: backup is done by copying data from VSS and VSS is taken at the user side by using the VSS client service.
Thanking you again for your patience and replying to my posts.

NtDev_Geek · June 25, 2014, 4:41am

check your buffer where you are allocating also check for the lock and partial creation is there or not.
if not buildpartial mdl and try and let me know.

Dev

Jan_Bottorff · June 25, 2014, 3:16pm

>Actually at present I am not handling any bitmap file. I am just saving bitmap in a buffer and

upon receiving a user defined IOCTL from the user app the bitmap buffer is send to the user app
to take backup. Also I have verified both the bitmaps(kernel side and user app side), both are
same and my backup and restore too are working properly. Only don’t know is what causing
the chkdsk issue. Is there any other approach that I need to follow?

I worked on a product many years ago that somebody designed with a dirty bitmap for disk replication. Unfortunately, it became clear this is not a viable strategy, unless all activity on the disk is stopped, or it records the ordered stream of write data. The order of I/O completion becomes a complex issue on a multi-core, multi-thread system. You can force everything into a serialized write stream, although you also degrade performance doing so.

You should write a little test program that does the following:

Open a file for unbuffered writing of allocation size blocks
blockIndex = 1
while (blockIndex < maxBlockIndex) {
write block index value at block blockIndex
write block index value at block 0
blockIndex++
}

While this is running, repeatedly crash the system. On restart, verification should always find block 0 points to a valid position in the file that contains the index, because there is an ordering to the writes. This is called crash consistency.

From the description of your algorithm, if the bitmap and app are copying the blocks of a disk with activity, the ordering between writes may be lost, which is essential to data integrity. For example, block 0 will frequently be marked as dirty in the bitmap, and will be updated with the latest end block index. The application code, which will be delayed in time slightly (or a lot) from the disk writes, will read the latest value of block 0, but will have a snapshot in the past of the bitmap, so will not necessarily be in sync with the current last block in the file. The later blocks of the file, and the file system metadata that describe the disk allocation, may not be included in the snapshot of the dirty bitmap. The result will be, copying the disk blocks based on a dirty bitmap, will create cases on the copied disk where block 0 does not point at a matching block in the file, it may in fact point past the end of file, which means the disk is corrupt.

You may be getting chkdisk errors because it’s detecting disk corruption, because the copy process is not maintaining write ordering, and not copying a valid moment in time snapshot of the disk. This is exactly why Windows has shadow disk copies, so backup software can see a crash consistent view of an active disk. Shadow copies actually can communicate with applications too, allowing them to bring files to a known consistent state before the snapshot is taken. There is a system API to interact with the shadow copy facilities. Among other things, the shadow copy APIs allow a copy provider to know when the disk has temporarily stopped write activity and shadow copy aware applications have flushed files to an application consistent state.

Jan