BSOD on NtfsFspCloseInternal

Hi everyone,

I am writing a software-only RAID driver. In this test case, I made a 4-disk RAID1 set and wrote data to it. In the middle of the write, I disconnect one of the disks and reconnect it. Everything works fine, but I get a BSOD on shutdown. It’s either this, or in CcUnpinData or the CC flush cache function.

My guess is that because I yank a disk and reconnect it, NTFS is not aware of any change to the RAID volume (by design) so it expects a pointer to be valid, when it isn’t. Is there a way to let Windows know that the underlying disk has changed so the cache pointers may not be valid any more?

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: fffff8010b1d4210, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000008, value 0 = read operation, 1 = write operation
Arg4: fffff8010b1d4210, address which referenced memory

... 

FAILED_INSTRUCTION_ADDRESS: 
Ntfs!`NtfsFspCloseInternal`$filt$0+0
fffff801`0b1d4210 4055            push    rbp

STACK_TEXT:  
ffffcb80`ccff8658 fffff801`09166dc2     : fffff801`0b1d4210 00000000`00000003 ffffcb80`ccff87c0 fffff801`08fe44d0 : nt!DbgBreakPointWithStatus
ffffcb80`ccff8660 fffff801`091664b7     : 00000000`00000003 ffffcb80`ccff87c0 fffff801`090961a0 00000000`000000d1 : nt!KiBugCheckDebugBreak+0x12
ffffcb80`ccff86c0 fffff801`09081c27     : 486f78c0`85480000 a5158d48`4024448b fffff801`0b0ba1dc 0000639c`0d8b4808 : nt!KeBugCheck2+0x947
ffffcb80`ccff8dc0 fffff801`09093929     : 00000000`0000000a fffff801`0b1d4210 00000000`00000002 00000000`00000008 : nt!KeBugCheckEx+0x107
ffffcb80`ccff8e00 fffff801`0908fc69     : ffffdd05`d8a2b000 ffffdd05`d8a31000 fffff801`00000000 00000000`00000000 : nt!KiBugCheckDispatch+0x69
ffffcb80`ccff8f40 fffff801`0b1d4210     : fffff801`0905c7d9 ffffdd05`00000003 fffff801`0b0ba1ec ffffdd05`d95f9000 : nt!KiPageFault+0x469
ffffcb80`ccff90d8 fffff801`0905c7d9     : ffffdd05`00000003 fffff801`0b0ba1ec ffffdd05`d95f9000 ffffdd05`d95ff000 : Ntfs!NtfsFspCloseInternal$filt$0
ffffcb80`ccff90e0 fffff801`0b06fb2e     : fffff801`0b0ba1ec ffffdd05`d95fe0d8 ffffdd05`d95fe750 fffff801`0b150cbc : nt!_C_specific_handler+0xa9
ffffcb80`ccff9150 fffff801`0908a90f     : ffffdd05`d95fe750 ffffcb80`ccff96f0 00000000`00000000 00000000`00000000 : Ntfs!_GSHandlerCheck_SEH+0x6a
ffffcb80`ccff9180 fffff801`08ed5885     : 00000000`00000000 00000000`00000000 ffffcb80`ccff96f0 00007fff`ffff0000 : nt!RtlpExecuteHandlerForException+0xf
ffffcb80`ccff91b0 fffff801`08ed3f1e     : ffffdd05`d95fe0d8 ffffcb80`ccff9e30 ffffdd05`d95fe0d8 ffffcb80`cd000180 : nt!RtlDispatchException+0x4a5
ffffcb80`ccff9900 fffff801`09082992     : 1f000000`00000000 0000e0ff`ffffffff 00000000`00000000 00000000`00000000 : nt!KiDispatchException+0x16e
ffffcb80`ccff9fb0 fffff801`09082960     : fffff801`09093a56 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxExceptionDispatchOnExceptionStack+0x12
ffffdd05`d95fdf98 fffff801`09093a56     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiExceptionDispatchOnExceptionStackContinue
ffffdd05`d95fdfa0 fffff801`0908f7e0     : ffffdd05`d95ff000 ffffdd05`d95f9000 00000000`00040293 fffff801`08f8100d : nt!KiExceptionDispatch+0x116
ffffdd05`d95fe180 fffff801`08f383e7     : ffff800c`ad802340 00000000`786674ff ffff800c`b6791330 00000000`00000000 : nt!KiGeneralProtectionFault+0x320
ffffdd05`d95fe310 fffff801`08f37f78     : ffff9683`00000000 ffff800c`b12dfda0 00000000`00000000 00000000`00000000 : nt!CcGetPartition+0xa7
ffffdd05`d95fe360 fffff801`08f38cd8     : 00000000`00000745 fffff801`08efd854 00000000`00000000 00000000`00000745 : nt!CcUnmapVacbArray+0x38
ffffdd05`d95fe3d0 fffff801`0b064b6b     : ffff800c`b18bee20 00000000`00000745 00000000`00000000 fffff801`0922c0a9 : nt!CcUnmapFileOffsetFromSystemCache+0x28
ffffdd05`d95fe410 fffff801`0b12fd20     : ffffdd05`d95fe540 ffffdd05`d95fe540 ffff9683`95681650 ffffdd05`d95fe8f0 : Ntfs!NtfsMftViewAddToDelayedUnmap+0x10f
ffffdd05`d95fe490 fffff801`0b054587     : ffffdd05`00000000 ffff9683`9d1a5b00 ffffdd05`0000047e ffff9683`00000000 : Ntfs!NtfsDeleteFcb+0x520
ffffdd05`d95fe510 fffff801`0b12f4ce     : ffffdd05`d95fe8f0 ffff800c`b18bd180 ffff9683`9c58ca10 ffff9683`9c58ce58 : Ntfs!NtfsTeardownFromLcb+0x267
ffffdd05`d95fe5b0 fffff801`0b0589aa     : ffffdd05`d95fe8f0 ffffdd05`d95fe6b1 ffff9683`9c58ce58 ffffdd05`d95fe8f0 : Ntfs!NtfsTeardownStructures+0xee
ffffdd05`d95fe630 fffff801`0b150cbc     : ffffdd05`d95fe700 ffff9683`00000000 ffffdd05`00000000 ffffdd05`d95fe8f0 : Ntfs!NtfsDecrementCloseCounts+0xaa
ffffdd05`d95fe670 fffff801`0b14fc01     : ffffdd05`d95fe8f0 ffff9683`9c58cb70 ffff9683`9c58ca10 ffff800c`b18bd180 : Ntfs!NtfsCommonClose+0x45c
ffffdd05`d95fe750 fffff801`0b185cb8     : 00000000`0000001c fffff801`0944b240 00000000`00000000 00000000`00000000 : Ntfs!NtfsFspCloseInternal+0x241
ffffdd05`d95fe8b0 fffff801`08f17d35     : ffff800c`adf069e0 ffff800c`adf06900 ffff800c`adf06900 ffff800c`b5619bb8 : Ntfs!NtfsFspClose+0x88
ffffdd05`d95feb70 fffff801`08ff1585     : ffff800c`b3eca040 00000000`00000080 ffff800c`ade71340 0000246f`b19bbdff : nt!ExpWorkerThread+0x105
ffffdd05`d95fec10 fffff801`09089128     : fffff801`07f72180 ffff800c`b3eca040 fffff801`08ff1530 48000000`c2e9d88a : nt!PspSystemThreadStartup+0x55
ffffdd05`d95fec60 00000000`00000000     : ffffdd05`d95ff000 ffffdd05`d95f9000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x28

Regards,
Mridul.

Hmmmmm… shouldn’t you be surfacing your RAID volume to NTFS in a way such that it doesn’t know anything about the underlying physical disks/partitions? So, when a physical disk/partition goes away, this is a private matter to your RAID driver, and has no effect at all on NTFS?

If that’s what you’re doing… the cause of your crash is “something else”…

ETA: the fault in the crash above is in CcGetPartition.

Peter

The call in NtfsFspCloseInternal is in a try/except block, CcGetPartition throws an exception (KiGeneralProtectionFault) and continue with the NtfsFspCloseInternal (NtfsFspCloseInternal$filt$0) exception handler, but in that handler, it is using an invalid pointer (KiPageFault).

The faulting instruction is a push rbp so I’m guessing the stack pointer is invalid ?

Are you sure you’ve set up your RAID system correctly ?
Are you correctly loading your driver at boot ?

The first obvious question is why are you doing this at all? There was an in box solution for software RAID from MSFT in windows 2000 - 20 years ago. And there are many others also including hybrid software / hardware solutions. This seems like a problem that has been well solved.

In any event, it has nothing to do with NTFS surely. By design, whatever sort of disk object that you expose, cannot assume that it will even be formatted NTFS. It could be FAT or ReFS or even more exotic things like CLFS. So if you have a crash after removing one of your drives, almost certainly that’s your bug.

Thank you for your replies. I’m sorry I am responding so late.

I am working on a cross-platform RAID solution. I will check how I am presenting my RAID virtual drive to Windows.

Regards,
Mridul.

Cross-platform RAID that isn’t hardware or SAN based?