clusdisk problem.

Our driver uses a volume and a fs filter. When our driver is installed we
cannot do a fail-over on certain hardware. The volume that is trying to
fail-over is not mounting. As part of our mount handling we try to read
from the volume. Reading returns a device not ready error. We found that
our disk filter was not clearing the DO_DEVICE_INITIALIZING flag in its
device object. After clearing the flag we now get a BSOD. We’ve seen this
on Windows 2000 Advanced server SP3 and SP4. The BSOD is when clusdisk
calls IoReleaseRemoveLock and IoReleaseRemoveLockEx is calling KeSetEvent
which crashes when it tries to access the wait list in the event of the
IO_REMOVE_LOCK. The address is always 0x00040016 and is, of course,
invalid. The interesting thing is that the structure looks like it is
shifted off by 8 bytes: there are two identical addresses that point to the
address where they start like and empty LIST_HEAD. I removed our driver and
put a break point in the clusdisk code where it calls IoReleaseRemoveLockEx
and that break point is never hit without our driver installed.

Has anybody seen this before or have any ideas?

Thanks,

Jonathan

This is the analysis:

1: kd> !analyze -v

****************************************************************************
***

* *

* Bugcheck Analysis *

* *

****************************************************************************
***

IRQL_NOT_LESS_OR_EQUAL (a)

An attempt was made to access a pageable (or completely invalid) address at
an

interrupt request level (IRQL) that is too high. This is usually

caused by drivers using improper addresses.

If a kernel debugger is available get the stack backtrace.

Arguments:

Arg1: 00040016, memory referenced

Arg2: 0000001c, IRQL

Arg3: 00000000, value 0 = read operation, 1 = write operation

Arg4: 80432de6, address which referenced memory

Debugging Details:


READ_ADDRESS: 00040016

CURRENT_IRQL: 1c

FAULTING_IP:

nt!KiWaitTest+26

80432de6 66837f1601 cmp word ptr [edi+0x16],0x1

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0xA

TRAP_FRAME: be238aa8 – (.trap ffffffffbe238aa8)

ErrCode = 00000000

eax=00000001 ebx=80420200 ecx=884f8b3c edx=00000000 esi=884f8b34
edi=00040000

eip=80432de6 esp=be238b1c ebp=be238b30 iopl=0 nv up ei pl nz ac pe cy

cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010213

nt!KiWaitTest+0x26:

80432de6 66837f1601 cmp word ptr [edi+0x16],0x1 ds:0023:00040016=???

Resetting default scope

LAST_CONTROL_TRANSFER: from 8042b78b to 80432de6

STACK_TEXT:

be238b30 8042b78b 8850f388 884f8b34 8042020a nt!KiWaitTest+0x26

be238b44 804202e1 884f8b34 00000000 00000000 nt!KeSetEvent+0x57

be238b6c bf1c681f 884f8b2c 87ed9508 884f8b30 nt!IoReleaseRemoveLockEx+0xd7

be238bbc bf1c5a97 8850f2d0 884f8b2c 8850f2d0
clusdisk!ClusDiskRootDeviceControl+0x3a9

be238c0c 8041de41 8850f2d0 87ed9508 87ed9508
clusdisk!ClusDiskDeviceControl+0xa9

be238c20 804b06b8 87e824fc 00000000 87ed9508 nt!IopfCallDriver+0x35

be238c34 804b14d9 8850f2d0 87ed9508 87e7ef88
nt!IopSynchronousServiceTail+0x60

be238d00 804a93dc 000001fc 00000000 00000000 nt!IopXxxControlFile+0x5ab

be238d34 80466389 000001fc 00000000 00000000 nt!NtDeviceIoControlFile+0x28

be238d34 77f950df 000001fc 00000000 00000000 nt!KiSystemService+0xc9

0097f93c 7c5794c7 000001fc 00000000 00000000 ntdll!ZwDeviceIoControlFile+0xb

0097f9a0 738dca12 000001fc 0004d458 0097f9d0 KERNEL32!DeviceIoControl+0xf8

0097f9e4 738dcb33 00088940 000001fc 00000007 clusres!DoArbEscape+0x57

0097fa08 738dc8d1 00088940 000001fc 00000000 clusres!VerifySectorSize+0x26

0097fa4c 738dab85 00088940 000001fc 77f89103 clusres!DiskArbitration+0x75

0097fa64 0100a060 00000001 0100654f 7c57ed3d clusres!DisksArbitrate+0x6f

0097fa74 01008f80 00089f10 0100654f 0097fa98 resrcmon!Resmon_Arbitrate+0x15

0097fa94 77d77fb0 00000000 02020202 00000001
resrcmon!s_RmArbitrateResource+0x84

0097faac 77d95ad7 01008efc 0097fac0 00000001 RPCRT4!Invoke+0x30

0097fd0c 77d9547e 00000000 00000000 0007e4b0 RPCRT4!NdrStubCall2+0x655

0097fd28 77d52156 0007e4b0 0007c568 0007e4b0 RPCRT4!NdrServerCall2+0x17

0097fd60 77d37ee1 0100ab1e 0007e4b0 0097fe0c RPCRT4!DispatchToStubInC+0x32

0097fdb8 77d37db5 00000006 00000000 0097fe0c
RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0x100

0097fdd8 77d58bea 0007e4b0 00000000 0097fe0c
RPCRT4!RPC_INTERFACE::DispatchToStub+0x5e

0097fe10 77d5717a 0007c6d0 0007b150 80020000
RPCRT4!LRPC_SCALL::DealWithRequestMessage+0x1dd

0097fe28 77d57689 0007e3b8 0097fe50 0007c6d0
RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0x10c

0097ff74 77d56d9e 77d39a00 0007b150 00000000
RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0x229

0097ff78 77d39a00 0007b150 00000000 00000000
RPCRT4!RecvLotsaCallsWrapper+0x9

0097ffa8 77d41c6c 0007ce58 0097ffec 7c57b382
RPCRT4!BaseCachedThreadRoutine+0x4f

0097ffb4 7c57b382 0007e520 00000000 00000000 RPCRT4!ThreadStartRoutine+0x18

0097ffec 00000000 77d41c54 0007e520 00000000 KERNEL32!BaseThreadStart+0x52

FOLLOWUP_IP:

clusdisk!ClusDiskRootDeviceControl+3a9

bf1c681f eb10 jmp clusdisk!ClusDiskRootDeviceControl+0x3bb (bf1c6831)

FOLLOWUP_NAME: MachineOwner

SYMBOL_NAME: clusdisk!ClusDiskRootDeviceControl+3a9

MODULE_NAME: clusdisk

IMAGE_NAME: clusdisk.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 3ef26059

STACK_COMMAND: .trap ffffffffbe238aa8 ; kb

BUCKET_ID: 0xA_clusdisk!ClusDiskRootDeviceControl+3a9

Followup: MachineOwner


1: kd> dd 884f8b2c // This is the address of the IO_REMOVE_LOCK

884f8b2c 00000000 00000000 00000000 00000001

884f8b3c 00040000 00000001 884f8b44 884f8b44

884f8b4c 00000000 00000000 00000000 00000000

884f8b5c 00000000 00000000 00000000 00000000

884f8b6c 00000000 00000000 00000000 00000000

884f8b7c 00000000 884f8bc0 bfdc23fe 00000000

884f8b8c 00000000 bfdc2428 8041605c bfdc2a7a

884f8b9c bfdc2b26 00000000 bfdc7f00 00000010

// This is the LIST_HEAD looking structure. Notice that this is expected to
be 16 bytes into the IO_REMOVE_LOCK structure but is really 24 bytes in.
Also notice that the size value of DWORDS could be in the 0x00040000 DWORD
( 4 bytes) which is also 8 bytes too far in and it would seem that a value
of 1 is expected for SignalState. I don’t know much about the internals of
events though.

1: kd> dd 884f8b44

884f8b44 884f8b44 884f8b44 00000000 00000000

884f8b54 00000000 00000000 00000000 00000000

884f8b64 00000000 00000000 00000000 00000000

884f8b74 00000000 00000000 00000000 884f8bc0

884f8b84 bfdc23fe 00000000 00000000 bfdc2428

884f8b94 8041605c bfdc2a7a bfdc2b26 00000000

884f8ba4 bfdc7f00 00000010 00000000 00000000

884f8bb4 00000000 00000000 00000000 00000000

I made a few corrections:

“Jonathan Ludwig” wrote in message
news:xxxxx@ntdev…
> Our driver uses a volume and a fs filter. When our driver is installed we
> cannot do a fail-over on certain hardware. The volume that is trying to
> fail-over is not mounting. As part of our mount handling we try to read
> from the volume. Reading returns a device not ready error. We found that
> our disk filter was not clearing the DO_DEVICE_INITIALIZING flag in its
> device object. After clearing the flag we now get a BSOD. We’ve seen
this
> on Windows 2000 Advanced server SP3 and SP4. The BSOD is when clusdisk
> calls IoReleaseRemoveLock and IoReleaseRemoveLockEx is calling KeSetEvent
> which crashes when it tries to access the wait list in the event of the
> IO_REMOVE_LOCK. The address is always 0x00040016 and is, of course,
> invalid. The interesting thing is that the structure looks like it is
> shifted off by 8 bytes: there are two identical addresses that point to
the
> address where they start like and empty LIST_HEAD. I removed our driver
and
> put a break point in the clusdisk code where it calls
IoReleaseRemoveLockEx
> and that break point is never hit without our driver installed.
>
> Has anybody seen this before or have any ideas?
>
> Thanks,
>
> Jonathan
>
> This is the analysis:
>
> 1: kd> !analyze -v
>
>
*************************************************************************
>

>
> * *
>
> * Bugcheck Analysis *
>
> * *
>
>
*************************************************************************
>

>
> IRQL_NOT_LESS_OR_EQUAL (a)
>
> An attempt was made to access a pageable (or completely invalid) address
at
> an
>
> interrupt request level (IRQL) that is too high. This is usually
>
> caused by drivers using improper addresses.
>
> If a kernel debugger is available get the stack backtrace.
>
> Arguments:
>
> Arg1: 00040016, memory referenced
>
> Arg2: 0000001c, IRQL
>
> Arg3: 00000000, value 0 = read operation, 1 = write operation
>
> Arg4: 80432de6, address which referenced memory
>
> Debugging Details:
>
> ------------------
>
>
>
> READ_ADDRESS: 00040016
>
> CURRENT_IRQL: 1c
>
> FAULTING_IP:
>
> nt!KiWaitTest+26
>
> 80432de6 66837f1601 cmp word ptr [edi+0x16],0x1
>
> DEFAULT_BUCKET_ID: DRIVER_FAULT
>
> BUGCHECK_STR: 0xA
>
> TRAP_FRAME: be238aa8 – (.trap ffffffffbe238aa8)
>
> ErrCode = 00000000
>
> eax=00000001 ebx=80420200 ecx=884f8b3c edx=00000000 esi=884f8b34
> edi=00040000
>
> eip=80432de6 esp=be238b1c ebp=be238b30 iopl=0 nv up ei pl nz ac pe cy
>
> cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010213
>
> nt!KiWaitTest+0x26:
>
> 80432de6 66837f1601 cmp word ptr [edi+0x16],0x1 ds:0023:00040016=???
>
> Resetting default scope
>
> LAST_CONTROL_TRANSFER: from 8042b78b to 80432de6
>
> STACK_TEXT:
>
> be238b30 8042b78b 8850f388 884f8b34 8042020a nt!KiWaitTest+0x26
>
> be238b44 804202e1 884f8b34 00000000 00000000 nt!KeSetEvent+0x57
>
> be238b6c bf1c681f 884f8b2c 87ed9508 884f8b30 nt!IoReleaseRemoveLockEx+0xd7
>
> be238bbc bf1c5a97 8850f2d0 884f8b2c 8850f2d0
> clusdisk!ClusDiskRootDeviceControl+0x3a9
>
> be238c0c 8041de41 8850f2d0 87ed9508 87ed9508
> clusdisk!ClusDiskDeviceControl+0xa9
>
> be238c20 804b06b8 87e824fc 00000000 87ed9508 nt!IopfCallDriver+0x35
>
> be238c34 804b14d9 8850f2d0 87ed9508 87e7ef88
> nt!IopSynchronousServiceTail+0x60
>
> be238d00 804a93dc 000001fc 00000000 00000000 nt!IopXxxControlFile+0x5ab
>
> be238d34 80466389 000001fc 00000000 00000000 nt!NtDeviceIoControlFile+0x28
>
> be238d34 77f950df 000001fc 00000000 00000000 nt!KiSystemService+0xc9
>
> 0097f93c 7c5794c7 000001fc 00000000 00000000
ntdll!ZwDeviceIoControlFile+0xb
>
> 0097f9a0 738dca12 000001fc 0004d458 0097f9d0 KERNEL32!DeviceIoControl+0xf8
>
> 0097f9e4 738dcb33 00088940 000001fc 00000007 clusres!DoArbEscape+0x57
>
> 0097fa08 738dc8d1 00088940 000001fc 00000000 clusres!VerifySectorSize+0x26
>
> 0097fa4c 738dab85 00088940 000001fc 77f89103 clusres!DiskArbitration+0x75
>
> 0097fa64 0100a060 00000001 0100654f 7c57ed3d clusres!DisksArbitrate+0x6f
>
> 0097fa74 01008f80 00089f10 0100654f 0097fa98
resrcmon!Resmon_Arbitrate+0x15
>
> 0097fa94 77d77fb0 00000000 02020202 00000001
> resrcmon!s_RmArbitrateResource+0x84
>
> 0097faac 77d95ad7 01008efc 0097fac0 00000001 RPCRT4!Invoke+0x30
>
> 0097fd0c 77d9547e 00000000 00000000 0007e4b0 RPCRT4!NdrStubCall2+0x655
>
> 0097fd28 77d52156 0007e4b0 0007c568 0007e4b0 RPCRT4!NdrServerCall2+0x17
>
> 0097fd60 77d37ee1 0100ab1e 0007e4b0 0097fe0c RPCRT4!DispatchToStubInC+0x32
>
> 0097fdb8 77d37db5 00000006 00000000 0097fe0c
> RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0x100
>
> 0097fdd8 77d58bea 0007e4b0 00000000 0097fe0c
> RPCRT4!RPC_INTERFACE::DispatchToStub+0x5e
>
> 0097fe10 77d5717a 0007c6d0 0007b150 80020000
> RPCRT4!LRPC_SCALL::DealWithRequestMessage+0x1dd
>
> 0097fe28 77d57689 0007e3b8 0097fe50 0007c6d0
> RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0x10c
>
> 0097ff74 77d56d9e 77d39a00 0007b150 00000000
> RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0x229
>
> 0097ff78 77d39a00 0007b150 00000000 00000000
> RPCRT4!RecvLotsaCallsWrapper+0x9
>
> 0097ffa8 77d41c6c 0007ce58 0097ffec 7c57b382
> RPCRT4!BaseCachedThreadRoutine+0x4f
>
> 0097ffb4 7c57b382 0007e520 00000000 00000000
RPCRT4!ThreadStartRoutine+0x18
>
> 0097ffec 00000000 77d41c54 0007e520 00000000 KERNEL32!BaseThreadStart+0x52
>
>
>
> FOLLOWUP_IP:
>
> clusdisk!ClusDiskRootDeviceControl+3a9
>
> bf1c681f eb10 jmp clusdisk!ClusDiskRootDeviceControl+0x3bb (bf1c6831)
>
> FOLLOWUP_NAME: MachineOwner
>
> SYMBOL_NAME: clusdisk!ClusDiskRootDeviceControl+3a9
>
> MODULE_NAME: clusdisk
>
> IMAGE_NAME: clusdisk.sys
>
> DEBUG_FLR_IMAGE_TIMESTAMP: 3ef26059
>
> STACK_COMMAND: .trap ffffffffbe238aa8 ; kb
>
> BUCKET_ID: 0xA_clusdisk!ClusDiskRootDeviceControl+3a9
>
> Followup: MachineOwner
>
> ---------
>
> 1: kd> dd 884f8b2c // This is the address of the expected
IO_REMOVE_LOCK structure
>
> 884f8b2c 00000000 00000000 00000000 00000001
>
> 884f8b3c 00040000 00000001 884f8b44 884f8b44
>
> 884f8b4c 00000000 00000000 00000000 00000000
>
> 884f8b5c 00000000 00000000 00000000 00000000
>
> 884f8b6c 00000000 00000000 00000000 00000000
>
> 884f8b7c 00000000 884f8bc0 bfdc23fe 00000000
>
> 884f8b8c 00000000 bfdc2428 8041605c bfdc2a7a
>
> 884f8b9c bfdc2b26 00000000 bfdc7f00 00000010
>
> // This is the LIST_HEAD looking structure. Notice that this is expected
to
> be 16 bytes into the IO_REMOVE_LOCK structure but is really 24 bytes in.
> Also notice in the possible IO_REMOVE_LOCK structure the KEVENT structure
is expected to be 8 bytes into the lock structure and that the size value of
DWORDS could be in the 0x00040000 DWORD
> (4 the size of a KEVENT) which is also 8 bytes too far in and it would
seem that a value
> of 1 is expected for SignalState. I don’t know much about the internals
of
> events though.
>
> 1: kd> dd 884f8b44
>
> 884f8b44 884f8b44 884f8b44 00000000 00000000
>
> 884f8b54 00000000 00000000 00000000 00000000
>
> 884f8b64 00000000 00000000 00000000 00000000
>
> 884f8b74 00000000 00000000 00000000 884f8bc0
>
> 884f8b84 bfdc23fe 00000000 00000000 bfdc2428
>
> 884f8b94 8041605c bfdc2a7a bfdc2b26 00000000
>
> 884f8ba4 bfdc7f00 00000010 00000000 00000000
>
> 884f8bb4 00000000 00000000 00000000 00000000
>
>
>