BSOD debugging help needed

Hi,
I am asking this group help in chasing a very devious BSOD.

First some details:
1.My software includes a upper volume fileter driver + service application.2
2.The BSOD occurs only when the test machines are win2k3 SP1 with MSCS
cluster installed.
3.My software is cluster-aware and has associated resources that move from
node to node.
4.This issue never happens when I install Veritas VVM and use VVM’s disk
resource as a disk cluster resource for MSCS. It only happens without VVM
when the clustered disks are a of a physical disk resource type instead of
VVM’s disk group resource.
5.My daemon issues alot of reads to volumes that resides on clustered disks.
Due to unavoidable race conditions, it might be possible that the daemon
will issue a read I/O even after a volume become off-line. This is usually
not a problem since the daemon will just receive an error and will know how
to handle it.
6.The BSOD happens when the disks are becoming off-line (They are switched
to the second node).
7.Although Read I/O pass through my filter, my filter doesn’t regard them
and it just pass them down the stack.
8.This BSOD never happens when the machine aren’t in a cluster
configuration.

The first BSOD that I got, was a LOCKED_PAGES_TRACKER_CORRUPTION which
indicated that the same MDL was inserted twice into the same process list
(My daemon). After I was told to try to use driver verifier to pinpoint the
problem, I activated driver verifier on all the drivers in my machine with
the following settings:
Special pool
Special irql
All pool allocations checked on unload
Io subsystem checking enabled
Deadlock detection enabled
DMA checking enabled

After a 20 minutes run, I got a verifier assertion. I was very happy, until
I saw that the assertion pin-points ntfs and not my driver.
Now, I know that some of you are going to say that since it happens with my
driver, then my driver is at fault. My answer to this is that in 99% of the
time it is true, in the remaining 1% of the time, it is possible that my
driver changes the I/O timing in such a way that it will never occur without
it.
Moreover, if my driver is really at fault, verifier should have caught him
instead of ntfs.
Since the assertion also shows an IRP, I tried to add debugging statements
to see if the IRP ever reached my driver but I never saw it.

Attached below are the assertion details.
I will be very glad to receive any help as I am starting to become
desparate.

Thanks,
Eran.

***********************************************************************
* THIS VALIDATION BUG IS FATAL AND WILL CAUSE THE VERIFIER TO HALT *
* WINDOWS (BUGCHECK) WHEN THE MACHINE IS NOT UNDER A KERNEL DEBUGGER! *
***********************************************************************

WDM DRIVER ERROR: [Ntfs.sys @ 0xF721A7FD] An IRP dispatch handler has
returned without passing down or completing this Irp or
someone forgot to return STATUS_PENDING. (Irp = 86E49008
).
IRP_MJ_READ
[ DevObj=86EC8480, FileObject=86E7B1C0, Parameters=00800000 00000000
09000000 00000000 ]
http://www.microsoft.com/hwdq/bc/default.asp?os=5.2.3790&major=0xc9&minor=0x226&lang=0x9
Break, Ignore, Zap, Remove, Disable all (bizrd)? break
b
Breaking in… (press g to return to assert menu)
Break instruction exception - code 80000003 (first chance)
nt!DbgBreakPoint:
80833f89 cc int 3
3: kd> !analyze -v
Connected to Windows Server 2003 3790 x86 compatible target, ptr64 FALSE
Loading Kernel Symbols

Loading unloaded module list

Loading User Symbols

Unknown bugcheck code (0)
Unknown bugcheck description
Arguments:
Arg1: 00000000
Arg2: 00000000
Arg3: 00000000
Arg4: 00000000

Debugging Details:
------------------

PROCESS_NAME: trcuser.exe

FAULTING_IP:
nt!DbgBreakPoint+0
80833f89 cc int 3

EXCEPTION_RECORD: ffffffff – (.exr ffffffffffffffff)
ExceptionAddress: 80833f89 (nt!DbgBreakPoint)
ExceptionCode: 80000003 (Break instruction exception)
ExceptionFlags: 00000000
NumberParameters: 3
Parameter[0]: 00000000
Parameter[1]: 8705d020
Parameter[2]: 00000039

ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION} Breakpoint A breakpoint
has been reached.

DEFAULT_BUCKET_ID: DRIVER_FAULT

BUGCHECK_STR: 0x0

CURRENT_IRQL: 2

LAST_CONTROL_TRANSFER: from 809df3f9 to 80833f89

STACK_TEXT:
b9a1980c 809df3f9 00000000 b9a19af4 b9a198c7 nt!DbgBreakPoint
b9a19828 809df5cd b9a19b08 80a14119 00040000 nt!ViBugcheckPrompt+0xf2
b9a19acc 809df6a8 80a13248 00000226 b9a19af8
nt!VfBugcheckThrowException+0xb2
b9a19bbc 809e8330 00000226 00000041 f721a7fd
nt!VfBugcheckThrowIoException+0xb5
b9a19bf0 809d458d 000bbeb8 86ec8480 86ec8480 nt!IovpCallDriver2+0x31a
b9a19c1c 80859657 f72d4c53 b9a19c50 f72d4c53 nt!IovCallDriver+0x122
b9a19c28 f72d4c53 86ec8480 80a78be4 86c78020 nt!IofCallDriver+0x13
b9a19c50 809d457d 86ec8480 86e49008 86e49008 fltmgr!FltpDispatch+0x6f
b9a19c80 80859657 8092d3b9 b9a19ca0 8092d3b9 nt!IovCallDriver+0x112
b9a19c8c 8092d3b9 86e491e0 86e49008 86e7b1c0 nt!IofCallDriver+0x13
b9a19ca0 8092d5c3 86ec8480 86e49008 86e7b1c0
nt!IopSynchronousServiceTail+0x10b
b9a19d38 80834d3f 000004ac 00000000 00000000 nt!NtReadFile+0x5cf
b9a19d38 7c82ed54 000004ac 00000000 00000000 nt!KiFastCallEntry+0xfc
WARNING: Stack unwind information not available. Following frames may be
wrong.
0250f234 004829ef 000004ac 0417e000 00800000 ntdll!KiFastSystemCallRet
0250f350 00482547 015583e8 01010000 0250f4e8
trcuser!vols_ReadWriteSendIO_CK+0x46f
[d:\projects\frea\alfa1\vols_io_win2k.c @ 875]
0250f3ac 00482106 015583e8 0040109b 0250f548
trcuser!vols_ReadWriteSendIO+0x47 [d:\projects\frea\alfa1\vols_io_win2k.c @
726]
0250f4e8 0048169c 015583e8 01010000 0250f718
trcuser!vols_ReadWriteDoIO_CK+0x646 [d:\projects\frea\alfa1\vols_io_win2k.c
@ 571]
0250f548 00480907 015583e8 00b149a8 00000000 trcuser!vols_ReadWriteDoIO+0xcc
[d:\projects\frea\alfa1\vols_io.c @ 1731]
0250f718 004802c9 015583e8 01010000 0250f824
trcuser!vols_ReadWriteExtent_CK+0x5e7 [d:\projects\frea\alfa1\vols_io.c @
1158]
0250f790 0048002a 015583e8 00b149a8 00000000
trcuser!vols_ReadWriteExtent+0xd9 [d:\projects\frea\alfa1\vols_io.c @ 937]
0250f824 0050523e 015583e8 00b149a8 00000000 trcuser!vols_ReadExtent+0x7a
[d:\projects\frea\alfa1\vols_io.c @ 525]
0250f988 00504c77 015583e8 01010000 0250faf8
trcuser!vrrQReadAhead_ReadData_CK+0x40e
[d:\projects\frea\alfa1\vrrq_read_ahead.c @ 808]
0250f9e4 00503fb8 015583e8 00503bd0 0250fb60
trcuser!vrrQReadAhead_GetNextReadyDataEntry+0x247
[d:\projects\frea\alfa1\vrrq_read_ahead.c @ 263]
0250faf8 00503b81 015583e8 01010000 0250fbec
trcuser!vrrQReadAhead_Task_CK+0x268
[d:\projects\frea\alfa1\vrrq_read_ahead.c @ 450]
0250fb60 0043f086 015583e8 0250ffa0 0250fbf8 trcuser!vrrQPhysical_Dump+0x1c1
[d:\projects\frea\alfa1\vrrq_physical.c @ 2101]
0250fbec 00440b96 00000005 00000000 00000000
trcuser!ossTAJThread_MainLoop+0x156 [D:\projects\Frea\alfa1\oss_taj_thread.c
@ 508]
0250ffb8 77e66063 005d32c8 00000000 00000000
trcuser!ossThrd_OSThreadEntry+0xb6 [d:\projects\frea\alfa1\oss_thrd.c @
1090]
0250ffec 00000000 00440ae0 005d32c8 00000000 kernel32!BaseThreadStart+0x34

FOLLOWUP_IP:
fltmgr!FltpDispatch+6f
f72d4c53 e9df000000 jmp fltmgr!FltpDispatch+0x153 (f72d4d37)

SYMBOL_STACK_INDEX: 7

FOLLOWUP_NAME: MachineOwner

SYMBOL_NAME: fltmgr!FltpDispatch+6f

MODULE_NAME: fltmgr

IMAGE_NAME: fltmgr.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 42435ba1

STACK_COMMAND: kb

FAILURE_BUCKET_ID: 0x0_fltmgr!FltpDispatch+6f

BUCKET_ID: 0x0_fltmgr!FltpDispatch+6f

Followup: MachineOwner
---------

3: kd> !verifier 15
Verify Level bb … enabled options are:
Special pool
Special irql
All pool allocations checked on unload
Io subsystem checking enabled
Deadlock detection enabled
DMA checking enabled

Summary of All Verifier Statistics

RaiseIrqls 0x4f19e
AcquireSpinLocks 0x7f6ebb
Synch Executions 0x75845
Trims 0xc616b

Pool Allocations Attempted 0x154917
Pool Allocations Succeeded 0x154917
Pool Allocations Succeeded SpecialPool 0xc995
Pool Allocations With NO TAG 0x0
Pool Allocations Failed 0x0
Resource Allocations Failed Deliberately 0x0

Current paged pool allocations 0x2472 for 00493540 bytes
Peak paged pool allocations 0x2604 for 004B2DE8 bytes
Current nonpaged pool allocations 0x3b7f for 0233B864 bytes
Peak nonpaged pool allocations 0x3c38 for 02364564 bytes

Driver Verification List

Entry State NonPagedPool PagedPool Module

8a3ddd90 Loaded 000004c8 00000000 hal.dll
8a3dd370 Loaded 00000000 00000000 kdcom.dll
8a3dd2f8 Loaded 00000000 00000000 BOOTVID.dll
8a3dd288 Loaded 0002a04c 0000294c ACPI.sys
8a3dd210 Loaded 00000000 00000000 WMILIB.SYS
8a3dd1a0 Loaded 00002ac0 00021648 pci.sys
8a3dd128 Loaded 0000004c 000000a8 isapnp.sys
8a3dd0b0 Loaded 00000000 00000000 pciide.sys
8a3dd038 Loaded 0000006c 00000000 PCIIDEX.SYS
8a3c8008 Loaded 00005358 0000491c MountMgr.sys
8a3c8f90 Loaded 0000c930 00000d78 ftdisk.sys
8a3c8f18 Loaded 00000000 00000000 dmload.sys
8a3c8ea8 Loaded 0010d298 00000264 dmio.sys
8a3c8e30 Loaded 0007f31c 00000078 volsnap.sys
8a3c8db8 Loaded 00003a40 0000047c PartMgr.sys
8a3c8d40 Loaded 0000310c 00000000 atapi.sys
8a3c8cc8 Loaded 00000000 00000000 lp6nds35.sys
8a3c8c50 Loaded 003cb63c 000002cc SCSIPORT.SYS
8a3c8bd8 Loaded 00000000 00000000 nfrd960.sys
8a3c8b60 Loaded 00004a54 00000024 storport.sys
8a3c8ae8 Loaded 00000000 00000000 lpxnds.sys
8a3c8a78 Loaded 000000f0 00000088 disk.sys
8a3c8a00 Loaded 0001a598 000000e0 CLASSPNP.SYS
8a3c8988 Loaded 0000dff8 00000010 fltmgr.sys
8a3c8918 Loaded 00002148 00000930 Dfs.sys
8a3c88a0 Loaded 00001378 00000018 KSecDD.sys
8a3c8830 Loaded 000d3178 0024c840 Ntfs.sys
8a3c87c0 Loaded 000b95f8 00000378 NDIS.sys
8a3c8740 Loaded 0146dfe0 00000000 TDPS_R2_2.sys
8a3c86d0 Loaded 00000190 00004cb8 Mup.sys
8a3c8658 Loaded 00202000 00000000 lpxftr.sys
8a3c85e0 Loaded 00000000 00000000 crcdisk.sys
886118b0 Loaded 00000080 00000138 intelppm.sys
885c3400 Loaded 000001b0 00000000 watchdog.sys
885c4d08 Loaded 00000178 00001a10 VIDEOPRT.SYS
885c3088 Loaded 00000000 00000000 ati2mpad.sys
88563f90 Loaded 00000f60 00000000 i8042prt.sys
885412d0 Loaded 000009e8 00000000 kbdclass.sys
88541258 Loaded 00001348 00000000 mouclass.sys
88551ed8 Loaded 00000028 00000000 fdc.sys
8854af90 Loaded 00000060 00000030 cdrom.sys
88549f98 Loaded 000000d0 000001f0 ks.sys
885519a8 Loaded 000000d0 00000000 redbook.sys
8853cb20 Loaded 0000a838 00000028 USBPORT.SYS
8854e3e0 Loaded 00000000 00000000 usbohci.sys
88603880 Loaded 00000000 00000000 e1000325.sys
8861f8e8 Loaded 00000000 00000000 audstub.sys
885cff90 Loaded 00000000 00000000 rasl2tp.sys
88663d40 Loaded 000000a0 00000000 ndistapi.sys
8863f3f8 Loaded 000012a0 00000000 ndiswan.sys
8850f280 Loaded 00000000 00000000 raspppoe.sys
88506b90 Loaded 00001788 00000000 TDI.SYS
88572658 Loaded 00000000 00000000 raspptp.sys
88503a18 Loaded 00000000 00000120 ptilink.sys
8855c7e8 Loaded 00000000 00000000 raspti.sys
8856f960 Loaded 00000400 00000000 rdpdr.sys
886152a0 Loaded 000025b0 00000000 termdd.sys
8852c268 Loaded 00000000 00000000 swenum.sys
8856f588 Loaded 00000000 00000000 update.sys
885c47d0 Loaded 0000a0c8 00000078 mssmbios.sys
884a3f90 Loaded 00001ab0 00000000 NDProxy.SYS
884cfcd0 Loaded 00000000 00000000 dmboot.sys
88608130 Loaded 00000020 00000028 flpydisk.sys
884dc8b8 Loaded 00000000 00000000 USBD.SYS
8830c008 Loaded 000001c0 00000098 usbhub.sys
8840eca8 Unloaded 00000000 00000000 Sfloppy.SYS
8830e008 Loaded 00029d98 00000000 ClusDisk.sys
884ce388 Loaded 00000018 00000000 Fs_Rec.SYS
88476d20 Loaded 00000000 00000000 Null.SYS
88606f98 Loaded 00000000 00000000 Beep.SYS
88478008 Loaded 00000000 00000000 vga.sys
88662350 Loaded 00000000 00000000 mnmdd.SYS
8866aaa8 Loaded 00000000 00000000 RDPCDD.sys
88667570 Loaded 00000250 00000428 Msfs.SYS
8832e008 Loaded 000021f0 00004040 Npfs.SYS
8854d458 Loaded 00000430 00000000 rasacd.sys
884664a0 Loaded 000113e8 00000000 ipsec.sys
88662f90 Loaded 00000818 00000000 msgpc.sys
88333008 Loaded 001d8fc0 00000000 tcpip.sys
8831a228 Loaded 00013620 00000000 netbt.sys
884a1e10 Loaded 000003f0 00000000 wanarp.sys
886331b8 Loaded 000266d0 00002600 afd.sys
88566760 Loaded 000009e8 00000000 netbios.sys
884d92c8 Unloaded 00000000 00000000 serial.sys
88549418 Loaded 000024a0 000005f0 rdbss.sys
8834b0d8 Loaded 00000930 000004b8 mrxsmb.sys
88342908 Unloaded 00000000 00000000 imapi.sys
8842eb60 Loaded 00000520 00000000 Fips.SYS
88480738 Loaded 00001080 00003148 Fastfat.SYS
88456f80 Loaded 00000000 00000000 dump_storport.sys
88540f88 Loaded 00000000 00000000 dump_nfrd960.sys
88672f20 Loaded 00000000 00000000 Dxapi.sys
886737f0 Loaded 00002900 001ffa60 win32k.sys
88535240 Loaded 00000000 00000000 dxgthk.sys
88468df8 Loaded 00000000 00000000 dxg.sys
8857dc28 Loaded&Unloaded 00000000 00000000 ati2drad.dll
88481410 Unloaded 00000000 00000000 vga.dll
88343cc8 Loaded 00000000 00000000 ndisuio.sys
88433b80 Loaded 0000d138 00000000 clusnet.sys
876e86a0 Loaded 002ec970 000077c8 srv.sys
870ce618 Loaded 00000000 00000000 TDTCP.SYS
883f15c0 Loaded 00000578 00003890 RDPWD.SYS
-----------------------------------------------
Fault injection trace log
-----------------------------------------------
No fault injection traces found.
-----------------------------------------------
Verifier triage
-----------------------------------------------
Incorrect symbols (failed to get address of `nt!VerifierTriageActionTaken’
variable.

3: kd> !irp 86E49008
Irp is active with 11 stacks 11 is current (= 0x86e491e0)
No Mdl Thread 8705d020: Irp stack trace.
cmd flg cl Device File Completion-Context
[0, 0] 0 0 00000000 00000000 00000000-00000000

Args: 00000000 00000000 00000000 00000000
[0, 0] 0 0 00000000 00000000 00000000-00000000

Args: 00000000 00000000 00000000 00000000
[0, 0] 0 0 00000000 00000000 00000000-00000000

Args: 00000000 00000000 00000000 00000000
[0, 0] 0 0 00000000 00000000 00000000-00000000

Args: 00000000 00000000 00000000 00000000
[0, 0] 0 0 00000000 00000000 00000000-00000000

Args: 00000000 00000000 00000000 00000000
[0, 0] 0 0 00000000 00000000 00000000-00000000

Args: 00000000 00000000 00000000 00000000
[0, 0] 0 0 00000000 00000000 00000000-00000000

Args: 00000000 00000000 00000000 00000000
[0, 0] 0 0 00000000 00000000 00000000-00000000

Args: 00000000 00000000 00000000 00000000
[0, 0] 0 0 00000000 00000000 00000000-00000000

Args: 00000000 00000000 00000000 00000000
[0, 0] 0 10 00000000 00000000 00000000-00000000

Args: 00000000 00000000 00000000 00000000
>[3, 0] 0 0 86c78020 86e7b1c0 00000000-00000000
\FileSystem\Ntfs
Args: 00800000 00000000 09000000 00000000
3: kd> dt _IRP 86E49008
+0x000 Type : 6
+0x002 Size : 0x1fc
+0x004 MdlAddress : (null)
+0x008 Flags : 0x40000901
+0x00c AssociatedIrp : __unnamed
+0x010 ThreadListEntry : _LIST_ENTRY [0x8705d228 - 0x8705d228]
+0x018 IoStatus : _IO_STATUS_BLOCK
+0x020 RequestorMode : 1 ‘’
+0x021 PendingReturned : 0 ‘’
+0x022 StackCount : 11 ‘’
+0x023 CurrentLocation : 11 ‘’
+0x024 Cancel : 0 ‘’
+0x025 CancelIrql : 0 ‘’
+0x026 ApcEnvironment : 0 ‘’
+0x027 AllocationFlags : 0x81 ‘’
+0x028 UserIosb : 0x02c91f38
+0x02c UserEvent : (null)
+0x030 Overlay :__unnamed
+0x038 CancelRoutine : (null)
+0x03c UserBuffer : 0x0417e000
+0x040 Tail : __unnamed

!devobj 86EC8480
Device object (86ec8480) is for:
\FileSystem\FltMgr DriverObject 88746658
Current Irp 00000000 RefCount 0 Type 00000008 Flags 00000000
DevExt 86ec8538 DevObjExt 86ec8568
ExtensionFlags (0xc0000000) DOE_BOTTOM_OF_FDO_STACK, DOE_DESIGNATED_FDO
AttachedTo (Lower) 86c78020 \FileSystem\Ntfs
Device queue is not busy