FltQueryInformationFile causing bugcheck when called from PreCleanup

Hi,

I've a minifilter driver that's calling FltQueryInformationFile (FILE_INFORMATION_CLASS being FileBasicInformation) from PreCleanup.
It succeeds on most of the occasions (i.e. we have not been able to reproduce it in-house so far) however it looks like in the field there are no. of occasions
when systems (all client WinOS, win10 upwards) are crashing with SYSTEM_SERVICE_EXCEPTION bug check and our driver is found in the call stack making call to
FltQueryInformationFile(). Since the dumps are being reported under the MS analytics program, I only have dumps to work with and do not know exact steps
/ actions that lead to the crash. The instance of the filter is attached to an NTFS volume.

From the crash dump, it looks like a completely invalid address is being accessed.
However, it is unclear to me why that is the case - I'm passing all the valid pointers i.e. instance, file object (what I received in callback data) and also
a valid pointer to buffer that shall receive the information.

It's also worth noting that call stack is not always consistent - they differ but there are always couple of function frames corresponding to our driver that
call FltQueryInformationFile.

I don't think there is anything wrong with the call itself. Just thinking, are there any special circumstances under which this API should not be called?

Here's pasting important bits from !analyze -v:

9: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

SYSTEM_SERVICE_EXCEPTION (3b)
An exception happened while executing a system service routine.
Arguments:
Arg1: 00000000c0000005, Exception code that caused the BugCheck
Arg2: fffff8028070b73f, Address of the instruction which caused the BugCheck
Arg3: ffff860375e43480, Address of the context record for the exception that caused the BugCheck
Arg4: 0000000000000000, zero.

Debugging Details:
------------------

BUGCHECK_CODE:  3b

BUGCHECK_P1: c0000005

BUGCHECK_P2: fffff8028070b73f

BUGCHECK_P3: ffff860375e43480

BUGCHECK_P4: 0

FILE_IN_CAB:  MEMORY.DMP

TAG_NOT_DEFINED_202b:  *** Unknown TAG in analysis list 202b


DUMP_FILE_ATTRIBUTES: 0x1800

CONTEXT:  ffff860375e43480 -- (.cxr 0xffff860375e43480)
rax=0000000000000000 rbx=ffff860375e44160 rcx=00007802805e0000
rdx=ffffb08f832b73c0 rsi=ffffc5875c978920 rdi=ffffc5875d426890
rip=fffff8028070b73f rsp=ffff860375e43ea0 rbp=ffffb08f80de61d0
 r8=ffffb08f5cdbc1b0  r9=0000000000000001 r10=ffffc58758a4b5a0
r11=ffff860375e43e70 r12=0000000000000000 r13=ffffb08f7ea44ae0
r14=ffffb08f80de61d0 r15=0000000000000000
iopl=0         nv up ei ng nz ac po cy
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00050297

Ntfs!NtfsCommonQueryInformation+0x8bf:
fffff802`8070b73f 8b948118c80700  mov     edx,dword ptr [rcx+rax*4+7C818h] ds:002b:00007802`8065c818=????????

Resetting default scope

BLACKBOXBSD: 1 (!blackboxbsd)


BLACKBOXNTFS: 1 (!blackboxntfs)


BLACKBOXPNP: 1 (!blackboxpnp)


BLACKBOXWINLOGON: 1

PROCESS_NAME:  Dropbox.exe

STACK_TEXT:  
Ntfs!NtfsCommonQueryInformation+0x8bf
Ntfs!NtfsFsdDispatchSwitch+0x156
Ntfs!NtfsFsdDispatchWait+0x40
nt!IofCallDriver+0x55
FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x15b
FLTMGR!FltPerformSynchronousIo+0x2d8
FLTMGR!FltQueryInformationFile+0x6c
mydriver!mydriver::My_QueryInformationFile+0xf1
mydriver!`anonymous namespace'::FileCleanupPre+0xae
FLTMGR!FltpPerformPreCallbacksWorker+0x37a
FLTMGR!FltpPassThroughInternal+0xd1
FLTMGR!FltpPassThrough+0x179
FLTMGR!FltpDispatch+0x8b
nt!IofCallDriver+0x55
nt!IopCloseFile+0x18f
nt!ObpCloseHandle+0x313
nt!NtClose+0x39
nt!KiSystemServiceCopyEnd+0x28
0x00007fff`8762fb34

Looks like that's part of the switch statement for handling the passed in FILE_INFORMATION_CLASS. The info class value should be in RAX, which is zero in your case and that's not a valid value:

0: kd> dt nt!_FILE_INFORMATION_CLASS
   FileDirectoryInformation = 0n1
   FileFullDirectoryInformation = 0n2
   ...

While this might look like an NTFS bug, the file systems expect that the inputs are sanitized by ZwQueryInformationFile (which would return a STATUS_INVALID_INFO_CLASS error here). But FltQueryInformationFile doesn't do any sanity checking and just sends whatever you pass to the file system which can lead to weird failures.

Now, if you're not obviously sending an info class of zero then this might just be a secondary failure (i.e. the real bug is that your state is corrupt and so you end up sending zero as the info class).

Sorry, I spoke too soon...

I decided to poke at my target Win11 system and RAX isn't the info class itself. Instead NTFS subtracts 4 before doing the calculation. So, looks like RAX would be zero here if you're asking for FileBasicInformation. I put a breakpoint on that instruction and sent my own just to be sure and RAX is indeed zero:

1: kd> r
rax=0000000000000000 rbx=ffff800f228fc6f0 rcx=fffff80611e70000
rdx=ffffaf062ceab470 rsi=ffffd3052e633a40 rdi=ffffd3052862a7c0
rip=fffff80611f9b63f rsp=ffff800f228fc430 rbp=fffff8060f98a1d6
 r8=ffffaf062671f1b0  r9=0000000000000003 r10=ffffd3052862a660
r11=ffff800f228fc400 r12=0000000000000000 r13=ffffaf062e61b820
r14=ffffaf062fe13000 r15=0000000000000000
iopl=0         nv up ei ng nz ac po cy
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00040297
Ntfs!NtfsCommonQueryInformation+0x8bf:
fffff806`11f9b63f mov     edx,dword ptr [rcx+rax*4+7C818h] ds:002b:fffff806`11eec818=0012b64f

The set of instructions I see leading up to this is:

Ntfs!NtfsCommonQueryInformation+0x8a7:
fffff806`11f9b627 add     r15d,0FFFFFFFCh
fffff806`11f9b62b cmp     r15d,48h
fffff806`11f9b62f ja      Ntfs!NtfsCommonQueryInformation+0xc22 (fffff806`11f9b9a2)  Branch

Ntfs!NtfsCommonQueryInformation+0x8b5:
fffff806`11f9b635 movsxd  rax,r15d
fffff806`11f9b638 lea     rcx,[Ntfs!NtfsZeroEndOfSectorNonCached <PERF> (Ntfs+0x0) (fffff806`11e70000)]
fffff806`11f9b63f mov     edx,dword ptr [rcx+rax*4+7C818h]
fffff806`11f9b646 add     rdx,rcx
fffff806`11f9b649 jmp     rdx

That means RCX should be a kernel address within NTFS that's the base of the jump table.
In your crash it's 0x00007802805e0000 and that would be a problem...No idea where that could have come from though. Is this just a minidump?

Hey Scott,

Thanks for taking the time out to look into this (and also for the correct formatting), appreciate it!

So it is a full kernel memory dump and I too have no idea as to why rcx has that invalid address. What's weird is its always been there through out the frames of the stack i.e. when I switch to different function frames from the call stack and dump contents of registers, its always there - even in nt!NtClose() that triggered this flow.

9: kd> kPnL
 # Child-SP          RetAddr               Call Site
00 ffff8603`75e42b28 fffff802`7c42de69     nt!KeBugCheckEx
01 ffff8603`75e42b30 fffff802`7c42cf3c     nt!KiBugCheckDispatch+0x69
02 ffff8603`75e42c70 fffff802`7c4236d2     nt!KiSystemServiceHandler+0x7c
03 ffff8603`75e42cb0 fffff802`7c2ed293     nt!RtlpExecuteHandlerForException+0x12
04 ffff8603`75e42ce0 fffff802`7c2bd35e     nt!RtlDispatchException+0x2f3
05 ffff8603`75e43450 fffff802`7c42dfbc     nt!KiDispatchException+0x1ae
06 ffff8603`75e43b30 fffff802`7c4292df     nt!KiExceptionDispatch+0x13c
07 ffff8603`75e43d10 fffff802`8070b73f     nt!KiPageFault+0x45f
08 ffff8603`75e43ea0 fffff802`8070c6f6     Ntfs!NtfsCommonQueryInformation+0x8bf
09 ffff8603`75e44010 fffff802`8070bf60     Ntfs!NtfsFsdDispatchSwitch+0x156
0a ffff8603`75e44140 fffff802`7c229b45     Ntfs!NtfsFsdDispatchWait+0x40
0b ffff8603`75e44410 fffff802`7a13a1db     nt!IofCallDriver+0x55
0c ffff8603`75e44450 fffff802`7a13c418     FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x15b
0d ffff8603`75e444c0 fffff802`7a17490c     FLTMGR!FltPerformSynchronousIo+0x2d8
0e ffff8603`75e44570 fffff802`805c7ec1     FLTMGR!FltQueryInformationFile+0x6c
0f ffff8603`75e445b0 fffff802`805c85fe     mydriver!mydriver::My_QueryInformationFile(
			struct _FLT_INSTANCE * instance = 0xffffb08f`5f320aa0, 
			struct _FILE_OBJECT * fileObject = 0xffffb08f`7ea44ae0 "\Users\userxyz\.dropbox_bi\79f37303725e625447a3be2b4f78157dce28a9f80457cf9c7641527d9c4147db\DropboxBackup\LIFESTUDIO\LIFESTUDIO.dbx-external-drive" - Device for "\FileSystem\Ntfs")+0xf1
10 ffff8603`75e445f0 fffff802`7a13961a     mydriver!`anonymous namespace'::FileCleanupPre(
			struct _FLT_CALLBACK_DATA * data = 0xffffb08f`6b89f9f8, 
			struct _FLT_RELATED_OBJECTS * relatedObjects = 0xffff8603`75e446f0, 
			void ** __formal = 0x00000000`00000000)+0xae
11 ffff8603`75e44660 fffff802`7a1390a1     FLTMGR!FltpPerformPreCallbacksWorker+0x37a
12 ffff8603`75e44770 fffff802`7a138029     FLTMGR!FltpPassThroughInternal+0xd1
13 ffff8603`75e447c0 fffff802`7a137e0b     FLTMGR!FltpPassThrough+0x179
14 ffff8603`75e44860 fffff802`7c229b45     FLTMGR!FltpDispatch+0x8b
15 ffff8603`75e448c0 fffff802`7c79eb8f     nt!IofCallDriver+0x55
16 ffff8603`75e44900 fffff802`7c6c5b53     nt!IopCloseFile+0x18f
17 ffff8603`75e44990 fffff802`7c6c26b9     nt!ObpCloseHandle+0x313
18 ffff8603`75e44ab0 fffff802`7c42d538     nt!NtClose+0x39
19 ffff8603`75e44ae0 00007fff`8762fb34     nt!KiSystemServiceCopyEnd+0x28
1a 00000061`e65d8a58 00000000`00000000     0x00007fff`8762fb34

Contents of registers going all the way to nt!NtClose():

9: kd> .frame /c /r 0f
0f ffff8603`75e445b0 fffff802`805c85fe     mydriver!mydriver::My_QueryInformationFile+0xf1
rax=0000000000000000 rbx=0000000000000000 rcx=00007802805e0000
rdx=ffffb08f832b73c0 rsi=ffffb08f7ea44ae0 rdi=ffffb08f80de61d0
rip=fffff802805c7ec1 rsp=ffff860375e445b0 rbp=ffffb08f5f320aa0
 r8=ffffb08f5cdbc1b0  r9=0000000000000001 r10=ffffc58758a4b5a0
r11=ffff860375e43e70 r12=0000000000000004 r13=00000000000003b0
r14=ffffb08f6b89f9f8 r15=ffffb08f5f320d08
iopl=0         nv up ei ng nz na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00040286
mydriver!mydriver::My_QueryInformationFile+0xf1:
fffff802`805c7ec1 448bc8          mov     r9d,eax
9: kd> .frame /c /r 10
10 ffff8603`75e445f0 fffff802`7a13961a     mydriver!`anonymous namespace'::FileCleanupPre+0xae
rax=0000000000000000 rbx=0000000000000000 rcx=00007802805e0000
rdx=ffffb08f832b73c0 rsi=ffff860375e446f0 rdi=0000000000000000
rip=fffff802805c85fe rsp=ffff860375e445f0 rbp=ffff860375e44709
 r8=ffffb08f5cdbc1b0  r9=0000000000000001 r10=ffffc58758a4b5a0
r11=ffff860375e43e70 r12=0000000000000004 r13=00000000000003b0
r14=ffffb08f6b89f9f8 r15=ffffb08f5f320d08
iopl=0         nv up ei ng nz na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00040286
mydriver!`anonymous namespace'::FileCleanupPre+0xae:
fffff802`805c85fe 84c0            test    al,al
9: kd> .frame /c /r 15
15 ffff8603`75e448c0 fffff802`7c79eb8f     nt!IofCallDriver+0x55
rax=0000000000000000 rbx=ffffb08f7ea44ae0 rcx=00007802805e0000
rdx=ffffb08f832b73c0 rsi=ffffb08f780420c0 rdi=ffffb08f6b9e3690
rip=fffff8027c229b45 rsp=ffff860375e448c0 rbp=ffffb08f5cd67c80
 r8=ffffb08f5cdbc1b0  r9=0000000000000001 r10=ffffc58758a4b5a0
r11=ffff860375e43e70 r12=0000000000000000 r13=0000000000000000
r14=0000000000000001 r15=0000000000000000
iopl=0         nv up ei ng nz na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00040286
nt!IofCallDriver+0x55:
fffff802`7c229b45 4883c438        add     rsp,38h
9: kd> .frame /c /r 18
18 ffff8603`75e44ab0 fffff802`7c42d538     nt!NtClose+0x39
rax=0000000000000000 rbx=0000000000008d6c rcx=00007802805e0000
rdx=ffffb08f832b73c0 rsi=00000061e65e8c60 rdi=0000000000000001
rip=fffff8027c6c26b9 rsp=ffff860375e44ab0 rbp=ffff860375e44b60
 r8=ffffb08f5cdbc1b0  r9=0000000000000001 r10=ffffc58758a4b5a0
r11=ffff860375e43e70 r12=00000061e65e8b80 r13=0000000000000016
r14=0000000000008d6c r15=0000000000000016
iopl=0         nv up ei ng nz na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00040286
nt!NtClose+0x39:
fffff802`7c6c26b9 488b5c2430      mov     rbx,qword ptr [rsp+30h] ss:0018:ffff8603`75e44ae0=ffffb08f6debb080

No, that's just the debugger misleading you...The .frame /c command uses Unwind Data to restore the non-volatile registers as you work your way up the stack. The volatile ones (including RCX) aren't in the usual Unwind Data so they're not going to change in the debugger as you do the .frame /c. The list of volatile/non-volatile registers can be found here:

x64 calling convention | Microsoft Learn

You can demonstrate this behavior yourself by explicitly changing a volatile register value and moving frames. Note how RCX remains zero:

1: kd> r
rax=0000000000000000 rbx=ffff800f228fc6f0 rcx=fffff80611e70000
rdx=ffffaf062ceab470 rsi=ffffd3052e633a40 rdi=ffffd3052862a7c0
rip=fffff80611f9b63f rsp=ffff800f228fc430 rbp=fffff8060f98a1d6
 r8=ffffaf062671f1b0  r9=0000000000000003 r10=ffffd3052862a660
r11=ffff800f228fc400 r12=0000000000000000 r13=ffffaf062e61b820
r14=ffffaf062fe13000 r15=0000000000000000
iopl=0         nv up ei ng nz ac po cy
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00040297
Ntfs!NtfsCommonQueryInformation+0x8bf:
fffff806`11f9b63f mov     edx,dword ptr [rcx+rax*4+7C818h] ds:002b:fffff806`11eec818=0012b64f

1: kd> r @rcx = 0

1: kd> r
rax=0000000000000000 rbx=ffff800f228fc6f0 rcx=0000000000000000
rdx=ffffaf062ceab470 rsi=ffffd3052e633a40 rdi=ffffd3052862a7c0
rip=fffff80611f9b63f rsp=ffff800f228fc430 rbp=fffff8060f98a1d6
 r8=ffffaf062671f1b0  r9=0000000000000003 r10=ffffd3052862a660
r11=ffff800f228fc400 r12=0000000000000000 r13=ffffaf062e61b820
r14=ffffaf062fe13000 r15=0000000000000000
iopl=0         nv up ei ng nz ac po cy
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00040297
Ntfs!NtfsCommonQueryInformation+0x8bf:
fffff806`11f9b63f mov     edx,dword ptr [rcx+rax*4+7C818h] ds:002b:00000000`0007c818=????????


1: kd> .frame /c 3
03 ffff800f`228fc9a0 fffff806`104c221f     nt!IopfCallDriver+0x53
rax=0000000000000000 rbx=ffffaf0628c2eb20 rcx=0000000000000000
rdx=ffffaf062ceab470 rsi=ffffaf062671f030 rdi=ffffaf0628c2eec0
rip=fffff8060fcfc647 rsp=ffff800f228fc9a0 rbp=fffff8060f98a1d6
 r8=ffffaf062671f1b0  r9=0000000000000003 r10=ffffd3052862a660
r11=ffff800f228fc400 r12=0000000000000001 r13=0000000000000000
r14=ffffaf0628c2eb20 r15=0000000000000000
iopl=0         nv up ei ng nz ac po cy
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00040297
nt!IopfCallDriver+0x53:
fffff806`0fcfc647 add     rsp,38h

Now do that again with a non-volatile register and you'll get different results as the non-volatile registers are restored from the Unwind Data:

1: kd> r @rbx = 0

1: kd> r
rax=0000000000000000 rbx=0000000000000000 rcx=fffff80611e70000
rdx=ffffaf062ceab470 rsi=ffffd3052e633a40 rdi=ffffd3052862a7c0
rip=fffff80611f9b63f rsp=ffff800f228fc430 rbp=fffff8060f98a1d6
 r8=ffffaf062671f1b0  r9=0000000000000003 r10=ffffd3052862a660
r11=ffff800f228fc400 r12=0000000000000000 r13=ffffaf062e61b820
r14=ffffaf062fe13000 r15=0000000000000000
iopl=0         nv up ei ng nz ac po cy
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00040297
Ntfs!NtfsCommonQueryInformation+0x8bf:
fffff806`11f9b63f mov     edx,dword ptr [rcx+rax*4+7C818h] ds:002b:fffff806`11eec818=0012b64f

1: kd> .frame /c 3
03 ffff800f`228fc9a0 fffff806`104c221f     nt!IopfCallDriver+0x53
rax=0000000000000000 rbx=ffffaf0628c2eb20 rcx=fffff80611e70000
rdx=ffffaf062ceab470 rsi=ffffaf062671f030 rdi=ffffaf0628c2eec0
rip=fffff8060fcfc647 rsp=ffff800f228fc9a0 rbp=fffff8060f98a1d6
 r8=ffffaf062671f1b0  r9=0000000000000003 r10=ffffd3052862a660
r11=ffff800f228fc400 r12=0000000000000001 r13=0000000000000000
r14=ffffaf0628c2eb20 r15=0000000000000000
iopl=0         nv up ei ng nz ac po cy
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00040297
nt!IopfCallDriver+0x53:
fffff806`0fcfc647 add     rsp,38h

The exception to this is if there's a Trap Frame on the stack, which causes the OS to also (indirectly) record the volatile register state in the Unwind Data. You can see which frames have Trap Frames with kv (output trimmed for clarity):


1: kd> kv
 # Call Site
00 nt!DbgBreakPointWithStatus
01 nt!KiBugCheckDebugBreak+0x12
02 nt!KeBugCheck2+0xba3
03 nt!KeBugCheckEx+0x107
04 nt!KiBugCheckDispatch+0x69
05 nt!KiSystemServiceHandler+0x7c
06 nt!RtlpExecuteHandlerForException+0xf
07 nt!RtlDispatchException+0x2f3
08 nt!KiDispatchException+0x1ae
09 nt!KiExceptionDispatch+0x13c
0a nt!KiPageFault+0x44e (TrapFrame @ ffff800f`228fc2a0)
0b Ntfs!NtfsCommonQueryInformation+0x8bf
0c Ntfs!NtfsFsdDispatchSwitch+0x156
0d Ntfs!NtfsFsdDispatchWait+0x40
0e nt!IopfCallDriver+0x53
0f nt!IovCallDriver+0x5f
10 nt!IofCallDriver+0x22ba19
11 FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x156
12 FLTMGR!FltpDispatch+0xa3
13 nt!IopfCallDriver+0x53
14 nt!IovCallDriver+0x5f
15 nt!IofCallDriver+0x22ba19
16 nt!NtQueryInformationFile+0x4d0
17 nt!KiSystemServiceCopyEnd+0x25 (TrapFrame @ ffff800f`228fce20)
18 ntdll!NtQueryInformationFile+0x14

.frame /c to a frame immediately before the Trap Frame (0xb and 0x18 above) will restore both the volatile and non-volatile registers. This is the only time that you're "guaranteed" that all the registers are correct (for some value of "guarantee"...it's a big scary world out there and maybe the Unwind Data or stack is corrupt :man_shrugging:).

Looking at your bad value again: 0x00007802`805e0000

Based on my target system this should be an address within NTFS. Note your faulting instruction address is within NTFS with an address of 0xfffff802'8070b73f, which makes me think your RCX should have been 0xfffff802'805e0000. What's the output of:

  1. ".frame /c 8 ; ub"
  2. !chkimg -d ntfs.sys

Thanks for sharing your insights.

9: kd> .frame /c 8; ub
08 ffff8603`75e43ea0 fffff802`8070c6f6     Ntfs!NtfsCommonQueryInformation+0x8bf
rax=0000000000000000 rbx=ffff860375e44160 rcx=00007802805e0000
rdx=ffffb08f832b73c0 rsi=ffffc5875c978920 rdi=ffffc5875d426890
rip=fffff8028070b73f rsp=ffff860375e43ea0 rbp=ffffb08f80de61d0
 r8=ffffb08f5cdbc1b0  r9=0000000000000001 r10=ffffc58758a4b5a0
r11=ffff860375e43e70 r12=0000000000000000 r13=ffffb08f7ea44ae0
r14=ffffb08f80de61d0 r15=0000000000000000
iopl=0         nv up ei ng nz na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00040286
Ntfs!NtfsCommonQueryInformation+0x8bf:
fffff802`8070b73f 8b948118c80700  mov     edx,dword ptr [rcx+rax*4+7C818h] ds:002b:00007802`8065c818=????????
Ntfs!NtfsCommonQueryInformation+0x89c:
fffff802`8070b71c 004183          add     byte ptr [rcx-7Dh],al
fffff802`8070b71f ff12            call    qword ptr [rdx]
fffff802`8070b721 0f84ac030000    je      Ntfs!NtfsCommonQueryInformation+0xc53 (fffff802`8070bad3)
fffff802`8070b727 4183c7fc        add     r15d,0FFFFFFFCh
fffff802`8070b72b 4183ff48        cmp     r15d,48h
fffff802`8070b72f 0f876d030000    ja      Ntfs!NtfsCommonQueryInformation+0xc22 (fffff802`8070baa2)
fffff802`8070b735 4963c7          movsxd  rax,r15d
fffff802`8070b738 488d0dc148edff  lea     rcx,[Ntfs!NtfsZeroEndOfSectorNonCached <PERF> (Ntfs+0x0) (fffff802`805e0000)]

It is loading the address of the NtfsZeroEndOfSectorNonCached function into the rcx register and it should have been what you expected. Also, there were few mismatched areas found in ntfs:

9: kd> !chkimg -d ntfs.sys
*** WARNING: Check Image - Checksum mismatch - Dump: 0x15061, File: 0x13ce5 - c:\symbols\Fs_Rec.sys\24813526f000\Fs_Rec.sys
*** WARNING: Check Image - Checksum mismatch - Dump: 0x15061, File: 0x13ce5 - c:\symbols\Fs_Rec.sys\24813526f000\Fs_Rec.sys
    fffff802806a0650-fffff802806a0653  4 bytes - Ntfs!_guard_check_icall_fptr
	[ 40 6a 61 80:b0 31 42 7c ]
    fffff802806a0658-fffff802806a065b  4 bytes - Ntfs!_guard_dispatch_icall_fptr (+0x08)
	[ 40 72 61 80:00 32 42 7c ]
8 errors : ntfs.sys (fffff802806a0650-fffff802806a065b)

Is it a case of ntfs somehow being corrupted?

Very curious, it's like the LEA instruction messed up and calculated the address wrong. Have you only seen this on one machine or are you having it across different systems? I'm not quick to blame hardware failures but it's hard to imagine how a driver would have caused this bug. Maybe if the NTFS image was corrupted at this instruction I could start to blame a DMA device or RAM, but seems like the CPU just flaked out.

No, those are the CFG implementation functions. I believe they start off as pointing to NOP functions and get rewritten if CFG is actually enabled on your system. I don't have CFG enabled on my test system and here's what mine look like:

1: kd> dps Ntfs!_guard_check_icall_fptr
fffff806`11f30650  fffff806`11ea6a40 Ntfs!guard_check_icall_nop
fffff806`11f30658  fffff806`11ea7240 Ntfs!guard_dispatch_icall_nop
fffff806`11f30660  fffff806`11ea6a40 Ntfs!guard_check_icall_nop
fffff806`11f30668  fffff806`11ea7260 Ntfs!guard_xfg_dispatch_icall_nop
fffff806`11f30670  fffff806`11ea7260 Ntfs!guard_xfg_dispatch_icall_nop

If !chkimg detected errors in normal functions I'd be a bit more interested.

yeah it is seen on multiple systems alright - the stack trace is not exactly the same across the systems however the bug check more or less occurs due to accessing invalid address. E.g. here's one from another system :

This time there is an attempt of accessing invalid address from nt!IoRetrievePriorityInfo:

6: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

SYSTEM_SERVICE_EXCEPTION (3b)
An exception happened while executing a system service routine.
Arguments:
Arg1: 00000000c0000005, Exception code that caused the BugCheck
Arg2: fffff80084077d05, Address of the instruction which caused the BugCheck
Arg3: ffffdf0c08c469b0, Address of the context record for the exception that caused the BugCheck
Arg4: 0000000000000000, zero.

BUGCHECK_CODE:  3b

BUGCHECK_P1: c0000005

BUGCHECK_P2: fffff80084077d05

BUGCHECK_P3: ffffdf0c08c469b0

BUGCHECK_P4: 0

FILE_IN_CAB:  MEMORY.DMP

TAG_NOT_DEFINED_202b:  *** Unknown TAG in analysis list 202b


DUMP_FILE_ATTRIBUTES: 0x1800

CONTEXT:  ffffdf0c08c469b0 -- (.cxr 0xffffdf0c08c469b0)
rax=0000000055a74003 rbx=ffffb7095665f080 rcx=ffffb70949c235e0
rdx=ffffb70958bbd2d0 rsi=ffffdf0c08c47460 rdi=0000000000000000
rip=fffff80084077d05 rsp=ffffdf0c08c473d0 rbp=ffffb709587c8450
 r8=ffffb7095665f080  r9=ffffdf0c08c47460 r10=fffff80084077bc0
r11=ffffb70949c23aa0 r12=0000000010000004 r13=ffffb70957fd3a88
r14=0000000000000002 r15=ffffb70945b1be88
iopl=0         nv up ei pl nz na po nc
cs=0010  ss=0018  ds=002b  es=002b  fs=0053  gs=002b             efl=00050206
nt!IoRetrievePriorityInfo+0x145:
fffff800`84077d05 8b4858          mov     ecx,dword ptr [rax+58h] ds:002b:00000000`55a7405b=????????
Resetting default scope

BLACKBOXBSD: 1 (!blackboxbsd)


BLACKBOXNTFS: 1 (!blackboxntfs)


BLACKBOXPNP: 1 (!blackboxpnp)


BLACKBOXWINLOGON: 1

PROCESS_NAME:  explorer.exe

STACK_TEXT:  
nt!IoRetrievePriorityInfo+0x145
FLTMGR!FltPerformSynchronousIo+0x284
FLTMGR!FltQueryInformationFile+0x6c
mydriver!mydriver::My_QueryInformationFile+0xf1
mydriver!`anonymous namespace'::FileCleanupPre+0xae
FLTMGR!FltpPerformPreCallbacksWorker+0x37b
FLTMGR!FltpPassThroughInternal+0xd1
FLTMGR!FltpPassThrough+0x179
FLTMGR!FltpDispatch+0x8b
nt!IofCallDriver+0x55
nt!IopCloseFile+0x18f
nt!ObpCloseHandle+0x313
nt!NtClose+0x39
nt!KiSystemServiceCopyEnd+0x25
0x00007ffd`5392f544

That one looks more tractable:

0: kd> uf nt!IoRetrievePriorityInfo
nt!IoRetrievePriorityInfo:
fffff802`0f71b2c0 mov     qword ptr [rsp+10h],rbx
fffff802`0f71b2c5 mov     qword ptr [rsp+18h],rsi
fffff802`0f71b2ca mov     qword ptr [rsp+20h],rdi
fffff802`0f71b2cf push    r14
fffff802`0f71b2d1 sub     rsp,20h
fffff802`0f71b2d5 mov     r14d,2
fffff802`0f71b2db mov     rsi,r9
fffff802`0f71b2de mov     dword ptr [r9+0Ch],r14d
fffff802`0f71b2e2 mov     rbx,r8
fffff802`0f71b2e5 test    rcx,rcx
fffff802`0f71b2e8 jne     nt!IoRetrievePriorityInfo+0x12c (fffff802`0f71b3ec)  Branch

nt!IoRetrievePriorityInfo+0x2e:
fffff802`0f71b2ee test    rdx,rdx
fffff802`0f71b2f1 je      nt!IoRetrievePriorityInfo+0xed (fffff802`0f71b3ad)  Branch

nt!IoRetrievePriorityInfo+0x37:
fffff802`0f71b2f7 mov     rax,qword ptr [rdx+0D0h]
fffff802`0f71b2fe test    rax,rax
fffff802`0f71b301 jne     nt!IoRetrievePriorityInfo+0x145 (fffff802`0f71b405)  Branch

...

nt!IoRetrievePriorityInfo+0x145:
fffff802`0f71b405 mov     ecx,dword ptr [rax+58h]

And based on a breakpoint RDX should be a File Object and RDX+D0 is the File Object Extension:

0: kd> !pool @rdx 2
Pool page ffffaa06538a8710 region is Nonpaged pool
*ffffaa06538a8690 size:  190 previous size:    0  (Allocated) *File
		Pooltag File : File objects
0: kd> dt nt!_file_object @rdx
   +0x000 Type             : 0n5
   +0x002 Size             : 0n216
   +0x008 DeviceObject     : 0xffffaa06`4fbab060 _DEVICE_OBJECT
   +0x010 Vpb              : 0xffffaa06`4fb91c00 _VPB
   +0x018 FsContext        : 0xffff838b`6011f170 Void
   +0x020 FsContext2       : (null) 
   +0x028 SectionObjectPointer : 0xffffaa06`53925dc8 _SECTION_OBJECT_POINTERS
   +0x030 PrivateCacheMap  : 0xffffaa06`4f3c0190 Void
   +0x038 FinalStatus      : 0n0
   +0x040 RelatedFileObject : (null) 
   +0x048 LockOperation    : 0 ''
   +0x049 DeletePending    : 0 ''
   +0x04a ReadAccess       : 0x1 ''
   +0x04b WriteAccess      : 0x1 ''
   +0x04c DeleteAccess     : 0 ''
   +0x04d SharedRead       : 0x1 ''
   +0x04e SharedWrite      : 0x1 ''
   +0x04f SharedDelete     : 0x1 ''
   +0x050 Flags            : 0x40100
   +0x058 FileName         : _UNICODE_STRING "\$Directory"
   +0x068 CurrentByteOffset : _LARGE_INTEGER 0x0
   +0x070 Waiters          : 0
   +0x074 Busy             : 0
   +0x078 LastLock         : (null) 
   +0x080 Lock             : _KEVENT
   +0x098 Event            : _KEVENT
   +0x0b0 CompletionContext : (null) 
   +0x0b8 IrpListLock      : 0
   +0x0c0 IrpList          : _LIST_ENTRY [ 0xffffaa06`538a87d0 - 0xffffaa06`538a87d0 ]
   +0x0d0 FileObjectExtension : (null)

But in your crash the FileObjectExtension is garbage. That feels like more of an every day use after free bug or something.

Have you tried enabling Verifier on your driver, FltMgr.sys, and ntoskrnl.exe? That will result in the File Objects coming out of Special Pool and might point to a bug faster.

You certainly appear to have a real mystery on your hands...

We always run test-suits with driver verifier enabled but only for our driver. Let me enable it for filter manager and ntoskernel.exe as well and see what comes out. Would you recon this should always be case? i.e. Driver verifier should always be enabled not only for your own binary but also for filter manager?

Thanks for all the inputs bdw, appreciate it.

Short Answer:

You should definitely have it enabled on your driver and FltMgr.sys. Enabling it for ntoskrnl.exe is more heavyweight, though I like to do it periodically for test runs and also in cases where I have a corruption that I can't pin down.

Long Answer:

Verifier hooks your Import Address Table and swaps OS functions for Verifier versions. So, for example, if you enable Special Pool your call to ExAllocatePoolWithTag becomes a call to VerifierExAllocatePoolWithTag. This then returns a Special Pool allocation that has all those Verifier things we love (buffer overflow checking, use after free, etc.).

But, there's an issue if you use a wrapper library that makes allocations on your behalf. For example, in a minifilter you're probably using contexts of some sort allocated using FltAllocateContext. This API is going to call into FltMgr and then FltMgr.sys is going to allocate the memory from somewhere. If you don't enable Verifier on FltMgr.sys then those allocates do NOT come from Special Pool (even if you've enabled Verifier on your driver!). That's why it's always good to enable Verifier for both your driver and FltMgr.sys.

Enabling on ntoskrnl.exe causes any internal kernel allocations to also come from Special Pool. For a file system filter, this is of interest because it means that the File Objects will come out of Special Pool. This is the only practical way to catch uses of File Objects after you've dereferenced them.

FileObjectExtension appears to be fine - its null. rax should have been set to null and so nt!IoRetrievePriorityInfo+0x145 should not have got called. It's as if this move instruction had no effect at all:

fffff802`0f71b2f7 mov     rax,qword ptr [rdx+0D0h]
6: kd> dt nt!_file_object FFFFB70958BBD2D0
   +0x000 Type             : 0n5
   +0x002 Size             : 0n216
   +0x008 DeviceObject     : 0xffffb709`454be8f0 _DEVICE_OBJECT
   +0x010 Vpb              : 0xffffb709`454ffc60 _VPB
   +0x018 FsContext        : 0xffffa60a`0642a170 Void
   +0x020 FsContext2       : 0xffffa60a`033ac2f0 Void
   +0x028 SectionObjectPointer : (null) 
   +0x030 PrivateCacheMap  : (null) 
   +0x038 FinalStatus      : 0n-2147483642
   +0x040 RelatedFileObject : (null) 
   +0x048 LockOperation    : 0 ''
   +0x049 DeletePending    : 0 ''
   +0x04a ReadAccess       : 0x1 ''
   +0x04b WriteAccess      : 0 ''
   +0x04c DeleteAccess     : 0 ''
   +0x04d SharedRead       : 0x1 ''
   +0x04e SharedWrite      : 0x1 ''
   +0x04f SharedDelete     : 0x1 ''
   +0x050 Flags            : 0xc0002
   +0x058 FileName         : _UNICODE_STRING "\Users\xxxxxxx\AppData\Roaming\Microsoft\Windows\Start Menu"
   +0x068 CurrentByteOffset : _LARGE_INTEGER 0x0
   +0x070 Waiters          : 0
   +0x074 Busy             : 1
   +0x078 LastLock         : (null) 
   +0x080 Lock             : _KEVENT
   +0x098 Event            : _KEVENT
   +0x0b0 CompletionContext : (null) 
   +0x0b8 IrpListLock      : 0
   +0x0c0 IrpList          : _LIST_ENTRY [ 0xffffb709`58bbd390 - 0xffffb709`58bbd390 ]
   +0x0d0 FileObjectExtension : (null)

Also I did run our unit tests a good number of times with DV enabled for our driver, FltMgr.sys and ntoskernel.exe but so far I haven't come across any BSOD.

It's increasingly looking like the problem itself may be external to the driver and somehow its just triggering it.

Wow, that's very strange...And I did confirm that routine doesn't loop or anything, that code just executes once based on the provided File Object.

Is this always the same system?

And any chance you can put the crash dump somewhere for me to look at? I'd be curious.

Sorry for the delay.

It is across various systems. It seems difficult put up the dump but will keep on looking into it and update the post if there is break through.

Thanks for all the pointers though.

1 Like