SYSTEM_THREAD_EXCEPTION_NOT_HANDLED after calling FltStartFiltering on a system with multiple frames

We received a BSOD, that happened in our minifilter’s DriverEntry right after the call to FltStartFiltering (!):

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e)
This is a very common BugCheck.  Usually the exception address pinpoints
the driver/function that caused the problem.  Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: ffffffffc0000005, The exception code that was not handled
Arg2: fffff802e2da878c, The address that the exception occurred at
Arg3: ffff960049a86388, Exception Record Address
Arg4: ffff960049a85bd0, Context Record Address

Call stack:

nt!KeBugCheckEx
nt!PspSystemThreadStartup$filt$0+0x44
nt!_C_specific_handler+0x9f
nt!RtlpExecuteHandlerForException+0xd
nt!RtlDispatchException+0x421
nt!KiDispatchException+0x1e4
nt!KiExceptionDispatch+0xc2
nt!KiPageFault+0x406
FLTMGR!FltpInitInstance+0x324
FLTMGR!FltpCreateInstanceFromName+0x1ad
FLTMGR!FltpEnumerateRegistryInstances+0x145
FLTMGR!FltpDoVolumeNotificationForNewFilter+0xb9
FLTMGR!FltStartFiltering+0x2c
OurDriver

Attempt to write to address 0000000000000030

FLTMGR!FltpInitInstance+0x324:
lock and dword ptr [rax+30h],0FFFFFBFFh

The only unusual thing I found was having multiple frames instead of just 0. MSDN and books like Windows kernel programming don’t explain the reason for this split too well (other than this being because of a legacy filter driver) as its a very rare case.

!fltkd.filters

Filter List: ffffaa8b881d1930 "Frame 1" 
   FLT_FILTER: ffffaa8b881dec50 "Our Minifilter" "313234"
      FLT_INSTANCE: ffffaa8b881f8c70 "Our Minifilter Instance" "313234"
      FLT_INSTANCE: ffffaa8b881f6840 "Our Minifilter Instance" "313234"
      FLT_INSTANCE: ffffaa8b881f3c70 "Our Minifilter Instance" "313234"
      FLT_INSTANCE: ffffaa8b881f34c0 "Our Minifilter Instance" "313234"
      FLT_INSTANCE: ffffaa8b881f0c70 "Our Minifilter Instance" "313234"
      FLT_INSTANCE: ffffaa8b881f04c0 "Our Minifilter Instance" "313234"
      FLT_INSTANCE: ffffaa8b881dc490 "Our Minifilter Instance" "313234"
Filter List: ffffaa8b83bea300 "Frame 0" 
   FLT_FILTER: ffffaa8b827f2c40 "VirtFile" "280700"
      FLT_INSTANCE: ffffaa8b84463c50 "VirtFile Instance" "280700"
      FLT_INSTANCE: ffffaa8b87174490 "VirtFile Instance" "280700"
   FLT_FILTER: ffffaa8b8739b010 "storqosflt" "244000"
   FLT_FILTER: ffffaa8b8739d010 "wcifs" "189900"
   FLT_FILTER: ffffaa8b8445e8b0 "FileCrypt" "141100"
   FLT_FILTER: ffffaa8b873a8010 "luafv" "135000"
      FLT_INSTANCE: ffffaa8b873a54b0 "luafv" "135000"
   FLT_FILTER: ffffaa8b84496b10 "npsvctrig" "46000"
      FLT_INSTANCE: ffffaa8b84a61010 "npsvctrig" "46000"
   FLT_FILTER: ffffaa8b83bfd460 "Wof" "40700"
      FLT_INSTANCE: ffffaa8b84a69b90 "Wof Instance" "40700"

My questions are :

  1. How can I find the legacy filter driver that is causing this split?
  2. What cause our minifilter to go to frame 1 instead of 0?
  3. Could this BSOD be related to this split? If not, how can I find the root cause of it, considering that we are simply calling FltStartFiltering?

Edit:
Based on reversing the functions in the stack, i think this could be related to pFltVolume->FrameZeroVolume being zero (and fltmgr trying to access the flag member of it which is offset 0x30)( pFltVolume is a PFLT_VOLUME), and it kinda makes sense this being related to this bugcheck, considering that we are on frame 1 and then for some reason its accessing FrameZeroVolume (which I have no Idea what this FrameZeroVolume even is…).

I came to this conclusion because of the following code :

__int64 __fastcall FltpDoVolumeNotificationForNewFilter(PFLT_FILTER ourFilter)
{
  Frame = ourFilter->Frame;
  ourFilter->Flags |= 2u;
  v3 = 0;
  KeEnterCriticalRegion();
  p_rLock = &Frame->AttachedVolumes.rLock;
  ExAcquireResourceSharedLite(&Frame->AttachedVolumes.rLock, 1u);
  p_rList = &Frame->AttachedVolumes.rList;
  Flink = Frame->AttachedVolumes.rList.Flink;

And then Flink[-1] is passed to FltpInitInstance as second argument which it seems like its a PFLT_VOLUME. (Note that these are just guesses, I dont even know what FrameZeroVolume is)
And years ago someone seems to have faces the same issue too :

https://community.osr.com/discussion/274512/flt-volume-with-a-null-framezerovolume

!fltkd.volumes

Volume List: ffffaa8b881d19b0 "Frame 1" 
   FLT_VOLUME: ffffaa8b881fe010 "\Device\Mup"
      FLT_INSTANCE: ffffaa8b881f8c70 "Our Minifilter Instance" "313234"
   FLT_VOLUME: ffffaa8b881fd010 "\Device\NamedPipe"
   FLT_VOLUME: ffffaa8b881fc010 "\Device\Mailslot"
   FLT_VOLUME: ffffaa8b881fb010 "\Device\HarddiskVolume1"
      FLT_INSTANCE: ffffaa8b881f6840 "Our Minifilter Instance" "313234"
   FLT_VOLUME: ffffaa8b881fa010 "\Device\HarddiskVolume7"
      FLT_INSTANCE: ffffaa8b881f3c70 "Our Minifilter Instance" "313234"
   FLT_VOLUME: ffffaa8b881f9010 "\Device\HarddiskVolume5"
      FLT_INSTANCE: ffffaa8b881f34c0 "Our Minifilter Instance" "313234"
   FLT_VOLUME: ffffaa8b881f8010 "\Device\HarddiskVolume3"
      FLT_INSTANCE: ffffaa8b881f0c70 "Our Minifilter Instance" "313234"
   FLT_VOLUME: ffffaa8b881f7010 "\Device\HarddiskVolume2"
      FLT_INSTANCE: ffffaa8b881f04c0 "Our Minifilter Instance" "313234"
   FLT_VOLUME: ffffaa8b881f6010 "\Device\Tape1"
      FLT_INSTANCE: ffffaa8b881dc490 "Our Minifilter Instance" "313234"
   FLT_VOLUME: ffffaa8b881f5010 "\Device\Tape0"
Volume List: ffffaa8b83bea380 "Frame 0" 
   FLT_VOLUME: ffffaa8b84680010 "\Device\Mup"
   FLT_VOLUME: ffffaa8b84bfc010 "\Device\HarddiskVolume3"
   FLT_VOLUME: ffffaa8b84866010 "\Device\HarddiskVolume2"
      FLT_INSTANCE: ffffaa8b84463c50 "VirtFile Instance" "280700"
      FLT_INSTANCE: ffffaa8b873a54b0 "luafv" "135000"
      FLT_INSTANCE: ffffaa8b84a69b90 "Wof Instance" "40700"
   FLT_VOLUME: ffffb284cc2d07e0 "\Device\NamedPipe"
      FLT_INSTANCE: ffffaa8b84a61010 "npsvctrig" "46000"
   FLT_VOLUME: ffffb284cc2cf7e0 "\Device\Mailslot"
   FLT_VOLUME: ffffaa8b870e1010 "\Device\HarddiskVolume5"
      FLT_INSTANCE: ffffaa8b87174490 "VirtFile Instance" "280700"
   FLT_VOLUME: ffffaa8b8717f4c0 "\Device\HarddiskVolume7"
   FLT_VOLUME: ffffaa8b81aa6010 "\Device\HarddiskVolume1"

Another thing I noticed, indeed just like this question:

https://community.osr.com/discussion/274512/flt-volume-with-a-null-framezerovolume

Only the FLT_VOLUMEs related to \Device\TapeX (Tape1 in this case) seems to be empty.

Why is VolumeInNextFrame in \Device\Tape1 empty? What type of device a \Device\Tape is even representing because I have never seen this before?

And I assume one solution is for us to not attach to tape devices?

   FLT_VOLUME: ffffaa8b881f6010 "\Device\Tape1"
      FLT_OBJECT: ffffaa8b881f6010  [04000000] Volume
         RundownRef               : 0x0000000000000006 (3)
         PointerCount             : 0x00000001 
         PrimaryLink              : [ffffaa8b881f5020-ffffaa8b881f7020] 
      Frame                    : ffffaa8b881d1880 "Frame 1" 
      Flags                    : [00000044] SetupNotifyCalled FilterAttached
      FileSystemType           : [00000001] FLT_FSTYPE_RAW
      VolumeLink               : [ffffaa8b881f5020-ffffaa8b881f7020] 
      DeviceObject             : ffffaa8b881fa9b0 
      DiskDeviceObject         : ffffaa8b84b5b060 
      FrameZeroVolume          : 0000000000000000 
      VolumeInNextFrame        : 0000000000000000 
      Guid                     : "" 
      CDODeviceName            : "\Device\RawDisk" 
      CDODriverName            : "\FileSystem\RAW" 
      TargetedOpenCount        : 0 
      Callbacks                : (ffffaa8b881f6130)
      ContextLock              : (ffffaa8b881f6518)
      VolumeContexts           : (ffffaa8b881f6520)
Could not get field size for CONTEXT_LIST_LLCACHE_ENTRY, assuming old version
  Count=0
      StreamListCtrls          : (ffffaa8b881f6528)  rCount=0 
      FileListCtrls            : (ffffaa8b881f65a8)  rCount=0 
      NameCacheCtrl            : (ffffaa8b881f6628)
      InstanceList             : (ffffaa8b881f60b0)
         FLT_INSTANCE: ffffaa8b881dc490 "Our Minifilter Instance" "313234"
   ---
   FLT_VOLUME: ffffaa8b881f5010 "\Device\Tape0"
      FLT_OBJECT: ffffaa8b881f5010  [04000000] Volume
         RundownRef               : 0x0000000000000002 (1)
         PointerCount             : 0x00000001 
         PrimaryLink              : [ffffaa8b881d19b0-ffffaa8b881f6020] 
      Frame                    : ffffaa8b881d1880 "Frame 1" 
      Flags                    : [00000004] SetupNotifyCalled
      FileSystemType           : [00000001] FLT_FSTYPE_RAW
      VolumeLink               : [ffffaa8b881d19b0-ffffaa8b881f6020] 
      DeviceObject             : ffffaa8b881f9df0 
      DiskDeviceObject         : ffffaa8b84b5d060 
      FrameZeroVolume          : 0000000000000000 
      VolumeInNextFrame        : 0000000000000000 
      Guid                     : "" 
      CDODeviceName            : "\Device\RawDisk" 
      CDODriverName            : "\FileSystem\RAW" 
      TargetedOpenCount        : 0 
      Callbacks                : (ffffaa8b881f5130)
      ContextLock              : (ffffaa8b881f5518)
      VolumeContexts           : (ffffaa8b881f5520)
Could not get field size for CONTEXT_LIST_LLCACHE_ENTRY, assuming old version
  Count=0
      StreamListCtrls          : (ffffaa8b881f5528)  rCount=0 
      FileListCtrls            : (ffffaa8b881f55a8)  rCount=0 
      NameCacheCtrl            : (ffffaa8b881f5628)
      InstanceList             : (ffffaa8b881f50b0)
   ---

So, this system has a legacy file system filter AND is using tape drives? That’s a pretty unique configuration…

You need to look at the file system stack to find the legacy filter(s). I think this should do it:

!devstack ffffaa8b881f9df0

If there’s anything there other than FltMgr then it’s a legacy filter.

The frame layering is based on the install class of the legacy filter and the altitudes of the minifilters.

@“Scott_Noone_(OSR)” said:
So, this system has a legacy file system filter AND is using tape drives? That’s a pretty unique configuration…

You need to look at the file system stack to find the legacy filter(s). I think this should do it:

!devstack ffffaa8b881f9df0

If there’s anything there other than FltMgr then it’s a legacy filter.

The frame layering is based on the install class of the legacy filter and the altitudes of the minifilters.

But why is FrameZeroVolume NULL in the FLT_VOLUME for that tape device object?
Considering that it seems like FltMgr always assumes FrameZeroVolume is non-null when attaching a minifilter, I assume this is a bug in the driver for that tape device object right? And I guess the only way to overcome this bugcheck is to not attach to FLT_FSTYPE_RAW volumes?

You need to look at the file system stack to find the legacy filter(s). I think this should do it:

!devstack ffffaa8b881f9df0

If there’s anything there other than FltMgr then it’s a legacy filter.

The frame layering is based on the install class of the legacy filter and the altitudes of the minifilters.

And Interestingly, !devstack on device object of \Device\Tape0 or Tape1 volumes will just return two drivers, fltmgr and filesystem\raw:

 !devstack ffffaa8b881f9df0
  !DevObj           !DrvObj            !DevExt           ObjectName
  ffffaa8b881f9df0  \FileSystem\FltMgr ffffaa8b881f9f40  
  ffffb284cd161cf0  \FileSystem\RAW    ffffb284cd161e40  

Interesting. Can you make the crash dump available? I don’t have tape drives around anymore (in a previous decade/life I ported LTFS to Windows and had plenty of them anround…Interesting project!) so I can’t confirm/deny what you’re seeing. I will say that it’s entirely possible that there’s a bug in FltMgr’s handling of this incredibly unusual situation. It’s also possible that it’s a red herring and there’s a bug elsewhere. Without being able to dig I can’t say for certain. Also, for completeness: if simply ignoring these drives makes your problem go away I’m not going to argue that you’re doing something wrong by ignoring the raw filesystem. As I noted in the other thread lots of filters are fine doing this (and possibly a big reason why this issue is not widespread!)

@“Scott_Noone_(OSR)” said:
Interesting. Can you make the crash dump available? I don’t have tape drives around anymore (in a previous decade/life I ported LTFS to Windows and had plenty of them anround…Interesting project!) so I can’t confirm/deny what you’re seeing.

I will say that it’s entirely possible that there’s a bug in FltMgr’s handling of this incredibly unusual situation. It’s also possible that it’s a red herring and there’s a bug elsewhere. Without being able to dig I can’t say for certain.

Also, for completeness: if simply ignoring these drives makes your problem go away I’m not going to argue that you’re doing something wrong by ignoring the raw filesystem. As I noted in the other thread lots of filters are fine doing this (and possibly a big reason why this issue is not widespread!)

Unfortunately I’m not allowed to share any crash dump :frowning:

But the biggest question is why is FrameZeroVolume zero for that particular FLT_VOLUME?
Who has the job of filling this value? Is it the fltmgr that fills this value solely, or its the job of the developers for that tape device driver that need to use an API that fills this value? How can I find the root cause of this value being zero?

It seems like FrameZeroVolume is basically a pointer to the corresponding FLT_VOLUME in the frame 0 for that device.
For example FrameZeroVolume for FLT_VOLUME of “\Device\HarddiskVolume3” in frame1 basically points to FLT_VOLUME of “\Device\HarddiskVolume3” in frame 0. But for some weird and strange reason, this tape device doesn’t have a corresponding FLT_VOLUME in frame 0, but why?!!

It’s a field in an internal FltMgr structure, so no idea why it would or wouldn’t be set…If you want to spelunk further you can use access breakpoints (ba w8 ) to see how/where it’s set in the case of the volume and compare that to the tape drive.

@“Scott_Noone_(OSR)” said:
It’s a field in an internal FltMgr structure, so no idea why it would or wouldn’t be set…If you want to spelunk further you can use access breakpoints (ba w8 ) to see how/where it’s set in the case of the volume and compare that to the tape drive.

Alright so this field is basically telling the fltmgr which FLT_VOLUME in the frame 0 corresponds to this FLT_VOLUME. But the confusing part is, why there is no corresponding FLT_VOLUME for the \device\tapeX FLT_VOLUMES?? I thought that for each real volume, there will be a corresponding FLT_VOLUME for each frame…

It almost feels like a bug in FltMgr itself. Why isn’t the filter manager creating a FLT_VOLUME for these tape devices in frame 0, but creating one for them in frame 1?! I thought when a frame split happens, fltmgr creates a new FLT_VOLUME for each volume in frame 0

Also what are the steps that are required to find the root cause of the frame split (the legacy filter driver) ? Meaning when you have a dump file, what do you usually do to find the legacy driver that caused the split? Surely the split can’t be happening because of \Filesystem\raw

Right now the split seems to be happening in the VirtFile FLT_FILTER’s altitude, because frame 0 is from altitude 0 to 280700 (VirtFile’s altitude) and second frame is from 280700 to end. But VirtFile doesn’t seem to be a minifilter (for example it has instances, and altitude, surely this isn’t a legacy filter right?), so why is the split happening between VirtFile’s altitude?!

@brad_H said:

@“Scott_Noone_(OSR)” said:
It’s a field in an internal FltMgr structure, so no idea why it would or wouldn’t be set…If you want to spelunk further you can use access breakpoints (ba w8 ) to see how/where it’s set in the case of the volume and compare that to the tape drive.

Alright so this field is basically telling the fltmgr which FLT_VOLUME in the frame 0 corresponds to this FLT_VOLUME. But the confusing part is, why there is no corresponding FLT_VOLUME for the \device\tapeX FLT_VOLUMES?? I thought that for each real volume, there will be a corresponding FLT_VOLUME for each frame…

It almost feels like a bug in FltMgr itself.

Entirely possible/likely. This configuration is pretty “out there” so I wouldn’t be surprised if there was a bug in FltMgr.

@brad_H said:
Also what are the steps that are required to find the root cause of the frame split (the legacy filter driver) ? Meaning when you have a dump file, what do you usually do to find the legacy driver that caused the split? Surely the split can’t be happening because of \Filesystem\raw

On a live system you could try blocking legacy file system filters and then checking the event log:

https://learn.microsoft.com/en-us/windows-hardware/drivers/ifs/blocking-file-system-filter-drivers

I don’t have a system with a legacy filter so I’m taking shots in the dark…But, legacy filters call IoRegisterFsRegistrationChange and get put on a list somewhere so they can be notified when file systems register (via IoRegisterFileSystem). So I did a quick disassembly to see if I could find the list of registers file system filters:

6: kd> vertarget
Windows 10 Kernel Version 22000 MP (16 procs) Free x64
Edition build lab: 22000.1.amd64fre.co_release.210604-1628
Machine Name:
Kernel base = 0xfffff801`69c16000 PsLoadedModuleList = 0xfffff801`6a83fb60
Debug session time: Wed May 10 17:19:53.124 2023 (UTC - 4:00)
System Uptime: 0 days 0:08:54.803

6: kd> .asm no_code_bytes
Assembly options: no_code_bytes

6: kd> uf IoRegisterFsRegistrationChange 
nt!IoRegisterFsRegistrationChange:
fffff801`6a54e260 sub     rsp,28h
fffff801`6a54e264 xor     r8d,r8d
fffff801`6a54e267 call    nt!IoRegisterFsRegistrationChangeMountAware (fffff801`6a469520)
fffff801`6a54e26c add     rsp,28h
fffff801`6a54e270 ret

6: kd> uf IoRegisterFsRegistrationChangeMountAware
nt!IoRegisterFsRegistrationChangeMountAware:
<clip>
nt!IoRegisterFsRegistrationChangeMountAware+0x72:
fffff801`6a469592 lea     r15,[nt!IopFsNotifyChangeQueueHead (fffff801`6a85cf90)]
fffff801`6a469599 cmp     qword ptr [nt!IopFsNotifyChangeQueueHead (fffff801`6a85cf90)],r15
fffff801`6a4695a0 je      nt!IoRegisterFsRegistrationChangeMountAware+0xce (fffff801`6a4695ee)  Branch
<clip>

Assuming IopFsNotifyChangeQueueHead is “The List” I dumped it out and poked around with the results. In the end I made this command to dump the currently registered file system filters:

6: kd> !list -x ".if (@$extret != nt!IopFsNotifyChangeQueueHead) {dps @$extret L4 ; !drvobj poi(@$extret+10)}" nt!IopFsNotifyChangeQueueHead 

ffffe78e`d02a84d0  fffff801`6a85cf90 nt!IopFsNotifyChangeQueueHead
ffffe78e`d02a84d8  fffff801`6a85cf90 nt!IopFsNotifyChangeQueueHead
ffffe78e`d02a84e0  ffffb38f`285f28e0
ffffe78e`d02a84e8  fffff801`6af58e50 FLTMGR!FltpFsNotification
Driver object (ffffb38f285f28e0) is for:
 \FileSystem\FltMgr

Driver Extension List: (id , addr)

Device Object list:
ffffb38f332ec9f0  ffffb38f2c079b80  ffffb38f28fc0b50  ffffb38f28fc5ba0
ffffb38f28fc9920  ffffb38f287718d0  ffffb38f28793b80  ffffb38f287938d0
ffffb38f28852cf0  ffffb38f2884cd40  ffffb38f28414d30  ffffb38f285f3c60
ffffb38f285f4ca0  ffffb38f285f2d20  ffffb38f285f2af0  

(Note that !drvobj takes forever for some reason so be patient.)

I’d expect if there’s a legacy filter there would be something else on that list (assuming it’s still registered)

@“Scott_Noone_(OSR)” said:
6: kd> !list -x “.if (@$extret != nt!IopFsNotifyChangeQueueHead) {dps @$extret L4 ; !drvobj poi(@$extret+10)}” nt!IopFsNotifyChangeQueueHead

Thanks for this, tried it on multiple machines and so far worked like a charm

Nice! Did it actually show the legacy filter?

@“Scott_Noone_(OSR)” said:
Nice! Did it actually show the legacy filter?

Yes it was a third party app that was not fully removed, removing it solved the problem.

One interesting observation: It seems like that the split of frames happens based on the group of the legacy filter, so for example if the group is “FSFilter Continuous Backup”, and there are minifilters that have higher and lower altitude than this, it will cause a split (which makes sense). In this case the minifilters above this legacy filter will have their own frame (which is similar to another legacy filter), and the same goes with the lower minifilters.

But the funny part is, if the legacy filter doesn’t have a group in its service key (meaning its INF file), windows gives it a random altitude uppon first installation (I think it was ~280000), which most of the time would cause a frame split. BUT, after the first reboot, there will be no longer a split, since it seems like after the reboot, Windows will now give it the highest altitude possible in order to stop the split! Not sure what is happening exactly.

1 Like