Intermittent File System Driver Unload Failure on Windows 8.1

We have a File System (not a filter) which has supported driver
unload for many years now, but with Windows 8.1 we seem to be
seeing intermittent driver unload failures. I suspect the
problem is possible on other versions of Windows, perhaps all,
but that’s just a guess.

This condition caused me to question my understanding of what
is necessary to get a File System driver to unload (and in
particular whether it is different than other drivers).
My current understanding would lead me to believe that the state
of the device and driver options from the windbg output below
indicates that the driver is be in a state to unload but I’m
suspecting I have something new to learn here and we’ve just
been lucky that this has worked up to now.

So my question is whether anything in the state of the device
object or driver object below indicate a reason for the driver
to not unload. The only references that are non-zero
are the “generic object” (PointerCount) references, but the
device object ReferenceCount is zero (and that together with
DOE_UNLOAD_PENDING|DOE_DELETE_PENDING being set I thought was
enough to cause my “DriverUnload” handler to be called, but
that is not occurring).

=================================================================
0: kd> !devobj 0xffffe00000a45bd0
Device object (ffffe00000a45bd0) is for:
cvfs \FileSystem\Cvfs DriverObject ffffe00003d0ee60
Current Irp 00000000 RefCount 0 Type 00000008 Flags 00000040
Dacl ffffc101001aeed0 DevExt ffffe00000a45d20 DevObjExt ffffe00000a45f98
ExtensionFlags (0x00000803) DOE_UNLOAD_PENDING, DOE_DELETE_PENDING,
DOE_DEFAULT_SD_PRESENT
Characteristics (0000000000)
Device queue is not busy.
0: kd> !object 0xffffe00000a45bd0
Object: ffffe00000a45bd0 Type: (ffffe000001bac60) Device
ObjectHeader: ffffe00000a45ba0 (new version)
HandleCount: 0 PointerCount: 1
Directory Object: 00000000 Name: cvfs
0: kd> !drvobj ffffe00003d0ee60
Driver object (ffffe00003d0ee60) is for:
\FileSystem\Cvfs
Driver Extension List: (id , addr)

Device Object list:
ffffe00000a45bd0
0: kd> !object ffffe00003d0ee60
Object: ffffe00003d0ee60 Type: (ffffe000001bab00) Driver
ObjectHeader: ffffe00003d0ee30 (new version)
HandleCount: 0 PointerCount: 3
Directory Object: ffffc0000007b2e0 Name: Cvfs
0: kd> dt DEVICE_OBJECT 0xffffe00000a45bd0
cvfs!DEVICE_OBJECT
+0x000 Type : 0n3
+0x002 Size : 0x3c8
+0x004 ReferenceCount : 0n0
+0x008 DriverObject : 0xffffe00003d0ee60 _DRIVER_OBJECT +0x010 NextDevice : (null) +0x018 AttachedDevice : (null) +0x020 CurrentIrp : (null) +0x028 Timer : (null) +0x030 Flags : 0x40 +0x034 Characteristics : 0 +0x038 Vpb : (null) +0x040 DeviceExtension : 0xffffe00000a45d20 Void
+0x048 DeviceType : 8
+0x04c StackSize : 1 ‘’
+0x050 Queue :
+0x098 AlignmentRequirement : 0
+0x0a0 DeviceQueue : _KDEVICE_QUEUE
+0x0c8 Dpc : _KDPC
+0x108 ActiveThreadCount : 0
+0x110 SecurityDescriptor : 0xffffc0000008ee30 Void<br> +0x118 DeviceLock : _KEVENT<br> +0x130 SectorSize : 0x200<br> +0x132 Spare1 : 1<br> +0x138 DeviceObjectExtension : 0xffffe00000a45f98 _DEVOBJ_EXTENSION
+0x140 Reserved : (null)
0: kd> dt DRIVER_OBJECT 0xffffe00003d0ee60<br>cvfs!DRIVER_OBJECT<br> +0x000 Type : 0n4<br> +0x002 Size : 0n336<br> +0x008 DeviceObject : 0xffffe00000a45bd0 _DEVICE_OBJECT
+0x010 Flags : 0x92
+0x018 DriverStart : 0xfffff80003a00000 Void<br> +0x020 DriverSize : 0x29a9000<br> +0x028 DriverSection : 0xffffe00001bdc120 Void
+0x030 DriverExtension : 0xffffe00003d0efb0 _DRIVER_EXTENSION<br> +0x038 DriverName : _UNICODE_STRING "\FileSystem\Cvfs"<br> +0x048 HardwareDatabase : 0xfffff80192722580 _UNICODE_STRING
“\REGISTRY\MACHINE\HARDWARE\DESCRIPTION\SYSTEM”
+0x050 FastIoDispatch : 0xfffff8000639a578 _FAST_IO_DISPATCH<br> +0x058 DriverInit : 0xfffff800063a3064 long
cvfs!GsDriverEntry+0
+0x060 DriverStartIo : (null)
+0x068 DriverUnload : 0xfffff80003b04590 void<br> cvfs!CvNtDriverUnload+0<br> +0x070 MajorFunction : [28] 0xfffff80003b144d0 long
cvfs!CvCreateDispatch+0
=================================================================

Assuming the above indicates the device/driver objects are in the
right state to unload, perhaps the following is relevant.

Traces indicate this problem occurs when we get an IRP_MN_MOUNT_VOLUME
call from Windows (in this case for a partition that is not ours so
we just return STATUS_UNRECOGNIZED_VOLUME) when we are in the middle
of the unload process (in particular, we are in the middle of a CLOSE
dispatch which has called IoUnregisterFileSystem, which I suspect is
the key to getting out of the file system lists that result in the
IRP_MN_MOUNT_VOLUME queries).

One of the possible issues here is that our driver should be doing
some operation that causes these IRP_MN_MOUNT_VOLUME requests to
quiesce, but I am unaware of any such mechanism. The ReferenceCount
on the device object at the time of the IRP_MN_MOUNT_VOLUME appears
to be 2 and I’m suspecting that the decrement of ReferenceCount to
zero is in the thread issuing the IRP_MN_MOUNT_VOLUME but I could be
wrong about that.

Thanks,

Eric

Just to highlight what is the core preliminary question from the above about the unloading
of a File System device driver and the state of the DEVICE_OBJECT representing that
File System:

– Does the “PointerCount” in the “generic object” have relevance to determining whether
a driver can be unloaded or is only the “ReferenceCount” of the DEVICE_OBJECT used
(together with particular devobj/devext flags)?

Note that the DEVICE_OBJECT this refers to is of type FILE_DEVICE_DISK_FILE_SYSTEM.
Also note that there as no mounted file systems at the time of this event, we are simply
dealing with the “control device” created in the DriverEntry routine.

I believe the answer is that the “PointerCount” does not contribute to the “gating conditions”
for a driver unload. If I’m wrong about that, I’ve just got to figure out why the PointerCount is
high on the DEVICE_OBJECT for the driver. But if I’m correct, then please consider the detailed
description of the DRIVER_OBJECT/DEVICE_OBJECT state above, with the following question
in mind:

– What in the above windbg dump points to a state in the DRIVER_OBJECT and/or DEVICE_OBJECT
that would prevent the driver from unloading.

My understanding is that the state indicates the driver should be unloadable at this point,
but for some reason the registered “DriverUnload” procedure is not being called.

At this point I’m just looking for a confirmation/correction on my currently understanding (although
other insights are always welcome!).

Thanks in advance for any insight you have on this,

Eric

In general, the I/O Manager’s reference count (DevObj->ReferenceCount) is
what gates DriverUnload being called. However, I believe that FS drivers are
treated specially, you’re not going to be unloaded until the last device
object goes away. You should be able to confirm this by looking in the
debugger in your working case.

If I were you, I’d try to find out what that last Ob reference on your
device object is (i.e. Pointer Count in !object output). You can use
!obtrace and GFlags to help find these:

http://msdn.microsoft.com/en-us/library/windows/hardware/ff564594(v=vs.85).aspx

-scott
@OSRDrivers

wrote in message news:xxxxx@ntfsd…

Just to highlight what is the core preliminary question from the above about
the unloading
of a File System device driver and the state of the DEVICE_OBJECT
representing that
File System:

– Does the “PointerCount” in the “generic object” have relevance to
determining whether
a driver can be unloaded or is only the “ReferenceCount” of the
DEVICE_OBJECT used
(together with particular devobj/devext flags)?

Note that the DEVICE_OBJECT this refers to is of type
FILE_DEVICE_DISK_FILE_SYSTEM.
Also note that there as no mounted file systems at the time of this event,
we are simply
dealing with the “control device” created in the DriverEntry routine.

I believe the answer is that the “PointerCount” does not contribute to the
“gating conditions”
for a driver unload. If I’m wrong about that, I’ve just got to figure out
why the PointerCount is
high on the DEVICE_OBJECT for the driver. But if I’m correct, then please
consider the detailed
description of the DRIVER_OBJECT/DEVICE_OBJECT state above, with the
following question
in mind:

– What in the above windbg dump points to a state in the DRIVER_OBJECT
and/or DEVICE_OBJECT
that would prevent the driver from unloading.

My understanding is that the state indicates the driver should be unloadable
at this point,
but for some reason the registered “DriverUnload” procedure is not being
called.

At this point I’m just looking for a confirmation/correction on my currently
understanding (although
other insights are always welcome!).

Thanks in advance for any insight you have on this,

Eric

> In general, the I/O Manager’s reference count (DevObj->ReferenceCount) is

what gates DriverUnload being called.

Yes.

And I would also say I NEVER managed to make NT4-style block device driver or the FSD really unloadable without bugs and issues.

With PnP (w2k+) and a block device, this is simple.

And, if we are about the FSD without the underlying standard block device and a network login, then implement it as a FltMgr minifilter+namespace provider, mapping your FS to a subdir of the main NTFS volume.

This is how WIM mounter works in Windows. More so, WIM mounter has a tiny kmode part (FltMgr minifilter) and a user process which runs all the job, so, you can reuse the architecture.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

Thanks for the suggestion Scott. Based on that I gathered traces of the state of the devobj at the time of my “last shot” at it (when we’re in the last close and attempting to set up the conditions to allow the unload handler to be called).

The traces included the “PointerCount” as well as the “ReferenceCount” for the devobj, and there is no difference in the values of “PointerCount” between the successful case and the failing case, only a difference in the “ReferenceCount” even though when I ultimately look at the dump after the failed unload the “ReferenceCount” is zero at that time.

I also did the !obtrace comparison and the cause of the difference between the two was the very last trace in the successful unload which was the following (and which did not appear in the !obtrace of the failing case):

4b797 -1 Dflt nt! ?? ::FNODOBFM::`string’+95ef
nt!IopCompleteUnloadOrDelete+a0
nt!IopDecrementDeviceObjectRef+ee
nt!IopDeleteFile+19b
nt!ObpRemoveObjectRoutine+64
nt!ObfDereferenceObjectWithTag+8f
nt!ObCloseHandleTableEntry+33f
nt!ExSweepHandleTable+ba
nt!ObKillProcess+31
nt!PspRundownSingleProcess+a4
nt!PspExitThread+4c8
nt!NtTerminateProcess+fd
nt!KiSystemServiceCopyEnd+13

I do not have access to the Windows Kernel source code, but this appears to be what I would expect to be a fairly normal handling of the devobj “ReferenceCount” going to zero and the unload handling being called. More specifically, I’m expecting (but do not know with certainty) that this is just the result of the processing just after the completion of the IRP_MJ_CLOSE dispatch into our file system which set up the conditions for the unload to proceed. If so, the reason this does not appear in the failing trace is that the devobj ReferenceCount is still high (as the traces did indicate) but, again, the ReferenceCount (rather than the PointerCount) is later decremented to zero. But the processing that did that decrement did not result in the ultimate call to IopCompleteUnloadOrDelete to complete the unload processing. The key suspect in this regard is the thread do
ing the IRP_MN_MOUNT_VOLUME as mentioned in the previous descriptions.

Thanks for these suggestions to help me dig deeper, as this give me stronger reason to believe that this indicates we need to look elsewhere for the reason why the Windows Kernel is not completing the driver unload than the maintenance of the “PointerCount” associated with the devobj.

Maxim:

Thanks for the alternative, ultimately we might need to use them and it’s always great to know what we might also wish to consider. I still have a lot to learn about the implications of that type of restructuring relative to our current implementation (read as: I’m totally ignorant on that approach, and I might need to eradicate that ignorance…).

At this point, I’m still hoping to see if we can’t find out what’s going on in the current structure, especially as our driver has been able to unload for 6+ years and we rely on that for software upgrades without a reboot. With that being the case and having thousands of customers not having problems I’m probably going to favor seeing if this can’t be run to ground as a first try. Especially as we’ve only run into this with a “stress test” and even then, only on Windows 8.1 (strangely 2012 R2 works just fine as have every other version of Windows we’ve run this on: XP, 2003 and up).

But thanks again for your input, as that might ultimately be the route we need to take,

Eric