Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Before Posting...
Please check out the Community Guidelines in the Announcements and Administration Category.

Intermittent File System Driver Unload Failure on Windows 8.1

Eric_BergeEric_Berge Member Posts: 30
We have a File System (not a filter) which has supported driver
unload for many years now, but with Windows 8.1 we seem to be
seeing intermittent driver unload failures. I suspect the
problem is possible on other versions of Windows, perhaps all,
but that's just a guess.

This condition caused me to question my understanding of what
is necessary to get a File System driver to unload (and in
particular whether it is different than other drivers).
My current understanding would lead me to believe that the state
of the device and driver options from the windbg output below
indicates that the driver is be in a state to unload but I'm
suspecting I have something new to learn here and we've just
been lucky that this has worked up to now.

So my question is whether anything in the state of the device
object or driver object below indicate a reason for the driver
to not unload. The only references that are non-zero
are the "generic object" (PointerCount) references, but the
device object ReferenceCount is zero (and that together with
DOE_UNLOAD_PENDING|DOE_DELETE_PENDING being set I thought was
enough to cause my "DriverUnload" handler to be called, but
that is not occurring).


=================================================================
0: kd> !devobj 0xffffe00000a45bd0
Device object (ffffe00000a45bd0) is for:
cvfs \FileSystem\Cvfs DriverObject ffffe00003d0ee60
Current Irp 00000000 RefCount 0 Type 00000008 Flags 00000040
Dacl ffffc101001aeed0 DevExt ffffe00000a45d20 DevObjExt ffffe00000a45f98
ExtensionFlags (0x00000803) DOE_UNLOAD_PENDING, DOE_DELETE_PENDING,
DOE_DEFAULT_SD_PRESENT
Characteristics (0000000000)
Device queue is not busy.
0: kd> !object 0xffffe00000a45bd0
Object: ffffe00000a45bd0 Type: (ffffe000001bac60) Device
ObjectHeader: ffffe00000a45ba0 (new version)
HandleCount: 0 PointerCount: 1
Directory Object: 00000000 Name: cvfs
0: kd> !drvobj ffffe00003d0ee60
Driver object (ffffe00003d0ee60) is for:
\FileSystem\Cvfs
Driver Extension List: (id , addr)

Device Object list:
ffffe00000a45bd0
0: kd> !object ffffe00003d0ee60
Object: ffffe00003d0ee60 Type: (ffffe000001bab00) Driver
ObjectHeader: ffffe00003d0ee30 (new version)
HandleCount: 0 PointerCount: 3
Directory Object: ffffc0000007b2e0 Name: Cvfs
0: kd> dt DEVICE_OBJECT 0xffffe00000a45bd0
cvfs!DEVICE_OBJECT
+0x000 Type : 0n3
+0x002 Size : 0x3c8
+0x004 ReferenceCount : 0n0
+0x008 DriverObject : 0xffffe000`03d0ee60 _DRIVER_OBJECT
+0x010 NextDevice : (null)
+0x018 AttachedDevice : (null)
+0x020 CurrentIrp : (null)
+0x028 Timer : (null)
+0x030 Flags : 0x40
+0x034 Characteristics : 0
+0x038 Vpb : (null)
+0x040 DeviceExtension : 0xffffe000`00a45d20 Void
+0x048 DeviceType : 8
+0x04c StackSize : 1 ''
+0x050 Queue :
+0x098 AlignmentRequirement : 0
+0x0a0 DeviceQueue : _KDEVICE_QUEUE
+0x0c8 Dpc : _KDPC
+0x108 ActiveThreadCount : 0
+0x110 SecurityDescriptor : 0xffffc000`0008ee30 Void
+0x118 DeviceLock : _KEVENT
+0x130 SectorSize : 0x200
+0x132 Spare1 : 1
+0x138 DeviceObjectExtension : 0xffffe000`00a45f98 _DEVOBJ_EXTENSION
+0x140 Reserved : (null)
0: kd> dt DRIVER_OBJECT 0xffffe000`03d0ee60
cvfs!DRIVER_OBJECT
+0x000 Type : 0n4
+0x002 Size : 0n336
+0x008 DeviceObject : 0xffffe000`00a45bd0 _DEVICE_OBJECT
+0x010 Flags : 0x92
+0x018 DriverStart : 0xfffff800`03a00000 Void
+0x020 DriverSize : 0x29a9000
+0x028 DriverSection : 0xffffe000`01bdc120 Void
+0x030 DriverExtension : 0xffffe000`03d0efb0 _DRIVER_EXTENSION
+0x038 DriverName : _UNICODE_STRING "\FileSystem\Cvfs"
+0x048 HardwareDatabase : 0xfffff801`92722580 _UNICODE_STRING
"\REGISTRY\MACHINE\HARDWARE\DESCRIPTION\SYSTEM"
+0x050 FastIoDispatch : 0xfffff800`0639a578 _FAST_IO_DISPATCH
+0x058 DriverInit : 0xfffff800`063a3064 long
cvfs!GsDriverEntry+0
+0x060 DriverStartIo : (null)
+0x068 DriverUnload : 0xfffff800`03b04590 void
cvfs!CvNtDriverUnload+0
+0x070 MajorFunction : [28] 0xfffff800`03b144d0 long
cvfs!CvCreateDispatch+0
=================================================================


Assuming the above indicates the device/driver objects are in the
right state to unload, perhaps the following is relevant.

Traces indicate this problem occurs when we get an IRP_MN_MOUNT_VOLUME
call from Windows (in this case for a partition that is not ours so
we just return STATUS_UNRECOGNIZED_VOLUME) when we are in the middle
of the unload process (in particular, we are in the middle of a CLOSE
dispatch which has called IoUnregisterFileSystem, which I suspect is
the key to getting out of the file system lists that result in the
IRP_MN_MOUNT_VOLUME queries).

One of the possible issues here is that our driver should be doing
some operation that causes these IRP_MN_MOUNT_VOLUME requests to
quiesce, but I am unaware of any such mechanism. The ReferenceCount
on the device object at the time of the IRP_MN_MOUNT_VOLUME appears
to be 2 and I'm suspecting that the decrement of ReferenceCount to
zero is in the thread issuing the IRP_MN_MOUNT_VOLUME but I could be
wrong about that.

Thanks,

Eric

Comments

  • Eric_BergeEric_Berge Member Posts: 30
    Just to highlight what is the core preliminary question from the above about the unloading
    of a File System device driver and the state of the DEVICE_OBJECT representing that
    File System:

    -- Does the "PointerCount" in the "generic object" have relevance to determining whether
    a driver can be unloaded or is only the "ReferenceCount" of the DEVICE_OBJECT used
    (together with particular devobj/devext flags)?

    Note that the DEVICE_OBJECT this refers to is of type FILE_DEVICE_DISK_FILE_SYSTEM.
    Also note that there as no mounted file systems at the time of this event, we are simply
    dealing with the "control device" created in the DriverEntry routine.

    I believe the answer is that the "PointerCount" does not contribute to the "gating conditions"
    for a driver unload. If I'm wrong about that, I've just got to figure out why the PointerCount is
    high on the DEVICE_OBJECT for the driver. But if I'm correct, then please consider the detailed
    description of the DRIVER_OBJECT/DEVICE_OBJECT state above, with the following question
    in mind:

    -- What in the above windbg dump points to a state in the DRIVER_OBJECT and/or DEVICE_OBJECT
    that would prevent the driver from unloading.

    My understanding is that the state indicates the driver should be unloadable at this point,
    but for some reason the registered "DriverUnload" procedure is not being called.

    At this point I'm just looking for a confirmation/correction on my currently understanding (although
    other insights are always welcome!).

    Thanks in advance for any insight you have on this,

    Eric
  • Scott_Noone_(OSR)Scott_Noone_(OSR) Administrator Posts: 3,142
    In general, the I/O Manager's reference count (DevObj->ReferenceCount) is
    what gates DriverUnload being called. However, I believe that FS drivers are
    treated specially, you're not going to be unloaded until the last device
    object goes away. You should be able to confirm this by looking in the
    debugger in your working case.

    If I were you, I'd try to find out what that last Ob reference on your
    device object is (i.e. Pointer Count in !object output). You can use
    !obtrace and GFlags to help find these:

    http://msdn.microsoft.com/en-us/library/windows/hardware/ff564594(v=vs.85).aspx



    -scott
    @OSRDrivers

    wrote in message news:xxxxx@ntfsd...

    Just to highlight what is the core preliminary question from the above about
    the unloading
    of a File System device driver and the state of the DEVICE_OBJECT
    representing that
    File System:

    -- Does the "PointerCount" in the "generic object" have relevance to
    determining whether
    a driver can be unloaded or is only the "ReferenceCount" of the
    DEVICE_OBJECT used
    (together with particular devobj/devext flags)?

    Note that the DEVICE_OBJECT this refers to is of type
    FILE_DEVICE_DISK_FILE_SYSTEM.
    Also note that there as no mounted file systems at the time of this event,
    we are simply
    dealing with the "control device" created in the DriverEntry routine.

    I believe the answer is that the "PointerCount" does not contribute to the
    "gating conditions"
    for a driver unload. If I'm wrong about that, I've just got to figure out
    why the PointerCount is
    high on the DEVICE_OBJECT for the driver. But if I'm correct, then please
    consider the detailed
    description of the DRIVER_OBJECT/DEVICE_OBJECT state above, with the
    following question
    in mind:

    -- What in the above windbg dump points to a state in the DRIVER_OBJECT
    and/or DEVICE_OBJECT
    that would prevent the driver from unloading.

    My understanding is that the state indicates the driver should be unloadable
    at this point,
    but for some reason the registered "DriverUnload" procedure is not being
    called.

    At this point I'm just looking for a confirmation/correction on my currently
    understanding (although
    other insights are always welcome!).

    Thanks in advance for any insight you have on this,

    Eric

    -scott
    OSR

  • OSR_Community_UserOSR_Community_User Member Posts: 110,217
    > In general, the I/O Manager's reference count (DevObj->ReferenceCount) is
    > what gates DriverUnload being called.

    Yes.

    And I would also say I NEVER managed to make NT4-style block device driver or the FSD really unloadable without bugs and issues.

    With PnP (w2k+) and a block device, this is simple.

    And, if we are about the FSD without the underlying standard block device and a network login, then implement it as a FltMgr minifilter+namespace provider, mapping your FS to a subdir of the main NTFS volume.

    This is how WIM mounter works in Windows. More so, WIM mounter has a tiny kmode part (FltMgr minifilter) and a user process which runs all the job, so, you can reuse the architecture.

    --
    Maxim S. Shatskih
    Microsoft MVP on File System And Storage
    xxxxx@storagecraft.com
    http://www.storagecraft.com
  • Eric_BergeEric_Berge Member Posts: 30
    Thanks for the suggestion Scott. Based on that I gathered traces of the state of the devobj at the time of my "last shot" at it (when we're in the last close and attempting to set up the conditions to allow the unload handler to be called).

    The traces included the "PointerCount" as well as the "ReferenceCount" for the devobj, and there is no difference in the values of "PointerCount" between the successful case and the failing case, only a difference in the "ReferenceCount" even though when I ultimately look at the dump after the failed unload the "ReferenceCount" is zero at that time.

    I also did the !obtrace comparison and the cause of the difference between the two was the very last trace in the successful unload which was the following (and which did not appear in the !obtrace of the failing case):

    4b797 -1 Dflt nt! ?? ::FNODOBFM::`string'+95ef
    nt!IopCompleteUnloadOrDelete+a0
    nt!IopDecrementDeviceObjectRef+ee
    nt!IopDeleteFile+19b
    nt!ObpRemoveObjectRoutine+64
    nt!ObfDereferenceObjectWithTag+8f
    nt!ObCloseHandleTableEntry+33f
    nt!ExSweepHandleTable+ba
    nt!ObKillProcess+31
    nt!PspRundownSingleProcess+a4
    nt!PspExitThread+4c8
    nt!NtTerminateProcess+fd
    nt!KiSystemServiceCopyEnd+13

    I do not have access to the Windows Kernel source code, but this appears to be what I would expect to be a fairly normal handling of the devobj "ReferenceCount" going to zero and the unload handling being called. More specifically, I'm expecting (but do not know with certainty) that this is just the result of the processing just after the completion of the IRP_MJ_CLOSE dispatch into our file system which set up the conditions for the unload to proceed. If so, the reason this does not appear in the failing trace is that the devobj ReferenceCount is still high (as the traces did indicate) but, again, the ReferenceCount (rather than the PointerCount) is later decremented to zero. But the processing that did that decrement did not result in the ultimate call to IopCompleteUnloadOrDelete to complete the unload processing. The key suspect in this regard is the thread do
    ing the IRP_MN_MOUNT_VOLUME as mentioned in the previous descriptions.

    Thanks for these suggestions to help me dig deeper, as this give me stronger reason to believe that this indicates we need to look elsewhere for the reason why the Windows Kernel is not completing the driver unload than the maintenance of the "PointerCount" associated with the devobj.
  • Eric_BergeEric_Berge Member Posts: 30
    Maxim:

    Thanks for the alternative, ultimately we might need to use them and it's always great to know what we might also wish to consider. I still have a lot to learn about the implications of that type of restructuring relative to our current implementation (read as: I'm totally ignorant on that approach, and I might need to eradicate that ignorance...).

    At this point, I'm still hoping to see if we can't find out what's going on in the current structure, especially as our driver has been able to unload for 6+ years and we rely on that for software upgrades without a reboot. With that being the case and having thousands of customers not having problems I'm probably going to favor seeing if this can't be run to ground as a first try. Especially as we've only run into this with a "stress test" and even then, only on Windows 8.1 (strangely 2012 R2 works just fine as have every other version of Windows we've run this on: XP, 2003 and up).

    But thanks again for your input, as that might ultimately be the route we need to take,

    Eric
Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Upcoming OSR Seminars
Developing Minifilters 29 July 2019 OSR Seminar Space
Writing WDF Drivers 23 Sept 2019 OSR Seminar Space
Kernel Debugging 21 Oct 2019 OSR Seminar Space
Internals & Software Drivers 18 Nov 2019 Dulles, VA