System thread and a driver reference count - possible race condition

wc2023 · June 16, 2025, 12:24pm

I have a KMDF driver that uses a system thread. It goes as such (pseudocode):

WdfObjectReference(device);
PsCreateSystemThread(&handle, THREAD_ALL_ACCESS,
                     NULL, NULL, NULL,
                     MyThread, device);

and then:

void MyThread(PVOID p)
{
    //Do work
    //...

    WdfObjectDereference(p);
}

So the idea is that I will increment the reference count on the device (driver) handle to prevent it from unloading when my system thread is running, and then decrement it when the thread is done, which should allow my driver to unload.

But then I started thinking about a possible race condition in the WdfObjectDereference call. During that call the reference count on the device may reach 0, in which case it will signal it to unload .... but my system thread would be still running.

Doesn't it pose a rare race condition there? And if so, how do you handle it?

Slava_Imameev · June 16, 2025, 6:23pm

I think DriverUnload is called in the context of other threads, like a PnP manager thread for WDM/WDF drivers or user mode thread that called NtUnloadDriver for legacy drivers, and your driver needs to signal all threads to terminate and wait for these threads termination from DriverUnload, so at the moment DriverUnload returns there is no code being executed for this driver code section.

Mark_Roddy · June 16, 2025, 7:00pm

You've abandoned workitems because you decided they could not wait on events (which is not true) in favor of system threads, but you have simply made your problem more complicated.

Your threads, either workitem threads or system threads, always have to wait on multiple events whenever they wait, and one of those events should be a termination signal set when you need to remove the related device or unload the related driver.

wc2023 · June 16, 2025, 7:11pm

Mark, you're posting it in the wrong thread. The issue with work items is not about waiting for more than one event, but about signaling the event to stop it. I could not find a callback to signal it from.

But anyway, this question is not about work items.

Doron_Holan · June 16, 2025, 9:07pm

a wdfobject ref is not an Ob object ref. it does nothing from an ntoskrnl POV to keep your image in memory. you need to use an external ob reference on your driver image to handle the ret after deref problem. either wdfworkitem or ioworkitem do this, system threads do not have this external ref counting feature.

use EvtDeviceSelfManagedIoFlush to signal your workitem to stop waiting and return early

wc2023 · June 17, 2025, 1:26am

Ok, thanks. That answers it about ref counting.

wc2023 · June 21, 2025, 9:43pm

You know I thought about it and got back to this question. How does
WDFWORKITEM do it internally?

In case of a system thread created with PsCreateSystemThread even if I hold an external Ob reference to the driver while that thread is running, like @Doron_Holan suggested, it will still pose an issue from the WDF perspective. Let me explain. And correct me if I'm wrong.

From what I can see, WDF is partially compiled statically into my module (i.e. driver.) Which means that its code will reside in the address space on my driver. So even if I hold an external Ob reference to the driver itself while my system thread is running and then release it (externally) when the thread terminates, parts of the WDF framework may be still loaded in my driver's address space. How would that not cause an issue with the driver begins unloading when my system thread terminates and I release the last reference count to the driver itself?

Tim_Roberts · June 21, 2025, 9:59pm

The WDKWORKITEM is a WDF object. All WDF objects are owned by another object. If you don't override it during WdfWorkItemCreate, it will be owned by the WDFDEVICE (not the driver). When the device shuts down, WDF will clean up the objects owned by that device. The device cleanup cannot proceed until all of the objects have a reference count of 0, so you'll block at that point. The driver cannot unload until all of the devices are released.

wc2023 · June 21, 2025, 10:31pm

Sure. So let's modify my code example in the first post. Say, if I increment the Ob reference to the driver when the system thread is starting, and also increment WDF device ref count to prevent it from unloading (since I may need WDF in my system thread as well):

ObReferenceObject(WdfDriverWdmGetDriverObject(device));
WdfObjectReference(device);

PsCreateSystemThread(&handle, THREAD_ALL_ACCESS,
                     NULL, NULL, NULL,
                     ThreadInAnotherModule, device);

and then hypothetically call MyThread from another module that will decrement the Ob reference count for it (from outside):

void ThreadInAnotherModule(PVOID p)
{
    PDRIVER_OBJECT drvr = WdfDriverWdmGetDriverObject(p);
    MyThread(p);

    ObDereferenceObject(drvr);
}

for the system thread in my driver:

void MyThread(PVOID p)
{
    //Do work
    //...

    WdfObjectDereference(p);
}

And let's assume that MyThread runs for a long time while my driver starts unloading first. Then when MyThread exits and calls WdfObjectDereference to release the last holding reference for the WDF device, and then ThreadInAnotherModule dereferences the last Ob count for my driver that will start unloading. But some of the WDF objects in it may not be fully released yet.

Wouldn't that still create a race condition?

Doron_Holan · June 22, 2025, 12:10am

Yes, it would be a race condition for your example WRT threads and objects outside of those managed by pnp state (and the last FDO being deleted which leads to driver unload).

to your original question

The WDF code in your driver is bootstrap (call the WDF loader) and a jump table (WDF loader fills it in), WDF itself is not added to your driver. it is in its own driver
WDFWORKITEM uses Io work items under the covers. Io work items rely on the kernel itself to hold the Ob ref on your device object for the lifetime of the work item callback so that it stays in memory until it returns back to the kernel
since WDF is in its own driver, and essentially unloadable due to the large set of drivers who rely on it, it can also safely hold onto a reference to your driver without concern about WDF itself unloading. At least when I was the WDF architect, WDF didn't rely on this behavior (in fact WDF unloading was common and I had to fix a whole set of race conditions like you describe above), but that is a detail left to history that is no longer relevant.

wc2023 · June 22, 2025, 12:58am

Thanks for confirming.

It's a little bit off topic. But I was always wondering about that jump table. Why did you guys decide to go with it instead of linking to WDF like user mode image loader does?

Doron_Holan · June 22, 2025, 2:00am

We wanted to control the load and lifetime of the wdf runtime outside of the kernel loader. Also to bind to different versions of wdf runtime if needed. All of this was put in place a decade before apisets which solve the direct binding problem