Is there a safe way to handle APCs you’ve queued when unloading your driver?

I’ve been getting reports of rare and very occasional BSODs which affect our product. We queue APCs to trigger LoadLibrary() in processes and the BSODs are all the result of an APC rundown function being invoked after our driver has been unloaded (during an upgrade of the product). In some cases, the rundown function is being invoked 60 seconds or so after the unload has occurred.

I’ve spent the day reading through the Reactos APC source code and then looking at the Windows APC functions in IDA. I’ve concluded that only half the APC APIs are available. The cancellation functions like KeRemoveQueueApc() are not exported. Even flushing is not possible because KeFlushQueueApc() is also not exported.
In the past, our product has been written to work around this by using an interlocked counter. When APCs are inserted onto the queue, the counter is incremented and when either the kernel or rundown functions are invoked, the counter is decremented. When the driver is unloaded, we wait for the counter to fall to zero before allowing the driver to unload. An arbitrary maximum of 10 seconds has been given to this wait.

It’s been pointed out to me today that there are two problems with this:

  1. Suspended threads which are waiting in a non alertable state can cause APCs to be delayed for long periods of time. Without the ability to cancel the APCs, the unload method could be waiting a long time. (I suspect this is the cause of the BSODs I’ve been seeing)
  2. Decrementing a counter with InterlockedDecrement in an APC function will still return to the function, which is about to be unloaded, so there’s still a small race if the driver is unloaded between the decrement and the return. (I’ve heard tail optimizations suggested as a possible solution for this)
    So my question is: Given the APIs which are available, is there any way to unload your driver safely if you’ve been queuing APCs?

Nope.

There’s a reason we tell people not to fool around with APCs. They’re not fully documented, and the functions you need to use them properly aren’t fully exported.

I can’t think of anything you can accomplish with APCs that you can’t manage to do some other way using documented methods.

Peter

Thanks Peter,

I’d come to more or less the same conclusion myself. Cheers for confirming.

Given the APIs which are available, is there any way to unload your driver safely if you’ve been queuing APCs?

Well, if we decide to overlook the fact that you are speaking about the undocumented stuff here (actually, I don’t exclude the possibility of a “funny” reaction from the usual suspects) , what you can do here is just to increment an object refcount on some DO created by your driver before queuing an APC, and decrement it from your rundown function.

If you take this approach you are going to make use of the fact that, as long as the total Ob… refcount on all DOs that a driver creates is non-zero, the outstanding refcount on its corrsponding DRIVER_OBJECT is going to be non-zero as well. It is understandable that, as long as a DRIVER_OBJECT’s refcount is non-zero, its corresponding executable image has to be loaded in RAM. The target module’s DrvUnload() may still get invoked, and the device in question may get deleted, but this operation will not take an effect until Ob… refcount goes down to zero.

Although it may look fine and dandy at the first glance, in actuality there is still a “small” problem with this approach. Don’t forget that
ObDereferenceObject() has to return control somewhere. If you make this call from your driver after its DrvUnload() has returned control, there is already no guarantee that the code that ObDereferenceObject() returns control to stays loaded in RAM until your rundown function actually returns. Don’t forget that a driver image may get unloaded at any moment after its corresponding DRIVER_OBJECT’s refcount has gone down to zero. Therefore, there is still a possibility of a race condition left.

What you have to do in order to make it safe is to ensure that a call to ObDereferenceObject() does not already return to your module. It means that you have to invoke ObDereferenceObject() from a special helper assembly routine that has to be called from your rundown function. This routine must play certain “dirty tricks” with the call stack before transferring the execution
to ObDereferenceObject() , which has to be done with a JMP, rather than CALL, instruction.

Please note that I am not telling you that doing all the above is a wonderful idea - I am just answering your question as it has been presented…

Anton Bassov