Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Home NTDEV
Before Posting...
Please check out the Community Guidelines in the Announcements and Administration Category.

More Info on Driver Writing and Debugging


The free OSR Learning Library has more than 50 articles on a wide variety of topics about writing and debugging device drivers and Minifilters. From introductory level to advanced. All the articles have been recently reviewed and updated, and are written using the clear and definitive style you've come to expect from OSR over the years.


Check out The OSR Learning Library at: https://www.osr.com/osr-learning-library/


Is there a safe way to handle APCs you’ve queued when unloading your driver?

BenStanifordBenStaniford Member Posts: 5

I’ve been getting reports of rare and very occasional BSODs which affect our product. We queue APCs to trigger LoadLibrary() in processes and the BSODs are all the result of an APC rundown function being invoked after our driver has been unloaded (during an upgrade of the product). In some cases, the rundown function is being invoked 60 seconds or so after the unload has occurred.

I’ve spent the day reading through the Reactos APC source code and then looking at the Windows APC functions in IDA. I’ve concluded that only half the APC APIs are available. The cancellation functions like KeRemoveQueueApc() are not exported. Even flushing is not possible because KeFlushQueueApc() is also not exported.
In the past, our product has been written to work around this by using an interlocked counter. When APCs are inserted onto the queue, the counter is incremented and when either the kernel or rundown functions are invoked, the counter is decremented. When the driver is unloaded, we wait for the counter to fall to zero before allowing the driver to unload. An arbitrary maximum of 10 seconds has been given to this wait.

It’s been pointed out to me today that there are two problems with this:

  1. Suspended threads which are waiting in a non alertable state can cause APCs to be delayed for long periods of time. Without the ability to cancel the APCs, the unload method could be waiting a long time. (I suspect this is the cause of the BSODs I’ve been seeing)
  2. Decrementing a counter with InterlockedDecrement in an APC function will still return to the function, which is about to be unloaded, so there’s still a small race if the driver is unloaded between the decrement and the return. (I’ve heard tail optimizations suggested as a possible solution for this)
    So my question is: Given the APIs which are available, is there any way to unload your driver safely if you’ve been queuing APCs?

Comments

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 7,796

    Nope.

    There’s a reason we tell people not to fool around with APCs. They’re not fully documented, and the functions you need to use them properly aren’t fully exported.

    I can’t think of anything you can accomplish with APCs that you can’t manage to do some other way using documented methods.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • BenStanifordBenStaniford Member Posts: 5

    Thanks Peter,

    I'd come to more or less the same conclusion myself. Cheers for confirming.

  • anton_bassovanton_bassov Member Posts: 5,158

    Given the APIs which are available, is there any way to unload your driver safely if you’ve been queuing APCs?

    Well, if we decide to overlook the fact that you are speaking about the undocumented stuff here (actually, I don't exclude the possibility of a "funny" reaction from the usual suspects) , what you can do here is just to increment an object refcount on some DO created by your driver before queuing an APC, and decrement it from your rundown function.

    If you take this approach you are going to make use of the fact that, as long as the total Ob.... refcount on all DOs that a driver creates is non-zero, the outstanding refcount on its corrsponding DRIVER_OBJECT is going to be non-zero as well. It is understandable that, as long as a DRIVER_OBJECT's refcount is non-zero, its corresponding executable image has to be loaded in RAM. The target module's DrvUnload() may still get invoked, and the device in question may get deleted, but this operation will not take an effect until Ob.... refcount goes down to zero.

    Although it may look fine and dandy at the first glance, in actuality there is still a "small" problem with this approach. Don't forget that
    ObDereferenceObject() has to return control somewhere. If you make this call from your driver after its DrvUnload() has returned control, there is already no guarantee that the code that ObDereferenceObject() returns control to stays loaded in RAM until your rundown function actually returns. Don't forget that a driver image may get unloaded at any moment after its corresponding DRIVER_OBJECT's refcount has gone down to zero. Therefore, there is still a possibility of a race condition left.

    What you have to do in order to make it safe is to ensure that a call to ObDereferenceObject() does not already return to your module. It means that you have to invoke ObDereferenceObject() from a special helper assembly routine that has to be called from your rundown function. This routine must play certain "dirty tricks" with the call stack before transferring the execution
    to ObDereferenceObject() , which has to be done with a JMP, rather than CALL, instruction.

    Please note that I am not telling you that doing all the above is a wonderful idea - I am just answering your question as it has been presented.....

    Anton Bassov

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Upcoming OSR Seminars
OSR has suspended in-person seminars due to the Covid-19 outbreak. But, don't miss your training! Attend via the internet instead!
Kernel Debugging 30 Mar 2020 OSR Seminar Space
Developing Minifilters 15 Jun 2020 LIVE ONLINE
Writing WDF Drivers 22 June 2020 LIVE ONLINE
Internals & Software Drivers 28 Sept 2020 Dulles, VA