Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results
The free OSR Learning Library has more than 50 articles on a wide variety of topics about writing and debugging device drivers and Minifilters. From introductory level to advanced. All the articles have been recently reviewed and updated, and are written using the clear and definitive style you've come to expect from OSR over the years.
Check out The OSR Learning Library at: https://www.osr.com/osr-learning-library/
I’ve been getting reports of rare and very occasional BSODs which affect our product. We queue APCs to trigger LoadLibrary() in processes and the BSODs are all the result of an APC rundown function being invoked after our driver has been unloaded (during an upgrade of the product). In some cases, the rundown function is being invoked 60 seconds or so after the unload has occurred.
I’ve spent the day reading through the Reactos APC source code and then looking at the Windows APC functions in IDA. I’ve concluded that only half the APC APIs are available. The cancellation functions like KeRemoveQueueApc() are not exported. Even flushing is not possible because KeFlushQueueApc() is also not exported.
In the past, our product has been written to work around this by using an interlocked counter. When APCs are inserted onto the queue, the counter is incremented and when either the kernel or rundown functions are invoked, the counter is decremented. When the driver is unloaded, we wait for the counter to fall to zero before allowing the driver to unload. An arbitrary maximum of 10 seconds has been given to this wait.
It’s been pointed out to me today that there are two problems with this:
Upcoming OSR Seminars | ||
---|---|---|
OSR has suspended in-person seminars due to the Covid-19 outbreak. But, don't miss your training! Attend via the internet instead! | ||
Kernel Debugging | 9-13 Sept 2024 | Live, Online |
Developing Minifilters | 15-19 July 2024 | Live, Online |
Internals & Software Drivers | 11-15 Mar 2024 | Live, Online |
Writing WDF Drivers | 20-24 May 2024 | Live, Online |
Comments
Nope.
There’s a reason we tell people not to fool around with APCs. They’re not fully documented, and the functions you need to use them properly aren’t fully exported.
I can’t think of anything you can accomplish with APCs that you can’t manage to do some other way using documented methods.
Peter
Peter Viscarola
OSR
@OSRDrivers
Thanks Peter,
I'd come to more or less the same conclusion myself. Cheers for confirming.
Well, if we decide to overlook the fact that you are speaking about the undocumented stuff here (actually, I don't exclude the possibility of a "funny" reaction from the usual suspects) , what you can do here is just to increment an object refcount on some DO created by your driver before queuing an APC, and decrement it from your rundown function.
If you take this approach you are going to make use of the fact that, as long as the total Ob.... refcount on all DOs that a driver creates is non-zero, the outstanding refcount on its corrsponding DRIVER_OBJECT is going to be non-zero as well. It is understandable that, as long as a DRIVER_OBJECT's refcount is non-zero, its corresponding executable image has to be loaded in RAM. The target module's DrvUnload() may still get invoked, and the device in question may get deleted, but this operation will not take an effect until Ob.... refcount goes down to zero.
Although it may look fine and dandy at the first glance, in actuality there is still a "small" problem with this approach. Don't forget that
ObDereferenceObject() has to return control somewhere. If you make this call from your driver after its DrvUnload() has returned control, there is already no guarantee that the code that ObDereferenceObject() returns control to stays loaded in RAM until your rundown function actually returns. Don't forget that a driver image may get unloaded at any moment after its corresponding DRIVER_OBJECT's refcount has gone down to zero. Therefore, there is still a possibility of a race condition left.
What you have to do in order to make it safe is to ensure that a call to ObDereferenceObject() does not already return to your module. It means that you have to invoke ObDereferenceObject() from a special helper assembly routine that has to be called from your rundown function. This routine must play certain "dirty tricks" with the call stack before transferring the execution
to ObDereferenceObject() , which has to be done with a JMP, rather than CALL, instruction.
Please note that I am not telling you that doing all the above is a wonderful idea - I am just answering your question as it has been presented.....
Anton Bassov