Device cleanup/close routines not called, and hence application does not exit

Hi,

We are developing a AV engine. Process create/delete calls are monitored in driver using PsSetCreateProcessNotifyRoutineEx

  • Each of the create/delete call is entered in to a (single) queue (along with per create/delete KEVENT event).
  • Once inserted, create/delete calls waits on their respective KEVENT
  • User mode process which opens the device makes IOCTL calls to read create/delete pending queue items
  • Once usermode gets an items one by one, it processes and sends the decision back (with PID as key to tie both ends)
  • The driver matches the corresponding queue item and sets the event to wake up pending create/delete proc requests.

The issues is that when I run this console program from CMD, and if I try to close CMD, my program hangs forever.
And IOCTL calls are not generated anymore to the driver. So, all future process creation/deletions also hand.
The only option is to hard reboot the system.

Closing CMD windows generate delete call which is put in to queue. But, usermode program IOCTL read calls are stopped immediately after clicking on CMD close (???)
So, no body is reading the queue now. Its kind of dead lock. CMD is not getting deleted, and hence my program is not getting chance to terminate.

Request your help on understanding more of this and possible solutions??

What kind of a driver is this? Are you handling cancel in your Driver? Sounds like you’re not…

Peter

@“Peter_Viscarola_(OSR)” said:
What kind of a driver is this? Are you handling cancel in your Driver? Sounds like you’re not…

Peter

Hi Peter,

If there are no queue items in the driver, IOCTL call we be made to block on an KEVENT in the driver. Whenever events are inserted to the queue, this event is set to wakeup IOCTL.

Our application generally runs as service. in this case, and when service is being stopped, the code closes the device handle, and this is triggering device cleanup. This cleanup routine sets the event that IOCTL is blocking on to wake up any pending IOCTL. This is all working in graceful shutdown of the service application.

The problem is only in the case when we run the application from CMD and if we try to force close it by clicking on X button. In this case, the cleanup routine is not called. My understanding is that OS closes handles in this case also (???).

When we run application from CMD, when we click on X button, what we are actually closing is ConHost.exe. I am indeed getting terminate event for conhost in the driver, and its getting queued in the driver. But as far as our application is concerned, device cleanup is not called, so IOCTL is blocked in the driver (keeping conhost terminate blocked in the driver).

-Vijay

Have you ever heard about a zombie process? Although this term is normally associated with UNIX environment, you can have a zombie process under Windows as well. A process structure cannot be freed/destroyed before all outstanding references to the target process are released. Therefore, if someone keeps an open handle to the process that has been terminated, this process cannot be destroyed until the handle to it is closed.

I think this is exactly what is happening in the scenario that you have described. Although the process has been terminated, it is still around, at least as far as the Object Manager is concerned - has just entered a terminated state. However, the cleanup has not yet been done because there are still some outstanding references to the target process, so that the handle that it had opened to your device is still sort of valid, at least as far as IO Manager is concerned. Therefore, your driver does not receive IRP_MJ_CLEAUP IRP…

Anton Bassov

Vijayabhaskarreddy_CH wrote:

The problem is only in the case when we run the application from CMD and if we try to force close it by clicking on X button. In this case, the cleanup routine is not called. My understanding is that OS closes handles in this case also (???).

The OS *tries* to close the handles, but it cannot do so if there are
outstanding I/O operations.  It will send a cancel signal to any
outstanding IRPs, but it’s entirely up to your driver to handle that. 
If you are putting the events into a WDFQUEUE, then it should all be
handled automatically.  If you are doing your own queue, then that’s the
problem.

If you are doing your own queue, then that’s the problem.

Well, in such case the OP would get into the same problem every time his app gets terminated, no matter if he had started it from the command prompt or just clicked on its icon, right. However, if I got him right, he makes it clear that the problem arises only when he terminates the app that he had started from CMD. Therefore, I believe a problem may be somehow related to outstanding references that CMD may have to the target process…

Anton Bassov

Finally looks like I found the issue and is obviously related to cancelling of IRPs.
Was busy last two days reading lot of material on the subject and two articles helped me a lot.
http://www.osronline.com/article.cfm^article=78.htm
The Truth About Cancel - IRP Cancel Operations (Part I) and (Part II).

Some of my previous assumptions are broken and hence the problem. My read calls are blocked in the driver on an event. And the event is set when ever there is a module load event callback.

The assumption I had is that device cleanup will be called when ever application is closed (does not matter how its closed : Graceful or non gracefully clicking on X button on CMD).

As of now, graceful shutdown is only possible when the application is run as service. We can do ‘sc stop service-name’ to stop service. With this, the code the service gets control and I am closing the device handle in the code. This intern calls cleanup routing of our driver and in cleanup routing, we are setting the event to wake up any pending reads. This works fine.

The problem is when app is run as console application in CMD. When X on CMD clicked, device cleanup is not called. And if no module load events are happening, the read call is struck in the driver. So, OS cannot close the thread as there is a outstanding read call.

Looks like OS calls device cleanup only after all the outstanding IO calls are done???
This is what spoiled my game. I always thought cleanup is called first even if there are pending IO on the driver during non-graceful shutdown.

Now, I have implemented cancel routine for the read call. My driver design is that it can only have one read call pending at any time. So, no queue is required and pending IRP is stored in Device Context and accessed behind an mutex. This solved the issue and now things are buttery smooth.

Thanks Tim and Anton Bassov, your inputs helped.

Can somebody help me to understand the sequence of driver calls in case of this non-graceful shutdown. Why cannot OS close usermode handles first ?? If that can be done, driver can drain any pending IO (without needing cancel routines).

HANDLEs are a process wide resource. Ignoring thread agnostic user I/O, when a thread submits an IRP the IRP is queued to the requesting thread.

When you kill a process the Process Manager goes through each thread and tries to make it go away. Part of making a thread go away is calling the I/O Manager and telling it to wait for the outstanding I/O requests submitted by the thread to complete. The I/O Manager then does its “I/O rundown” processing by dutifully attempting to cancel the IRPs and then waiting for them all to complete.

Once all the threads exit the Process Manager “sweeps” the HANDLE table (insert Karate Kid reference), thus triggering the IRP_MJ_CLEANUPs*.

Could this have been implemented differently? Sure! But, this is the way it works, so that doesn’t really matter at this point…

*For completeness: if you’re doing thread agnostic user I/O the I/O Manager does the I/O rundown processing before sending the IRP_MJ_CLEANUP. That means you still don’t get out of handling cancel.

Thanks scott. That makes sense.

In graceful shutdown scenario, the understanding is that when the device (file) handle is closed by usermode code with CloseHandle(handle), IO manager does not cancel the pending IOs on the handle automatically.

So, is it a better approach to complete pending IO in driver’s cleanup routine or is it better to make an explicit call to CancelIo?? CancleIo only cancels current thread pending IO. I don’t think the thread gets a chance to do that as its waiting on GetOverlappedResult for the pending IO to finish.

And I cannot use CancleIoEx as the application support is required for XP also (Ex is available from Vista onwards).

Of course, usermode code can totally forget about pending IO while doing a graceful termination. OS will attempt to cancel pending IO anyways.

Which is better approach??

Cancel… always use cancel, anytime an I/O Request can pend or will take a long or indeterminate amount of time to complete.

Don’t implement cleanup unless you have process-specific state to undo (like mapping memory into a process context).

Drivers almost never need cleanup, and almost always need cancel. These routines are often confused. In 7 out of 8 drivers that I code review that have a cleanup routine, it’s the result of the dev not properly understanding cancel.

Peter

@“Peter_Viscarola_(OSR)” said:
Cancel… always use cancel, anytime an I/O Request can pend or will take a long or indeterminate amount of time to complete.

Don’t implement cleanup unless you have process-specific state to undo (like mapping memory into a process context).

Drivers almost never need cleanup, and almost always need cancel. These routines are often confused. In 7 out of 8 drivers that I code review that have a cleanup routine, it’s the result of the dev not properly understanding cancel.

Peter

If we don’t cancel pending IO in cleanup and if there is a IO pending for indefinite amount of time, who will cancel it ?? If completing IO depends on an trigger from OS which will not be satisfied in the near future, we need a way to cancel pending IO.

Your recommendation will work well for cases where IO will be completed eventually (may take little longer, but completion is guaranteed). But for custom software drivers which depend on a trigger from OS to complete IO will be struck. What do you recommend.

One more issue I see not canceling pending IO in cleanup is that usermode application cannot stop the driver service it has started.

The sequence in the user-mode:

CloseHandle(DeviceHandle);
Stop Driver Service <<<< This cannot be done now as there is a pending IO (which basically means DeviceClose cannot be called, and hence service cannot be stopped as device is active).
Delete Driver Service
Exit Application.

When I tested, I see that IO is cancelled eventually during application termination by OS. But, driver service did not get a chance to stop and delete.

On Mar 31, 2019, at 10:23 AM, Vijayabhaskarreddy_CH wrote:
>
> If we don’t cancel pending IO in cleanup and if there is a IO pending for indefinite amount of time, who will cancel it ??

As Peter said, the operating system does.

However, there are really several different things going on here. There are I/O requests that you RECEIVE, and there are I/O requests that you SEND. Your responsibilities are different. When a process dies, any outstanding I/O requests on any open file handles will have cancel requested. That’s automatic. It is, of course, up to your driver to HANDLE the cancellation. “Cancel” in Windows is just a polite suggestion. Nobody forces anything to happen; your driver has to see the request and act on it.

For I/O requests that your driver has submitted, you don’t HANDLE the cancellation, you REQUEST the cancellation. If each IRP you submit was generated from an IRP that you received, then you can request cancellation during your own cancel handling. But if your driver submits IRPs on its own independently of user-mode requests, then the situation is trickier. I would think it rare for such a driver-created IRP to be related to a file handle in the first place, so cleaning them up is usually a more global operation.

> If completing IO depends on an trigger from OS which will not be satisfied in the near future, we need a way to cancel pending IO.

In this case, you are referring to IRPs that you submitted, right? Was that IRP submitted in response to a user-mode request? If so, then you cancel when you receive the cancel notification for the request you received.

> One more issue I see not canceling pending IO in cleanup is that usermode application cannot stop the driver service it has started.
>
> The sequence in the user-mode:
>
> CloseHandle(DeviceHandle);

That triggers cancellation of all outstanding requests on that handle. If there are no outstanding requests, then what is the driver waiting for?

Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

If we don’t cancel pending IO in cleanup

You cancel Requests in a cancel routine, not in a Cleanup routine. I think you’re confusing the two, which is very commmon. You want a cancel routine. Not cleanup. Lookup WdfRequestMarkCancelable or put the request in a WDFQUEUE and register a Canceled on Queue event processing callback.

I don’t know how I can be more clear than that, I’m sorry.

Peter

@“Peter_Viscarola_(OSR)” said:

If we don’t cancel pending IO in cleanup

You cancel Requests in a cancel routine, not in a Cleanup routine. I think you’re confusing the two, which is very commmon. You want a cancel routine. Not cleanup. Lookup WdfRequestMarkCancelable or put the request in a WDFQUEUE and register a Canceled on Queue event processing callback.

I don’t know how I can be more clear than that, I’m sorry.

Peter

Peter, sorry for dragging this little more.

Quote from book “Developing Drivers with the Windows Driver Foundation”:

When the application closes the handle (assuming this was the only outstanding handle), your driver receives a cleanup request, which just means that the file object has no more outstanding clients. So all the I/O for that file object can be canceled. Thus, cleanup serves as a bulk cancel to give driver the opportunity to efficiently cancel all outstanding I/O for a file.

Quote from DispatchCleanup Routines in MSDN:

In general, a DispatchCleanup routine must process an IRP_MJ_CLEANUP request by doing the following for every IRP that is currently in the device queue (or in the driver’s internal queue of IRPs), for the target device object, and is associated with the file object:

  • Call IoSetCancelRoutine to set the Cancel routine pointer to NULL.
  • Cancel every IRP that is currently in the queue for the target device object, if the file object that is specified in the driver’s I/O stack location of the queued IRP matches the file object that was received in the I/O stack location of the IRP_MJ_CLEANUP request.
  • Call IoCompleteRequest to complete the IRP, and return STATUS_SUCCESS.

I am wring a custom software driver and it is exclusive driver in the sense only one handle can be opened at a time.

If I have pending IO on the Device and if CloseHandle is called on the handle from Usermode, the pending IO remain there till driver satisfies the IO request. If driver does not satisfy the IO request for indefinite amount of time, pending IRP is hanging around till my application terminates which is not good.

So, how do we cancel this pending IO now. I can do CancelIoEx (before calling CloseHandle), but this function is not available on XP.

So, the only reliable way I can think of to cancel pending IO to do it in driver cleanup routine. The documentation also suggests the same.

Can you elaborate your thought process you mentioned above that IO cancellation cannot be done in driver cleanup routine.

Thanks in advance.

Sigh. I can see where you’re being mislead… Some of that documentation, well… it sucks.

If driver does not satisfy the IO request for indefinite amount of time, pending IRP is hanging around till my application terminates which is not good.

That’s the definition of a CANCEL scenario. There are THREE separate, different, unique events:

  1. Cancel
  2. Cleanup
  3. Close

They are related, they are not interchangable.

Read this, please, and tell me if you have any additional questions.

Peter

Vijayabhaskarreddy_CH wrote:

I am wring a custom software driver and it is exclusive driver in the sense only one handle can be opened at a time.

If I have pending IO on the Device and if CloseHandle is called on the handle from Usermode, the pending IO remain there till driver satisfies the IO request. If driver does not satisfy the IO request for indefinite amount of time, pending IRP is hanging around till my application terminates which is not good.

Is this a KMDF driver?  If it is a KMDF driver, then KMDF will
automatically cancel any incoming requests that are sitting in WDFQUEUE
objects as part of the IRP_MJ_CLEANUP processing, which gets called when
CloseHandle is called.   See
https://www.osr.com/nt-insider/2017-issue2/handling-cleanup-close-cancel-wdf-driver/
.  If you are holding requests that are not in a queue, you shouldn’t. 
That’s a design flaw.  WDFQUEUEs are cheap.  It’s completely reasonable
to create one just for holding one request at a time.

If this is a WDM driver, then you need to do it yourself.  Handle
IRP_MJ_CLEANUP, and cancel any incoming requests you are holding. Note
that “cancel” in both of these contexts means IoCompleteRequest( pIrp,
STATUS_CANCELED ).  You don’t IoCancelIrp on requests that you own.

So, how do we cancel this pending IO now. I can do CancelIoEx (before calling CloseHandle), but this function is not available on XP.

So?   XP is not relevant.  For cripes sakes, XP is old enough to vote
this year.  Software does not age like wine, it ages like milk.

Ours is WDM driver and driver can only have one outstanding IO at any given time (this our driver design). So, the pending IO is stored in DeviceContext behind a mutex.

So, since we are in WDM, we need to cancel pending IO (if any) from Cleanup routing.
I will start educating on WDF and see if our driver can be migrated to get benefit of WDFQUEUE (alone with other advantages)

XP is not relevant.

Probably correct, and good advise, for this OP… but if I may digress a bit: XP is still very much with us.

I started a major project just a few weeks ago targeting XP specifically. It’s a port of a major industrial security control system to XP.

There’s lots of XP still around: Embedded systems, industrial control, even line of business apps running on thin clients that use XP Embedded (usually with some version of the Write Filter enables). Recall that XP Embedded was, in my view at least, the absolute peak of the journey that has been Windows Embedded. Pick and choose exactly what you wanted in your system… it was complicated, and awesome.

Peter

Back to the OP’s topic, so I can rant some:

need to cancel pending IO (if any) from Cleanup routing

Arrrgh! No, no, no, no, no, no.

I repeat (and here, I differ with Mr. Roberts, apparently):

Do not handle cleanup at all. Do. Not. Support. Cleanup. Full stop. Period. End of sentence.

The “hung app problem” is handled by cancel. It is not handled by Cleanup. People who handle this problem in cleanup are doing it wrong. I really hate to repeat myself so many times. It’s frustrating.

Before your IRP pends, call IoSetCancelRoutine. When the cancel routine is called, complete the IRP with STATUS_CANCELED.

Stop thinking about cleanup. Ignore cleanup. Forget it exists. You don’t need cleanup. Nobody, almost, ever needs cleanup. I curse the WDK for not making this clear, because it’s resulted in a whole generation of drivers where people think that Cleanup is a substitute for cancel. It is not.

Go back and read all those articles in The NT Insider that we’ve referred you to again.

You gain absolutely nothing by retiring your driver in KMDF, if all you care about is fixing this hanging app problem. Call IoSetCancelRoutine, handle the callback, and be done with it.

Peter