Device cleanup/close routines not called, and hence application does not exit

Peter_Viscarola_OSR · April 1, 2019, 3:43pm

Sigh. I can see where you’re being mislead… Some of that documentation, well… it sucks.

If driver does not satisfy the IO request for indefinite amount of time, pending IRP is hanging around till my application terminates which is not good.

That’s the definition of a CANCEL scenario. There are THREE separate, different, unique events:

Cancel
Cleanup
Close

They are related, they are not interchangable.

Read this, please, and tell me if you have any additional questions.

Peter

Tim_Roberts · April 1, 2019, 5:26pm

Vijayabhaskarreddy_CH wrote:

I am wring a custom software driver and it is exclusive driver in the sense only one handle can be opened at a time.

If I have pending IO on the Device and if CloseHandle is called on the handle from Usermode, the pending IO remain there till driver satisfies the IO request. If driver does not satisfy the IO request for indefinite amount of time, pending IRP is hanging around till my application terminates which is not good.

Is this a KMDF driver? If it is a KMDF driver, then KMDF will
automatically cancel any incoming requests that are sitting in WDFQUEUE
objects as part of the IRP_MJ_CLEANUP processing, which gets called when
CloseHandle is called. See
https://www.osr.com/nt-insider/2017-issue2/handling-cleanup-close-cancel-wdf-driver/
. If you are holding requests that are not in a queue, you shouldn’t.
That’s a design flaw. WDFQUEUEs are cheap. It’s completely reasonable
to create one just for holding one request at a time.

If this is a WDM driver, then you need to do it yourself. Handle
IRP_MJ_CLEANUP, and cancel any incoming requests you are holding. Note
that “cancel” in both of these contexts means IoCompleteRequest( pIrp,
STATUS_CANCELED ). You don’t IoCancelIrp on requests that you own.

So, how do we cancel this pending IO now. I can do CancelIoEx (before calling CloseHandle), but this function is not available on XP.

So? XP is not relevant. For cripes sakes, XP is old enough to vote
this year. Software does not age like wine, it ages like milk.

Vijayabhaskarreddy_CH · April 3, 2019, 4:00am

Ours is WDM driver and driver can only have one outstanding IO at any given time (this our driver design). So, the pending IO is stored in DeviceContext behind a mutex.

So, since we are in WDM, we need to cancel pending IO (if any) from Cleanup routing.
I will start educating on WDF and see if our driver can be migrated to get benefit of WDFQUEUE (alone with other advantages)

Peter_Viscarola_OSR · April 3, 2019, 12:01pm

XP is not relevant.

Probably correct, and good advise, for this OP… but if I may digress a bit: XP is still very much with us.

I started a major project just a few weeks ago targeting XP specifically. It’s a port of a major industrial security control system to XP.

There’s lots of XP still around: Embedded systems, industrial control, even line of business apps running on thin clients that use XP Embedded (usually with some version of the Write Filter enables). Recall that XP Embedded was, in my view at least, the absolute peak of the journey that has been Windows Embedded. Pick and choose exactly what you wanted in your system… it was complicated, and awesome.

Peter

Peter_Viscarola_OSR · April 3, 2019, 12:10pm

Back to the OP’s topic, so I can rant some:

need to cancel pending IO (if any) from Cleanup routing

Arrrgh! No, no, no, no, no, no.

I repeat (and here, I differ with Mr. Roberts, apparently):

Do not handle cleanup at all. Do. Not. Support. Cleanup. Full stop. Period. End of sentence.

The “hung app problem” is handled by cancel. It is not handled by Cleanup. People who handle this problem in cleanup are doing it wrong. I really hate to repeat myself so many times. It’s frustrating.

Before your IRP pends, call IoSetCancelRoutine. When the cancel routine is called, complete the IRP with STATUS_CANCELED.

Stop thinking about cleanup. Ignore cleanup. Forget it exists. You don’t need cleanup. Nobody, almost, ever needs cleanup. I curse the WDK for not making this clear, because it’s resulted in a whole generation of drivers where people think that Cleanup is a substitute for cancel. It is not.

Go back and read all those articles in The NT Insider that we’ve referred you to again.

You gain absolutely nothing by retiring your driver in KMDF, if all you care about is fixing this hanging app problem. Call IoSetCancelRoutine, handle the callback, and be done with it.

Peter

Tim_Roberts · April 3, 2019, 9:27pm

Fair enough, and this presents an excellent opportunity to get a definitive answer in the public record, so please make sure the court stenographer is recording this.

I have been waffling about this issue, and that waffling exposes a deep-seated and fundamental hole in my understanding of Windows. I have scanned literally dozens of articles in the past few days as this conversation has gone on, trying to find the definitive answer to a key question, and I find nothing in either MSDN or driver sources or this forum or the hallowed OSR NT Insider knowledge base that unambiguously answers it.

There are sources that clearly say that when a thread terminates, all of its outstanding I/O operations are canceled. Fair enough, you’d expect that. But I have never been able to find a source that definitively says the same thing happens when the last handle to a file object is closed. In fact, the evidence suggests the exact opposite: many conversations on this forum have cautioned “when a handle is closed, it is the responsibility of the driver to cancel any outstanding I/O requests.” The KMDF documentation clearly says that the framework cancels outstanding I/O operations when the handle is closed. The framework – not the kernel.

So, which is it? If pending IRPs are not automatically canceled on the last close, then a WDM driver must handle IRP_MJ_CLEANUP so it can do the cancellations. If not, the handle will never be closed. But if the I/O manager is automatically cancelling pending IRPs, then it is absolutely true that having an IRP cancel handler is sufficient.

I HOPE the latter is true. If so, then many documentation pages are wrong. They should not say “I/O operations are canceled when a thread terminates”. They should say “thread termination triggers a closing of all file handles, and closing a file handle triggers cancellation of outstanding I/O operations.” It would be nice to have a page that describes in detail the process of closing a file handle

anton_bassov · April 3, 2019, 11:16pm

They should not say “I/O operations are canceled when a thread terminates”. They should say “thread termination triggers a closing of all >file handles, and closing a file handle triggers cancellation of outstanding I/O operations.” It would be nice to have a page that describes in >detail the process of closing a file handle

What about the scenario when someone calls ObRefrencexxx() on the target FILE_OBJECT? In this case it may still be in use by someone despite all the handles to it having had been closed…

Anton Bassov

Peter_Viscarola_OSR · April 4, 2019, 2:44am

If pending IRPs are not automatically canceled on the last close, then a WDM driver must handle IRP_MJ_CLEANUP so it can do the cancellations.

We know how it works in WDF. So, let’s restrict ourselves to WDM.

As far as I know, in WDM it’s entirely up to the driver to define what happens when a handle is closed. I’ll look it up (again) but I’m pretty sure closing a handle doesn’t necessarily imply terminating I/O on that handle. Why would it? You can easily argue that by closing the handle, the user is merely saying “I don’t intend to submit any further I/O operations via this handle”… this does not imply “I wish to terminate all pending I/O on this handle”… after all, there’s a different API for that.

If the I/O stays in progress after CloseHandle (that is, IRP_MJ_CLEANUP) those IRPs simply stay in progress, they each have a reference on the File Object. They complete in their turn, and when the ref count in the File Object goes to zero, the driver gets the Close.

Or the thread exists before all the IRPs complete and the IRPs get run down and canceled.

Either way… no harm, no foul. And no need to handle Cleanup.

Again, if the user wanted the I/O canceled, they would have called CanelIoEx.

Like I said, I’ll look it up… but that’s the way I understood it to work.

Peter

Tim_Roberts · April 4, 2019, 6:50am

On Apr 3, 2019, at 7:44 PM, Peter_Viscarola_(OSR) wrote:
>
> As far as I know, in WDM it’s entirely up to the driver to define what happens when a handle is closed. I’ll look it up (again) but I’m pretty sure closing a handle doesn’t necessarily imply terminating I/O on that handle. Why would it? You can easily argue that by closing the handle, the user is merely saying “I don’t intend to submit any further I/O operations via this handle”… this does not imply “I wish to terminate all pending I/O on this handle”… after all, there’s a different API for that.

I’m not sure I buy your argument. If I close a handle, then it seems to me that the handle should be DEAD. Invalid. Unusable. So, what right do I have to pass that handle to GetOverlappedResult to find their status?

You may be right – and I’m hoping we can find out for sure – but it seems to me that closing a handle tells the system “all I/O on this handle is finished.” By the time CloseHandle returns, I, personally, want to know that all I/O activity on that handle is completed.

> If the I/O stays in progress after CloseHandle (that is, IRP_MJ_CLEANUP) those IRPs simply stay in progress, they each have a reference on the File Object. They complete in their turn, and when the ref count in the File Object goes to zero, the driver gets the Close.

OK, but the original poster’s situation involved a long-term request in a software driver. Where does that get cleaned up?

> Again, if the user wanted the I/O canceled, they would have called CanelIoEx.

We cannot allow a misbehaving application to induce our driver to lock up a process.
—
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

anton_bassov · April 4, 2019, 7:54am

it seems to me that closing a handle tells the system “all I/O on this handle is finished.”

Consider the scenario when you close a file handle immediately after having submitted an overlapping write request.
Do you really think that closing a handle should necessarily invalidate a request that is still pending, although the data to be written still hanging out somewhere in a disk’s queue ?

Anton Bassov

Peter_Viscarola_OSR · April 4, 2019, 12:29pm

If I close a handle, then it seems to me that the handle should be DEAD. Invalid. Unusable. So, what right do I have to pass that handle to GetOverlappedResult to find their status?

You’re right. It’s a terrible API. And IIRC there are some warnings in the docs about this exact situation. Something about not being able to trust the returns in OVERLAPPED after Close? Somewhere? And some weirdness about completion ports during CloseHandle as well?That’s from memory, though, and what I know about Win32 can fit in a thimble.

By the time CloseHandle returns, I, personally, want to know that all I/O activity on that handle is completed.

Hmmmm… with all due respect, that sounds to me like the guy who writes “I send 3 asynchronous reads, and I personally want to know that all three have completed when the third one gets completed.” OK, but that’s not how the system works.

Again, I will look this up… but the way I understand it, is that you can do what you’re saying you want to do… just not via close handle.

And there’s nothing preventing a driver from implementing the behavior you describe… and that is, we know, the behavior that WDF gives you.

OK, but the original poster’s situation involved a long-term request in a software driver. Where does that get cleaned up?

When the process exits, during I/O rundown. That’s the very definition of cancel processing, right?

We cannot allow a misbehaving application to induce our driver to lock up a process.

Agreed. And we don’t. The I/O request either completes of its own accord, or it gets Canceled (IoSetCancelRoutine) during thread exit.

More later…

Peter

Tim_Roberts · April 4, 2019, 4:26pm

anton_bassov wrote:

> it seems to me that closing a handle tells the system “all I/O on this handle is finished.”
Consider the scenario when you close a file handle immediately after having submitted an overlapping write request.

Do you really think that closing a handle should necessarily invalidate a request that is still pending, although the data to be written still hanging out somewhere in a disk’s queue ?

Absolutely, yes. That’s the only deterministic scenario, isn’t it? How
is it NOT a huge application bug to close a file handle when you know
the underlying file still has activity that you instituted? How can
that asynchronous request be completed when the application has shut the
door on responses?

I understand that I’m losing this battle, but I think the consensus
answer here is loony.

OK, you used the word “invalidate,” which has a different connotation
from “cancel”. If an operation has been committed, it is naturally
going to complete before cancellation is acknowledged. If it has not
been committed, then it should be canceled.

Tim_Roberts · April 4, 2019, 4:35pm

Peter_Viscarola_(OSR) wrote:

Hmmmm… with all due respect, that sounds to me like the guy who writes “I send 3 asynchronous reads, and I personally want to know that all three have completed when the third one gets completed.” OK, but that’s not how the system works.

You may be right, and you present good evidence, but even in this
distinguished company, I’m still hearing a lot of “well, I think” and
“well, it seems to me” and not a lot of “it definitely says that…”.
My hope is that we can get the One True Answer here.

And there’s nothing preventing a driver from implementing the behavior you describe… and that is, we know, the behavior that WDF gives you.

Yes, but implementing that behavior requires handling IRP_MJ_CLEANUP,
right? That’s exactly where KMDF does this operation.

> OK, but the original poster’s situation involved a long-term request in a software driver. Where does that get cleaned up?
When the process exits, during I/O rundown. That’s the very definition of cancel processing, right?

Is it? Whose definition? I hear what you’re saying – I really do –
but why is it sensible that cleaning up I/O operations should be
triggered by thread exit? The process subsystem and the I/O subsystem
should not, in a rational design, have such a tight coupling. Thread
exit should close handles, and closing a handle should trigger I/O
cleanup on that handle.

Bill_Zissimopoulos · April 4, 2019, 4:37pm

I am fairly certain that when the last handle to a FILE_OBJECT is closed, IRP_MJ_CLEANUP will be sent but pending IRP’s will not be cancelled (I believe an IRP_MN_UNLOCK_ALL may also be sent if there has been a lock operation on the FILE_OBJECT).

Why would they be cancelled? For example, in the context of a file system driver a user mode process may have memory mapped the file and then close all handles to it. This is a supported scenario on Windows. So there may be legitimate user mode READ/WRITE IRP’s in flight when the handle is closed that should not be cancelled.

I apologize if I have misunderstood the context of this discussion.

anton_bassov · April 4, 2019, 10:42pm

How is it NOT a huge application bug to close a file handle when you know the underlying file still has activity that you instituted?
How can that asynchronous request be completed when the application has shut the door on responses?

Actually, this was exactly my point…

Certainly, this approach is very obviously a sloppy one -who would even argue about it. However, by applying your logic, the kernel’s actions are going to depend on whether the app in question is properly-written, which is most definitely not the way a properly-designed kernel should work.

Just to give you an idea,in this particular example the outcome of the write operation would depend solely on the particular moment the app had closed a handle. It does not really sound like a consistent approach on the kernel’s behalf, does it…

Anton Bassov

Peter_Viscarola_OSR · April 4, 2019, 10:52pm

why is it sensible that cleaning up I/O operations should be triggered by thread exit?

Well, the rundown has to happen on process exit… and thread exit is really just an implementation detail, as we learned from the introduction of thread agnostic I/O. Handles have always been a process-wide resource. Architecturally, I suspect cleaning up I/O on thread exit was merely an implementation choice made a long time ago… and not the best one. The rundown should happen on Process exit, not thread exit.

in the context of a file system driver

And we know that in the context of a file system, you can indeed get reads and writes after receiving the cleanup IRP. You can even get this in a device driver… provoking another common bug (“I got the cleanup, and I won’t get any more IRPs on this handle”… wrong!).

Peter

Peter_Viscarola_OSR · April 5, 2019, 5:13pm

So, as promised, I spent some time this morning looking this up, and I have what I believe to be the definitive answer.

For those of you playing along at home, the question is “When a user mode application closes the last handle to a File Object, are the IRPs associated with that handle canceled?” Where canceled specifically means “the cancel routine stored in the IRP at Irp->CancelRoutine is called by the I/O Manager” and where the scope of this question is restricted to WDM drivers.

Those who have been around a while won’t be surprised to know that the answer to this seemingly simple question turns out to be “It depends.”

When the last handle to a File Object is closed, the I/O Manager will send an IRP_MJ_LOCK_CONTROL, IRP_MN_UNLOCK_ALL (and wait for it to complete) if the process issuing the close has ever done a lock operation on the file. Thank you, Mr. Zissimopoulos… I didn’t recall every reading this code before and it was fun to discover it.

The I/O Manager will then send an IRP_MJ_CLEANUP and wait for it to complete.

And now, finally, the answer to the pending question:

The I/O Manager will cancel pending IRPs that are queued to the File Object. This will be the case when the process is using Completion Ports for this File Object, or there is a “locked IOSB range” (which implies the process called to SetFileIoOverlappedRange). The feature by which IRPs can be queued to the File Object is referred to as Thread Agnostic I/O, and it was introduced in Windows Vista. Like so many things, you can read Mr. Holan’s description of Thread Agnostic I/O in his excellent blog post from 2006 (which, thankfully, as of the current moment has not gone away).

IRPs that are not queued to the File Object will not be canceled. This was the universal behavior pre-Vista (when Thread Agnostic I/O was introduced).

And after reading all that code, which felt mighty familiar, I believe Mr. Noone and I had this exact conversation some time (years) ago… and we looked up this code… and came to the same conclusion.

So that, I think, puts to rest Mr. Roberts question.

Sort of. Because, I decided to take the next step, and look to see what WDF does in this situation… because the Framework docs are clear on the fact that after the Cleanup is issued, a Cancel occurs. And what I discovered surprised me.

It seems that, after calling EvtFileCleanup, the Framework only cancels IRPs that are on its own Queues. That is, it does not cancel IRPs that are actually pending in the driver. You can see the code, very clearly, here. And every time I can write that, I thank the WDF development and PM teams for making the sources to WDF public. Now if we can just get MSFT to do the same for the I/O Manager, we’ll all be happier. But I digress.

So… that’s the story.

Peter
(Source code access, and the resulting information being shared with the community, done under license/authorization from the Microsoft MVP program)

Peter_Viscarola_OSR · April 7, 2019, 10:20pm

@Tim_Roberts

Mr. Roberts… did you notice that I did finally answer your pending query on the whole cleanup / cancel mess?

P

Tim_Roberts · April 8, 2019, 3:49am

On Apr 7, 2019, at 3:20 PM, Peter_Viscarola_(OSR) wrote:
>
> @Tim_Roberts
>
> Mr. Roberts… did you notice that I did finally answer your pending query on the whole cleanup / cancel mess?

Yes, sir. I appreciate your efforts and am grateful to have an answer to a question that has nagged me for years.
—
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

Peter_Viscarola_OSR · April 8, 2019, 11:48am

NP. Just wanted to be sure you had seen it go by in your email stream.

Sadly, the answer was far less than satisfying, at least from m,my point of view. Architectural mess: I don’t like “some are, some aren’t” … ugh.

Peter