USB device surprise removal issue

We have a WDM driver for a usb device. We are facing problems related to hanging of the remove device irp till the handle from application is released.
There was an earlier query on this posted at:
http://www.osronline.com/showThread.cfm?link=144844

The issue is like this:
application is reading/ writing to the device continuously.The device is surprise removed and then inserted back in while the Writes continues, Surprise Remove device IRP appears but not the Remove device IRP. After the writes stop and the application exits, Remove device IRP appears which was probably missing earlier and the device gets deregistered. As doron suggested earlier, we had a problem related to global device extension in our driver. We got rid of them and always take the device extension from the device object.In spite of this I am getting the remove device irp when I close the application. Using Debugger, I find that the device object is the initial one (not the one newly created at intermediate insertion). All the cleanup in the remove device handling part use the device object that comes as parameter to this function.(The old one). In spite of this the device gets logically removed from the system. Any thoughts on what we are missing here?

Regards,
Venkateswaran C G

From:
> application is reading/ writing to the device continuously.The device is
> surprise removed and then inserted back in while the Writes continues,
> Surprise Remove device IRP appears but not the Remove device IRP. After
> the writes stop and the application exits, Remove device IRP appears which
> was probably missing earlier and the device gets deregistered. As doron
> suggested earlier, we had a problem related to global device extension in
> our driver. We got rid of them and always take the device extension from
> the device object.In spite of this I am getting the remove device irp when
> I close the application. Using Debugger, I find that the device object is
> the initial one (not the one newly created at intermediate insertion). All
> the cleanup in the remove device handling part use the device object that
> comes as parameter to this function.(The old one). In spite of this the
> device gets logically removed from the system. Any thoughts on what we are
> missing here?

You won’t get a REMOVE_DEVICE until the application closes its handle, which
(it sounds like) is not happening after the first surprise removal.
Additionally, you should be cancelling any outstanding I/O when you get the
SURPRISE_REMOVE.

As to why you’re seeing the behavior you describe: your app’s handle is for
the original DEVICE_OBJECT, and the system can’t delete that object until
your app closes the handle. You would probably find that I/O directed to the
handle is failing without ever getting to your driver, too, because the
system is protecting you from using potentially stale resources.

Looking beyond your query, you may want to rearchitect your driver so that
the application opens a handle to a non-PNP device object which then
forwards requests to the real device (when it’s plugged in) or which fails
them when there’s no real device plugged in.

Walter Oney
Consulting and Training
www.oneysoft.com

Have the application call RegisterDeviceNotification with DBT_DEVTYPE_HANDLE for the device handle. As the application receives that, it MUST close the handle.

The os does not start failing io when the device is in the surprise removed state. There might be legit io that must go through and there is no way for the io manager to know what should make it and what should not. It is up to the driver to fail io in this state.

Why would you create a non pnp devobj to “work around” this problem? To prevent a devobj from sticking around in the surprise removed state for potentially a very long time? By introducing a non pnp fwding devobj you introduce more race conditions b/c now you have to handle the condition where the pnp devobj is gone while there is still a handle to the non pnpdevobj, a handle directly to the pnp devobj does not have this race. not to mention you now need an addressing scheme to support more than one active pnp device. Just stick with the pnp devobj and handle surprise removal and all will be fine

d

Sent from my phone with no t9, all spilling mistakes are not intentional.

-----Original Message-----
From: Walter Oney
Sent: Wednesday, January 21, 2009 5:55 AM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] USB device surprise removal issue

From:
> application is reading/ writing to the device continuously.The device is
> surprise removed and then inserted back in while the Writes continues,
> Surprise Remove device IRP appears but not the Remove device IRP. After
> the writes stop and the application exits, Remove device IRP appears which
> was probably missing earlier and the device gets deregistered. As doron
> suggested earlier, we had a problem related to global device extension in
> our driver. We got rid of them and always take the device extension from
> the device object.In spite of this I am getting the remove device irp when
> I close the application. Using Debugger, I find that the device object is
> the initial one (not the one newly created at intermediate insertion). All
> the cleanup in the remove device handling part use the device object that
> comes as parameter to this function.(The old one). In spite of this the
> device gets logically removed from the system. Any thoughts on what we are
> missing here?

You won’t get a REMOVE_DEVICE until the application closes its handle, which
(it sounds like) is not happening after the first surprise removal.
Additionally, you should be cancelling any outstanding I/O when you get the
SURPRISE_REMOVE.

As to why you’re seeing the behavior you describe: your app’s handle is for
the original DEVICE_OBJECT, and the system can’t delete that object until
your app closes the handle. You would probably find that I/O directed to the
handle is failing without ever getting to your driver, too, because the
system is protecting you from using potentially stale resources.

Looking beyond your query, you may want to rearchitect your driver so that
the application opens a handle to a non-PNP device object which then
forwards requests to the real device (when it’s plugged in) or which fails
them when there’s no real device plugged in.

Walter Oney
Consulting and Training
www.oneysoft.com


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

From: “Doron Holan”
> Why would you create a non pnp devobj to “work around” this problem? To
> prevent a devobj from sticking around in the surprise removed state for
> potentially a very long time? By introducing a non pnp fwding devobj you
> introduce more race conditions b/c now you have to handle the condition
> where the pnp devobj is gone while there is still a handle to the non
> pnpdevobj, a handle directly to the pnp devobj does not have this race.
> not to mention you now need an addressing scheme to support more than one
> active pnp device. Just stick with the pnp devobj and handle surprise
> removal and all will be fine

A great many of my clients want just this kind of functionality. For
example, they have a POS terminal that mimics a serial port. They want to be
able to fire up an application that never shuts down and tolerates plugging
and unplugging the real device. The apps are often legacy code that can’t be
modified to handle WM_DEVICECHANGE.

Walter Oney
Consulting and Training
www.oneysoft.com

xxxxx@broadcom.com wrote:

Have the application call RegisterDeviceNotification with DBT_DEVTYPE_HANDLE for the device handle.
As the application receives that, it MUST close the handle.

So what would you do if the app just won’t close the handle
(applications, and their authors are like that… you know.)

Regards,
– pa

So this is more than just masking away unplug/replug b/c the app has no idea that the device is gone. You now have to fake state enough such that while the usb device is unplugged, the POS app does not freak out on error and when it is plugged back it, you restore the usb device to its previous state (e.g. serialcomm settings most likely) so that the app does not have to deal with reinit. I would say this is a niche scenario where you can get away with faking the presence of hardware and b/c you cannot control the app.

d

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Walter Oney
Sent: Wednesday, January 21, 2009 7:56 AM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] USB device surprise removal issue

From: “Doron Holan”
> Why would you create a non pnp devobj to “work around” this problem? To
> prevent a devobj from sticking around in the surprise removed state for
> potentially a very long time? By introducing a non pnp fwding devobj you
> introduce more race conditions b/c now you have to handle the condition
> where the pnp devobj is gone while there is still a handle to the non
> pnpdevobj, a handle directly to the pnp devobj does not have this race.
> not to mention you now need an addressing scheme to support more than one
> active pnp device. Just stick with the pnp devobj and handle surprise
> removal and all will be fine

A great many of my clients want just this kind of functionality. For
example, they have a POS terminal that mimics a serial port. They want to be
able to fire up an application that never shuts down and tolerates plugging
and unplugging the real device. The apps are often legacy code that can’t be
modified to handle WM_DEVICECHANGE.

Walter Oney
Consulting and Training
www.oneysoft.com


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

From: “Doron Holan”
> So this is more than just masking away unplug/replug b/c the app has no
> idea that the device is gone. You now have to fake state enough such that
> while the usb device is unplugged, the POS app does not freak out on error
> and when it is plugged back it, you restore the usb device to its previous
> state (e.g. serialcomm settings most likely) so that the app does not have
> to deal with reinit. I would say this is a niche scenario where you can
> get away with faking the presence of hardware and b/c you cannot control
> the app.

Yes, it requires a great deal of state fakery, and I agree it’s a niche
problem. So I’m perfectly happy for you guys not to solve it!

Walter Oney
Consulting and Training
www.oneysoft.com

Thanks to everyone for your responses.

I am in a situation where I do not control the application code. In any case I would like the driver to be robust irrespective of what the application does.

As Doron said, In case of application not releasing handle and still attempting IO after removal,It is made to fail by the driver in this case, not the OS.

My problem is not that about the IO’s but with a REMOVE_DEVICE IRP that comes out of turn due to app not releasing handle. This caused the device to get logically removed during cleanup.

Earlier I had fixed this by identifying the problem REMOVE_DEVICE IRP by tracking the states. In normal course it should follow either surprise remove or query removal. If not, i used to simply complete the IRP without doing any processing. This caused driver verifier to Bugcheck

DRIVER_VERIFIER_IOMANAGER_VIOLATION (c9)

The IO manager has caught a misbehaving driver.

Arguments:

Arg1: 0000021d, (Fatal error) An IRP dispatch handler has not properly detached from the stack below it upon receiving a remove IRP. (DeviceObject, Dispatch Routine, and IRP specified.)

Today, I came up with a new solution. After analyzing the device objects from succesive insertions, I found that the only common thing that I am cleaning up is the symbolic link. I crated a global pointer to hold the symbolic link and deffered setting it to false, to the driver unload function. With this, the problematic REMOVE_DEVICE IRP handler will only do cleanup on what the old device object refers to. The new device object is not affected and hence device does not get logically removed.

What do you think about this solution. Is this an acceptable way to deal with this?

Regards,
Venkateswaran C G

> I am in a situation where I do not control the application code.

Note that even with a 100% bug-free driver, if the app is not subscribed to the WM_DEVICECHANGE notification, detaching and re-plugging the hardware will require the app to be restarted.

App-side fix for this is trivial - just add WM_DEVICECHANGE handler and re-open the device.

Driver-side for this is rather complex, you will need a 2nd shim driver (like Walter Oney described here) which will listen for PnP arrival/removal and fix the state issues across device re-plug.

So, it is a good idea to talk to your another department which controls the app’s code :slight_smile: This is a question of 1-2 man*days or 1 man*month.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Maxim S. Shatskih wrote:

> I am in a situation where I do not control the application code.

Note that even with a 100% bug-free driver, if the app is not subscribed to the WM_DEVICECHANGE notification, detaching and re-plugging the hardware will require the app to be restarted.

App-side fix for this is trivial - just add WM_DEVICECHANGE handler and re-open the device.

Driver-side for this is rather complex, you will need a 2nd shim driver (like Walter Oney described here) which will listen for PnP arrival/removal and fix the state issues across device re-plug.

So, it is a good idea to talk to your another department which controls the app’s code :slight_smile: This is a question of 1-2 man*days or 1 man*month.

There is a known way (or call it a hack)
to forcibly close app handles. For example, see
http://safelyremove.com/fullFeaturesList.htm

/* IMHO it’s a pity that Windows does not provide similar functionality,
so a misbehaving usermode app can interfere with unloading of PnP driver
(yes I know Doron’s opinion that a well designed driver can
cope with such bad apps …) */

Regards,
–pa

A couple of suggestions

  1. do not keep ANY globals at all. Get rid of every last one. This will force you to keep state per device object and to maintain that state in the right spot.

  2. read the wdk rules about pnp irps proactively and apply to them to your driver. Do not randomly change behavior in the driver and hope that it sticks (And is correct)

…and of course, I would suggest you strongly consider KMDF for your driver instead of WDM. KMDF takes care of all of these details for you and lets you focus in on what you want the driver to do instead of all the rules it must follow

d

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
Sent: Thursday, January 22, 2009 6:31 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] USB device surprise removal issue

Thanks to everyone for your responses.

I am in a situation where I do not control the application code. In any case I would like the driver to be robust irrespective of what the application does.

As Doron said, In case of application not releasing handle and still attempting IO after removal,It is made to fail by the driver in this case, not the OS.

My problem is not that about the IO’s but with a REMOVE_DEVICE IRP that comes out of turn due to app not releasing handle. This caused the device to get logically removed during cleanup.

Earlier I had fixed this by identifying the problem REMOVE_DEVICE IRP by tracking the states. In normal course it should follow either surprise remove or query removal. If not, i used to simply complete the IRP without doing any processing. This caused driver verifier to Bugcheck

DRIVER_VERIFIER_IOMANAGER_VIOLATION (c9)

The IO manager has caught a misbehaving driver.

Arguments:

Arg1: 0000021d, (Fatal error) An IRP dispatch handler has not properly detached from the stack below it upon receiving a remove IRP. (DeviceObject, Dispatch Routine, and IRP specified.)

Today, I came up with a new solution. After analyzing the device objects from succesive insertions, I found that the only common thing that I am cleaning up is the symbolic link. I crated a global pointer to hold the symbolic link and deffered setting it to false, to the driver unload function. With this, the problematic REMOVE_DEVICE IRP handler will only do cleanup on what the old device object refers to. The new device object is not affected and hence device does not get logically removed.

What do you think about this solution. Is this an acceptable way to deal with this?

Regards,
Venkateswaran C G


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer