Time to complete PnP query remove

OSR_Community_User · October 2, 2012, 4:31am

For the sake of an easy example, let’s assume I’m implementing something similar to the iSCSI initiator. I have a root enumerated virtual storage minport (non-boot/non-paging disk), that talks to one of a couple kinds of transport drivers. The storage driver and the transports exchange function pointers via QueryInterface (discovered via interface notifications), so have a direct call interface, for best performance.

So let’s say a user wants to update the transport driver, the PnP manager will send a query remove, and being a good PnP citizen the transports want to agree, but has to tell the storage driver the path to the remote disk is becoming unavailable. Ideally, the storage driver could have an callback from the transport, and try to do a eject operation on the disk, which if everybody is PnP friendly (wishful thinking?) will cause the file system to flush it’s buffers and the disk is unmounted cleanly. Wait, it’s a miniport, can I even do a gentle eject of the child disks? I can certainly say the bus has changed, and make the disks surprise vanish. Somebody remind me why try to do virtual disks using a virtual storport instead of a kmdf bus driver that exposes disk pdos?

Unmounting a disk is not always going to be an instant operation, so the question is: how long can the query remove to the transport take to complete before the PnP subsystem concludes a reboot will be required. At a minimum, any queued i/o to the transport will need to be completed (success or canceled), and on a big storage system you might have hundreds/thousands of outstanding requests across multiple LUNs. If you really are just updating a transport driver, the storage driver could in theory hold all the requests, and when the updated transports is available again, start up i/o’s again (well probably not, because the whole lun discovery process may have to happen first, although I think there is an srb return code that means try again in a little bit).

Or, perhaps the right thing is to fail the transport query remove if any volumes are online, although how do you tell if a disk is online or offline from a storage miniport? Miniports don’t exactly get open/close irps.

The more I describe this the more I think the question is: is it hopeless to try and be a good PnP citizen for a storage driver, although if you pull the network cable on a iSCSI disk, it’s a while before the initiator declares the disk gone. It seems like last time I debated this, a few years ago, the conclusion was all you really could do was tell users unmount your disks before you do anything that makes the disk surprise removed. Perhaps I should do some experiments on iSCSI disks, as I’m pretty sure you can disable a NIC in device manager even if it’s being used by the TCP transport for iSCSI traffic.

I’ve never had much faith in users following a procedure (even system management folks for servers), so generally believe software just had to do the right thing and cope the best it can with whatever actions the user takes.

I believe it’s a requirement of WHQL certification that drivers can be started and stopped in any order, unless there is a bus/child relationship, in which case the PnP manager controls the ordering, and if I don’t handle remove query, that will not be the case.

Jan

Alex_Grig · October 2, 2012, 11:19am

Your QUIERY_REMOVE should not be concerned about disks. That’s up to the upper layers and the applications. I don’t think any stor miniport ever handles it.