Final IRPs before system/device power off

Hello.

According to the documentation here: https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/handling-device-power-down-irps

“The driver should queue any incoming I/O requests until the device has returned to the working state.”

Ok. But let’s assume this is the result of a PowerSystemShutdown, or a D3 cold IRP. Anything queued will be, well, lost. Because once I send the IRP down the stack, there’s no coming back. Those devices will never accept writes.

I’m unable to find any documentation about whether its acceptable to fail IRPs or just drop them past a certain point. One thing I’ve noticed is that even after I’ve received a shutdown IRP there are often new IO requests still coming in (Easy test: start a non-quick format on a volume and then shutdown the system).

Presumably, there must be some “safe” cut-off point where a device cannot be reasonably expected to process IRPs, but I can’t figure out if or where this is defined. I’ve followed the documentation to use PoSetPowerState, hoping that if I declare my upper level driver to be in powered off, then new IRPs will no longer be sent, but this assumption appears incorrect.

It’s critical that my driver be the last writer to the device, but I can’t find any way to ensure such a guarantee without setting HOLD_NEW_REQUESTS, marking irps pending (and then never really processing them), or failing them outright. How can I accomplish this?

Any help is much appreciated.

Thanks

Every stack gets to make their own rules. If I’m writing the driver for the Bob device and want to fail requests during shutdown then no one can stop me. If I’m writing a keyboard driver maybe I just let keystrokes queue up even though they’ll be “lost”.

If you’re a filter then you’re subject to the rules of the stack that you’re in. In most cases, the rules are simply by way of implementation so it’s not like you can go look it up someplace (assuming that you don’t have access to source). Based on your post, I’m assuming that you’re filtering somewhere in one of the four storage related stacks, which of course have their own special rules.

I’m not sure where you are filtering, but here’s a quick drawing of the relevant stacks for discussion purposes:

The file system enforces that no write operations are sent to the volume stack after IRP_MJ_SHUTDOWN is processed (see the FastFat source and its handling of VCB_STATE_FLAG_SHUTDOWN). Note that there are in fact three different shutdown notifications. The shutdown that the file systems use is called in between IoRegisterShutdownNotification and IoRegisterLastChanceShutdownNotification and is passed along to the volume stack (again, see FastFat).

I don’t know of any other such “rules” for the other storage related stacks. At some point the power down will come and requests will get queued up. However, anyone that actually cared about their requests hitting the media before shutdown should have flushed to disk when they received the shutdown notification. There is no option provided to say, “I want to be the last writer” that will be honored up and down all of the related stacks (i.e. a filter beneath you could write to the volume after you and you’d never know it).

How much this matters depends on what you’re trying to do. This most commonly comes up when someone is trying to do Change Block Tracking (CBT) and they want to persist their changes across reboots. About the only thing you can go on in that case is that the file system is guaranteed consistent after the IRP_MJ_SHUTDOWN. I’m not sure why you would still see writes in the other stacks after the file system is shutdown, but I’d need to know more about where you’re filtering and where the writes are coming from. It could just be an artifact of it being a format operation with the file system dismounted (I’ve never cared about writes during shutdown in this case).

Thanks a bunch for your response, Scott.

This is a disk filter and indeed it’s related to CBT. I’m not particularly afraid of the storport adapter performing IO after I’m gone, but this is a block level filter so I absolutely must get all the upper (fs/volume) writes recorded for the next reboot.

“anyone that actually cared about their requests hitting the media before shutdown should have flushed to disk when they received the shutdown notification”

That’s the thing about formatting a partition. Those IOs keep coming in, long after Shutdown, LastChanceShutdown, PowerSystemShutdown, and most recently discovered, even DevicePowerState! They just don’t quit. I’ve pushed my final write back as far as it seems possible, but it’s just not enough. Now obviously I’m being silly here and shutting down in the middle of a format operation, and that’s not a real use case, but I’m afraid it’s a red flag that I’m not doing something correctly.

My gut instinct is to mimic the behaviour of the underlying disk, but I don’t know what that is. Every time I’ve tried to send IRPs down after I’ve passed the IRP_MJ_POWER down, I basically get no reply. Completion routines don’t get called, etc. So if they are allowed to go AWOL with my IRPs, can I say the same for any new writes my driver gets during power off?

start a non-quick format on a volume and then shutdown the system

Yes, but unless the entire format finishes, the volume being formatted won’t be valid when the system is restarted (assuming a real shutdown)… regardless of what you do, right?

Bear in mind Windows goal for FS consistency isn’t to “always save every I/O ever written” but rather is to allow the file system to be recovered to some previous point in time. So, you need to aim for saving stuff through the last metadata write, then your write, then hold everything thereafter.

So, I’d say… you get Last Chance Shutdown notification, send your write, hold any further writes you get, and call it done.

Mr. Noone, who’s actually designed and built a commercial CBT solution or two, can give you the definitive guidance.

Peter

OK, I understand the confusion now…This will likely be a repeat of things we already know, but I’ll start from the beginning:

IRP_MJ_SHUTDOWN is really only about “shutting down” the file system, which means putting the file system on the volume into some consistent state before the system is powered down. The file systems take this opportunity to flush cached files and metadata, snapshot journals, etc. Once the file system is shutdown, the file system blocks any writes going to the volume as that might leave the file system in an inconsistent state. You can be called before the file system shuts down (IoRegisterShutdownNotification) so you can write things out while the file system is still available, or after the file system shuts down (IoRegisterLastChanceShutdownNotification) at which point you know the file systems are in a consistent, read only state.

Below the file system, there is no concept of “consistent state” as everyone is just a dumb block device ([“I lift blocks up and put them down”](https://www.youtube.com/watch?v=eOSeOieLh7s ““I lift blocks up and put them down””)). It’s assumed that the structure is maintained by a higher power such as the file system.

In the case of a format operation, the file system is dismounted and the application is allowed to write directly to the volume. There is no one to listen to/honor the shutdown notification, so the application is allowed to write until the volume (and its corresponding disk(s)) is powered down and the writes start queueing up. On reboot, you get what you get (in my case I got a corrupted volume because the format didn’t finish) because, unlike the file system, the application is not holding up the power state transition while it puts the volume into a consistent state.

At the LUN level I’m not sure that you can do anything but not make things any worse. By that I mean if you’re writing to a disk during shutdown, at some point the D3 IRP arrives and the disk is powered down. If I send 100 writes during that time, there are no guarantees about which requests complete before the power down. If your filter makes that cut off determination before the driver beneath you I’m not sure that actually materially changes the behavior of the stack.

I’ve never done CBT at the LUN level so it’s possible there are other considerations (e.g. you’ll want to think about how this plays with Storage Spaces).

Thanks guys. I’ve implemented this as we’ve discussed.