IRP_MN_START_DEVICE and IRP_MN_STOP_DEVICE semantics

Ijor,

I can answer that question for you, or at least a form of it.

Today, the writer of a driver for a PCI device needs to be prepared for the possibility that power will go away across a Stop. The reason is that the ACPI driver will call into the firmware’s power down handling, which might do something to the platform power supplied to the device. The PCI driver doesn’t actually put the device in D3, but the ACPI driver’s behavior means that power down during Stop is something you have to handle. The DDK docs, crappy as they are, do call this out. Bus drivers are allowed to power down a device, and I don’t know enough about other common bus drivers to know their behavior for sure.

Personally, I think you make a strong argument that bus drivers shouldn’t do this. That said, changing the ACPI driver’s behavior would, as you say, have unknown compatibility implications that make it unlikely we’d change this in the future. Plus, because of the existing DDK rules there is no telling what another filter or bus driver that entered into the mix might do. Maybe a Stop sent to a bus further up in the tree would involve a driver that - following the DDK rules - managed to pull main power from the entire tree. This is an esoteric example but you see what I’m saying.

Hope that at least partially clears up the situation.

Dave

Hi Dave,

Davis Walker wrote:

I can answer that question for you, or at least a form of it

Hope that at least partially clears up the situation.

Thank you very much for your reply. Yes, I think you cleared up things very well.

So if I understand correctly, the rule should be: “Drivers should not power-down the device on STOP, but yet drivers must be prepared for the possibility that the device *might* loss power”.

I anticipated something like this for a moment, but then I checked this at the hardware level and I couldn’t find a real need for any power-down. I didn’t consider it could be related to compatibility reasons at the ACPI driver level.

I must say that I now think the KDMF behavior for a function driver seems to be perfect. It follows all the steps needed in anticipation for an *eventual possible* power-down.

The only minor issue would be for a bus driver. The framework should, IMHO, somehow avoid that the EvtDeviceD0Exit Pdo callback on STOP would actually power-down the device. Then the behavior would be consistent with the in-box bus drivers.

Thank you. I designed that KMDF behavior, with this specifically in mind.

Furthermore, and to add to what Davis Walker and Peter Viscarola said, I
believe that the correct behavior actually differs from bus architecture to
bus architecture. PCI shouldn’t power down a device unless the BIOS says it
has to (which is the same thing as saying that ACPI has to be compatible
with existing motherboards.) USB might be completely different.

KMDF was built to be a completely generic implementation of a device driver,
one that can be dropped into any bus and it will function. When building
such a generic thing, we constantly had to make choices between preserving
perfectly generic behavior and simplifying driver development. WDM allows a
perfectly generic implementation, one that can be tailored to any possibly
situation (almost.) But the consequence is that it’s phenomenally
complicated to implement and many drivers end up supporting crippled
functionality just so that the author of the driver can meet deadlines and
test his or code.

There are places where KMDF asks the driver writer to make decisions or
supply policy for it and other places where it arbitrarily choses a
particular behavior. If we had exposed every possible decision, KMDF would
be a lot less comprehensible and thus it would be less likely to facilitate
the creation of rock-solid drivers. In order to allow a perfectly generic
implemenation, we allowed the client drivers to hook any IRP coming into
KMDF and override its behavior. I’m not entirely sure if it would be
possible to override the power down in a STOP without throwing out much of
the rest of the PnP/Power model in KMDF, but I think it is.

This brings up the following question. Could we have (re)written our PCI
driver using KMDF? Probably not. We decided that, for KMDF 1.x at least,
we weren’t trying to target the most complex of bus drivers. If KMDF allows
you to write 95% of all possible bus drivers (excluding only PCI and ACPI)
with a fairly simple conceptual model, we’ve succeeded. That was the design
goal, by the way: all bus drivers except PCI and ACPI. Those two are
complex enough that we wouldn’t serve anybody by complicating KMDF and
delaying its release by another two years.

I had another goal, by the way, when I layed out this part of KMDF. I knew
that we rarely executed the rebalance code(*) and I wanted to be sure that
every KMDF driver automatically just worked when a rebalance came along. So
we modeled it as a normal partial tear-down so that the callback functions
invoked would be ones that were well tested. We went to great pains to make
sure that the contracts on those callbacks didn’t differ during a rebalance
from an uninstallation, for example. (They have to differ a little bit on
surprise-remove, as it’s not possible to know at what point access to the
hardware truly disappears.)

  • Jake Oshins
    Windows Kernel Team

(*) As a separate discussion, we could talk at great length about whether
it would have been a good idea to do rebalances frequently from the outset
in Windows 2000 and follow-on systems. Suffice it to say that we tried that
and we discovered that there are a tremendous number of chipset-embedded
devices which were designed by people who thought that chipset devices were
excluded from many of the rules in the PCI spec. Thus we often couldn’t
know if a rebalance would result in a functioning system. Going to great
pains to trust the values that the BIOS put into the devices at startup
allowed BIOS guys to wallpaper over chipset bugs and machines ran better.
Thus we only do rebalance when a device is hot-plugged and we have to
disturb the BIOS’s settings. Those chipset bugs may be a thing of the past
now, though we have little way of knowing.

wrote in message news:xxxxx@ntdev…
> Hi Dave,
>
> Davis Walker wrote:
>
>> I can answer that question for you, or at least a form of it
> …
>> Hope that at least partially clears up the situation.
>
> Thank you very much for your reply. Yes, I think you cleared up things
> very well.
>
> So if I understand correctly, the rule should be: “Drivers should not
> power-down the device on STOP, but yet drivers must be prepared for the
> possibility that the device might loss power”.
>
> I anticipated something like this for a moment, but then I checked this at
> the hardware level and I couldn’t find a real need for any power-down. I
> didn’t consider it could be related to compatibility reasons at the ACPI
> driver level.
>
> I must say that I now think the KDMF behavior for a function driver seems
> to be perfect. It follows all the steps needed in anticipation for an
> eventual possible power-down.
>
> The only minor issue would be for a bus driver. The framework should,
> IMHO, somehow avoid that the EvtDeviceD0Exit Pdo callback on STOP would
> actually power-down the device. Then the behavior would be consistent with
> the in-box bus drivers.
>
>

wrote in message news:xxxxx@ntdev…
> Hi Peter,
>

(snip)

>
> What I am asking is: “Should I expect that future versions of the in-box
> PCI bus driver will power-down my device on STOP automatically?”.
> That seems an odd idea to most people here, but that is the KMDF
> model. Again, a KMDF bus driver would do exactly that. It would
> power-down the device on STOP. And it would do it without asking
> the function driver.
>
> I am not worried too much about the KMDF behavior. Right or
> wrong, the practical implications are not too important. They are not
> too important because I can use classic WDM for my function driver
> if I need to. And because nobody is actually going to write a third
> party PCI bus driver using KMDF.
>
> So my main concern is if the KMDF model would be applied to the
> whole Pnp system. Then indeed a future PCI driver might implement
> the same behavior. I guess nobody here knows the answer. And I
> would guess that not even the Pnp devs know for sure. They might
> know what is technically better, but the compatibility implications
> might be more important than anything else.
>

I feel like I know the answer. We have no intention of changing the WDM
contract, as that would break some of the roughly 280,000 device drivers.
(That was the last count of live, in-the-field, drivers that I heard.)

> Then I am making two technical comments. One is that I
> believe that a STOP should not be considered an automatic/
> implicit power-down. This should be decided by the function
> driver and Power Policy Owner, not by the bus driver or by
> the Pnp model. Currently, this is the actual behavior of the
> PCI bus driver.
>
> The other comment is that a subsequent Pnp START should
> not be considered an implicit power-up. I don’t see any sense
> whatsoever in powering-up a device (if it was powered down
> for idle reasons), just for the purpose of rebalance. This is
> perhaps the most important comment, because everybody
> (including in-box function and bus drivers) do assume an
> implicit power-up on every START, not just the initial one.
>

If a device is powered down, but its interrupt is still enabled (though
probably masked) then we have to power the device back up to allow the
driver to disable that interrupt in the device hardware. This is why KMDF
transitions back to D0 when a device has idled out to a low power state and
then a new PnP state transition occurs. The situation gets even more
complicated when the device has a wake signal that is armed. Subsequent
events (during the rebalance, after uninstallation, whatever…) could cause
a PME interrupt or a device interrupt, neither of which can be handled
because the driver isn’t in place and ready to manipulate the device. So,
again to simply KMDF, we just go through the entire power up sequence, then
tell the driver to disable and wake signals and interrupts, and then go
through the power down sequence.

Again, this behavior has nothing to do with a WDM driver written to run on
top of pci.sys. There’s no KMDF in this picture. The only thing you have to
do is worry about the motherboard telling ACPI to power your device off when
it is shut down, which has nothing to do with idle behavior.

- Jake Oshins
Windows Kernel Team

Jake Oshins wrote:

KMDF was built to be a completely generic implementation of a device driver,
one that can be dropped into any bus and it will function.

I understand and I’m sure we all agree with that approach.

I had another goal, by the way, when I layed out this part of KMDF. I knew
that we rarely executed the rebalance code(*) and I wanted to be sure that
every KMDF driver automatically just worked when a rebalance came along. So
we modeled it as a normal partial tear-down so that the callback functions
invoked would be ones that were well tested.

That was a very smart idea!

We have no intention of changing the WDM contract,

I know, but you must agree with me that at least in this specific case, the WDK is so ambiguous that it is almost impossible to deduce what actually the contract is.

> The other comment is that a *subsequent* Pnp START should
> not be considered an implicit power-up. I don’t see any sense
> whatsoever in powering-up a device (if it was powered down
> for idle reasons), just for the purpose of rebalance.

If a device is powered down, but its interrupt is still enabled (though
probably masked) then we have to power the device back up to allow the
driver to disable that interrupt in the device hardware. This is why KMDF
transitions back to D0 when a device has idled out to a low power state and
then a new PnP state transition occurs. The situation gets even more
complicated when the device has a wake signal that is armed. Subsequent
events (during the rebalance, after uninstallation, whatever…) could cause
a PME interrupt or a device interrupt, neither of which can be handled
because the driver isn’t in place and ready to manipulate the device.

I’m not sure I follow your logic here. That is, I understand the KMDF approach. But I still don’t understand why pci.sys must put the device in D0 after a *second* START, except for possible compatibility reasons. Just in case I wasn’t clear, I am not talking here about the KDMF behavior on STOP and other Pnp state changes. I am talking about the PCI.SYS behavior on START (on any START following a STOP, not on the initial START one).

If the device was in D3 state before the STOP, then interrupts are usually already disabled. Either because the hardware does it automatically, or either because the driver did it manually. Otherwise, and if there is any risk of the hardware generating an interrupt, then the driver would indeed need to power-up to disable them. But in the worst case this would need to be done on STOP, not on START.

If wake-up was armed then, depending on the case, it might need to be disarmed and re-armed with the inherent power transitions. The driver would need to handle this.

And I mentioned idle, because that’s the only possibility I see for the device being powered down when receiving a STOP. So the pci.sys behavior on a subsequent START, is relevant only when the device was idle on STOP.

I didn’t insist with this issue when replying to Dave, because even when he didn’t answered this, I understood from his post that compatibility is an utmost goal. And changing the pci.sys behavior on START, might be a bit risky. To be honest, I suspect very few drivers if at all would break by this change alone. Most drivers that will break in rebalance, they probably would break anyway. But I understand you won’t take chances, specially not when KDMF is doing all those power transitions anyway.

Thanks for your post and all the “insider and historical” information. Was a very interesting read.

Btw, I understand you are “hal-man”. I have a good laugh the other day tracing kernel timers on one of my old machines, and finding it uses a function named … “HalAcpiBrokenPiiX” :slight_smile:

>> The other comment is that a *subsequent* Pnp START should

> not be considered an implicit power-up.

Replying to myself.

I realized this whole thing might be a non-issue considering the new information that Dave provided. If ACPI/Bios might physically power cycle the card, then it might come back in D0 no matter what. And it wouldn’t make much sense for pci.sys to do a further power cycle putting the card back in the previous Dx state.

In a nutshell, I would say I was wrong on all my points except about the poor documentation, and about the (non KMDF) samples not being ready for a rebalance. I learnt quite some stuff here, thanks :slight_smile:

wrote in message news:xxxxx@ntdev…

> Btw, I understand you are “hal-man”. I have a good laugh the other day
> tracing kernel timers on one of my old machines, and finding it uses a
> function named … “HalAcpiBrokenPiiX” :slight_smile:
>
>

I believe the function name (from memory) is actually HalAcpiBrokenPiix4.
And, yes, I wrote the early versions of it. The PIIX4 ACPI timer
occasionally returns garbage. I actually once spent an entire six months
working around bugs in the 440BX/PIIX4 chipset platform.

- Jake