Shutdown ordering and parallelism

I’ve been debugging system shutdown issues on unified I/O fabric drivers
some more and a few questions have come up.

Doran has said the drivers stack can be powered down in any order, and there
is no way to control this. The ONLY ordering is based on PnP bus
parent/child relationships. So, the question is: how does the software iSCSI
initiator (a storage port/miniport) assure the TCP stack+NIC does not power
down BEFORE the disk buffers have been flushed. The TCP stack is NOT in the
PnP tree of the iSCSI stack, therefore should have no ordering. This is also
similar to ftdisk, which is root enumerated and makes PDO’s than have the
volsnap function driver, with file systems layered on top. The whole upper
level storage stack is NOT in the PnP tree of the lower level storage
stacks, which have things like disk as the function driver on the PDO’s
created by storage bus enumeration, yet the systems seems to know not to
shutdown stacks that are used by someone else?

Is it possible the file systems will ALL be flushed and unmounted BEFORE ANY
device stack is powered down, so volumes will not be corrupted? This way, it
will not be so bad if the iSCSI driver is still active after shut down of
the TCP stack, although I assume iSCSI has some sort of session shutdown
protocol messages which may get lost.

Doran said it’s indeterminate if stacks will be powered down in parallel or
serially, and I was looking for a way to test these cases. It offhand looks
like on a single processor system, there may only be a single shutdown
thread, so all shutdown irp processing is serialized? Is this valid, or will
SMP systems sometimes have a single thread, and UP systems sometimes have
multiple threads? Having a shutdown thread per cpu makes sense, if you
believe the power irps will not block, and you want to maximize shutdown
performance. It seems like on UP systems, my I/O fabric sometimes gets stuck
during shutdown, because it blocks the power irp thread, waiting for it’s
clients to shutdown first, which will never happen as there is only a single
shutdown thread. Sometimes I can reproduce this hang 10 times in row;
sometimes it doesn’t happen for days.

Thanks!

  • Jan

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-243680-
xxxxx@lists.osr.com] On Behalf Of Doron Holan
Sent: Monday, March 13, 2006 12:46 AM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] How to Catch Shutdown notification in Volume filter
driver

Yes, there is no guaranteed order between stacks. As for ftdisk, I can
ask around.

d

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Jan Bottorff
Sent: Sunday, March 12, 2006 7:52 PM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] How to Catch Shutdown notification in Volume filter
driver

It sounds like your saying that stacks MAY be powered down in parallel,
in
any order, and they MAY be serialized using a single power irp, and I
need
to assure I can handle it both ways.

That brings up the question, how does the system know it should shutdown
the
low level disk stack AFTER it has shut down the higher level volume
stack
(which I believe starts at \root\ftdisk). The higher level stack is NOT
in
the PnP tree of the disk drivers or the disk controller; it’s attached
by
having a reference to the top of the lower level disk stack. There must
be
SOMETHING that causes the volumes to get flushed and unmounted BEFORE
the
low level disk stacks are shutdown? From what you saying I’m assuming
the
low level PartMgr driver does not delay the power shutdown irp, waiting
for
ftdisk to close open handles to the physical disk stacks.

  • Jan

> There is no ordering guarantees between stacks for power (or pnp for
> that matter). You will have to move to the parent/child relationship
in
> your stack if you want any ordering guarantees, but even then, there
is
> no ordering between children of the same parent, only that all
children
> will be powered down by the time the parent gets the power down
> notification.
>
> Blocking an S irp is very bad. Under low resources, the OS could only
> have one S irp and you will affectively freeze the machine.

I asked around, at least about ftdisk and volume manager and the answer
is that they do a lot of nasty & undocumented operations to coordinate
the 2 stacks. I can ask about iSCSI.

d

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Jan Bottorff
Sent: Friday, March 24, 2006 8:39 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Shutdown ordering and parallelism

I’ve been debugging system shutdown issues on unified I/O fabric drivers
some more and a few questions have come up.

Doran has said the drivers stack can be powered down in any order, and
there
is no way to control this. The ONLY ordering is based on PnP bus
parent/child relationships. So, the question is: how does the software
iSCSI
initiator (a storage port/miniport) assure the TCP stack+NIC does not
power
down BEFORE the disk buffers have been flushed. The TCP stack is NOT in
the
PnP tree of the iSCSI stack, therefore should have no ordering. This is
also
similar to ftdisk, which is root enumerated and makes PDO’s than have
the
volsnap function driver, with file systems layered on top. The whole
upper
level storage stack is NOT in the PnP tree of the lower level storage
stacks, which have things like disk as the function driver on the PDO’s
created by storage bus enumeration, yet the systems seems to know not to
shutdown stacks that are used by someone else?

Is it possible the file systems will ALL be flushed and unmounted BEFORE
ANY
device stack is powered down, so volumes will not be corrupted? This
way, it
will not be so bad if the iSCSI driver is still active after shut down
of
the TCP stack, although I assume iSCSI has some sort of session shutdown
protocol messages which may get lost.

Doran said it’s indeterminate if stacks will be powered down in parallel
or
serially, and I was looking for a way to test these cases. It offhand
looks
like on a single processor system, there may only be a single shutdown
thread, so all shutdown irp processing is serialized? Is this valid, or
will
SMP systems sometimes have a single thread, and UP systems sometimes
have
multiple threads? Having a shutdown thread per cpu makes sense, if you
believe the power irps will not block, and you want to maximize shutdown
performance. It seems like on UP systems, my I/O fabric sometimes gets
stuck
during shutdown, because it blocks the power irp thread, waiting for
it’s
clients to shutdown first, which will never happen as there is only a
single
shutdown thread. Sometimes I can reproduce this hang 10 times in row;
sometimes it doesn’t happen for days.

Thanks!

  • Jan

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-243680-
xxxxx@lists.osr.com] On Behalf Of Doron Holan
Sent: Monday, March 13, 2006 12:46 AM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] How to Catch Shutdown notification in Volume
filter
driver

Yes, there is no guaranteed order between stacks. As for ftdisk, I
can
ask around.

d

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Jan Bottorff
Sent: Sunday, March 12, 2006 7:52 PM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] How to Catch Shutdown notification in Volume
filter
driver

It sounds like your saying that stacks MAY be powered down in
parallel,
in
any order, and they MAY be serialized using a single power irp, and I
need
to assure I can handle it both ways.

That brings up the question, how does the system know it should
shutdown
the
low level disk stack AFTER it has shut down the higher level volume
stack
(which I believe starts at \root\ftdisk). The higher level stack is
NOT
in
the PnP tree of the disk drivers or the disk controller; it’s attached
by
having a reference to the top of the lower level disk stack. There
must
be
SOMETHING that causes the volumes to get flushed and unmounted BEFORE
the
low level disk stacks are shutdown? From what you saying I’m assuming
the
low level PartMgr driver does not delay the power shutdown irp,
waiting
for
ftdisk to close open handles to the physical disk stacks.

  • Jan

> There is no ordering guarantees between stacks for power (or pnp for
> that matter). You will have to move to the parent/child
relationship
in
> your stack if you want any ordering guarantees, but even then, there
is
> no ordering between children of the same parent, only that all
children
> will be powered down by the time the parent gets the power down
> notification.
>
> Blocking an S irp is very bad. Under low resources, the OS could
only
> have one S irp and you will affectively freeze the machine.


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Before a system power state change occurs, all file systems are flushed.

-p

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Jan Bottorff
Sent: Friday, March 24, 2006 8:39 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Shutdown ordering and parallelism

I’ve been debugging system shutdown issues on unified I/O fabric drivers
some more and a few questions have come up.

Doran has said the drivers stack can be powered down in any order, and
there is no way to control this. The ONLY ordering is based on PnP bus
parent/child relationships. So, the question is: how does the software
iSCSI initiator (a storage port/miniport) assure the TCP stack+NIC does
not power down BEFORE the disk buffers have been flushed. The TCP stack
is NOT in the PnP tree of the iSCSI stack, therefore should have no
ordering. This is also similar to ftdisk, which is root enumerated and
makes PDO’s than have the volsnap function driver, with file systems
layered on top. The whole upper level storage stack is NOT in the PnP
tree of the lower level storage stacks, which have things like disk as
the function driver on the PDO’s created by storage bus enumeration, yet
the systems seems to know not to shutdown stacks that are used by
someone else?

Is it possible the file systems will ALL be flushed and unmounted BEFORE
ANY device stack is powered down, so volumes will not be corrupted? This
way, it will not be so bad if the iSCSI driver is still active after
shut down of the TCP stack, although I assume iSCSI has some sort of
session shutdown protocol messages which may get lost.

Doran said it’s indeterminate if stacks will be powered down in parallel
or serially, and I was looking for a way to test these cases. It offhand
looks like on a single processor system, there may only be a single
shutdown thread, so all shutdown irp processing is serialized? Is this
valid, or will SMP systems sometimes have a single thread, and UP
systems sometimes have multiple threads? Having a shutdown thread per
cpu makes sense, if you believe the power irps will not block, and you
want to maximize shutdown performance. It seems like on UP systems, my
I/O fabric sometimes gets stuck during shutdown, because it blocks the
power irp thread, waiting for it’s clients to shutdown first, which will
never happen as there is only a single shutdown thread. Sometimes I can
reproduce this hang 10 times in row; sometimes it doesn’t happen for
days.

Thanks!

  • Jan

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-243680-
xxxxx@lists.osr.com] On Behalf Of Doron Holan
Sent: Monday, March 13, 2006 12:46 AM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] How to Catch Shutdown notification in Volume
filter driver

Yes, there is no guaranteed order between stacks. As for ftdisk, I
can ask around.

d

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Jan Bottorff
Sent: Sunday, March 12, 2006 7:52 PM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] How to Catch Shutdown notification in Volume
filter driver

It sounds like your saying that stacks MAY be powered down in
parallel, in any order, and they MAY be serialized using a single
power irp, and I need to assure I can handle it both ways.

That brings up the question, how does the system know it should
shutdown the low level disk stack AFTER it has shut down the higher
level volume stack (which I believe starts at \root\ftdisk). The
higher level stack is NOT in the PnP tree of the disk drivers or the
disk controller; it’s attached by having a reference to the top of the

lower level disk stack. There must be SOMETHING that causes the
volumes to get flushed and unmounted BEFORE the low level disk stacks
are shutdown? From what you saying I’m assuming the low level PartMgr
driver does not delay the power shutdown irp, waiting for ftdisk to
close open handles to the physical disk stacks.

  • Jan

> There is no ordering guarantees between stacks for power (or pnp for

> that matter). You will have to move to the parent/child
> relationship
in
> your stack if you want any ordering guarantees, but even then, there
is
> no ordering between children of the same parent, only that all
children
> will be powered down by the time the parent gets the power down
> notification.
>
> Blocking an S irp is very bad. Under low resources, the OS could
> only have one S irp and you will affectively freeze the machine.


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

> performance. It seems like on UP systems, my I/O fabric sometimes gets

stuck
during shutdown, because it blocks the power irp thread, waiting for it’s
clients to shutdown first, which will never happen as there is only a
single
shutdown thread. Sometimes I can reproduce this hang 10 times in row;
sometimes it doesn’t happen for days.

According to Oney’s book (page 432): power IRPs come to you in the context
of a system thread that you must not block. Blocking could lead to
deadlocks. You have to do everything with completion routines.

Kind regards,
Bruno van Dooren
xxxxx@hotmail.com
Remove only “_nos_pam”

> Doran has said the drivers stack can be powered down in any order, and there

is no way to control this. The ONLY ordering is based on PnP bus
parent/child relationships. So, the question is: how does the software iSCSI
initiator (a storage port/miniport) assure the TCP stack+NIC does not power
down BEFORE the disk buffers have been flushed. The TCP stack is NOT in
the
PnP tree of the iSCSI stack, therefore should have no ordering. This is also

TdiRegisterPnPHandlers is one of the possible solutions. It allows to register
a handler to be executed from the NIC’s MJ_POWER path.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com