about BUS-RESET of SCSI miniport

Hi, everyone

i wrote a SCSI miniport driver to redirect all disk I/O to remote storage server, and i found that the Port driver above me pass down a SRB with a timeout of 10 seconds. so, if i fail to complete the SRB in 10 seconds, then a BUS RESET occurs, and the driver (and the whole system) hang up.

who can tell me how can i avoid BUS RESET ? may i enlarge the 10 seconds timeout limit, e.g., to 1 minute or more? or, how can i recover from BUS RESET state?

thanks in advance.


DO YOU YAHOO!?
ÑÅ»¢Ãâ·ÑGÓÊÏ䣭ÖйúµÚÒ»¾øÎÞÀ¬»øÓʼþɧÈų¬´óÓÊÏä

If you’re talking about a Disk device, the timeout value can be changed by entering a DWORD value HKEY_LOCAL_MACHINE\CurrentControlSet\Services\Disk. The value name is TimeOutValue. The default is 10 seconds. This value gets multiplied by the number of 64k segments or parts thereof in the transfer - so if you’re transferring 129k, it would be 30 seconds (default). The value affects disk Read/Write commands, except possibly those sent via SCSI_PASS_THROUGH, for ALL disks in the system unfortunately. (There are some Fibre Channel HBAs that change this value to 60 seconds (!). That means a 1 meg request will time out in 16 MUNUTES!)

You can avoid the bus reset by timing out the command in your miniport before the TimeOutValue specified in the SRB, aborting the command and setting SrbStatus to SRB_STATUS_TIMEOUT.
Jerry.
xxxxx@lists.osr.com wrote: -----

>To: "Windows System Software Devs Interest List"
>
>From: identifier scorpio
>Sent by: xxxxx@lists.osr.com
>Date: 06/03/2005 10:50PM
>cc: xxxxx@lists.osr.com
>Subject: [ntdev] about BUS-RESET of SCSI miniport
>
>
>Hi, everyone
>
>i wrote a SCSI miniport driver to redirect all disk I/O to remote
>storage server, and i found that the Port driver above me pass down a
>SRB with a timeout of 10 seconds. so, if i fail to complete the SRB
>in 10 seconds, then a BUS RESET occurs, and the driver (and the whole
>system) hang up.
>
>who can tell me how can i avoid BUS RESET ? may i enlarge the 10
>seconds timeout limit, e.g., to 1 minute or more? or, how can i
>recover from BUS RESET state?
>
>thanks in advance.
>
>
>DO YOU YAHOO!?
>?Ż?????G???䣭?й???һ?????????ʼ?ɧ?ų??????? --- Questions? First check the Kernel Driver
>FAQ at http://www.osronline.com/article.cfm?id=256 You are currently
>subscribed to ntdev as: xxxxx@attotech.com To unsubscribe send a
>blank email to xxxxx@lists.osr.com

1. The Bus Reset operation should cause your Miniport to terminate all outstanding requests to be completed with SrbStatus set to SRB_STATUS_BUS_RESET. Apparently from what you are saying, your driver is not doing this and once the bus reset is invoked you are never completing the outstanding requests.

(The DDK documentation says you can call ScsiPortCompleteRequest to complete all outstanding requests. Our drivers never do this - they always call ScsiPortNotification for each ourstanding request, so I can't comment on the ScsiPortCompleteRequest method.)
2. I think you still need to support HwScsiResetBus. What happens if for some reason requests are lost and don't complete even in the 30 seconds you have set? Then the system will invoke your ResetBus function and if there is a problem with it, you are going to see this all over again.
3. Minor clarification on TimeOutValue - When doing a bus scan, ScsiPort sends Inquiry commands with a fixed TimeOutValue of 5 seconds regardless of device type. Also, as far as I know, the registry entry only applies to disk Read/Write commands.
Jerry.
xxxxx@lists.osr.com wrote: -----

>To: "Windows System Software Devs Interest List"
>
>From: identifier scorpio
>Sent by: xxxxx@lists.osr.com
>Date: 06/05/2005 05:42AM
>Subject: [ntdev] ?ظ??? Re: [ntdev] about BUS-RESET of SCSI miniport
>
>
>thanks for jerry's answer and it works.
>
>As you told me, i add a TimeOutValue of 30 seconds to
>HKLM\System\CurrentControl\Services\Disk and
>found that all SRBs SCSIport passed to miniport got a timeout of 30
>seconds.
>Also, i found there was an error in my previous question. That is,
>the system's
>hang-up is not due to BUS RESET, but the limit of maximum outstanding
>SRBs in MS SCSIport.
>At least in my system, if the storage server fails to respond to a
>disk I/O request,
>(maybe because request packet is lost or storage server is busy), the
>disk I/O request will be blocked
>and the SRB stay outstanding (not completed). But the SCSIport may
>continue to queue SRBs to miniport,
>which makes the number of outstanding SRBs to increase, eventually,
>the number will exceed 254
>(the maximum outstanding SRBs that MS SCSIport supports), and the
>system hang-up occurs.
>To my opinion, if the blocked disk I/O time out, the phenomenon of
>BUS RESET occurs, but the real
>reason that hang up the system is that the SRBs queue reaches its 254
>limit.
>So, if the storage server eventually manages to satisfy the blocked
>disk I/O request (maybe the miniport
>re-transmit it's request, or the storage server become idle), the
>system will recover from hang-up state.
>Perhaps i need not handle bus reset at all, as long as the server can
>satisfy every disk I/O request.
>any comment?
>
>xxxxx@attotech.com д????
>If you're talking about a Disk device, the timeout value can be
>changed by entering a DWORD value
>HKEY_LOCAL_MACHINE\CurrentControlSet\Services\Disk. The value name
>is TimeOutValue. The default is 10 seconds. This value gets
>multiplied by the number of 64k segments or parts thereof in the
>transfer - so if you're transferring 129k, it would be 30 seconds
>(default). The value affects disk Read/Write commands, except
>possibly those sent via SCSI_PASS_THROUGH, for ALL disks in the
>system unfortunately. (There are some Fibre Channel HBAs
>that change this value to 60 seconds (!). That means a 1 meg request
>will time out in 16 MUNUTES!)
>
>You can avoid the bus reset by timing out the command in your
>miniport before the TimeOutValue specified in the SRB, aborting the
>command and setting SrbStatus to SRB_STATUS_TIMEOUT.
>
>Jerry.
>
>
>xxxxx@lists.osr.com wrote: -----
>
>>To: "Windows System Software Devs Interest List"
>>
>>From: identifier scorpio
>>Sent by: xxxxx@lists.osr.com
>>Date: 06/03/2005 10:50PM
>>cc: xxxxx@lists.osr.com
>>Subject: [ntdev] about BUS-RESET of SCSI miniport
>>
>>
>>Hi, everyone
>>
>>i wrote a SCSI miniport driver to redirect all disk I/O to remote
>>storage server, and i found that the Port driver above me pass down
>a
>>SRB with a timeout of 10 seconds. so, if i fail to complete the SRB
>
>>in 10 seconds, then a BUS RESET occurs, and the driver (and the
>whole
>>system) hang up.
>>
>>who can tell me how can i avoid BUS RESET ? may i enlarge the 10
>>seconds timeout limit, e.g., to 1 minute or more? or, how can i
>>recover from BUS RESET state?
>>
>>thanks in advance.
>>
>>
>>DO YOU YAHOO!?
>>?Ż?????G???䣭?й???һ?????????ʼ?ɧ?ų??????? --- Questions? First check the Kernel
>Driver
>>FAQ at http://www.osronline.com/article.cfm?id=256 You are currently
>
>>subscribed to ntdev as: xxxxx@attotech.com To unsubscribe send a
>>blank email to xxxxx@lists.osr.com --- Questions? First
>check the Kernel Driver FAQ at
>http://www.osronline.com/article.cfm?id=256 You are currently
>subscribed to ntdev as: unknown lmsubst tag argument: '' To
>unsubscribe send a blank email to xxxxx@lists.osr.com
> __________________________________________________
>?Ͽ?ע???Ż??????????????????
>http://cn.mail.yahoo.com --- Questions? First check the Kernel Driver
>FAQ at http://www.osronline.com/article.cfm?id=256 You are currently
>subscribed to ntdev as: xxxxx@attotech.com To unsubscribe send a
>blank email to xxxxx@lists.osr.com

Jerry is correct about the handling of outstanding requests. You must always handle bus reset in your miniport and complete each request. The safest approach is using ScsiPortNotification - this will work with Storport as well (i.e., StorPortNotification). On Storport, StorPortCompleteRequest should never be called to complete all requests due to deserialization of StartIo and ISRs (full duplex model). Storport also requires support for LUN Reset and Target Reset, neither of which are implemented by SCSIport (but should be used by a miniport to remove the commands from each LUN/target). In all cases, you must actually make sure the I/O does not come back after completion.

SCSI miniports will see HwScsiResetBus on cluster systems in the event of a node failure as well as missing I/Os. This may not be relevant to your particular adapter.

WRT #3, for Server 2003 we changed SCSIport to always look at TimeOutValue for all commands other than passthroughs which set their own timeout values. So an INQUIRY, READ CAPACITY, REPORT LUNS, MODE SENSE, etc. will now use this value as the default; Win2K used various timeout values for internally generated commands (generally in the 2-4 second range).

In general, the expectation is normal I/Os complete promptly (millisecond range). The long timeout values are to facilitate recovery from fabric/network related events.


From: xxxxx@attotech.com [mailto:xxxxx@attotech.com]
Sent: Sunday, June 05, 2005 9:49 AM
Subject: Re: about BUS-RESET of SCSI miniport

  1. The Bus Reset operation should cause your Miniport to terminate all outstanding requests to be completed with SrbStatus set to SRB_STATUS_BUS_RESET. Apparently from what you are saying, your driver is not doing this and once the bus reset is invoked you are never completing the outstanding requests.

(The DDK documentation says you can call ScsiPortCompleteRequest to complete all outstanding requests. Our drivers never do this - they always call ScsiPortNotification for each ourstanding request, so I can’t comment on the ScsiPortCompleteRequest method.)

  1. I think you still need to support HwScsiResetBus. What happens if for some reason requests are lost and don’t complete even in the 30 seconds you have set? Then the system will invoke your ResetBus function and if there is a problem with it, you are going to see this all over again.

  2. Minor clarification on TimeOutValue - When doing a bus scan, ScsiPort sends Inquiry commands with a fixed TimeOutValue of 5 seconds regardless of device type. Also, as far as I know, the registry entry only applies to disk Read/Write commands.

Jerry.

xxxxx@lists.osr.com wrote: -----

To: “Windows System Software Devs Interest List”

From: identifier scorpio
Sent by: xxxxx@lists.osr.com
Date: 06/05/2005 05:42AM
Subject: [ntdev] ?ظ??? Re: [ntdev] about BUS-RESET of SCSI miniport

thanks for jerry’s answer and it works.

As you told me, i add a TimeOutValue of 30 seconds to
HKLM\System\CurrentControl\Services\Disk and
found that all SRBs SCSIport passed to miniport got a timeout of 30
seconds.
Also, i found there was an error in my previous question. That is,
the system’s
hang-up is not due to BUS RESET, but the limit of maximum outstanding
SRBs in MS SCSIport.
At least in my system, if the storage server fails to respond to a
disk I/O request,
(maybe because request packet is lost or storage server is busy), the
disk I/O request will be blocked
and the SRB stay outstanding (not completed). But the SCSIport may
continue to queue SRBs to miniport,
which makes the number of outstanding SRBs to increase, eventually,
the number will exceed 254
(the maximum outstanding SRBs that MS SCSIport supports), and the
system hang-up occurs.
To my opinion, if the blocked disk I/O time out, the phenomenon of
BUS RESET occurs, but the real
reason that hang up the system is that the SRBs queue reaches its 254
limit.
So, if the storage server eventually manages to satisfy the blocked
disk I/O request (maybe the miniport
re-transmit it’s request, or the storage server become idle), the
system will recover from hang-up state.
Perhaps i need not handle bus reset at all, as long as the server can
satisfy every disk I/O request.
any comment?

xxxxx@attotech.com д???
If you’re talking about a Disk device, the timeout value can be
changed by entering a DWORD value
HKEY_LOCAL_MACHINE\CurrentControlSet\Services\Disk. The value name
is TimeOutValue. The default is 10 seconds. This value gets
multiplied by the number of 64k segments or parts thereof in the
transfer - so if you’re transferring 129k, it would be 30 seconds
(default). The value affects disk Read/Write commands, except
possibly those sent via SCSI_PASS_THROUGH, for ALL disks in the
system unfortunately. (There are some Fibre Channel HBAs
that change this value to 60 seconds (!). That means a 1 meg request
will time out in 16 MUNUTES!)

You can avoid the bus reset by timing out the command in your
miniport before the TimeOutValue specified in the SRB, aborting the
command and setting SrbStatus to SRB_STATUS_TIMEOUT.

Jerry.

xxxxx@lists.osr.com wrote: -----

>To: “Windows System Software Devs Interest List”
>
>From: identifier scorpio
>Sent by: xxxxx@lists.osr.com
>Date: 06/03/2005 10:50PM
>cc: xxxxx@lists.osr.com
>Subject: [ntdev] about BUS-RESET of SCSI miniport
>
>
>Hi, everyone
>
>i wrote a SCSI miniport driver to redirect all disk I/O to remote
>storage server, and i found that the Port driver above me pass down
a
>SRB with a timeout of 10 seconds. so, if i fail to complete the SRB

>in 10 seconds, then a BUS RESET occurs, and the driver (and the
whole
>system) hang up.
>
>who can tell me how can i avoid BUS RESET ? may i enlarge the 10
>seconds timeout limit, e.g., to 1 minute or more? or, how can i
>recover from BUS RESET state?
>
>thanks in advance.
>
>
>DO YOU YAHOO!?
>?Ż???G???䣭?й???һ???ʼ?ɧ?ų??? — Questions? First check the Kernel
Driver
>FAQ at http://www.osronline.com/article.cfm?id=256 You are currently

>subscribed to ntdev as: xxxxx@attotech.com To unsubscribe send a
>blank email to xxxxx@lists.osr.com — Questions? First
check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256 You are currently
subscribed to ntdev as: unknown lmsubst tag argument: ‘’ To
unsubscribe send a blank email to xxxxx@lists.osr.com


?Ͽ?ע???Ż???
http://cn.mail.yahoo.com — Questions? First check the Kernel Driver
FAQ at http://www.osronline.com/article.cfm?id=256 You are currently
subscribed to ntdev as: xxxxx@attotech.com To unsubscribe send a
blank email to xxxxx@lists.osr.com