SCSI chips performance

Hi all,

the DDK has 2 samples - AHA174x and NCR53C9X.

According to the source code:

  • the Adaptec chip has a hardware FIFO of SCSI commands, drives the bus in
    the hardware, but does not support tagged queueing.
  • the NCR/Emulex chip is more stupid, cannot drive the bus without the aid
    of the software (SCSI bus changes are reported via interrupts), but supports
    tagged queueing.

Is there any data on what approach is better and what chip is faster?

Max

If you are talking to disks, and the disks are seeking, then
the interrupts, etc. are noise compared to seek scheduling.

The disks can do a better job of seek scheduling that
you can, but only if they have multiple requests outstanding,
i.e., tagged queuing,

-DH

----- Original Message -----
From: “Maxim S. Shatskih”
To: “NT Developers Interest List”
Sent: Monday, October 30, 2000 6:29 PM
Subject: [ntdev] SCSI chips performance

Hi all,

the DDK has 2 samples - AHA174x and NCR53C9X.

According to the source code:
- the Adaptec chip has a hardware FIFO of SCSI commands, drives the bus in
the hardware, but does not support tagged queueing.
- the NCR/Emulex chip is more stupid, cannot drive the bus without the aid
of the software (SCSI bus changes are reported via interrupts), but supports
tagged queueing.

Is there any data on what approach is better and what chip is faster?

Max


You are currently subscribed to ntdev as: xxxxx@syssoftsol.com
To unsubscribe send a blank email to $subst(‘Email.Unsub’)

> If you are talking to disks, and the disks are seeking, then

the interrupts, etc. are noise compared to seek scheduling.

The disks can do a better job of seek scheduling that
you can, but only if they have multiple requests outstanding,
i.e., tagged queuing,

AHA174x miniport does not use the Srb->QueueTag to any hardware activity.
Does it mean that this card is inferior?

Max

There are “queued” commands and there are “tagged queued” commands.
Now, I strongly suggest double-checking anything I say here, coz I’m a bit rusty
on this stuff.

NT/SCSI makes use of queued commands with pretty much all SCSI controllers (a
prerequisite is that disconnections have to be enabled for the relevant devices of
course).
Tagged-queued commands are very rarely used feature where you can group a bunch of
queued commands so that they’re completed in a certain order (like grouping sets
of commands together). I think this can also be done to prioritize one group of
commands over another. Since it is possible that a SCSI device can complete
commands in a different order than which they were sent, I believe it modifies
this behavior. I don’t think any version of NT uses tagged-queued commands at
all. It can be confusing though since “tagged-queued” commands and “queued”
commands are used interchangeably since a bunch of commands all with the same
“tag” is the same as no tag at all.

The LSI chipsets (who now own the NCR/Symbios SCSI controllers) are the nicest
IMHO. They have very intelligent controllers able to support “scripts” in
hardware that the drivers download. They’re really off-board CPUs in their own
right. The 64bit U3 controllers such as made by www.tekram.com are a good
implementation of such a chipset. It used to be common that a CPU was used to
handle SCSI bus phases, and the SCSI controller chip was really just a set of
intelligent line-drivers. I’ve written microcode to handle such phase transitions
in Z80. Thank goodness that SCSI controllers have come a long way since then.
The example code on the DDK for SCSI miniport drivers is woefully inadequate –
most (all?) are for ISA/EISA implementations, and therefore they are really not
representative of typical SCSI drivers that are in use today. Anyway, I think the
overhead of a few extra interrupts per SCSI I/O was negligible back in the day
when sustaining 3MB/sec was considered fast.

Regards,

Paul Bunn, UltraBac.com, 425-644-6000
Microsoft MVP - WindowsNT/2000
http://www.ultrabac.com

-----Original Message-----
From: Maxim S. Shatskih [mailto:xxxxx@storagecraft.com]
Sent: Monday, October 30, 2000 3:29 PM
To: NT Developers Interest List
Subject: [ntdev] SCSI chips performance

Hi all,

the DDK has 2 samples - AHA174x and NCR53C9X.

According to the source code:

  • the Adaptec chip has a hardware FIFO of SCSI commands, drives the bus in
    the hardware, but does not support tagged queueing.
  • the NCR/Emulex chip is more stupid, cannot drive the bus without the aid
    of the software (SCSI bus changes are reported via interrupts), but supports
    tagged queueing.

Is there any data on what approach is better and what chip is faster?

Er, not quite. Tagged queueing does not in general specify that
things complete in order. What the tags say is “complete other
outstanding operations before doing this one” or “reorder this
as you please”…for the common ones. They may be rarely used
in NT; VMS uses them when the device allows.

The value of using tagged queueing is where you have some error
condition or cluster state transition where it becomes necessary
to ensure that you don’t send a new write to a device before all
write commands that are queued inside the device finish.

That said, it should also be mentioned that in practice firmware
writers seem to not understand tagged queueing in many cases, and
have been known to write firmware in which all tags are handled
the same, as “reorder at will”. There has to be a tag value that
identifies the I/O operation inside the device; I believe the legal
range is 0-255. The modifiers are part of the same functional suite
however, and there is one bit in the IDENTIFY message which indicates
whether the device supports tagged queueing or not. Another bit
specifies if disconnect is permitted, but if tagged queueing is not
supported the device can have one operation pending inside it at a
time…no more. Unfortunately devices often lie about whether they
can in fact support tagged queueing or not. Support for this function
in driver software is non-trivial, since when errors occur there are
intended ways you should ensure the device is done. It makes handling
errors more complex, makes read-after-write more complex if your OS
bothers supporting it, and makes cancelling I/O more complicated since
some operations are in general inside the device itself, not just
waiting inside nonpaged pool to be dispatched. Where a device supports
it, though, the feature can produce extreme throughput and cover a
multitude of sins. A device that has enough internal commands to transfer
an entire track can, for example, get the whole track in one rotation
rather than handling things in order, if the tags allow (and there are
several more options I have not mentioned). The device can schedule
operations from several initiators (processors) if your OS supports this.
That kind of shared access support is needed in VMS, not NT, as part of
a shared-i/o clustering model. If your system does not share devices, but
just serves them (or, worse, uses reserve/release to bracket operations)
some of the potential tcq benefits (tcq=tagged command queueing) will not
be available.

There is some benefit though. A device that CORRECTLY supports TCQ
can have many commands in the device at a time and can internally
reorder them to gain performance. A device that does not must handle
commands one at a time in the order given. It is typically messy to
try to replicate the tcq behavior in software layers above the disk,
though it could be done.

TCQ is mis-implemented so often that some OSs (in particular bsd) default
to off for it and allow it only on devices where it is known to work right.
Unfortunately the inquiry bit cannot be relied upon.

-----Original Message-----
From: Paul Bunn [mailto:xxxxx@UltraBac.com]
Sent: Monday, October 30, 2000 9:01 PM
To: NT Developers Interest List
Subject: [ntdev] RE: SCSI chips performance

There are “queued” commands and there are “tagged queued” commands.
Now, I strongly suggest double-checking anything I say here, coz I’m a bit
rusty
on this stuff.

NT/SCSI makes use of queued commands with pretty much all SCSI controllers
(a
prerequisite is that disconnections have to be enabled for the relevant
devices of
course).
Tagged-queued commands are very rarely used feature where you can group a
bunch of
queued commands so that they’re completed in a certain order (like grouping
sets
of commands together). I think this can also be done to prioritize one
group of
commands over another. Since it is possible that a SCSI device can complete
commands in a different order than which they were sent, I believe it
modifies
this behavior. I don’t think any version of NT uses tagged-queued commands
at
all. It can be confusing though since “tagged-queued” commands and “queued”
commands are used interchangeably since a bunch of commands all with the
same
“tag” is the same as no tag at all.

The LSI chipsets (who now own the NCR/Symbios SCSI controllers) are the
nicest
IMHO. They have very intelligent controllers able to support “scripts” in
hardware that the drivers download. They’re really off-board CPUs in their
own
right. The 64bit U3 controllers such as made by www.tekram.com are a good
implementation of such a chipset. It used to be common that a CPU was used
to
handle SCSI bus phases, and the SCSI controller chip was really just a set
of
intelligent line-drivers. I’ve written microcode to handle such phase
transitions
in Z80. Thank goodness that SCSI controllers have come a long way since
then.
The example code on the DDK for SCSI miniport drivers is woefully inadequate

most (all?) are for ISA/EISA implementations, and therefore they are really
not
representative of typical SCSI drivers that are in use today. Anyway, I
think the
overhead of a few extra interrupts per SCSI I/O was negligible back in the
day
when sustaining 3MB/sec was considered fast.

Regards,

Paul Bunn, UltraBac.com, 425-644-6000
Microsoft MVP - WindowsNT/2000
http://www.ultrabac.com

-----Original Message-----
From: Maxim S. Shatskih [mailto:xxxxx@storagecraft.com]
Sent: Monday, October 30, 2000 3:29 PM
To: NT Developers Interest List
Subject: [ntdev] SCSI chips performance

Hi all,

the DDK has 2 samples - AHA174x and NCR53C9X.

According to the source code:

  • the Adaptec chip has a hardware FIFO of SCSI commands, drives the bus in
    the hardware, but does not support tagged queueing.
  • the NCR/Emulex chip is more stupid, cannot drive the bus without the aid
    of the software (SCSI bus changes are reported via interrupts), but supports
    tagged queueing.

Is there any data on what approach is better and what chip is faster?


You are currently subscribed to ntdev as: xxxxx@FirstUSA.com
To unsubscribe send a blank email to $subst(‘Email.Unsub’)

> Now, I strongly suggest double-checking anything I say here, coz I’m a bit

rusty
on this stuff.

NT/SCSI makes use of queued commands with pretty much all SCSI controllers
(a
prerequisite is that disconnections have to be enabled for the relevant
devices of
course).
Tagged-queued commands are very rarely used feature where you can

Not so exactly - just a terminological issue.
Tagged queue is the only way in SCSI to have several IO processes pending on
the same LUN.
This means - keeping a request queue on the LUN (not in the OS memory and
not in the HBA hardware) - thus providing the ability for the LUN to choose
an execution order.
Without the tagged queue, a LUN can execute only one command a time (can
accept another command only after completing the current one) - thus the
execution order can be chosen by the OS/HBA only and the only queues are the
OS software one and (possibly) the one in HBA hardware.
What you’re speaking on is the ordered tagged queue.

this behavior. I don’t think any version of NT uses tagged-queued
commands at
all.

NT4 DDK describes the Srb->QueueAction field intended especially for this:
SRB_SIMPLE_TAG_REQUEST
SRB_HEAD_OF_QUEUE_TAG_REQUEST
SRB_ORDERED_QUEUE_TAG_REQUEST

It can be confusing though since “tagged-queued” commands and “queued”
commands are used interchangeably since a bunch of commands all with

Yes, what you name “queued” is really “tagged-queued”, and what you name
“tagged-queued” is really “ordered-tagged-queued”.

right. The 64bit U3 controllers such as made by www.tekram.com are a

Looks like (my opinion is based on computer shops) Tekram is now (in the
160MB LVD era) the leader in SCSI HBAs, not Adaptec as it was usually.

The example code on the DDK for SCSI miniport drivers is woefully
inadequate –
most (all?) are for ISA/EISA implementations, and therefore they are
really

Yes, and it is very strange that the ISA card driver uses busmaster DMA.

Max