SCSI miniport reads are 5-6 times slower than writes

I’m working with a storage device that connects to the serial port. I
have a SCSI miniport driver which calls the serial port driver in order
to communicate with the drive. The processing of the SRBs is handled in
a separate thread created by my driver so that it can run at
PASSIVE_LEVEL so it can block waiting for the read and write IRPs sent
to the serial port to complete. The thread runs at a priority of
LOW_REALTIME_PRIORITY+2. For the moment, I complete the SRBs when the
are done by setting a boolean flag which is polled by a miniport timer
routine, which then can complete the request in miniport context
(discussion of alternative means of completing SRBs processed outside of
the miniport context is a subject of another thread of discussion).

Here’s the catch…

Writing data to the drive runs at about the expected performance given
the speed of the serial line and the overhead of the protocol used to
communicate with the drive.

However, reading from the drive is 5-6 times slower, even though the
protocol is essentially identical except that when the actual data is
transferred it goes the other direction.

A previous version of this driver (not my doing) had been written to do
all of the SRB processing and completion in the miniport StartIo
routine, bypassing the serial port driver and just accessing the serial
port registers directly, polling for each byte. Ignoring the obvious
issues with hardware conflicts (it didn’t claim the IO ports) and the
fact that the driver polled at raised IRQL (which rendered the system
totally unusable for the duration of any data transfers, which are not
exactly expedient over a serial port), that version of the driver was
able to read from the drive at the expected performance level,
consistent with the write operations.

You might think, “well, of course that would work faster, since it’s
basically taken over the machine running at DISPATCH_LEVEL”. However,
it’s not as if my version of the driver that utilizes the serial port
driver is being CPU starved. When doing data transfers with my version
of the driver, task manager only shows 20-25% CPU utilization, and
that’s on a slow (P5-166) system. Plus, that doesn’t explain why writes
still perform as expected, and only reads are slow for my version.

I instrumented the read operations in my driver with a bunch of calls to
KeQueryPerformanceCounter() to see where I had a bottleneck, and got
some very bizzare and confusing results. A single read operation will
sometimes get processed in the expected time, but other times it will
take orders of magnitude longer (sometimes on the order of seconds per
SRB), with an average performance about 5-6 times slower than expected.
It’s not clear to me why there is so much variation from the processing
of one SRB to the next. The machine is otherwise completely idle, not
doing anything but processing copying a file off the drive, yet it’s as
if when I’m reading from the drive that my thread is getting preempted
for extended periods of time.

One interesting observation…

When I copy a file to the drive, the copy progress dialog box pops up
for just a second, the progress bar immediately shoots across to 100%,
and then it disappears, even though the data transfer to the drive
itself may take another few minutes to complete (remember, were talking
serial here, so it’s not very fast when copying files on the order of
1-2 MB in size).

However, when I copy a file from the drive, the copy progress dialog box
pops up and shows the real-time progress for the duration of the time it
takes to actually copy the file.

So my only guess (and I’m really reaching here, cuz I’m not a filesystem
expert) is that there’s something going on at the filesystem level that
is different between when I’m reading and writing.

I suppose that when writing to the drive, it reads the file I’m copying
off of my hard disk and puts it all in filesystem cache almost
immediately (which might explain the quick pop up and disappearance of
the progress dialog), and then sends out the data to the drive at
whatever speed it can write it.

But when copying from the drive, it needs to take each block as it comes
off of the drive and write it to the hard disk. But, compared to the
speed of the drive on the serial port, I’d think the time and CPU cycles
that it would take to write to the hard disk would be virtually
insignificant compared to the time to read each block of data off of the
serial port drive. Besides, I would think that this would impact the
prior version of the driver as well, as it would also need to write the
blocks to the hard disk.

For those of you who have kept reading this far, any ideas? I’m at a
point where I’ve explored every avenue I can think of, and nothing seems
to make any difference on the performance.

One other comment… Running on a faster (P-II 333) dual CPU SMP system
doesn’t help. The performance is roughly the same on that system as
well.

I sure could use some bright ideas about now…

Thanks,

  • Jay

Jay Talbott
Principal Consulting Engineer
SysPro Consulting, LLC
3519 E. South Fork Drive
Suite 201
Phoenix, AZ 85044
(480) 704-8045
xxxxx@sysproconsulting.com
http://www.sysproconsulting.com


You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

Although I’m a far cry from being a “filesystems expert” too, I think
you’re on the right track with thinking this may have something to do
with the NT executive’s cache manager. In particular, I would go so far
as to conjecture that the older version of the driver is able to perform
equally for both reads and writes precisely because it bypasses the
cache manager and instead polls the serial registers (unless, of course,
I misread that part of your post!).

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Jay Talbott
Sent: Saturday, January 12, 2002 4:16 PM
To: NT Developers Interest List
Subject: [ntdev] SCSI miniport reads are 5-6 times slower than writes

I’m working with a storage device that connects to the serial port. I
have a SCSI miniport driver which calls the serial port driver in order
to communicate with the drive. The processing of the SRBs is handled in
a separate thread created by my driver so that it can run at
PASSIVE_LEVEL so it can block waiting for the read and write IRPs sent
to the serial port to complete. The thread runs at a priority of
LOW_REALTIME_PRIORITY+2. For the moment, I complete the SRBs when the
are done by setting a boolean flag which is polled by a miniport timer
routine, which then can complete the request in miniport context
(discussion of alternative means of completing SRBs processed outside of
the miniport context is a subject of another thread of discussion).

Here’s the catch…

Writing data to the drive runs at about the expected performance given
the speed of the serial line and the overhead of the protocol used to
communicate with the drive.

However, reading from the drive is 5-6 times slower, even though the
protocol is essentially identical except that when the actual data is
transferred it goes the other direction.

A previous version of this driver (not my doing) had been written to do
all of the SRB processing and completion in the miniport StartIo
routine, bypassing the serial port driver and just accessing the serial
port registers directly, polling for each byte. Ignoring the obvious
issues with hardware conflicts (it didn’t claim the IO ports) and the
fact that the driver polled at raised IRQL (which rendered the system
totally unusable for the duration of any data transfers, which are not
exactly expedient over a serial port), that version of the driver was
able to read from the drive at the expected performance level,
consistent with the write operations.

You might think, “well, of course that would work faster, since it’s
basically taken over the machine running at DISPATCH_LEVEL”. However,
it’s not as if my version of the driver that utilizes the serial port
driver is being CPU starved. When doing data transfers with my version
of the driver, task manager only shows 20-25% CPU utilization, and
that’s on a slow (P5-166) system. Plus, that doesn’t explain why writes
still perform as expected, and only reads are slow for my version.

I instrumented the read operations in my driver with a bunch of calls to
KeQueryPerformanceCounter() to see where I had a bottleneck, and got
some very bizzare and confusing results. A single read operation will
sometimes get processed in the expected time, but other times it will
take orders of magnitude longer (sometimes on the order of seconds per
SRB), with an average performance about 5-6 times slower than expected.
It’s not clear to me why there is so much variation from the processing
of one SRB to the next. The machine is otherwise completely idle, not
doing anything but processing copying a file off the drive, yet it’s as
if when I’m reading from the drive that my thread is getting preempted
for extended periods of time.

One interesting observation…

When I copy a file to the drive, the copy progress dialog box pops up
for just a second, the progress bar immediately shoots across to 100%,
and then it disappears, even though the data transfer to the drive
itself may take another few minutes to complete (remember, were talking
serial here, so it’s not very fast when copying files on the order of
1-2 MB in size).

However, when I copy a file from the drive, the copy progress dialog box
pops up and shows the real-time progress for the duration of the time it
takes to actually copy the file.

So my only guess (and I’m really reaching here, cuz I’m not a filesystem
expert) is that there’s something going on at the filesystem level that
is different between when I’m reading and writing.

I suppose that when writing to the drive, it reads the file I’m copying
off of my hard disk and puts it all in filesystem cache almost
immediately (which might explain the quick pop up and disappearance of
the progress dialog), and then sends out the data to the drive at
whatever speed it can write it.

But when copying from the drive, it needs to take each block as it comes
off of the drive and write it to the hard disk. But, compared to the
speed of the drive on the serial port, I’d think the time and CPU cycles
that it would take to write to the hard disk would be virtually
insignificant compared to the time to read each block of data off of the
serial port drive. Besides, I would think that this would impact the
prior version of the driver as well, as it would also need to write the
blocks to the hard disk.

For those of you who have kept reading this far, any ideas? I’m at a
point where I’ve explored every avenue I can think of, and nothing seems
to make any difference on the performance.

One other comment… Running on a faster (P-II 333) dual CPU SMP system
doesn’t help. The performance is roughly the same on that system as
well.

I sure could use some bright ideas about now…

Thanks,

  • Jay

Jay Talbott
Principal Consulting Engineer
SysPro Consulting, LLC
3519 E. South Fork Drive
Suite 201
Phoenix, AZ 85044
(480) 704-8045
xxxxx@sysproconsulting.com
http://www.sysproconsulting.com


You are currently subscribed to ntdev as: xxxxx@owen-t.com
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com


You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

Owen,

I don’t follow your thinking… Neither my version or the old version
bypasses the cache manager. The filesystem and cache management stuff
would be the same in both cases, as what goes on above the miniport
shouldn’t change based on which version of the miniport is running. The
old miniport just bypasses using the serial port driver to talk to the
serial port, but instead reads and writes directly to the serial port
hardware itself (polling at raised IRQL), whereas my new version
utilizes the serial port driver by constructing read and write IRPs that
are sent and then waiting (at PASSIVE_LEVEL) for IRP completion.

The real question is, what is going on at the levels up above that is
somehow impacting my performance when copying data from the drive on the
serial port.

  • Jay

Jay Talbott
Principal Consulting Engineer
SysPro Consulting, LLC
3519 E. South Fork Drive
Suite 201
Phoenix, AZ 85044
(480) 704-8045
xxxxx@sysproconsulting.com
http://www.sysproconsulting.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Owen T.
Cunningham
Sent: Saturday, January 12, 2002 2:27 PM
To: NT Developers Interest List
Subject: [ntdev] RE: SCSI miniport reads are 5-6 times slower
than writes

Although I’m a far cry from being a “filesystems expert” too, I think
you’re on the right track with thinking this may have something to do
with the NT executive’s cache manager. In particular, I would
go so far
as to conjecture that the older version of the driver is able
to perform
equally for both reads and writes precisely because it bypasses the
cache manager and instead polls the serial registers (unless,
of course,
I misread that part of your post!).

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Jay Talbott
Sent: Saturday, January 12, 2002 4:16 PM
To: NT Developers Interest List
Subject: [ntdev] SCSI miniport reads are 5-6 times slower than writes

I’m working with a storage device that connects to the serial port. I
have a SCSI miniport driver which calls the serial port
driver in order
to communicate with the drive. The processing of the SRBs is
handled in
a separate thread created by my driver so that it can run at
PASSIVE_LEVEL so it can block waiting for the read and write IRPs sent
to the serial port to complete. The thread runs at a priority of
LOW_REALTIME_PRIORITY+2. For the moment, I complete the SRBs when the
are done by setting a boolean flag which is polled by a miniport timer
routine, which then can complete the request in miniport context
(discussion of alternative means of completing SRBs processed
outside of
the miniport context is a subject of another thread of discussion).

Here’s the catch…

Writing data to the drive runs at about the expected performance given
the speed of the serial line and the overhead of the protocol used to
communicate with the drive.

However, reading from the drive is 5-6 times slower, even though the
protocol is essentially identical except that when the actual data is
transferred it goes the other direction.

A previous version of this driver (not my doing) had been
written to do
all of the SRB processing and completion in the miniport StartIo
routine, bypassing the serial port driver and just accessing
the serial
port registers directly, polling for each byte. Ignoring the obvious
issues with hardware conflicts (it didn’t claim the IO ports) and the
fact that the driver polled at raised IRQL (which rendered the system
totally unusable for the duration of any data transfers, which are not
exactly expedient over a serial port), that version of the driver was
able to read from the drive at the expected performance level,
consistent with the write operations.

You might think, “well, of course that would work faster, since it’s
basically taken over the machine running at DISPATCH_LEVEL”. However,
it’s not as if my version of the driver that utilizes the serial port
driver is being CPU starved. When doing data transfers with
my version
of the driver, task manager only shows 20-25% CPU utilization, and
that’s on a slow (P5-166) system. Plus, that doesn’t explain
why writes
still perform as expected, and only reads are slow for my version.

I instrumented the read operations in my driver with a bunch
of calls to
KeQueryPerformanceCounter() to see where I had a bottleneck, and got
some very bizzare and confusing results. A single read operation will
sometimes get processed in the expected time, but other times it will
take orders of magnitude longer (sometimes on the order of seconds per
SRB), with an average performance about 5-6 times slower than
expected.
It’s not clear to me why there is so much variation from the
processing
of one SRB to the next. The machine is otherwise completely idle, not
doing anything but processing copying a file off the drive,
yet it’s as
if when I’m reading from the drive that my thread is getting preempted
for extended periods of time.

One interesting observation…

When I copy a file to the drive, the copy progress dialog box pops up
for just a second, the progress bar immediately shoots across to 100%,
and then it disappears, even though the data transfer to the drive
itself may take another few minutes to complete (remember,
were talking
serial here, so it’s not very fast when copying files on the order of
1-2 MB in size).

However, when I copy a file from the drive, the copy progress
dialog box
pops up and shows the real-time progress for the duration of
the time it
takes to actually copy the file.

So my only guess (and I’m really reaching here, cuz I’m not a
filesystem
expert) is that there’s something going on at the filesystem
level that
is different between when I’m reading and writing.

I suppose that when writing to the drive, it reads the file
I’m copying
off of my hard disk and puts it all in filesystem cache almost
immediately (which might explain the quick pop up and disappearance of
the progress dialog), and then sends out the data to the drive at
whatever speed it can write it.

But when copying from the drive, it needs to take each block
as it comes
off of the drive and write it to the hard disk. But, compared to the
speed of the drive on the serial port, I’d think the time and
CPU cycles
that it would take to write to the hard disk would be virtually
insignificant compared to the time to read each block of data
off of the
serial port drive. Besides, I would think that this would impact the
prior version of the driver as well, as it would also need to
write the
blocks to the hard disk.

For those of you who have kept reading this far, any ideas? I’m at a
point where I’ve explored every avenue I can think of, and
nothing seems
to make any difference on the performance.

One other comment… Running on a faster (P-II 333) dual CPU
SMP system
doesn’t help. The performance is roughly the same on that system as
well.

I sure could use some bright ideas about now…

Thanks,

  • Jay

Jay Talbott
Principal Consulting Engineer
SysPro Consulting, LLC
3519 E. South Fork Drive
Suite 201
Phoenix, AZ 85044
(480) 704-8045
xxxxx@sysproconsulting.com
http://www.sysproconsulting.com


You are currently subscribed to ntdev as: xxxxx@owen-t.com
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com


You are currently subscribed to ntdev as:
xxxxx@sysproconsulting.com
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com


You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com