scsi miniport performance issue

Hi folks,

I meet a performance problem when developing scsi miniport driver.

I know that scsiport driver can queue lots (IIRC at least more then 200
for win2k3) of READ/WRITE scsi command and find properly time to send
them to scsi miniport driver. As a scsi miniport driver, call
ScsiPortNotification(NextLuRequest…) to notify scsiport driver that it
can receive another scsi command. In my driver, since some hardware and
other limitation, I set MAX_CMD_DEPTH to 32. But when I test the
physical disk, which no volume mounted on it, with Iometer, performance
is very low. I found that my scsi miniport driver just receive 4
READ/WRITE command at most. I am sure that I call
ScsiPortNotification(NextLuRequest…) for each scsi command when I
receive it and send it down. But it seems that scsiport driver doesn’t
give me more commands. Is that block my driver performance?

Thanks
Wayne

Did you run a trace to see if lots of requests are getting queued to the
miniport? Or is it that upper levels just aren’t sending requests down?

What does perfmon say the disk queue depth is?

I used to use a commercial software trace tool called BusHound for things
like this. I know some developers just like to take a filter driver and
munge it for their purposes.

The number 4 vaguely rings a bell as the number of buffers available as low
memory DMA bounce buffers, although it’s been years since I had to deal with
a non-64bit DMA device. Is your device declared to be 64-bit DMA capable?
Are you possibly running with verifier DMA checking, which forces the use of
bounce buffers (to check for DMA memory corruption)?

High performance storage generally uses storport, not scsiport.

Jan

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Wayne Gong
Sent: Wednesday, December 31, 2008 1:11 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] scsi miniport performance issue

Hi folks,

I meet a performance problem when developing scsi miniport driver.

I know that scsiport driver can queue lots (IIRC at least
more then 200 for win2k3) of READ/WRITE scsi command and find
properly time to send them to scsi miniport driver. As a scsi
miniport driver, call
ScsiPortNotification(NextLuRequest…) to notify scsiport
driver that it can receive another scsi command. In my
driver, since some hardware and other limitation, I set
MAX_CMD_DEPTH to 32. But when I test the physical disk, which
no volume mounted on it, with Iometer, performance is very
low. I found that my scsi miniport driver just receive 4
READ/WRITE command at most. I am sure that I call
ScsiPortNotification(NextLuRequest…) for each scsi command
when I receive it and send it down. But it seems that
scsiport driver doesn’t give me more commands. Is that block
my driver performance?

Thanks
Wayne

Jan Bottorff wrote:

Did you run a trace to see if lots of requests are getting queued to the
miniport? Or is it that upper levels just aren’t sending requests down?

What does perfmon say the disk queue depth is?

Perfmon say that disk avg queue length is less then 1, about 0.9. Sorry
for my stupid fault that I set ‘Outstanding IO’ to 1 in IoMeter. If I
set it to 16, perfmon say that disk avg queue is almost 16.

I used to use a commercial software trace tool called BusHound for things
like this. I know some developers just like to take a filter driver and
munge it for their purposes.

The number 4 vaguely rings a bell as the number of buffers available as low
memory DMA bounce buffers, although it’s been years since I had to deal with
a non-64bit DMA device. Is your device declared to be 64-bit DMA capable?
Are you possibly running with verifier DMA checking, which forces the use of
bounce buffers (to check for DMA memory corruption)?

Can I set 64-bit DMA enable for a 32bit OS such as win2k3 x86? I am not
very sure about that.

High performance storage generally uses storport, not scsiport.

I think so, but now scsiport driver just a plan for me.

Another info for performance issue:
If I use windows default driver for the same media on my test box, and
set ‘Outstanding IO’ to 1 in IoMeter, performance is higher then my
driver, about 5-8 times. So I think my driver still has something wrong
block the performance. If I get another progress, I will post to the
form, thanks for your help.

Thanks
Wayne

>

Can I set 64-bit DMA enable for a 32bit OS such as win2k3 x86? I am
not
very sure about that.

Have a look at my drivers… the code fragment is:

if (ConfigInfo->Dma64BitAddresses == SCSI_DMA64_SYSTEM_SUPPORTED)
{
ConfigInfo->Master = TRUE;
ConfigInfo->Dma64BitAddresses = SCSI_DMA64_MINIPORT_SUPPORTED;
KdPrint((__DRIVER_NAME " Dma64BitAddresses supported\n"));
}
else
{
ConfigInfo->Master = FALSE;
KdPrint((__DRIVER_NAME " Dma64BitAddresses not supported\n"));
}

Basically you need to check if Dma64BitAddresses ==
SCSI_DMA64_SYSTEM_SUPPORTED, and if it is (and if you support 64 bit
DMA) you then go and set SCSI_DMA64_MINIPORT_SUPPORTED. I seem to
remember that SCSI_DMA64_SYSTEM_SUPPORTED will be set when running 64
bit or with PAE enabled (eg explicitly enabled or with 4G or more of
memory).

Setting Master is a bit of a quirk I think. I couldn’t get the driver to
load without the above settings (I’m using mapped buffers).

> High performance storage generally uses storport, not scsiport.

I think so, but now scsiport driver just a plan for me.

I found storport too buggy to use… possibly because I was on a non-PCI
bus.

Another info for performance issue:
If I use windows default driver for the same media on my test box, and
set ‘Outstanding IO’ to 1 in IoMeter, performance is higher then my
driver, about 5-8 times. So I think my driver still has something
wrong
block the performance. If I get another progress, I will post to the
form, thanks for your help.

James

I am developing windows para-virtual drivers for Xen. A storage media
can work on two modes, Qemu and PV. If working on Qemu, windows will use
defalut intel ATAPI driver to service it. If working on PV, I write a
scsi miniport driver to service that media.

I do some quick performance test for my scsi miniport driver with
IoMeter. Default ‘Outstanding IO’ is set to 1 for a physical disk in
IoMeter. For 32K sequence 100% write operations, I get very bad
performance about 4 Mbps. If I use the default driver in windows with
the same media, performance is about 30 Mbps. From Perfmon, avg command
queue are both about to 1.

If I set ‘Outstanding IO’ to 16 or 32, my driver can get about 20 Mbps,
also less then the windows default driver. So now I think I have to take
some time to improve my miniport driver. So my first question is that
does the driver type degrade the performance with default setting in
IoMeter? I guess intel ATAPI driver has some command queue to optimize
Read/Write performance.

Search the list archive, I get some hack method to improve scsi miniport
driver performance. Such as, for Write command, save SRB to a queue and
complete SRB immediate in StartIo routine, process the SRB in a work
item or system processor later. But what’s about Read command. Any other
method and is the method WHQL safe? Please give me some tips to do a
deep search in this form or some thing I need to read.

Thanks
Wayne

> Basically you need to check if Dma64BitAddresses ==

SCSI_DMA64_SYSTEM_SUPPORTED, and if it is (and if you support 64 bit
DMA) you then go and set SCSI_DMA64_MINIPORT_SUPPORTED. I seem to
remember that SCSI_DMA64_SYSTEM_SUPPORTED will be set when running 64
bit or with PAE enabled (eg explicitly enabled or with 4G or more of
memory).

Setting Master is a bit of a quirk I think. I couldn’t get the driver to
load without the above settings (I’m using mapped buffers).

Thanks James, it’s very useful for me.

Thanks
Wayne

> item or system processor later. But what’s about Read command. Any other

method and is the method WHQL safe?

Actually I have doubts about WQHLing the full storage port on XP, or the SCSIPORT’s miniport which uses some hackery.

With 2003 and later, you have STORPORT’s virtual miniports to do the job.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

> SCSIPORT’s miniport which uses some hackery

The good ones don’t.

With 2003 and later, you have STORPORT

That represents a fraction of the market so this is not a commercially reasonable approach.

Thanks for your help.

I meet a strange thing. I use IoMeter to test my driver with difference
test case, find that if I use 100% Read or 100% Write test command,
performance is higher that windows default one. But if I use 75% Read or
50% Read, performance is very bad. Anyone meet the same issue as mine?

Thanks
Wayne