How to improve scsi miniport driver performance

Hi,

I am writing a Xen windows para-virtualize driver. One of the most
important driver is a scsi miniport driver to service a virtual block
device. Now, such driver can work well and performance is just
acceptable. So I want to use some approach to optimize the performance.

1, My scsi miniport driver already support MultipleRequestPerLu. When
driver receive a scsi read/write command, I will send it to ring buffer
( an approach used in Xen to send/receive command and data to/from
Dom0), then I call ScsiPortNotification(NextLuRequest,…) to notify
port driver send another scsi command. I found that port driver just
send 1~4 commands at the same time. But ring buffer can send more scsi
read/write commands. How can I get more scsi read/write command a time
to take a full load of ring buffer?

  1. Max tranfer length for a scsi command. Now, ring buffer can send just
    44K data a command. So I can set
    PORT_CONFIGURATION_INFORMATION.MaximumTransferLength to 44K to send each
    scsi comand in a single ring buffer segment. Also, I can set
    PORT_CONFIGURATION_INFORMATION.MaximumTransferLength as default, then
    driver may split a large scsi command into two and send them in two ring
    buffer segment. But both ring buffer segment will receive a interrupt.
    And both approach almost have the same performance. Any idea to optimize
    them?

3, Command queue. I know that port driver will maintain a scsi command
queue. Is that necessary to maintain a command queue in mini port
driver? That’s to say, when miniport driver receive a command, just
insert it to list and return as quick as possible. Then call
ScsiPortNotification(NextLuRequest,…) to get another command. There is
thread used to maintain this command list. It will get a node from list
and send it to ring buffer. Code of this approach is finished but I am
not use it now since someone said it’s danger for WHQL to create system
thread in miniport driver. Is that right? I think this approach can
improve performance, agree?

  1. Interrupt. WDK doc said ScsiPortNotification(CallEnableInterrupts…)
    and improve overall I/O throughput. In my driver, when receive a
    interrupt, I set some info in DeviceExtension then call
    ScsiPortNotification(CallEnableInterrupts…).
    HwEnableInterruptsCallback function will get data or command result from
    ring buffer and deal with them. But I get the same performance when I do
    that in interrupt routine. Anything wrong?

At last, thanks for read.

Best regards,
Wayne

For any type of performance optimizations, one should identify the bottlenecks and have some goals in mind before carrying out any actual work. Successful optimization should be based upon and be backed by intensive quantitative analysis, modeling such that the results are measurable, repeatable, predictable and explainable hence it’s scientific. Also note that perf optimizations sometimes could be art. As engineer by trade, I hate when it comes to the art part. I prefer dealing with numbers and equations-:slight_smile:

Do you know the bottlenecks of your performance problem?

Good luck,

Calvin Guan
Broadcom Corp.
Connecting Everything(r)

— On Wed, 11/5/08, Wayne Gong wrote:

> From: Wayne Gong
> Subject: [ntdev] How to improve scsi miniport driver performance
> To: “Windows System Software Devs Interest List”
> Received: Wednesday, November 5, 2008, 11:46 PM
> Hi,
>
> I am writing a Xen windows para-virtualize driver. One of
> the most important driver is a scsi miniport driver to
> service a virtual block device. Now, such driver can work
> well and performance is just acceptable. So I want to use
> some approach to optimize the performance.
>
> 1, My scsi miniport driver already support
> MultipleRequestPerLu. When driver receive a scsi read/write
> command, I will send it to ring buffer ( an approach used in
> Xen to send/receive command and data to/from Dom0), then I
> call ScsiPortNotification(NextLuRequest,…) to notify port
> driver send another scsi command. I found that port driver
> just send 1~4 commands at the same time. But ring buffer can
> send more scsi read/write commands. How can I get more scsi
> read/write command a time to take a full load of ring
> buffer?
>
> 2. Max tranfer length for a scsi command. Now, ring buffer
> can send just 44K data a command. So I can set
> PORT_CONFIGURATION_INFORMATION.MaximumTransferLength to 44K
> to send each scsi comand in a single ring buffer segment.
> Also, I can set
> PORT_CONFIGURATION_INFORMATION.MaximumTransferLength as
> default, then driver may split a large scsi command into two
> and send them in two ring buffer segment. But both ring
> buffer segment will receive a interrupt. And both approach
> almost have the same performance. Any idea to optimize them?
>
> 3, Command queue. I know that port driver will maintain a
> scsi command queue. Is that necessary to maintain a command
> queue in mini port driver? That’s to say, when miniport
> driver receive a command, just insert it to list and return
> as quick as possible. Then call
> ScsiPortNotification(NextLuRequest,…) to get another
> command. There is thread used to maintain this command list.
> It will get a node from list and send it to ring buffer.
> Code of this approach is finished but I am not use it now
> since someone said it’s danger for WHQL to create system
> thread in miniport driver. Is that right? I think this
> approach can improve performance, agree?
>
> 4. Interrupt. WDK doc said
> ScsiPortNotification(CallEnableInterrupts…) and improve
> overall I/O throughput. In my driver, when receive a
> interrupt, I set some info in DeviceExtension then call
> ScsiPortNotification(CallEnableInterrupts…).
> HwEnableInterruptsCallback function will get data or command
> result from ring buffer and deal with them. But I get the
> same performance when I do that in interrupt routine.
> Anything wrong?
>
> At last, thanks for read.
>
> Best regards,
> Wayne
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars
> visit: http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online
> at http://www.osronline.com/page.cfm?name=ListServer

__________________________________________________________________
Looking for the perfect gift? Give the gift of Flickr!

http://www.flickr.com/gift/

Calvin Guan wrote:

Do you know the bottlenecks of your performance problem?

I think the main bottleneck is my scsi miniport driver cannot receive
enough scsi READ/WRITE command to take a full load of ring buffer which
used to send/get data from disk media.

Thanks
Wayne