PCI Burst in SCSI Miniport Driver

Hi All,

Is there a way to do PCI burst in a Scsi Miniport driver?
We are doing a scsiMiniport driver and we need to read 64 bits of data (ULONGLONG) with 1 PCI operation to optimize performance.
We have tried

  1. ScsiPortMoveMemory();
  2. ScsiPortReadRegisterBufferUlong();
  3. Dereference memory pointer;(*pullDest = *pullSrc;)
    However, none of those things work. The PCI analyzer trace shows that we actually do 2 reads of ULONG

Thank you very much!
Vu

To the best of my knowledge, PCI burst size and patterns are completely up
to the hardware. This should be determined by the host bus controller from
what it thinks about bus widths and timings and the capabilities of
intermediate bridges and the final target chip.

Offhand, I’d say that something either can’t handle or doesn’t think there
is a 64 bit data path to the device, so is using narrower transfers.

Loren

Hi All,

Is there a way to do PCI burst in a Scsi Miniport driver?
We are doing a scsiMiniport driver and we need to read 64 bits of data
(ULONGLONG) with 1 PCI operation to optimize performance.
We have tried

  1. ScsiPortMoveMemory();
  2. ScsiPortReadRegisterBufferUlong();
  3. Dereference memory pointer;(*pullDest = *pullSrc;)
    However, none of those things work. The PCI analyzer trace shows that we
    actually do 2 reads of ULONG

Thank you very much!
Vu

Pretty much the only way to ensure burst on the PCI bus is to use bus
master DMA. In this configuration the PCI card is the master. If you are
trying to burst with the host as the master then it will not work.

Thanks,
Dale

I have run into this very same thing with our inhouse PCI card. This card
is used to interface a PC to our high speed tester. It is a 64 bit/66 Mhz
PCI card. I have tried many things, including coding a move using the MMX
instructions in assembly (MOVQ), and it still breaks it up into two 32 bit
transfers.

From my research in PC design and chipset design, it is basically a
chipset/PCI bridge issue, and there is no way to guarantee 64 bit transfers
from PIO type transfers.

We solved the problem though by putting bus master DMA on our card for
transfers of large data. We have obtained real world rates of over
250Mbytes/sec using 64 bit bus master DMA. However, you will find that
these rates, even for bus master DMA are also highly dependent on the
design of the motherboard. We have found that the SuperMicro P4DSE MB using
the ServerWorks GC-SL chipset has poor performance, mainly because it is
unable to get very high burst sizes, maybe 256 to 512 bytes at most. The
other system we use is based on an Intel E7501 chipset, and it gets better
DMA rates by being able to do bursts of up to 4096 bytes. These are using
P4 Xeon processors under W2K.

The size of the burst is very important because it can take several hundred
nanoseconds for a master to acquire the bus. However, once acquired, each
transfer in a burst is only 15ns (even for a 64 bit transfer). Standard PIO
transfers are generally around 130ns each, and it seems that you can’t
guarantee that these will burst either (I tried coding a “REP MOVSD” in
assembly). So you can see that if you need high performance transfer, bus
master DMA is the only way to go.

At 05:58 PM 10/30/2003 -0800, you wrote:

To the best of my knowledge, PCI burst size and patterns are completely up
to the hardware. This should be determined by the host bus controller from
what it thinks about bus widths and timings and the capabilities of
intermediate bridges and the final target chip.

Offhand, I’d say that something either can’t handle or doesn’t think there
is a 64 bit data path to the device, so is using narrower transfers.

Loren

> Hi All,
>
> Is there a way to do PCI burst in a Scsi Miniport driver?
> We are doing a scsiMiniport driver and we need to read 64 bits of data
(ULONGLONG) with 1 PCI operation to optimize performance.
> We have tried
> 1. ScsiPortMoveMemory();
> 2. ScsiPortReadRegisterBufferUlong();
> 3. Dereference memory pointer;(*pullDest = *pullSrc;)
> However, none of those things work. The PCI analyzer trace shows that we
actually do 2 reads of ULONG
>
> Thank you very much!
> Vu


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@nptest.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Russ Poffenberger
NPTest, Inc.
xxxxx@NPTest.com

When I did graphics for a living, coaxing the PCI bus into bursting for long
stretches of time was an on-going issue, even with DMA transfers !

I don’t know if I’m not barking up the wrong tree, but on a P4 you may have
to deal with output buffering and write combining, those may generate delays
depending on how you code your loops. Intel’s manual recommends that in
certain conditions the user must force the flushing of the output buffers,
so, it may be the case that the way to do it is to transfer a little bit of
data and then to issue a sequencing instruction, I don’t know. I heard that
the highest throughput is achieved by using the SIMD extensions (not MMX or
MOVS), but I never tried it. Yet I remember Intel distributed a turbocharged
piece of data movement code a few years ago, when the SIMD extensions were
first announced.

Another point to watch is how’s your PCI memory configured ? Intel’s book
says it should be configured as WC rather than UC, this is another point to
watch. It’s easy enough to fiddle with the appropriate MSR and see which
setting makes more sense.

It should be instructive to put a scope on the input to the bridge and see
what kind of signaling the processor generates into the bridge, that should
tell if the problem is the bridge or the processor. But then, if DMA speeds
it up, chances are that the Bridge is not the issue, right ? Because DMA
transfers from system memory go through the bridge as well.

Alberto.

-----Original Message-----
From: Russ Poffenberger [mailto:xxxxx@nptest.com]
Sent: Friday, October 31, 2003 11:11 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Re: PCI Burst in SCSI Miniport Driver

I have run into this very same thing with our inhouse PCI card. This card
is used to interface a PC to our high speed tester. It is a 64 bit/66 Mhz
PCI card. I have tried many things, including coding a move using the MMX
instructions in assembly (MOVQ), and it still breaks it up into two 32 bit
transfers.

From my research in PC design and chipset design, it is basically a
chipset/PCI bridge issue, and there is no way to guarantee 64 bit transfers
from PIO type transfers.

We solved the problem though by putting bus master DMA on our card for
transfers of large data. We have obtained real world rates of over
250Mbytes/sec using 64 bit bus master DMA. However, you will find that
these rates, even for bus master DMA are also highly dependent on the
design of the motherboard. We have found that the SuperMicro P4DSE MB using
the ServerWorks GC-SL chipset has poor performance, mainly because it is
unable to get very high burst sizes, maybe 256 to 512 bytes at most. The
other system we use is based on an Intel E7501 chipset, and it gets better
DMA rates by being able to do bursts of up to 4096 bytes. These are using
P4 Xeon processors under W2K.

The size of the burst is very important because it can take several hundred
nanoseconds for a master to acquire the bus. However, once acquired, each
transfer in a burst is only 15ns (even for a 64 bit transfer). Standard PIO
transfers are generally around 130ns each, and it seems that you can’t
guarantee that these will burst either (I tried coding a “REP MOVSD” in
assembly). So you can see that if you need high performance transfer, bus
master DMA is the only way to go.

At 05:58 PM 10/30/2003 -0800, you wrote:

To the best of my knowledge, PCI burst size and patterns are completely up
to the hardware. This should be determined by the host bus controller from
what it thinks about bus widths and timings and the capabilities of
intermediate bridges and the final target chip.

Offhand, I’d say that something either can’t handle or doesn’t think there
is a 64 bit data path to the device, so is using narrower transfers.

Loren

> Hi All,
>
> Is there a way to do PCI burst in a Scsi Miniport driver?
> We are doing a scsiMiniport driver and we need to read 64 bits of data
(ULONGLONG) with 1 PCI operation to optimize performance.
> We have tried
> 1. ScsiPortMoveMemory();
> 2. ScsiPortReadRegisterBufferUlong();
> 3. Dereference memory pointer;(*pullDest = *pullSrc;)
> However, none of those things work. The PCI analyzer trace shows that
we
actually do 2 reads of ULONG
>
> Thank you very much!
> Vu


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@nptest.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Russ Poffenberger
NPTest, Inc.
xxxxx@NPTest.com


Questions? First check the Kernel Driver FAQ at
http://www.osronline.com/article.cfm?id=256

You are currently subscribed to ntdev as: xxxxx@compuware.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

The contents of this e-mail are intended for the named addressee only. It
contains information that may be confidential. Unless you are the named
addressee or an authorized designee, you may not copy or use it, or disclose
it to anyone else. If you received it in error please notify us immediately
and then destroy it.

At 11:48 AM 10/31/2003 -0500, you wrote:

When I did graphics for a living, coaxing the PCI bus into bursting for long
stretches of time was an on-going issue, even with DMA transfers !

I don’t know if I’m not barking up the wrong tree, but on a P4 you may have
to deal with output buffering and write combining, those may generate delays
depending on how you code your loops. Intel’s manual recommends that in
certain conditions the user must force the flushing of the output buffers,
so, it may be the case that the way to do it is to transfer a little bit of
data and then to issue a sequencing instruction, I don’t know. I heard that
the highest throughput is achieved by using the SIMD extensions (not MMX or
MOVS), but I never tried it. Yet I remember Intel distributed a turbocharged
piece of data movement code a few years ago, when the SIMD extensions were
first announced.

Another point to watch is how’s your PCI memory configured ? Intel’s book
says it should be configured as WC rather than UC, this is another point to
watch. It’s easy enough to fiddle with the appropriate MSR and see which
setting makes more sense.

It should be instructive to put a scope on the input to the bridge and see
what kind of signaling the processor generates into the bridge, that should
tell if the problem is the bridge or the processor. But then, if DMA speeds
it up, chances are that the Bridge is not the issue, right ? Because DMA
transfers from system memory go through the bridge as well.

The problem with fiddling with bridge/chipset registers is that what works
for one MB probably won’t work for another :frowning:

Russ Poffenberger
NPTest, Inc.
xxxxx@NPTest.com