PCI burst transfers

I’m trying to speed up my driver’s access to a PCI device that I’ve produced.

The major bottleneck seems to be the speed of access over the PCI bus. The reason it’s so slow is that the driver isn’t doing burst accesses to the device, so I can’t get anything close to the theoretical PCI bus bandwidth.

I’m currently transferring data to/from the device using the macros READ_REGISTER_ULONG(), etc. Each read of a 32-bit register is taking around 600ns to complete.

My device isn’t a PCI bus master, so I need to either read using the main processor, or using system DMA. The device does support burst PCI transfers however.

The amount of data that I’m transferring is fairly small, normally 5 to 20 words at a time, so DMA seems like overkill.

Is there any simple way to just initiate a burst transfer using the processor without getting DMA involved? I had assumed that this would be automatic when I read a bunch of consecutive addresses, but apparently it’s not.

Thanks,
Steve

xxxxx@embeddedintelligence.com wrote:

I’m trying to speed up my driver’s access to a PCI device that I’ve produced.

The major bottleneck seems to be the speed of access over the PCI bus. The reason it’s so slow is that the driver isn’t doing burst accesses to the device, so I can’t get anything close to the theoretical PCI bus bandwidth.

I’m currently transferring data to/from the device using the macros READ_REGISTER_ULONG(), etc. Each read of a 32-bit register is taking around 600ns to complete.

You will never approach PCI bandwidth without bus mastering on the
device. Reading from a device is the worst case scenario.

However, 600ns does seem a bit long – that’s 20 PCI cycles. Have you
used a logic analyzer to figure out where the delay takes place?

My device isn’t a PCI bus master, so I need to either read using the main processor, or using system DMA. The device does support burst PCI transfers however.

Remember that system DMA is an ISA bus thing. It is not PCI bus
mastering. It is limited to 16-bit addressing, and will generally run
slower than what you are doing now.

The amount of data that I’m transferring is fairly small, normally 5 to 20 words at a time, so DMA seems like overkill.

In that case, why be worried about the bandwidth at all? Do you know
that the performance is an issue?

Is there any simple way to just initiate a burst transfer using the processor without getting DMA involved? I had assumed that this would be automatic when I read a bunch of consecutive addresses, but apparently it’s not.

No. How could it possibly do that? Your read instruction blocks until
the data comes back. Neither the processor nor the PCI chipset has any
idea that you are reading from consecutive addresses. It can do that on
WRITING, but not on reading.

You might try using RtlMoveMemory instead of READ_REGISTER_ULONG as an
experiment. It compiles to a “rep movsd”, which is the best you can do
from the CPU’s end.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.