> -----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-261229-
xxxxx@lists.osr.com] On Behalf Of nikolas stylianides
The maximum rate we achieved during early tests was 120MBits/sec.
With some modifications in PLX registers ( PCIMGR set to 6, PCIMLR set
to A ) we achieved to increase the rate at 150 MBits/sec.
how this number is measured (at bus level or at application level)?
I’m not familiar with plx9054 and the terms they are using in their
manual. I’m assuming PCIMGR is referred to “Minimum Grant” and PCIMLR is
refereed to “Maximum Latency” as defined to “MIN_GNT” and “MAX_LAT” in
PCI local bus spec. I further assume the device is operating at
33Mhz/32-bit PCI mode.
Note that the following discussion is only applicable to PCI bus(as
opposed to PCI-X or PCI Express.)
With some modifications in PLX registers (PCIMGR set to 6, PCIMLR set
to A
Basically, you are setting the length of data phase in a single burst to
6*250 nanoseconds==48 pci clk @33mhz, and telling the system your device
would access the bus every 10*250nanoseconds. How do you choose these
values? How big is your DMA FIFO? How fast can the device fill up the
DMA WRITE FIFO? How fast can the target (host bridge in your case) sink
the data from the master?
These values must be chosen to be bus friendly, such that:
- don’t overrun your DMA FIFO because the target can’t eat the data
fast enough, hence it have to insert WAIT states, or target retry,
disconnect.
- don’t underrun your DMA FIFO because your device is not feeding the
FIFO fast enough so that master has to insert waits state (since you may
overbook the bus cycle)
- Master don’t request the bus until he has enough data in the fifo so
that it can sustain a long burst.
However, these are your goals but how the system arbiter handles it may
vary from platform to platform, you’ve been warned.
You didn’t say what the Latency Timer value of your device is. Note that
the MAX_GNT and MAX_LAT settings should not conflict with the Latency
Timer.
In theory PCI bus can transfer 1GBit/sec.
On a 33mhz/32-bit bus, the theoretical max bw is
133Mbytes/sec==1064Gbps. You got this number if the bus only has data
phases, no handshaking and termination, nothing else — that’s of
course impossible because each transaction has overhead. It depends on a
number of facts including but not limited to:
- arbitration latency – The clks between master asserts REQ# and its
GNT# getting asserted)
- master data latency – The clks from master asserted FRAME# to
asserting IRDY#.
- target initial latency – the clks from master asserted FRAME# to
target claiming or terminating the first data phase.
- target subsequent latency – a bit complicated to explain w/o a
whiteboard.
- DEVSEL# timing profile – from which subsequent clk after master
drove the FRAME# to device starts driving the DEVSEL#
- WAIT states inserted by either master or target for any reasons.
- The PCI commands being used during a burst
- mean size of DMA bursts, longer burst size can compensate the
overhead but hurts latency.
- cacheline size and the DMA boundary
- the way transactions are terminated (by master or target; if by
target, how does target terminate it, Disconnect or retry or something
else) and so on and so on…
It also depends on what other master(s) on the bus is doing. PCI is
inherently a time sharing half duplex bus. i.e. I’ve a full duplex
Gigabit NIC sending a lot of packet but at the same time the NIC’s DMA
Write engine can not move received packets to the host memory until the
DMA Read engine yielded the bus. Bad, eh?
Is this the maximum rate we must expect concerning the at the same
time
the PCI BUS is kept busy by the hard disk as well?
Once you figure out the overhead that you can’t avoid, you pretty much
know how much to expect. A Gigabit NIC on a regular PCI slot delivers
about 700Mbps TCP stream send data is considered pretty good number. It
has already saturated the PCI bus bw limit if you take into accound the
rx DMA traffic, PCI accesses to update the hw ring indices and moving
the SGL across the bus. BTW, Gbe requires 2Gbps+ bi-directional BW at
bus level.
Is there something i can do in order to boost up the performance?
You are asking a tough question here:)
Find out where the bottleneck was first. It could be in hw, sw, or could
be due to explicit interactions between s/w components as well as hw. (a
classic headache of NIC driver). You can start with hooking up a bus
analyzer to watch your DMA. Also look into the driver for hot spots.
Understanding the behavior of the application is very important. You
would need to make tradeoff between throughout and latency at some point
of time. In general, by using general purpose PCI controller such as
9054, you may not be able to achieve the same performance as ASIC
does(like Ethernet controller, FC HBA etc) because their DMA engines and
the internal state machines/blocks are tightly coupled and requests are
often pipelined, so they can do very aggressive DMA at the bus level.
But 150Mbps seems a little bit low so you should have room.
In the end, performance tuning is very interesting and *frustrating*
work. You might get some boost in some applications but break others
depending on how complex your hw, driver are and the application that
uses your hw.
Good luck!
Calvin Guan (DDK MVP) Sr. Staff Engineer
NetXtreme Vista/Longhorn Server Miniport
Broadcom Corporation, Irvine CA 92618
Connecting Everything(r)