>Initially you say “host processor basically does memory moves…”, then "PCI
master basically does it’s own transfers", so how you make the difference?
Your PCI hardware has to have logic to implement bus mastering if that’s
what you want. Simpler PCI interface hardware only has a target interface.
You should look at the macro cell specs to determine what it’s capabilities
are.
I
know how to make the cpu do the job, but how you tell the PCI master to make
the transfers? and (just to be sure I understand), the PCI master is the PCI
controller on the host, right?
All data transfers across a PCI bus have master and a target. The most
common master on PC’s is the x86 CPU working with it’s bus interface
chipset. The CPU decides what addresses to present to the PCI bus, and what
operation (read/write). To be squeaky correct, actual PCI transfers have a
lot more details than just a read or a write.
A PCI transaction starts with some master requesting use of the bus. An
arbitration occurs to grant exclusive use of the bus signals. The host PCI
chipset has this arbitration logic in it. Once bus ownership is granted, a
master presents the desired starting address of a burst on the bus. A
target device recognizes the address and claims it. Only one target can
claim a transaction. To claim the transaction, the target raises an
appropriate signal on the bus, which moves the transaction into the data
phase (and often some wait states). In the data phase, the master and
target present data, and toggle PCI signals to strobe data values across
the bus (the PCI clock is really the strobe). Hopefully, you transfer lots
of data values on each PCI transaction. The master and target then release
the bus when they are done (this is a bit simplified). Any master is now
free to arbitrate a new transfer.
The CPU support chipset on a PCI is both a PCI master and target. It
responds to addresses, usually in the hosts’s physical memory range as a
target. It can also initiate master transactions.
A device card can also be a target, master, or both. A target device will
only respond to other masters. If the device is a master, it will typically
respond at some control register addresses as a target, to set parameters
for master transfers.
If your hardware has PCI mastering support, your driver has to program the
device to do the bus master transfers. Once the device is programmed with
it’s bus master parameters, the host cpu is finished with the transfer,
until the device interrupts with status.
A device can try for being the bus master at anytime, interleaving it’s
burst transfers with other masters (like the host cpu) based on
arbitration. Master devices typically round-robin bus ownership.
For the second case, how is the flow: a new irp triggers the PCI transfers,
and then waits for an interrupt to do whatever is needed after the data is
in place?
If the device is bus master capable, the flow would be: an irp passes into
the driver and assuming the device is not busy, programs that device bus
master registers with physical address and length. It then marks the irp as
pending and returns to the OS. Sometime later, the device decides it’s done
with the transfer, and signals an interrupt to the host cpu. The interrupt
service routine decides if it’s the correct device interrupting (PCI busses
share interrupt between multiple devices) and typically queues a DPC.
Eventually, the DPC will start running, and the driver fills in status from
the device and marks the IRP as complete.
For a target device, the irp will come in and typically synchronously be
processed by reading/writing to the device memory as a target (it’s just
some addresses). To be squeaky clean again, this irp actually may have to
be queued to make PnP happy. After the data is transferred, the irp is
completed. Target mode drivers are often simpler than bus master drivers.
I plan to use a PCI macro cell from xilinx. Meantime I have just seen the
marketing blabla, but they also promise a 130mb/s max throughput
PCI marketing literature almost ALWAYS says that. Sometimes they are
telling the truth, sometimes not. It depends on a lot of factors.
What I plan to do is something like this:
- the fpga writes data to memory
- near the middle of the buffer it interrupts, notifying data is available
- device driver gets relevant info from device registers and moves the data
(by whatever mean)
- at move completion, driver set relevant info in device registers.
- during the whole transaction, fpga keeps writing, unless it overrides
data not moved yet.
And how does the driver EXACTLY synchronize with the byte count? It seems like:
- the fpga will be writing to memory, updating a count register on every byte
- at 50% buffer full, it requests an interrupt
- somewhere between a few microseconds and a bunch of milliseconds later,
the cpu will respond to the interrupt and queue the dpc
- sometime later, the dpc will start running, and assuming target access
mode, will move data from addresses 0 to 50% of the buffer to a buffer
hopefully passed down by an application
- the fpga will keep writing to the buffer, and incrementing the count
- the cpu will finish it’s memory move
- at this point there is new data from the buffer 50% mark to someplace higher
- how does the driver now inform the fpga write logic that 0-50% of the
buffer is free, and the stuff it wrote since the interrupt request is still
there?
- the driver returns the buffer full of data to the application
You could set up the target interface to look like a fifo, with a status
bit to tell if ANY data is in the fifo. The driver can then poll the status
bit, and read a byte if it’s available. Note this will cause small PCI
bursts (like 1 byte) and cause your transfer rate to be a small number of
MBytes/sec.
You could also implement a fifo as a buffer and read/write pointers. The
driver keeps unloading the buffer, moving up the read pointer. The fpga
keeps writing at the write pointer, unless it’s the same as the read
pointer, in which case you have buffer overflow (which you might want to
signal in another bit). It’s usually appropriate from drivers to know when
data is lost. This strategy would allow you to do much larger burst
transfers, as you don’t have to poll the status on every byte. You have a
few strategies on knowing when to unload the fifo. One is you just have a
timer fire in the driver, and you look to see of there is any new data. If
the buffer is large, and you were not expecting interrupts that often
anyway, this is a fine way to do things. It also forces the hardware
designer to think about what happens under slow interrupt latency. Hardware
designers sometimes expect maximum latency to be less than it really is,
and you have a broken hardware design than can’t be made to work reliable.
This also simplifies the driver and probably the hardware, as no interrupts
are needed. If minimizing the latency from when the data is captured to
when an application can see it is important, then it’s not a good design,
as you might have to wait for the polling delay to get 1 byte. Think in
terms of polling delay’s of 10’s of milliseconds, which if your producing
100 MByes/sec of new data means you buffers had better be multiple MBytes
in size (rather larger than on chip fpga memory).
As you specific app may not be for general public consumption, you could
also use a strategy of just constantly polling, completing the IRP if there
is no data. Polling intervals may be rather less than many milliseconds.
Your still not guaranteed a maximum latency, so big buffers may still be
required to guarantee correct operation.
If you data rate is high, and your available buffer space is small, you
don’t have much choice but to use bus mastering support. In theory, PCI bus
masters can have guaranteed bus access latency. So if your buffer is 1000
bytes, and you generating 100 MBytes/sec, you will need access to the bus
real fast and often (perhaps imposibly often). Also keep in mind the
difference between AVERAGE transfer rates across the PCI bus, and MINIMUM
rates. For example, if your generating data at 50 MBytes/sec, you buffers
have to be large enough to handle the longest latency you might encounter.
On the average, you might be able to transfer 100 MBytes/sec, but if your
buffer is only 1000 bytes, you absolutely MUST transfer 1000 bytes each
and every 20 microseconds. You will NOT be able to request an interrupt,
and be assured your bus mastering hardware get’s programmed in 20
microseconds. The result may be your hardware, which absolutely can
transfer an average of 100 MBytes/sec will loose data at only 50
MBytes/sec. Drawing a time line of activity can often help a lot.
An solution is to have BIG hardware buffers if your data rates are high.
There are sample PCI drivers for both masters and targets in the DDK.
I hopefully have pointed out some of the deep holes it’s easy to fall into
when designing PCI hardware. It’s better to understand the issues up front,
than having to discover them a few weeks before you want to be shipping a
product, and find it drops data. No amount of driver magic can fix many of
these problems. As a driver writer, I don’t look forward to delivering the
news that a hardware design is broken.
You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com