PCI DMA reads are too slow

Fellow Device Driver Writers,

I have seen some good discussion about DMA on this
list. I have two issues I am hoping to get some advice on.

Background
***********
I am running Windows 2000. I have an INTEL IQ80303 evaluation board.
The 80303 is the successor to the INTEL 960 chip. This board is plugged
in to the PCI bus on an INTEL D815EEA motherboard. This motherboard
has an INTEL 815 chip set. My application, digital fluoroscopy,
requires me to transfer a 2MB image between my 80303 board to the pc
host system memory once every 33 milliseconds.

My First Question
*****************
When I set up the 80303 as a busmaster and write data to
host system memory I am able to transfer a 2MB image to
host memory in about 17 milliseconds. This is good!.

When I set the 80303 up as busmaster and do a read from
system memory it takes 38 milliseconds to transfer 2MB.
This is bad!!!

On reads, my PCI bus analyzer shows 24 clock cycle (96 bytes)
bursts followed by over 30 clock cycles with TRDY high. In this
test the 815 motherboard is the target so it is the one
setting TRDY.

Does anyone out there have experience doing pci busmaster
writes and reads using the 815 chip set? If so, am I out
of luck or is there something I can do to get faster reads?

The 810 chip set is as lousy as the 815
On other chip sets we have had better results.
The 820 does well.
The 840 is pretty good.
The INTEL SERVER STL2 motherboard we tried was wonderful.
We want to use the 815 because it is cheaper.

My Second Question
******************
One solution would be for my 80303 board to send a message
to a W2K driver and the driver pushes (writes) the data
to my 80303 card. I will say up front that I don’t really
like this solution. However, even if we do not do this, I still
have call to push data to another card and I would like to do
it quicker than 69 milliseconds.

I coded up a 2MB data push to my card. I am using NuMega DriverWorks.
The KMemoryRange::outd command ends up calling
WRITE_REGISTER_BUFFER_ULONG.
The MapIoSPace call in DriverWorks sets the mapped memory to noncached.
It takes 69 milliseconds to transfer 2MB of data. When I look at the
transfer
on my PCI bus analyzer I see lots of single DWORD writes
and some 3 to 8 DWORD bursts. This is really bad.
Since the mapped memory was noncached I tried an RtlCopyMemory.
Performance was no better.

My hardware guy insists there must be a way to set up a host
to pci card dma transfer with the motherboard as the master.
I am not sure if there is a better way.

Is there a way to set up a motherboard to pci card busmastering
dma transfer with a motherboard resource acting as the master?

Thanks in advance

Chuck Rush
xxxxx@infimed.com


You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

>When I set the 80303 up as busmaster and do a read from

system memory it takes 38 milliseconds to transfer 2MB.
This is bad!!!

You might try plugging in an AGP video card, and see if the behavior
changes. The 815 shares main memory between the video frame buffer and
everything else, so assume it must reads bursts for data for the video
RAMDAC. As the 815 at 133 Mhz uses SDRAM, maximum bandwidth must be about 1
GByte/sec (realistically less). To constantly refresh a megapixel display
at 32-bit deep pixels at 75 Hz must consume about 300 Mbyte/sec of memory
bandwidth. Add in some bandwidth for 3-D texture and z-buffering, and I
could easily believe you might get some read latency. You could also play
with the screen resolution and depth, and see if it has an effect on your
device bandwidth.

  • Jan

You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

Jan,
Thanks for the response. I already tried this. In addition I
diasbled audio, LAN, end everything else hooked to the I/O hub.
Made no difference.
Chuck

On 02/13/01, “Jan Bottorff ” wrote:
> >When I set the 80303 up as busmaster and do a read from
> >system memory it takes 38 milliseconds to transfer 2MB.
> >This is bad!!!
>
> You might try plugging in an AGP video card, and see if the behavior
> changes. The 815 shares main memory between the video frame buffer and
> everything else, so assume it must reads bursts for data for the video
> RAMDAC. As the 815 at 133 Mhz uses SDRAM, maximum bandwidth must be about 1
> GByte/sec (realistically less). To constantly refresh a megapixel display
> at 32-bit deep pixels at 75 Hz must consume about 300 Mbyte/sec of memory
> bandwidth. Add in some bandwidth for 3-D texture and z-buffering, and I
> could easily believe you might get some read latency. You could also play
> with the screen resolution and depth, and see if it has an effect on your
> device bandwidth.
>
> - Jan


You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com

Well… The graphics chip does much of the drawing, but that doesn’t go
through the PCI bus, but directly between video chip and video memory. What
goes through the bus is typically command streams. For example to draw one
3D triangle you need a command, three vertices (x, y, z, w, color, maybe two
texture coordinates) and possibly nothing else, independent of the size of
your triangle. If you have a strip or a fan, it’s one vertex per triangle.
Textures do go accross the PCI bus in some cards, but not all video boards
can draw textures from host memory, textures are buffered in video memory
and rendered without going through the PCI bus. The Z buffer is also
resident in video memory.

On the other hand, you may fully load the PCI bus, and maybe more, if you
turn off hardware acceleration and run some OpenGL app or game, it’s going
to be slow as molasses, but you’re going to be drawing everything from the
host, and your bus load will go up.

Hope this helps,

Alberto.

-----Original Message-----
From: Jan Bottorff [mailto:xxxxx@pmatrix.com]
Sent: Tuesday, February 13, 2001 5:43 PM
To: NT Developers Interest List
Subject: [ntdev] Re: PCI DMA reads are too slow

When I set the 80303 up as busmaster and do a read from
system memory it takes 38 milliseconds to transfer 2MB.
This is bad!!!

You might try plugging in an AGP video card, and see if the behavior
changes. The 815 shares main memory between the video frame buffer and
everything else, so assume it must reads bursts for data for the video
RAMDAC. As the 815 at 133 Mhz uses SDRAM, maximum bandwidth must be about 1
GByte/sec (realistically less). To constantly refresh a megapixel display
at 32-bit deep pixels at 75 Hz must consume about 300 Mbyte/sec of memory
bandwidth. Add in some bandwidth for 3-D texture and z-buffering, and I
could easily believe you might get some read latency. You could also play
with the screen resolution and depth, and see if it has an effect on your
device bandwidth.

  • Jan

You are currently subscribed to ntdev as: xxxxx@compuware.com
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com


You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com