A long list of random ideas that might help…
RtlCopyMemory is inappropriate for accessing mapped device memory, you
should use one of the transfer macros like READ_REGISTER_BUFFER_ULONG64.
You might want to assure your device BAR allows prefetch read access.
Uncached access may force full bus cycles for each 32/64 bits instead of
initiating a burst transfer. You might also use write combining attributes
on the memory mapping for your writes, this usually made writes to video
cards much faster. Make sure your transfers are aligned on a good value,
like a cache line.
Some devices have a very long latency, but once a burst starts, can move a
good size block of data. A single PCI-e 1.1 lane has a theoretical bandwidth
of about 250 Mbytes/sec (per direction). To achieve the 100 Mbytes/sec you
asked about would require 40% of the theoretical bandwidth, which my gut
feeling (which may be totally wrong) says might be difficult for a target
mode device. The fact that your getting 18/3 Mbytes/sec (write/read) makes
me think your device probably has significant latency, so getting the burst
size up will be critical. Those numbers seem so low I wonder if there is
perhaps error retry happening, although I don’t have any experience with x1
PCI-e devices.
There have been MANY threads here on ntdev over the years on how to do fast
PCI target mode access. The answers usually are: 1) it depends a LOT on your
motherboard chipset, 2) bus master transfers work a LOT better.
The best way to debug what’s really happening will probably require a PCI-e
bus analyzer. I remember the first time I used a PCI bus analyzer to
optimize a driver, it was VERY enlightening (or depressing might be more
correct).
SSE instructions are also capable of 128-bit wide transfers, which might
cause bursting when integer register reads/writes don’t.
I’d also recommend you read a book on PCI-e or talk to a hardware engineer
and figure out what burst size you will need on a x1 interface to achieve
the bandwidth you need.
Another possibility would be to get some other bus master to use you card as
a target, which can also cause large burst transfers. Potential bus masters
include disk controllers, video controllers, and special memory copy
hardware like the Intel IOAT on some motherboards.
Modern Intel (and assume AMD) processors have a number of cache
prefetch/manupulation instructions that might alter the burst behaivor.
Jan
-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@hotmail.com
Sent: Friday, May 30, 2008 9:33 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] How to copy memory more quickly?
Dear ALL:
I will develop a driver for a PCI-Ex1 device. I will
copy the data from driver’s buffer to the memory on the PCI-E
device. The memory of the device is mapped to the windows.
I use the function RtlCopyMemory to do the copy action.
However, the speed is not fast.
I measure that the transfer rate is about writing data to the
device: 18 MB/sec reading data from the device:3 MB/sec
I use the following equipment.
Intel Core2 Duo E6750 2.66G
DDR2-800 1G
Intel P35
Is there any way to make the copy faster? Is it possible to
reach 100 MB/sec? Thanks!
NTDEV is sponsored by OSR
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online
at http://www.osronline.com/page.cfm?name=ListServer