Thanks Tim. I am interested in copying data TO device using PC CPU.
Regarding SSE2,I have already implemented SSE2 and I see almost double
throughput while writing “TO” device with SSE2. Do you think it is
happening because of something else?
When you say bus mastering- do you mean device dma?
Thanks & Regards,
Abhishek Joshi
-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Tim Roberts
Sent: Wednesday, October 14, 2009 7:45 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] SSE2 v/s SSE3 - Performance
You wrote:
I am trying to improve the data copy performance of Intel processor to
a
memory mapped area of a PCIe device.
What are you getting, and what do you expect?
I have following things in my mind.
- Make the memory mapped region “Cached”.
This will only help for copying data FROM the device. For copying TO
the device, you can try “write combining”, but you’ll need to do
experiments to make sure it works for you.
- Use SSE2 instructions
This is silly. The basic REP MOVSD runs WAAAAY faster than the
PCIExpress bus. CPU instructions are not your bottleneck.
If you want maximum throughput, you must use bus mastering. There is no
alternative.
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.
NTDEV is sponsored by OSR
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer