Hi Jan, All,
Thank you for your detailed explanation.
I will soon test your solution and update.
Best regards,
Z.V
-----Original Message-----
From: Jan Bottorff
Sent: Wednesday, October 14, 2015 1:55 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] Huge DMA size from FPGA to PC’s RAM
If you set the large page flag on VirtualAlloc, you get memory resident 2
Mbyte pages. To the driver everything looks the same, except when you feed
the MDLs to the DMAAdapter, the physical address scatter gather list is
still for 4K chunks, BUT you will find contiguous 4K chunks that make up the
2M pages. It’s easy to write code to coalesce these into 2M fragments for
programming into HW. If your hardware is ok with bigger fragments, and your
app is ok with memory resident memory, it can dramatically reduce the number
of descriptors the HW needs to process. TLB thrashing goes down too.
MDLs always describe 4K pages, but if an app uses 2 M large pages, you get
clusters of physically contiguous pages, like clusters of 512 to create 2M
of contiguous space needed to map a large page. Drivers that expect
everything to be 4K disjoint pages work just fine with large pages, and
drivers that are large page aware can coalesce the fragments when doing DMA.
A driver I suppose could fail a request if it didn’t come from large pages,
but making it still run, perhaps slower, is likely a better idea. Think of
large pages as an optimization opportunity just before doing the DMA, and
tell the application developer they might be happier with the performance if
they set that flag in VirtualAlloc.
When you talk about a custom FPGA and 3+ Gbyte/sec data rates, you’re likely
talking about a less mainstream system.
Jan
On 10/14/15, 12:10 AM, “xxxxx@lists.osr.com on behalf of
xxxxx@gmail.com” xxxxx@gmail.com> wrote:
>Hi Jan, All,
>
>You asked: “Is it a hardware limitation that each descriptor can only
>process 4K …”
>
>It is not a hardware limitation. But I suspect that the descriptor list
>kernel will build will contain many 4K descriptors.
>
>The application will allocate the buffer using VirtualAlloc. Is there a
>better win32 allocation routine that can cause small descriptors list ?
>
>Tim, the FPGA has 8 GEN3 lanes = 8 * 8Gb/Sec * ~75% = 48Gb / sec
>
>Best regards,
>Z.V
>
>—
>NTDEV is sponsored by OSR
>
>Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
>OSR is HIRING!! See http://www.osr.com/careers
>
>For our schedule of WDF, WDM, debugging and other seminars visit:
>http://www.osr.com/seminars
>
>To unsubscribe, visit the List Server section of OSR Online at
>http://www.osronline.com/page.cfm?name=ListServer
—
NTDEV is sponsored by OSR
Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
OSR is HIRING!! See http://www.osr.com/careers
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer