Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Home NTDEV
Before Posting...
Please check out the Community Guidelines in the Announcements and Administration Category.

More Info on Driver Writing and Debugging


The free OSR Learning Library has more than 50 articles on a wide variety of topics about writing and debugging device drivers and Minifilters. From introductory level to advanced. All the articles have been recently reviewed and updated, and are written using the clear and definitive style you've come to expect from OSR over the years.


Check out The OSR Learning Library at: https://www.osr.com/osr-learning-library/


DMA - Exposing the commonbuffer to user sw

2»

Comments

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 8,027

    I would not be hurt if you were to remove the post

    Thanks for that. With your permission, I did ultimately decide that the code should be removed.

    Maybe I'm overly concerned, but especially for this particular topic which is such a commonly encountered one I really didn't want to have yet another solution posted that doesn't really address any of the complex issues inherent in mapping memory into a user's address space.

    Thanks for your support,

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • anton_bassovanton_bassov Member Posts: 5,169

    I will attempt the VirtualAlloc approach

    Well, judging from your latest post, you don't seem to be able to tell the difference between the physical, logical and virtual addresses.

    Therefore, what I would advise you to do is to take a step back, and to review the basics. I've got a weird feeling that, after having reviewed them, you may abandon the common buffer approach altogether, because you may realise that your target hardware is, in actuality, perfectly SG- capable.

    If you do, I can assure you that you are not going to be the first poster in this NG wto do it . I vaguely recall a thread where the OP was bullshitting us about his common buffer requirement for about a couple of weeks or so, but then, after our explanations, had eventually "discovered" that his target board was, in actuality, SG- capable......

    Anton Bassov

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 8,027

    There are very few modern devices (including FPGA IPs) that do not support S/G for moving user data, simply because physically fragmented user data buffers are “the way of the world” and IOMMUs are not universally available.

    When I DO see requirements for logically contiguous blocks of memory for DMA, requiring the use of Common Buffers, this is most frequently due to the device using a “continuous” DMA scheme in which the Common Buffer holds a series of descriptors, each of which contains the logical address and length of a user data buffer fragment (and thus giving you a S/G capability). These descriptors are often grouped into one or more rings.

    When I see bus master DMA devices, FPGAs or otherwise, that do not support S/G for user data transfers they are almost always the result of a home-grown design (in which the device interface requirements somewhat exceeded the designer’s capabilities... either in terms of available time, insight, or experience). Or, sometimes, the device is a prototype or PoC that is never intended for use in production.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • OneKneeToeOneKneeToe Member Posts: 42
    edited December 2019

    Hello @anton_bassov and @Peter_Viscarola_(OSR)

    Maybe this deserves a new thread; I don't see a DM option.

    Get that dirty shoe ready:

    The FPGA does support SG, however, the legacy external device the FPGA DMAs to does not. SW, by way of the external device's driver, allocates a buffer and in return gets a Page List suitable for DMA - for the sake of discussion, lets say the page size on the external device is 32KB. SW then passes that list onto the FPGA by programming a Page Table in the FPGA. The table has a max size of 64K entries, enough entries (pages) for 2GB. FPGA DMAs data, a page at a time, going down the Page Table and circling back at the end. There is hand-shaking going on with SW via a windows driver (interrupts), but that is the gist of it.

    For this exercise, "all I wanted to do" (hahaha) is move this buffer from the external device to system memory. I cannot change how the FPGA works; The data path out of this FPGA is via that page table.

    With the common buffer approach, since I am getting logical addresses that are guaranteed to be contiguous, it doesn't matter if physical page sizes are 4K, SW does the math using the base logical address to come up with a new list of logical addresses to align with 32KB pages.

    Going with the SG List approach, I suppose if the list of entries are contiguous and uniform, then it's no different from the common buffer approach.

    If, however, I go with VirtualAlloc, I have no guarantee that the logical addresses will be contiguous. So I would need to go with Large Pages (2MB). Then SW would need to do the math to create a new list, for the FPGA to use, of logical addresses, splitting each logical address (for a 2MB Page) into 64 logical addresses (for 32KB pages the FPGA thinks the destination has).

    So, how far into the abyss have I fallen? How far can you throw that dirty shoe? :smiley:

    Juan

  • Erik_DabrowskyErik_Dabrowsky Member - All Emails Posts: 25

    Here's an older article that talks about the two methods mentioned in this thread:

    https://www.osronline.com/article.cfm^article=39.htm

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 8,027
    edited December 2019

    That article is 20 years old... and refers to WDM.

    Before editing this post, I went in to say that the article was not useful. However... I see, looking at it further, that it was updated at some point, and it at least briefly describes the duplicated handle problem. Which is good, I guess, as far as it goes.

    Still, not what I would call a terrific reference in the 21st Century (having written both the original article and the update, I’m kinda in a unique position to be able to say this).

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 8,027

    The FPGA does support SG, however, the legacy external device the FPGA DMAs to does not.

    But, but, but, but, but.... the “legacy external device” shouldn’t care, should it? It’s the FPGA that moves the data from the host to “internal” memory that the “legacy internal device” accesses, right? So... you only care that the buffers accessed by your “legacy external device” are contiguous. You don’t care about the host buffers, right?

    Of course, you haven’t really given us an architectural description of what you’re trying to do... so we are left to guess.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • anton_bassovanton_bassov Member Posts: 5,169

    If, however, I go with VirtualAlloc, I have no guarantee that the logical addresses will be contiguous.
    So I would need to go with Large Pages (2MB). Then SW would need to do the math to create a new list, for the FPGA to use,
    of logical addresses, splitting each logical address (for a 2MB Page) into 64 logical addresses (for 32KB pages
    the FPGA thinks the destination has).

    Assuming that your FPGA really,really,really needs to do the transfers in 32K chunks that are physically contiguous this approach,indeed,makes a perfect sense In the end of the day, it is much easier to find a contiguous 2M buffer rather than 2G one,right.
    In fact, in order to improve your chances of success you can even try splitting your allocations in multiple 2M chunks,
    with each of them being backed up by a single large page, rather than allocating a single buffer.

    However, for the reasons that have been already mentioned by Peter, I seriously doubt that "contiguous 32K chunks is an absolute must" requirement is justified in your case.....

    Anton Bassov

  • OneKneeToeOneKneeToe Member Posts: 42
    edited December 2019

    There is a limit on the page table in the FPGA. If the pages are 4K, 2GB / 4K pages, will exceed the capacity of the page table in the FGPA.

    Also, the FPGA basically DMAs data continuously. The driver is not processing SW read request for data ( SW is not pulling data). The FPGA is pushing the data and SW needs to keep up, that's the gist anyway; there is a stop/start mechanism, and the FPGA is programmed to push data in blocks - block size is configurable.

    I don't know if that helps.

    When I read about DMA, it all seems to be from the point of view that my SW wants to do a write-to or read-from. So SW send a request to the Driver and the driver sets up a DMA Transaction to execute a write or read. But that is not what the SW is trying to do. SW needs to allocate a landing spot for the FPGA to write-to. Once the FPGA is told to go, it is the FPGA performing the DMA writes, not the SW. I suppose my windows driver doesn't need to be involved in the DMA process, other than providing logical addresses for the buffer SW allocated, the landing spot for the FPGA to write into.

    Sorry to run you guys around in circles. I have said that I am thick-headed - its takes me a while :neutral:

    Thanks @Peter_Viscarola_(OSR) and @anton_bassov (and @Erik_Dabrowsky for the link).

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 8,027

    Either I’m confused or you’re confused or... you know... we’re both confused. Which is always a possibility.

    Your FPGA is moving data from HOST memory to a block of memory on the board (what we would call LOCAL memory), that it shares with this “legacy” device, right?

    If the legacy device doesn’t understand S/G, then it’s the LOCAL memory that needs to be contiguous. The organization of the HOST memory (whether its contiguous or not) depends solely on the capabilities of the FPGA.

    When I read about DMA, it all seems to be from the point of view that my SW wants to do a write-to or read-from. So SW send a request to the Driver and the driver sets up a DMA Transaction to execute a write or read. But that is not what the SW is trying to do. SW needs to allocate a landing spot for the FPGA to write-to.

    Yes. This is the distinction between what Windows calls “Packet Based” DMA (where one read or write Request in the host results in one DMA transaction, and “Continuous” DMA where the host driver sets up a mapping once (more or less) and then the FPGA periodically does transfers on its own (again, more or less, depending on the details). There are also “Hybrid” DMA schemes that combine the two approaches.

    In Windows, we generally think of continuous DMA designs as being to/from common buffers... and packet based designs as being S/G because the data for each “packet” is coming directly from the user.

    If you’re sharing your continuous DMA buffer with the user, this is a perfect application for the scheme I outlined a long time ago, in which the user sends an IOCTL with the OutBuffer describing their data buffer, and your driver keeps this in progress until the thread app is done with the device. Now, don’t get me wrong: There’s still complexity in that approach, but the major advantage is that once you get it to demonstrably work... it works. There are no hidden issues lurking. Contrast this with the “allocate some memory and map it back to the user application” approach, where getting it to work is pretty straightforward (as you’ve seen) but you’re really not half way there at that point... there are still somewhat subtle problems lurking, the solutions to which can be hard to test.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Upcoming OSR Seminars
OSR has suspended in-person seminars due to the Covid-19 outbreak. But, don't miss your training! Attend via the internet instead!
Internals & Software Drivers 30 Nov 2020 LIVE ONLINE
Writing WDF Drivers 7 Dec 2020 LIVE ONLINE
Developing Minifilters Early 2021 LIVE ONLINE