DMA - Exposing the commonbuffer to user sw

Before we go to far down this rathole, can I ask a few questions. How big is the buffer you need to allocate for the device? How big is the typical update to the data? And how frequent are the updates?

I’ve been on both side of this, while I am not in favor of mapping kernel memory to user space, I have done it because a device needed it. I was very careful to be sure I really needed it because the device was being delivered to Microsoft Research. As I say I have done it, then the next client insisted they needed the same thing, when once the above questions were answered there was no justification.

@anton_bassov No, I disagree. You need this mmap on Linux, because you don’t have Direct I/O. If you think about it, mmap() for “device files” is really a kinda dumb, hacks, idea. It means whatever the dev wants it to mean… unlike in a file system, where it gives you a view into the single, global, common, cache. For devices, it’s just a way to make up for the lack of Direct I/O… to allow you to do stuff without recopying.

In terms of ad hominem attacks… pointing out your lack of actual recent experience on Windows is just something a questioner here should consider… it’s just a fact and not ad hominem at all.

Peter

Bear in mind that Anton hasn’t written any code on Windows for about ten years. He just hangs out here for the abuse.

That’s really funny remark but I’m curious. What changed in the last 10 years in the area which is discussed here? Seriously. Windows development isn’t my main focus for about 6 years now so I wonder what I have missed.

NUMA support? Maybe a little bit more than 10 years? Couldn’t NUMA compatibility also be an argument in favor of the standard way Direct I/O?

Marcel Ruedinger
datronicsoft

You need this mmap on Linux, because you don’t have Direct I/O. If you think about it, mmap() for “device files”
is really a kinda dumb, hacks, idea. It means whatever the dev wants it to mean… unlike in a file system,
where it gives you a view into the single, global, common, cache. For devices, it’s just a way to make up
for the lack of Direct I/O… to allow you to do stuff without recopying.

There are few observations to be made here

  1. I don’t mean Linux in particular when speaking about " the world of the OSes that support mmap() call". This call is closely related to
    “everything is a file” concept. There is the entire family of the OSes that are based upon this concept, and Linux happens to be just one of them. In fact, if you look at the whole thing from the architectural perspective, the particular implementation of mmap()
    that Linux provides does not really seem to be the most architecturally-sound one in existence. If you have any doubts about this part, you can compare it to, say, SunOS/Solaris memory model.

  2. The “really a kinda dumb, hacks, idea” of mmap() (i.e. of providing a uniform way of mapping memory that works exactly the same way for any kind of file in existence, be it a disk file or a device one) predates Linux 0.01 by more than two decades. Furthermore, it predates not only Linux but UNIX as well. This concept is rooted in MULTICS - this is where the very idea of user apps attaching/detaching memory segments that are backed up by a store to/from their address spaces comes from.

In actuality, mmap() as we know it is just a result of combining this concept with “everything is a file” one. As you may have guessed, this
“merger” had been pioneered by SunOS. In fact, the very term “segment drivers” seems to make a very unambiguous reference to MULTICS
origins (at least the conceptual ones) of memory mapping, don’t you think…

  1. The very concept of METHOD_DIRECT has absolutely nothing to do with sharing a buffer between an app and a driver
    (and least it was not originally meant to be used this way). You must be confusing it with METHOD_NEITHER that was, indeed,
    intended specifically for this purpose from the very beginning of “Windows NT world”. What actually happened here is that some clever bloke (I guess it was you, right) realised that, by pending METHOD_DIRECT-based IOCTL indefinitely one effectively gets a shared buffer
    while, at the same time, avoiding all the extra pain in the arse that METHOD_NEITHER implies. As a result, the concept of a “Big Honking IRP” was born

  2. In case if you just cannot imagine your life without above mentioned “Big Honking IRPs” (you seem to be particularly fond of them these days, aren’t you), I can assure you that you can implement them under Linux just in a click of fingers.

All you have to do is to mlock() your target buffer before passing its address to a driver. In response to your app’s request, your driver will validate the address of every page in the target range, get its ‘struct PAGE’ descriptor, reference it, and then map all the target pages to the virtually contiguous kernel range with vmap(). At this point you driver will be able to access the shared buffer in any context, and it is 100% safe. Certainly, taking into consideration the availability of mmap() this is, probably, not the most efficient and reasonable way of doing things under Linux, but this is already a different story.

In other words, your “argument” about “needing this mmap on Linux, because you don’t have Direct I/O” simply
does not stand a slightest chance…

In terms of ad hominem attacks… pointing out your lack of actual recent experience on Windows is just something
a questioner here should consider… it’s just a fact and not ad hominem at all.

Actually,I don’t really find the very fact of " lacking of actual recent experience on Windows" embarrassing in any possible way.
After all, I had mentioned this fact quite a few times in this NG.

The only thing that I am saying here is that you somehow found a way to use this fact as an “ad hominem” attack. In the end of the day,
sometimes this is not about WHAT you had actually said but more about HOW you had said it. Let’s look at your statement one more time

[begin quote]

Bear in mind that Anton hasn’t written any code on Windows for about ten years. He just hangs out here for the abuse.

[end quote]

I hope you agree that the above statement in its actual form is a very obvious example of “ad hominem” argument, and was meant to be used as the one…

Anton Bassov

What changed in the last 10 years in the area which is discussed here? Seriously. Windows development isn’t my main focus
for about 6 years now so I wonder what I have missed

Although there are no dramatic changes anywhere in sight, but there may be some relatively minor modifications/improvements that were unavailable in the earlier days of Windows. As a result, more options may be available in the more recent OS versions.

For example, with the advent of the kernel sockets and Windows filtering platform in Vista you can be 100% sure that all your TDI experience can go down the drain. Furthermore, the advent of NDIS 6+ LWFs made NDIS 5 IM filter drivers sort of obsolete as well.

Another example is KMDF. For example, if you check WDK samples you will see that practically all WDM samples have been removed from it. Therefore, if you advise someone to do things in WDM-like fashion you will be looked upon as a dinosaur who got brought back to life.

You should not forget about the progress in the field of the development tools/environment either. For example, you can be sure that you are not going to be either called a “STUPID IDIOT” or requested to "PUBLISH YOUR NAME…(etc) " if you say that you want to build drivers with VC. Even more, these days you are sort of encouraged to use C++ in your drivers, so that you are quite unlikely
to hear Linus-like anti-C++ diatribes from our hosts. In fact, I would not bee too surprised if they actually start promoting the use of C++.

In context of Peter’s “argument” the very obvious example is process creation callbacks. For example, the NotificationRoutin() (i.e the only option was available under the XP) does not allow you to block process creation right of the spot in context of a callback, but the NotificationRoutineEx() that turned up in the more recent versions offers this functionality. IIRC, I put a foot in my mouth once because of this particular improvement.

Anton Bassov

What changed in the last 10 years in the area which is discussed here? Seriously.

One very real and impactful change, aside from the bus drivers being totally rewritten to exhibit different practical behaviors, is the ever increasing presence of IOMMUs. There are other changes as well, including many, many, changes in power management, right? And that’s always impactful.

But, when it comes to recency of experience, it’s not a matter of “what’s changed” as much as it’s a matter of “how have I used these concepts lately… so I know/remember the true practicalities, the implementation ‘potholes’ as I like to call them, and not just some overall general concepts of how things might/should/did work.”

@anton_bassov You’re wrong, and your knowledge of Windows architecture is flawed. I’m not debating this with you.

Be aware that my patience with you is getting thin. You’re not providing helpful guidance for the questioners. And you’ve not amused me lately. You trolled this thread in your initial post and I, stupidly, rewarded you by taking your bait. If I had to guess, I would guess your time here grows short.

Peter

Peter,

You’re not providing helpful guidance for the questioners.

Well, judging from the OP’s post, he was well aware of the potential issues that may arise if you map the kernel memory to the user space, but was still going to proceed this way. Therefore, I just showed him the function that allowed him to reach his objective, and dropped
an extra hint (although, probably, not in a best way - more on it below) that this practice may be frowned upon in the Windows world…

You trolled this thread in your initial post

This part was, indeed, totally unnecessary on my behalf. I’m sorry for that…

it’s not a matter of “what’s changed” as much as it’s a matter of “how have I used these concepts lately… so I know/remember
the true practicalities, the implementation ‘potholes’ as I like to call them, and not just some overall general concepts
of how things might/should/did work.”

True, but my NTDEV participation (apart from the “exciting” trolling side,of course) allows me not only to stay in shape but even to learn something new, or at least to “learn what I have to learn and re-learn” due to the OS changes…

You’re wrong, and your knowledge of Windows architecture is flawed. I’m not debating this with you.

Trolling issues aside, could you please explain to me what I have said wrong from the technical standpoint. I am not either trolling or trying to prove anything to anyone - I just want to learn things for myself…

Anton Bassov

Hello Peter:

Thanks for bearing with me and for the good info:

I attempted the common buffer approach last night and ran into an issue with my User SW getting an access violation. I had done everything from creating the DmaEnabler to MmMapLockedPagesSpecifyCache in my EvtDeviceAdd; Storing the addresses in the device context. Then I created an IOCTL for my User SW to call and retrieve the Virtual User Address.

Searching the forum, I found a post, by you actually, pointing out that the MmMapLockedPagesSpecifyCache call needs to be done in the correct Context and that a good location for this would be EvtIoInCallerContext and not EvtIoDeviceControl (which is where I was going to move it to next, as a part of the IOCTL).
https://community.osr.com/discussion/279797

@“Peter_Viscarola_(OSR)”
…EvtIoDeviceControl (in fact, EvtIoXxxx) is called in an arbitrary process and thread context. You need to use the EvtIoInCallerContext callback…

So, I added an EvtIoInCallerContext and moved the MmMapLockedPagesSpecifyCache to the method.
Unfortunately, I am still getting the access violation. Code Snippets below ( to give further context ).
I will play around with this some more; My first guess is it may be how I set up the enabler or the common buffer.

EvtDeviceAdd

    WDF_DMA_ENABLER_CONFIG dmaEnablerConfig;
    WDF_DMA_ENABLER_CONFIG_INIT( &dmaEnablerConfig, WdfDmaProfilePacket64, 128 );

    dmaEnablerConfig.EvtDmaEnablerFill = NULL;
    dmaEnablerConfig.EvtDmaEnablerFlush = NULL;
    dmaEnablerConfig.EvtDmaEnablerDisable = NULL;
    dmaEnablerConfig.EvtDmaEnablerEnable = NULL;
    dmaEnablerConfig.EvtDmaEnablerSelfManagedIoStart = NULL;
    dmaEnablerConfig.EvtDmaEnablerSelfManagedIoStop = NULL;
    dmaEnablerConfig.AddressWidthOverride = 0;
    dmaEnablerConfig.WdmDmaVersionOverride = 3;
    dmaEnablerConfig.Flags = WDF_DMA_ENABLER_CONFIG_REQUIRE_SINGLE_TRANSFER;

    NTSTATUS status{ WdfDmaEnablerCreate( wdfDevice, &dmaEnablerConfig, WDF_NO_OBJECT_ATTRIBUTES, &( deviceContextP->myDmaEnabler ) ) };
    if( NT_SUCCESS( status ) )
    {
        WDF_COMMON_BUFFER_CONFIG CommonBufferConfig;
        WDF_COMMON_BUFFER_CONFIG_INIT( &CommonBufferConfig, FILE_128_BYTE_ALIGNMENT );

        status = WdfCommonBufferCreateWithConfig( deviceContextP->myDmaEnabler,
                                    deviceContextP->myCommonBufferByteSize,
                                    &CommonBufferConfig,
                                    WDF_NO_OBJECT_ATTRIBUTES,
                                    &( deviceContextP->myCommonBuffer ) );
        if( NT_SUCCESS( status ) )
        {
            deviceContextP->virtualKernelAddr = WdfCommonBufferGetAlignedVirtualAddress( deviceContextP->myCommonBuffer );
            deviceContextP->LogicalAddr = WdfCommonBufferGetAlignedLogicalAddress( deviceContextP->myCommonBuffer );

            RtlZeroMemory( deviceContextP->virtualKernelAddr, deviceContextP->myCommonBufferByteSize );
            deviceContextP->pMdl = IoAllocateMdl( deviceContextP->virtualKernelAddr,
                                    ( ULONG ) deviceContextP->myCommonBufferByteSize,
                                    FALSE,
                                    FALSE,
                                    NULL );
            if( NULL == deviceContextP->pMdl )
            {
                status = STATUS_INSUFFICIENT_RESOURCES;
                TraceEvents( TRACE_LEVEL_ERROR, DBG_INIT, "IoAllocateMdl() failed with status=[%!STATUS!]", status );
            }
            else
            {
                    MmBuildMdlForNonPagedPool( deviceContextP->mySystemMemoryChannelDataBuffer.pMdl );
            }
        }
        else
        {
            TraceEvents( TRACE_LEVEL_ERROR, DBG_INIT, "WdfCommonBufferCreateWithConfig() failed with status=[%!STATUS!]", status );
        }
    }
    else
    {
        TraceEvents( TRACE_LEVEL_ERROR, DBG_INIT, "WdfDmaEnablerCreate() failed with status=[%!STATUS!]", status );
    }

EvtIoInCallerContext

if( ( NULL != deviceContextP->myCommonBuffer ) && 
    ( NULL == deviceContextP->virtualUserAddr ) )
{
    __try
    {
        deviceContextP->virtualUserAddr = MmMapLockedPagesSpecifyCache( deviceContextP->pMdl,
                                    UserMode,
                                    MmCached,
                                    NULL,
                                    FALSE,
                                    HighPagePriority );
        if( NULL == deviceContextP->virtualUserAddr )
        {
            TraceEvents( TRACE_LEVEL_ERROR, DBG_IOCTLS, "MmMapLockedPagesSpecifyCache() failed.");
        }
        else
        {
            TraceEvents( TRACE_LEVEL_INFORMATION, DBG_IOCTLS, "VirtualUserAddr=[0x%p]", deviceContextP->virtualUserAddr );
        }
    }
    __except( EXCEPTION_EXECUTE_HANDLER )
    {
        TraceEvents( TRACE_LEVEL_ERROR, DBG_IOCTLS, "MmMapLockedPagesSpecifyCache() threw an exception!]" );
    }
}
WdfDeviceEnqueueRequest( device, request );

EvtIoDeviceControl - IOCTL_GET_DMA_USER_ADDRESS

        if( NULL == deviceContextP->myCommonBuffer )
        {
            TraceEvents( TRACE_LEVEL_WARNING, DBG_IOCTLS, "NO COMMON BUFFER!!" );
        }
        else if( NULL == deviceContextP->virtualUserAddr )
        {
            TraceEvents( TRACE_LEVEL_WARNING, DBG_IOCTLS, "NO VIRTUAL ADDRESS" );
        }
        else
        {
            PDMA_USER_ADDRESS pDmaUserAddress{ nullptr };
            status = WdfRequestRetrieveOutputBuffer( request, outputBufferLength, ( PVOID* ) &pDmaUserAddress, NULL );
            if( NT_SUCCESS( status ) )
            {
                pDmaUserAddress->virtualAddr = reinterpret_cast<UINT64>( deviceContextP->virtualUserAddr ); 
                bytesTransferred = sizeof( DMA_USER_ADDRESS );
            }
            else
            {
                TraceEvents( TRACE_LEVEL_ERROR, DBG_IOCTLS, "WdfRequestRetrieveOutputBuffer() failed with status=[%!STATUS!]", status );
            }
        }
        ...
        ...
       WdfRequestCompleteWithInformation( request, status, bytesTransferred );

As always, thank you!

Juan

Hello Don:

I didn’t see your post before my reply to Peter.

@Don_Burn
How big is the buffer you need to allocate for the device? How big is the typical update to the data? And how frequent are the updates?
At the moment, I am looking at allocating 500MB, but may want to up that to 1GB or even 2GB later. The data size can vary, but at the moment I have the FPGA configured to push 64MB packets, 2 packets a second. But the size and rate can also change.

@Don_Burn
… while I am not in favor of mapping kernel memory to user space…
Yes, this is at the back of my mind as I am playing around with this. Even if I get this to work, what risks have I introduced, can I plug the holes, if not, how big are the risks and do they outweigh the rewards… etc.

It is clear from other threads and even here, that this approach is not ideal. Unfortunately, I need the guaranteed contiguous logical addresses provided by common buffer.

@Don_Burn
I have done it because a device needed it…
Unfortunately, I cannot change the way the FPGA is currently designed. The current design is having the FPGA DMA data directly to another device. SW then uses other means of processing that data on that device.

I am changing things a little, so that SW can process the data on system. In my mind I simply want to move that destination memory buffer from the other device to system memory (move the landing spot). To the FPGA the change is transparent as it only sees logical addresses.

Thanks for your interest and input Don.

Juan

One very real and impactful change, aside from the bus drivers being totally rewritten to exhibit different practical behaviors, is the ever increasing presence of IOMMUs. There are other changes as well, including many, many, changes in power management, right? And that’s always impactful.

Yes, but I didn’t mean general OS changes but changes related to things discussed in this thread (sharing memory between user and kernel mode and IOCTLs). To me it seems as things work the same way as before when I used them.

In context of Peter’s “argument” the very obvious example is process creation callbacks. For example, the NotificationRoutin() (i.e the only option was available under the XP)

If I count correctly, 10 years before we already had Win7 so XP is not a question. Also, there already was KMDF, WDM drivers were outdated and so on.

(Well, there still was legacy usbser driver not following even XP WDM rules and WDF version wasn’t available before Win10. Which fixed old bugs and introduced new ones. Real pain I had to handle recently…)

Michal

@“Peter_Viscarola_(OSR)”

As a test, I added trace code into the EvtIoInCallerContext to print the data at Virtual User Address. The print worked when EvtIoInCallerContext ran and performed MmMapLockedPagesSpecifyCache - the data being 0 as expected, due to RtlZeroMemory & FPGA not yet being configured.

However, subsequent calls to EvtIoInCallerContext shows the print throwing an exception - I suspect access violation, same as User SW.

EvtIoInCallerContext (added printing)

    // Check and if no Virtual User Address, call MmMapLockedPagesSpecifyCache (See previous post for details).
   ...
    if( NULL != deviceContextP->virtualUserAddr )
    {
        UINT64 *pBuffer = ( UINT64* ) deviceContextP->virtualUserAddr;
        for( auto x{ 0 }; x < 10; ++x )
        {
            __try
            {
                TraceEvents( TRACE_LEVEL_INFORMATION, DBG_IOCTLS
                    , "Addr[%u] @[0x%p] = [%llu]"
                    , x
                    , pBuffer
                    , *( pBuffer + x ) );
            }
            __except( EXCEPTION_EXECUTE_HANDLER )
            {
                TraceEvents( TRACE_LEVEL_ERROR, DBG_IOCTLS, "EXCEPTION in TraceEvent" );
            }
        }
    }

Dumb question: Is the second EvtIoInCallerContext from the exact same process as the first call? Remember that your virtualUserAddr is only valid for that one process. If you had a quick test app and then started another test app, that address is no longer valid.

And that points out a bug in your code, In your ioctl handler, you’re only doing the mapping if virtualUserAddr is null. That only works if the calling process never ends, unless you are zeroing out that field when the app exits. Personally, I’d just eliminate that check and ALWAYS do the mapping, even if the field already has a value. Windows won’t create a new mapping if one already exists.

@“Peter_Viscarola_(OSR)”
@Tim_Roberts
(Tim: I was in the middle of writing when you posted - Not a dumb question at all :smiley: )

I found my answer, after some lunchtime googling:

https://osr.com/blog/2014/04/15/evtioincallercontext-callback-called-even-io-operations-dont-queue/

… The other point of view was getting what are essentially unexpected (and, to your driver, unsupported) Requests in EvtIoInCallerContext was an annoyance…

As Tim pointed out above, the context of when EvtIoInCallerContext was called and MmMapLockedPagesSpecifyCache executed, was not the same context as my User SW.

To ensure MmMapLockedPagesSpecifyCache was executed for my IOCTL_GET_DMA_USER_ADDRESS request I added the below at the beginning of EvtIoInCallerContext:

EvtIoInCallerContext

WDF_REQUEST_PARAMETERS requestParams;
WDF_REQUEST_PARAMETERS_INIT( &requestParams );
WdfRequestGetParameters( request, &requestParams );

if( IOCTL_GET_DMA_SYSTEM_MEMORY != requestParams.Parameters.DeviceIoControl.IoControlCode )
{
    WdfDeviceEnqueueRequest( device, request );
    return;
}
...
... (see previous posts for more details)
...
WdfDeviceEnqueueRequest( device, request );

I re-ran my driver and the trace prints are printing, with data after the FPGA was configured to run (not sure on quality of the data, but that’s a future topic).

Thank you Tim & Peter

Juan

On to the next “battle” and maybe one of the “hazard” areas of this approach - unmapping.
https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-mmunmaplockedpages

Note that if the call to MmMapLockedPages or MmMapLockedPagesSpecifyCache specified user mode, the caller must be in the context of the original process before calling MmUnmapLockedPages. This is because the unmapping operation occurs in the context of the calling process, and, if the context is incorrect, the unmapping operation could delete the address range of a random process.

I now need to add an IOCTL to support unmapping. Without this, if I restart my User SW, the Virtual User Address is not valid for this new instance (as @Tim_Roberts pointed out previously).

And the hazard - What happens if my User SW crashes or closes without unmapping? Would I even be able to perform another mapping when a previous one already exists?

For now, I will march ahead and add an IOCTL to support unmapping. (Though, I will give VirtualAlloc with Large Pages another try later on.)

Juan

At the moment, I am looking at allocating 500MB, but may want to up that to 1GB or even 2GB later. The data size can vary,
but at the moment I have the FPGA configured to push 64MB packets, 2 packets a second. But the size and rate can also change.

I need the guaranteed contiguous logical addresses provided by common buffer.

Before you proceed with your “conquest” any further, I would rather recommend you to take the following points into consideration

In case if the target system does not support IOMMU, the ‘(logical_address==physical_address)’ statement is always going to be
evaluated to TRUE. Don’t you see any potential problem with finding a physically contiguous buffer in a GB range???

Certainly, as long as the target machine is equipped with IOMMU, the system is in a position to present a physically non-contiguous buffer as
a logically contiguous one to your device. However, you should bear in mind that not every machine in existence is going to support VT-d or AMD-VI.

For example, if you check the link below you will see that the assumption of IOMMU presence on the target machine is a way too bold

https://en.wikipedia.org/wiki/List_of_IOMMU-supporting_hardware

As you can see it yourself, the list is not THAT long.

Certainly, it is up to you to make a decision, but I think you may find it frustrating to find out that the design that you have
spent so much time and effort on is, for all practical purposes, simply infeasible…

Anton Bassov

I now need to add an IOCTL to support unmapping. Without this, if I restart my User SW, the Virtual User Address is not valid for this new instance (as @Tim_Roberts pointed out previously).

And the hazard - What happens if my User SW crashes or closes without unmapping?

Right. This why you have to handle EvtFileCleanup / EvtFileClose events, so you can clean up your dirty work when the app crashes or closes without cleanly shutting down. You CANNOT rely upon the application to clean up for you. You need to assume that all application writers are bozos, and malicious bozos at that.

Would I even be able to perform another mapping when a previous one already exists?

Not as the driver is currently written, but that’s a driver problem, not an architectural problem. The operating system doesn’t care how many times you map a piece of memory, but your driver certainly does.

Juan, just to be clear: You’re getting further and further down the road into a design that is going to wind-up either having serious edge-condition issues with security implications… OR one that’s going to need to include some clever code to take into account and handle these edge conditions. It’s a lot to ask to try to get this right one forum post at a time, without a good background in Windows kernel mode software.

So… think about it. Do you really want to be doing this? You’re venturing into an area that I advise the students in my Advanced WDF seminar to avoid.

Having said that, Mr, Roberts is right on target: You need to do the Unmap operation in your EvtFileCleanup Event Processing Callback. This is called in the context of the process that called CloseHandle. You don’t need or want a separate IOCTL for this. If you do it right, it’ll handle the unmap “automatically” even during abnormal thread termination.

Peter

Hello @anton_bassov

Yes, being able to secure GB(s) of physically contiguous memory is a concern, and one reason for having the driver perform the allocation as I assume (maybe wrongly) that the drive will have a better chance at getting that memory during start-up than User SW.

The alternative is using VirtualAlloc Large Pages in User SW. The Large Page size would be a multiple of the FPGA page size. This way, even if the logical pages are not contiguous, one+ FPGA pages can be DMAd into one System Large Page.

I know I am not completely in the clear with Large Pages as the system will still need to find enough contiguous memory for all the Large Pages needed to make up my desired buffer size. Since this is all happening during system start-up, and with 32GB of physical RAM, I would hope this is not a problem.

FYI: when I attempted to allocate 1GB using the CommonBuffer approach, the allocation failed. I was successful in allocating a 750MB CommonBuffer, however.

Thanks, Anton.
Juan

Hello @“Peter_Viscarola_(OSR)” and @Tim_Roberts

Thank you both again for your help and patience.

Regarding the unmapping, I went ahead with the IOCTL approach, but not because I disagreed with anything you two said, but because I decided to end my experiment with the CommonBuffer approach.

I had success allocating 500MB and having my FPGA DMA the data both system memory and the second external device, ping-ponging back and forth between External and System Memory. I ended up having SW perform a copy to the external device, to allow that device process the data.

The results were seamless and I could not tell the difference between processed data comging directly from FPGA or via the System Memory.

  • A little change in plans, but it was enough to prove using System Memory as a landing spot it doable.
  • Not to mention the warning you both gave about this approach.
  • BTW: performing a print-screen caused a BSOD - a corner case :wink:

My next step is to attempt allocating the buffer in User SW via VirtualAlloc and using Large Pages. Then feed that address to the Driver. Still not the conventional appraoch that Peter pointed me to, but I suspect better than the CommonBuffer approach I am using now.

  • I suspect I would still need to use EvtFileCleanup/EvtFileClose if my SW were to end abruptly.

Thank you again! I am having a good time and learning a little bit as I go.
Juan