DMA - Exposing the commonbuffer to user sw

OneKneeToe · December 11, 2019, 11:21pm

Hello OSR:

Go find your dirties pair shoes, you may want to throw at me.

I have had some success with my other toy projects and learned quite a bit. A lot of the DMA related details were hidden, so I haven’t had to deal with it.

My SW would allocate a buffer of data on an external device.
That external device (its driver) would provide my SW with a Logical-Address Page List (A list of address, one address for each page of memory).
My SW would program a LUT in the FPGA with this list of addresses.
The FPGA would then use these addresses for DMA.

So, continuing on, I want to do the data processing on the Windows system instead of this external device - Still using the pre-existing FPGA to DMA the data.

In my KMDF driver, I create a common buffer.
Using the logical address I get from WdfCommonBufferGetAlignedLogicalAddress, I create a list of logical addresses, using the same page-size as that of the external device.
– I figured this was safe since the CommonBuffer allocation is contiguous in the logical addresses space.
– Note: Physical Page addresses doesn’t matter since my User SW will be using the virtual address while the FPGA DMA will be using the logical addresses.
I get the Kernel Virtual Address by calling WdfCommonBufferGetAlignedVirtualAddress.

My question (drum roll) How can I convert the Kernel virtual address to a User virtual address so that my User SW can gain access to the data?

Should I be looking at allocating the buffer from user SW (VirtualAlloc or AllocateUserPhysicalPages)? How then to get the logical address for DMA?
Should the driver be copying data from the common buffer to some user-space buffer? This one seems like a waste?

Thank again everyone for your patience and continuing help.

Regards,
Juan

anton_bassov · December 12, 2019, 3:59am

Go find your dirties pair shoes, you may want to throw at me.
…

How can I convert the Kernel virtual address to a User virtual address so that my User SW can gain access to the data?

Well, you seem to realise that mapping the kernel/device memory to the userland (which happens to an absolutely standard operation in the world of the OSes that support mmap() call) is considered a “dodgy practice” in the Windows one, right. However, for this or that reason,
you still want to proceed in this direction.

Therefore, assuming that you are well aware of all the potential “caveats” (like, for example, what happens if your app’s address space gets modified by some other app by means of WriteProcessMemory() call , or what happens if your app just terminates abnormally) and are willing to deal with them, you may want to check the following link

https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-mmmaplockedpagesspecifycache

[begin ironical mode]

PS. You seem to be anticipating a “funny” reaction from the usual suspects, don’t you. Therefore, let’s see how it all goes - after all, we may be lucky enough to see some " exciting" inquiries concerning your product and company name in the end of the day. In fact, if we are really lucky we may, probably, even see the the above mentioned “inquiries” in capitals. In other words, I hold my breath in anticipation…

[end ironical mode]

Anton Bassov

OneKneeToe · December 12, 2019, 6:15pm

Hello Anton:

Thanks for the response!

My Approach:

In my naive mind, what I am trying to accomplish is allocate a memory location for my FPGA to DMA data into and for my User SW to be able to access directly. My driver doesn’t need to look at the data, it just need to expose the buffer for DMA use. I would like to avoid any need to copy data from a “Kernel buffer” to a “User Buffer”.

For my first attempt, I looked at allocating large page memory using VirtualAlloc in my user sw. But then I wasn’t able to find a method of making that memory available for DMA.
For my second attempt, I looked at CommonBuffer, but now I am looking for a way to expose the buffer to user sw.

I suppose my question should have been, “What is the ‘proper’ way of doing the above?”

How would you approach this, @anton_bassov ?

I did look at your link and will see if that approach works for the short term; yeah, I presume exposing a pool of memory allocated by a Kernel Driver to user sw is not “best practice” - well, unless the API (memory pool allocation) was for that specific purpose.

Light Heartedness :# :

In reading through the posts on the forum, the “usual suspect” have been contributing for quite some time. I am sure they have seen it all. Since I have no clue what I am doing in the driver world, I know I am asking pretty basic ( dumb?? ) questions, with approaches that are probably a sacrilege. I get that, so I sometimes try to lighten things up a little.

I am looking for a way to attend a future seminar. That should be a huge help.

For what it’s worth, I do try to give back by posting how I ended up resolving my question; with code snippets, if applicable.

Thanks again, Anton.
Juan

Peter_Viscarola_OSR · December 12, 2019, 8:04pm

If you want to avoid copying the data, why not just send it using Direct I/O (or an IOCTL using METHOD_IN_DIRECT)? And return it to users using Direct I/O (or an IOCTL using METHOD_OUT_DIRECT).

The fact that Windows has support for mapping user data buffers built-in, is the reason that we don’t use what Anton calls “an absolutely standard operation in the world of the OSes that support mmap() call” – We don’t typically need it.

About 90% of the time when people want to share buffers between user-mode and kernel-mode, they want to do this because they’re thinking Linux and coding on Windows.

How would you approach this, @anton_bassov ?

Bear in mind that Anton hasn’t written any code on Windows for about ten years. He just hangs out here for the abuse.

Peter

OneKneeToe · December 12, 2019, 10:15pm

Hello Peter:

Thanks again for making yourself available.

If I understood the online documentation, I believe using the Direct I/O or IOCTL would be a significant change of implementation for me.

Using the Direct I/O or IOCTL would require my software to make repeated calls to the driver. Each time, the driver would need to perform the mapping and configure the FPGA, which would be a command for it to flush its Page LUT, re-program the LUT with new logical addresses. and a few other things. Also, the logic within the FPGA may make adapting to this method a little complicated.

This is why I wanted to keep with the pre-existing approach where User SW would allocate a large buffer, this time of system memory, and program the FPGA once. Then the FPGA (via DMA) and User SW would both have direct access to that memory region - the driver itself doesn’t need to touch it. The FPGA would notify User SW via the driver by using interrupts and metadata. The metadata would tell User SW where in that “shared” large buffer to look at and how much memory it can use ( start address & byte size). Since Logical and Virtual addresses are all contiguous, it is OK if physical memory is not.

Repeated interrupts and file-reads to get the metadata is relatively light weight compared to the repeated direct I/O approach - if I understand things correctly.

How far off am I?

Thank you Peter!

Juan

Peter_Viscarola_OSR · December 12, 2019, 10:33pm

This is why I wanted to keep with the pre-existing approach where User SW would allocate a large buffer, this time of system memory, and program the FPGA once

OK… that’s fine. We can do that.

How about the user allocates this buffer, and then sends it up to you in the OutBuffer of a METHOD_OUT_DIRECT IOCTL? Keep that IOCTL pending… don’t complete it… as long as the user has the device open. When the user closes the handle, complete the IOCTL only then. Every time you want to return data to the user, DMA into this same buffer (you’ll need to figure out how to let the user know new data has arrived, but that’s the problem in any of these shared memory schemes, right?)…

Easy, right? It is., if that’ll work for you.

Now… there IS a bit of an issue regarding the whole “contiguous memory” issue… the device bus logical addresses are only going to be guaranteed to be contiguous when you allocate COMMON BUFFER specifically. So, if you can’t handle a scatter/gather list, you’re going to have to allocate the memory in kernel mode (again, a COMMON BUFFER as you’re already doing) and then mapping that memory back to user mode with MmMapLockedPagesSpecifyCache… however… This is not nearly as simple as it appears to be, and us in fact fraught with edge conditions and security implications, which is why we most ardently try to get people to not do this on Windows.

Direct I/O and MERHOD_xxx_DIRECT are there for a good reason in Windows.

Peter

anton_bassov · December 12, 2019, 11:39pm

The fact that Windows has support for mapping user data buffers built-in, is the reason that we don’t use what
Anton calls “an absolutely standard operation in the world of the OSes that support mmap() call”
– We don’t typically need it.

However, CreateFileMapping() (which happens to be a Windows equivalent of mmap() as far as “regular”
disk files are concerned) still seems to be, for this or that reason, quite popular in the Windows userland. Surprise, surprise…

I don’t know about you, but I have a weird feeling that the userland apps would not mind using this function with the device files either if this option was available to them.

Let’s put it the following way. Let’s say some future Windows release provides a uniform memory-mapping framework that works exactly the same way with both disk and device files, and is available to anyone who wishes to make use of it. If your driver decides to make a use of
this framework, it automatically implies that a FILE__OBJECT, corresponding to a DO created by your driver, may be mapped to the userland by means of CreateFileMapping() call. Are you going to tell your students in OSR classes to avoid this framework as a plague simply because, in your words, “we don’t typically need it”???

Therefore, the things are, probably, not as simple as you are trying to present them.

Bear in mind that Anton hasn’t written any code on Windows for about ten years. He just hangs out here for the abuse.

I’ve been told on multiple occasions that validity happens to be one of the things that “ad hominem” arguments are NOT generally known for. Therefore, I was under the impression that “ad hominem”-style arguments might be reputed for anything but adding any extra “validity points” to your argumentation. Am I wrong here?

Anton Bassov

Don_Burn · December 13, 2019, 12:15am

Before we go to far down this rathole, can I ask a few questions. How big is the buffer you need to allocate for the device? How big is the typical update to the data? And how frequent are the updates?

I’ve been on both side of this, while I am not in favor of mapping kernel memory to user space, I have done it because a device needed it. I was very careful to be sure I really needed it because the device was being delivered to Microsoft Research. As I say I have done it, then the next client insisted they needed the same thing, when once the above questions were answered there was no justification.

Peter_Viscarola_OSR · December 13, 2019, 12:53am

@anton_bassov No, I disagree. You need this mmap on Linux, because you don’t have Direct I/O. If you think about it, mmap() for “device files” is really a kinda dumb, hacks, idea. It means whatever the dev wants it to mean… unlike in a file system, where it gives you a view into the single, global, common, cache. For devices, it’s just a way to make up for the lack of Direct I/O… to allow you to do stuff without recopying.

In terms of ad hominem attacks… pointing out your lack of actual recent experience on Windows is just something a questioner here should consider… it’s just a fact and not ad hominem at all.

Peter

Michal_Vodicka · December 13, 2019, 3:23am

Bear in mind that Anton hasn’t written any code on Windows for about ten years. He just hangs out here for the abuse.

That’s really funny remark but I’m curious. What changed in the last 10 years in the area which is discussed here? Seriously. Windows development isn’t my main focus for about 6 years now so I wonder what I have missed.

Marcel_Ruedinger · December 13, 2019, 5:03am

NUMA support? Maybe a little bit more than 10 years? Couldn’t NUMA compatibility also be an argument in favor of the standard way Direct I/O?

Marcel Ruedinger
datronicsoft

anton_bassov · December 13, 2019, 7:50am

You need this mmap on Linux, because you don’t have Direct I/O. If you think about it, mmap() for “device files”
is really a kinda dumb, hacks, idea. It means whatever the dev wants it to mean… unlike in a file system,
where it gives you a view into the single, global, common, cache. For devices, it’s just a way to make up
for the lack of Direct I/O… to allow you to do stuff without recopying.

There are few observations to be made here

I don’t mean Linux in particular when speaking about " the world of the OSes that support mmap() call". This call is closely related to
“everything is a file” concept. There is the entire family of the OSes that are based upon this concept, and Linux happens to be just one of them. In fact, if you look at the whole thing from the architectural perspective, the particular implementation of mmap()
that Linux provides does not really seem to be the most architecturally-sound one in existence. If you have any doubts about this part, you can compare it to, say, SunOS/Solaris memory model.
The “really a kinda dumb, hacks, idea” of mmap() (i.e. of providing a uniform way of mapping memory that works exactly the same way for any kind of file in existence, be it a disk file or a device one) predates Linux 0.01 by more than two decades. Furthermore, it predates not only Linux but UNIX as well. This concept is rooted in MULTICS - this is where the very idea of user apps attaching/detaching memory segments that are backed up by a store to/from their address spaces comes from.

In actuality, mmap() as we know it is just a result of combining this concept with “everything is a file” one. As you may have guessed, this
“merger” had been pioneered by SunOS. In fact, the very term “segment drivers” seems to make a very unambiguous reference to MULTICS
origins (at least the conceptual ones) of memory mapping, don’t you think…

The very concept of METHOD_DIRECT has absolutely nothing to do with sharing a buffer between an app and a driver
(and least it was not originally meant to be used this way). You must be confusing it with METHOD_NEITHER that was, indeed,
intended specifically for this purpose from the very beginning of “Windows NT world”. What actually happened here is that some clever bloke (I guess it was you, right) realised that, by pending METHOD_DIRECT-based IOCTL indefinitely one effectively gets a shared buffer
while, at the same time, avoiding all the extra pain in the arse that METHOD_NEITHER implies. As a result, the concept of a “Big Honking IRP” was born
In case if you just cannot imagine your life without above mentioned “Big Honking IRPs” (you seem to be particularly fond of them these days, aren’t you), I can assure you that you can implement them under Linux just in a click of fingers.

All you have to do is to mlock() your target buffer before passing its address to a driver. In response to your app’s request, your driver will validate the address of every page in the target range, get its ‘struct PAGE’ descriptor, reference it, and then map all the target pages to the virtually contiguous kernel range with vmap(). At this point you driver will be able to access the shared buffer in any context, and it is 100% safe. Certainly, taking into consideration the availability of mmap() this is, probably, not the most efficient and reasonable way of doing things under Linux, but this is already a different story.

In other words, your “argument” about “needing this mmap on Linux, because you don’t have Direct I/O” simply
does not stand a slightest chance…

In terms of ad hominem attacks… pointing out your lack of actual recent experience on Windows is just something
a questioner here should consider… it’s just a fact and not ad hominem at all.

Actually,I don’t really find the very fact of " lacking of actual recent experience on Windows" embarrassing in any possible way.
After all, I had mentioned this fact quite a few times in this NG.

The only thing that I am saying here is that you somehow found a way to use this fact as an “ad hominem” attack. In the end of the day,
sometimes this is not about WHAT you had actually said but more about HOW you had said it. Let’s look at your statement one more time

[begin quote]

Bear in mind that Anton hasn’t written any code on Windows for about ten years. He just hangs out here for the abuse.

[end quote]

I hope you agree that the above statement in its actual form is a very obvious example of “ad hominem” argument, and was meant to be used as the one…

Anton Bassov

anton_bassov · December 13, 2019, 9:48am

What changed in the last 10 years in the area which is discussed here? Seriously. Windows development isn’t my main focus
for about 6 years now so I wonder what I have missed

Although there are no dramatic changes anywhere in sight, but there may be some relatively minor modifications/improvements that were unavailable in the earlier days of Windows. As a result, more options may be available in the more recent OS versions.

For example, with the advent of the kernel sockets and Windows filtering platform in Vista you can be 100% sure that all your TDI experience can go down the drain. Furthermore, the advent of NDIS 6+ LWFs made NDIS 5 IM filter drivers sort of obsolete as well.

Another example is KMDF. For example, if you check WDK samples you will see that practically all WDM samples have been removed from it. Therefore, if you advise someone to do things in WDM-like fashion you will be looked upon as a dinosaur who got brought back to life.

You should not forget about the progress in the field of the development tools/environment either. For example, you can be sure that you are not going to be either called a “STUPID IDIOT” or requested to "PUBLISH YOUR NAME…(etc) " if you say that you want to build drivers with VC. Even more, these days you are sort of encouraged to use C++ in your drivers, so that you are quite unlikely
to hear Linus-like anti-C++ diatribes from our hosts. In fact, I would not bee too surprised if they actually start promoting the use of C++.

In context of Peter’s “argument” the very obvious example is process creation callbacks. For example, the NotificationRoutin() (i.e the only option was available under the XP) does not allow you to block process creation right of the spot in context of a callback, but the NotificationRoutineEx() that turned up in the more recent versions offers this functionality. IIRC, I put a foot in my mouth once because of this particular improvement.

Anton Bassov

Peter_Viscarola_OSR · December 13, 2019, 1:40pm

What changed in the last 10 years in the area which is discussed here? Seriously.

One very real and impactful change, aside from the bus drivers being totally rewritten to exhibit different practical behaviors, is the ever increasing presence of IOMMUs. There are other changes as well, including many, many, changes in power management, right? And that’s always impactful.

But, when it comes to recency of experience, it’s not a matter of “what’s changed” as much as it’s a matter of “how have I used these concepts lately… so I know/remember the true practicalities, the implementation ‘potholes’ as I like to call them, and not just some overall general concepts of how things might/should/did work.”

@anton_bassov You’re wrong, and your knowledge of Windows architecture is flawed. I’m not debating this with you.

Be aware that my patience with you is getting thin. You’re not providing helpful guidance for the questioners. And you’ve not amused me lately. You trolled this thread in your initial post and I, stupidly, rewarded you by taking your bait. If I had to guess, I would guess your time here grows short.

Peter

anton_bassov · December 13, 2019, 4:33pm

Peter,

You’re not providing helpful guidance for the questioners.

Well, judging from the OP’s post, he was well aware of the potential issues that may arise if you map the kernel memory to the user space, but was still going to proceed this way. Therefore, I just showed him the function that allowed him to reach his objective, and dropped
an extra hint (although, probably, not in a best way - more on it below) that this practice may be frowned upon in the Windows world…

You trolled this thread in your initial post

This part was, indeed, totally unnecessary on my behalf. I’m sorry for that…

it’s not a matter of “what’s changed” as much as it’s a matter of “how have I used these concepts lately… so I know/remember
the true practicalities, the implementation ‘potholes’ as I like to call them, and not just some overall general concepts
of how things might/should/did work.”

True, but my NTDEV participation (apart from the “exciting” trolling side,of course) allows me not only to stay in shape but even to learn something new, or at least to “learn what I have to learn and re-learn” due to the OS changes…

You’re wrong, and your knowledge of Windows architecture is flawed. I’m not debating this with you.

Trolling issues aside, could you please explain to me what I have said wrong from the technical standpoint. I am not either trolling or trying to prove anything to anyone - I just want to learn things for myself…

Anton Bassov

OneKneeToe · December 13, 2019, 5:47pm

Hello Peter:

Thanks for bearing with me and for the good info:

I attempted the common buffer approach last night and ran into an issue with my User SW getting an access violation. I had done everything from creating the DmaEnabler to MmMapLockedPagesSpecifyCache in my EvtDeviceAdd; Storing the addresses in the device context. Then I created an IOCTL for my User SW to call and retrieve the Virtual User Address.

Searching the forum, I found a post, by you actually, pointing out that the MmMapLockedPagesSpecifyCache call needs to be done in the correct Context and that a good location for this would be EvtIoInCallerContext and not EvtIoDeviceControl (which is where I was going to move it to next, as a part of the IOCTL).
https://community.osr.com/discussion/279797

@“Peter_Viscarola_(OSR)”
…EvtIoDeviceControl (in fact, EvtIoXxxx) is called in an arbitrary process and thread context. You need to use the EvtIoInCallerContext callback…

So, I added an EvtIoInCallerContext and moved the MmMapLockedPagesSpecifyCache to the method.
Unfortunately, I am still getting the access violation. Code Snippets below ( to give further context ).
I will play around with this some more; My first guess is it may be how I set up the enabler or the common buffer.

EvtDeviceAdd

    WDF_DMA_ENABLER_CONFIG dmaEnablerConfig;
    WDF_DMA_ENABLER_CONFIG_INIT( &dmaEnablerConfig, WdfDmaProfilePacket64, 128 );

    dmaEnablerConfig.EvtDmaEnablerFill = NULL;
    dmaEnablerConfig.EvtDmaEnablerFlush = NULL;
    dmaEnablerConfig.EvtDmaEnablerDisable = NULL;
    dmaEnablerConfig.EvtDmaEnablerEnable = NULL;
    dmaEnablerConfig.EvtDmaEnablerSelfManagedIoStart = NULL;
    dmaEnablerConfig.EvtDmaEnablerSelfManagedIoStop = NULL;
    dmaEnablerConfig.AddressWidthOverride = 0;
    dmaEnablerConfig.WdmDmaVersionOverride = 3;
    dmaEnablerConfig.Flags = WDF_DMA_ENABLER_CONFIG_REQUIRE_SINGLE_TRANSFER;

    NTSTATUS status{ WdfDmaEnablerCreate( wdfDevice, &dmaEnablerConfig, WDF_NO_OBJECT_ATTRIBUTES, &( deviceContextP->myDmaEnabler ) ) };
    if( NT_SUCCESS( status ) )
    {
        WDF_COMMON_BUFFER_CONFIG CommonBufferConfig;
        WDF_COMMON_BUFFER_CONFIG_INIT( &CommonBufferConfig, FILE_128_BYTE_ALIGNMENT );

        status = WdfCommonBufferCreateWithConfig( deviceContextP->myDmaEnabler,
                                    deviceContextP->myCommonBufferByteSize,
                                    &CommonBufferConfig,
                                    WDF_NO_OBJECT_ATTRIBUTES,
                                    &( deviceContextP->myCommonBuffer ) );
        if( NT_SUCCESS( status ) )
        {
            deviceContextP->virtualKernelAddr = WdfCommonBufferGetAlignedVirtualAddress( deviceContextP->myCommonBuffer );
            deviceContextP->LogicalAddr = WdfCommonBufferGetAlignedLogicalAddress( deviceContextP->myCommonBuffer );

            RtlZeroMemory( deviceContextP->virtualKernelAddr, deviceContextP->myCommonBufferByteSize );
            deviceContextP->pMdl = IoAllocateMdl( deviceContextP->virtualKernelAddr,
                                    ( ULONG ) deviceContextP->myCommonBufferByteSize,
                                    FALSE,
                                    FALSE,
                                    NULL );
            if( NULL == deviceContextP->pMdl )
            {
                status = STATUS_INSUFFICIENT_RESOURCES;
                TraceEvents( TRACE_LEVEL_ERROR, DBG_INIT, "IoAllocateMdl() failed with status=[%!STATUS!]", status );
            }
            else
            {
                    MmBuildMdlForNonPagedPool( deviceContextP->mySystemMemoryChannelDataBuffer.pMdl );
            }
        }
        else
        {
            TraceEvents( TRACE_LEVEL_ERROR, DBG_INIT, "WdfCommonBufferCreateWithConfig() failed with status=[%!STATUS!]", status );
        }
    }
    else
    {
        TraceEvents( TRACE_LEVEL_ERROR, DBG_INIT, "WdfDmaEnablerCreate() failed with status=[%!STATUS!]", status );
    }

EvtIoInCallerContext

if( ( NULL != deviceContextP->myCommonBuffer ) && 
    ( NULL == deviceContextP->virtualUserAddr ) )
{
    __try
    {
        deviceContextP->virtualUserAddr = MmMapLockedPagesSpecifyCache( deviceContextP->pMdl,
                                    UserMode,
                                    MmCached,
                                    NULL,
                                    FALSE,
                                    HighPagePriority );
        if( NULL == deviceContextP->virtualUserAddr )
        {
            TraceEvents( TRACE_LEVEL_ERROR, DBG_IOCTLS, "MmMapLockedPagesSpecifyCache() failed.");
        }
        else
        {
            TraceEvents( TRACE_LEVEL_INFORMATION, DBG_IOCTLS, "VirtualUserAddr=[0x%p]", deviceContextP->virtualUserAddr );
        }
    }
    __except( EXCEPTION_EXECUTE_HANDLER )
    {
        TraceEvents( TRACE_LEVEL_ERROR, DBG_IOCTLS, "MmMapLockedPagesSpecifyCache() threw an exception!]" );
    }
}
WdfDeviceEnqueueRequest( device, request );

EvtIoDeviceControl - IOCTL_GET_DMA_USER_ADDRESS

        if( NULL == deviceContextP->myCommonBuffer )
        {
            TraceEvents( TRACE_LEVEL_WARNING, DBG_IOCTLS, "NO COMMON BUFFER!!" );
        }
        else if( NULL == deviceContextP->virtualUserAddr )
        {
            TraceEvents( TRACE_LEVEL_WARNING, DBG_IOCTLS, "NO VIRTUAL ADDRESS" );
        }
        else
        {
            PDMA_USER_ADDRESS pDmaUserAddress{ nullptr };
            status = WdfRequestRetrieveOutputBuffer( request, outputBufferLength, ( PVOID* ) &pDmaUserAddress, NULL );
            if( NT_SUCCESS( status ) )
            {
                pDmaUserAddress->virtualAddr = reinterpret_cast<UINT64>( deviceContextP->virtualUserAddr ); 
                bytesTransferred = sizeof( DMA_USER_ADDRESS );
            }
            else
            {
                TraceEvents( TRACE_LEVEL_ERROR, DBG_IOCTLS, "WdfRequestRetrieveOutputBuffer() failed with status=[%!STATUS!]", status );
            }
        }
        ...
        ...
       WdfRequestCompleteWithInformation( request, status, bytesTransferred );

As always, thank you!

Juan

OneKneeToe · December 13, 2019, 6:36pm

Hello Don:

I didn’t see your post before my reply to Peter.

@Don_Burn
How big is the buffer you need to allocate for the device? How big is the typical update to the data? And how frequent are the updates?
At the moment, I am looking at allocating 500MB, but may want to up that to 1GB or even 2GB later. The data size can vary, but at the moment I have the FPGA configured to push 64MB packets, 2 packets a second. But the size and rate can also change.

@Don_Burn
… while I am not in favor of mapping kernel memory to user space…
Yes, this is at the back of my mind as I am playing around with this. Even if I get this to work, what risks have I introduced, can I plug the holes, if not, how big are the risks and do they outweigh the rewards… etc.

It is clear from other threads and even here, that this approach is not ideal. Unfortunately, I need the guaranteed contiguous logical addresses provided by common buffer.

@Don_Burn
I have done it because a device needed it…
Unfortunately, I cannot change the way the FPGA is currently designed. The current design is having the FPGA DMA data directly to another device. SW then uses other means of processing that data on that device.

I am changing things a little, so that SW can process the data on system. In my mind I simply want to move that destination memory buffer from the other device to system memory (move the landing spot). To the FPGA the change is transparent as it only sees logical addresses.

Thanks for your interest and input Don.

Juan

Michal_Vodicka · December 13, 2019, 6:48pm

One very real and impactful change, aside from the bus drivers being totally rewritten to exhibit different practical behaviors, is the ever increasing presence of IOMMUs. There are other changes as well, including many, many, changes in power management, right? And that’s always impactful.

Yes, but I didn’t mean general OS changes but changes related to things discussed in this thread (sharing memory between user and kernel mode and IOCTLs). To me it seems as things work the same way as before when I used them.

In context of Peter’s “argument” the very obvious example is process creation callbacks. For example, the NotificationRoutin() (i.e the only option was available under the XP)

If I count correctly, 10 years before we already had Win7 so XP is not a question. Also, there already was KMDF, WDM drivers were outdated and so on.

(Well, there still was legacy usbser driver not following even XP WDM rules and WDF version wasn’t available before Win10. Which fixed old bugs and introduced new ones. Real pain I had to handle recently…)

Michal

OneKneeToe · December 13, 2019, 8:56pm

@“Peter_Viscarola_(OSR)”

As a test, I added trace code into the EvtIoInCallerContext to print the data at Virtual User Address. The print worked when EvtIoInCallerContext ran and performed MmMapLockedPagesSpecifyCache - the data being 0 as expected, due to RtlZeroMemory & FPGA not yet being configured.

However, subsequent calls to EvtIoInCallerContext shows the print throwing an exception - I suspect access violation, same as User SW.

EvtIoInCallerContext (added printing)

    // Check and if no Virtual User Address, call MmMapLockedPagesSpecifyCache (See previous post for details).
   ...
    if( NULL != deviceContextP->virtualUserAddr )
    {
        UINT64 *pBuffer = ( UINT64* ) deviceContextP->virtualUserAddr;
        for( auto x{ 0 }; x < 10; ++x )
        {
            __try
            {
                TraceEvents( TRACE_LEVEL_INFORMATION, DBG_IOCTLS
                    , "Addr[%u] @[0x%p] = [%llu]"
                    , x
                    , pBuffer
                    , *( pBuffer + x ) );
            }
            __except( EXCEPTION_EXECUTE_HANDLER )
            {
                TraceEvents( TRACE_LEVEL_ERROR, DBG_IOCTLS, "EXCEPTION in TraceEvent" );
            }
        }
    }

Tim_Roberts · December 13, 2019, 9:15pm

Dumb question: Is the second EvtIoInCallerContext from the exact same process as the first call? Remember that your virtualUserAddr is only valid for that one process. If you had a quick test app and then started another test app, that address is no longer valid.

And that points out a bug in your code, In your ioctl handler, you’re only doing the mapping if virtualUserAddr is null. That only works if the calling process never ends, unless you are zeroing out that field when the app exits. Personally, I’d just eliminate that check and ALWAYS do the mapping, even if the field already has a value. Windows won’t create a new mapping if one already exists.