Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Before Posting...
Please check out the Community Guidelines in the Announcements and Administration Category.

DMA - Exposing the commonbuffer to user sw

OneKneeToeOneKneeToe Member Posts: 30
edited December 2019 in NTDEV

Hello OSR:

Go find your dirties pair shoes, you may want to throw at me. :smile:

I have had some success with my other toy projects and learned quite a bit. A lot of the DMA related details were hidden, so I haven't had to deal with it.

  • My SW would allocate a buffer of data on an external device.
  • That external device (its driver) would provide my SW with a Logical-Address Page List (A list of address, one address for each page of memory).
  • My SW would program a LUT in the FPGA with this list of addresses.
  • The FPGA would then use these addresses for DMA.

So, continuing on, I want to do the data processing on the Windows system instead of this external device - Still using the pre-existing FPGA to DMA the data.
1. In my KMDF driver, I create a common buffer.
2. Using the logical address I get from WdfCommonBufferGetAlignedLogicalAddress, I create a list of logical addresses, using the same page-size as that of the external device.
-- I figured this was safe since the CommonBuffer allocation is contiguous in the logical addresses space.
-- Note: Physical Page addresses doesn't matter since my User SW will be using the virtual address while the FPGA DMA will be using the logical addresses.
3. I get the Kernel Virtual Address by calling WdfCommonBufferGetAlignedVirtualAddress.
*
My question (drum roll) How can I convert the Kernel virtual address to a User virtual address so that my User SW can gain access to the data?

  • Should I be looking at allocating the buffer from user SW (VirtualAlloc or AllocateUserPhysicalPages)? How then to get the logical address for DMA?
  • Should the driver be copying data from the common buffer to some user-space buffer? This one seems like a waste?

Thank again everyone for your patience and continuing help.

Regards,
Juan

«1

Comments

  • anton_bassovanton_bassov Member Posts: 5,092

    Go find your dirties pair shoes, you may want to throw at me.

    ...........

    How can I convert the Kernel virtual address to a User virtual address so that my User SW can gain access to the data?

    Well, you seem to realise that mapping the kernel/device memory to the userland (which happens to an absolutely standard operation in the world of the OSes that support mmap() call) is considered a "dodgy practice" in the Windows one, right. However, for this or that reason,
    you still want to proceed in this direction.

    Therefore, assuming that you are well aware of all the potential "caveats" (like, for example, what happens if your app's address space gets modified by some other app by means of WriteProcessMemory() call , or what happens if your app just terminates abnormally) and are willing to deal with them, you may want to check the following link

    https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-mmmaplockedpagesspecifycache

    [begin ironical mode]

    PS. You seem to be anticipating a "funny" reaction from the usual suspects, don't you. Therefore, let's see how it all goes - after all, we may be lucky enough to see some " exciting" inquiries concerning your product and company name in the end of the day. In fact, if we are really lucky we may, probably, even see the the above mentioned "inquiries" in capitals. In other words, I hold my breath in anticipation....

    [end ironical mode]

    Anton Bassov

  • OneKneeToeOneKneeToe Member Posts: 30
    edited December 2019

    Hello Anton:

    Thanks for the response!

    My Approach:

    In my naive mind, what I am trying to accomplish is allocate a memory location for my FPGA to DMA data into and for my User SW to be able to access directly. My driver doesn't need to look at the data, it just need to expose the buffer for DMA use. I would like to avoid any need to copy data from a "Kernel buffer" to a "User Buffer".

    • For my first attempt, I looked at allocating large page memory using VirtualAlloc in my user sw. But then I wasn't able to find a method of making that memory available for DMA.

    • For my second attempt, I looked at CommonBuffer, but now I am looking for a way to expose the buffer to user sw.

    I suppose my question should have been, "What is the 'proper' way of doing the above?"

    I did look at your link and will see if that approach works for the short term; yeah, I presume exposing a pool of memory allocated by a Kernel Driver to user sw is not "best practice" - well, unless the API (memory pool allocation) was for that specific purpose.

    Light Heartedness :# :

    In reading through the posts on the forum, the "usual suspect" have been contributing for quite some time. I am sure they have seen it all. Since I have no clue what I am doing in the driver world, I know I am asking pretty basic ( dumb?? ) questions, with approaches that are probably a sacrilege. I get that, so I sometimes try to lighten things up a little.

    I am looking for a way to attend a future seminar. That should be a huge help.

    For what it's worth, I do try to give back by posting how I ended up resolving my question; with code snippets, if applicable.

    Thanks again, Anton.
    Juan

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 7,583

    If you want to avoid copying the data, why not just send it using Direct I/O (or an IOCTL using METHOD_IN_DIRECT)? And return it to users using Direct I/O (or an IOCTL using METHOD_OUT_DIRECT).

    The fact that Windows has support for mapping user data buffers built-in, is the reason that we don't use what Anton calls "an absolutely standard operation in the world of the OSes that support mmap() call" -- We don't typically need it.

    About 90% of the time when people want to share buffers between user-mode and kernel-mode, they want to do this because they're thinking Linux and coding on Windows.

    How would you approach this, @anton_bassov ?

    Bear in mind that Anton hasn't written any code on Windows for about ten years. He just hangs out here for the abuse.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • OneKneeToeOneKneeToe Member Posts: 30

    Hello Peter:

    Thanks again for making yourself available.

    If I understood the online documentation, I believe using the Direct I/O or IOCTL would be a significant change of implementation for me.

    Using the Direct I/O or IOCTL would require my software to make repeated calls to the driver. Each time, the driver would need to perform the mapping and configure the FPGA, which would be a command for it to flush its Page LUT, re-program the LUT with new logical addresses. and a few other things. Also, the logic within the FPGA may make adapting to this method a little complicated.

    This is why I wanted to keep with the pre-existing approach where User SW would allocate a large buffer, this time of system memory, and program the FPGA once. Then the FPGA (via DMA) and User SW would both have direct access to that memory region - the driver itself doesn't need to touch it. The FPGA would notify User SW via the driver by using interrupts and metadata. The metadata would tell User SW where in that "shared" large buffer to look at and how much memory it can use ( start address & byte size). Since Logical and Virtual addresses are all contiguous, it is OK if physical memory is not.

    • Repeated interrupts and file-reads to get the metadata is relatively light weight compared to the repeated direct I/O approach - if I understand things correctly.

    How far off am I?

    Thank you Peter!

    Juan

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 7,583

    This is why I wanted to keep with the pre-existing approach where User SW would allocate a large buffer, this time of system memory, and program the FPGA once

    OK... that’s fine. We can do that.

    How about the user allocates this buffer, and then sends it up to you in the OutBuffer of a METHOD_OUT_DIRECT IOCTL? Keep that IOCTL pending... don’t complete it... as long as the user has the device open. When the user closes the handle, complete the IOCTL only then. Every time you want to return data to the user, DMA into this same buffer (you’ll need to figure out how to let the user know new data has arrived, but that’s the problem in any of these shared memory schemes, right?)...

    Easy, right? It is.,, if that’ll work for you.

    Now... there IS a bit of an issue regarding the whole “contiguous memory” issue... the device bus logical addresses are only going to be guaranteed to be contiguous when you allocate COMMON BUFFER specifically. So, if you can’t handle a scatter/gather list, you’re going to have to allocate the memory in kernel mode (again, a COMMON BUFFER as you’re already doing) and then mapping that memory back to user mode with MmMapLockedPagesSpecifyCache... however... This is not nearly as simple as it appears to be, and us in fact fraught with edge conditions and security implications, which is why we most ardently try to get people to not do this on Windows.

    Direct I/O and MERHOD_xxx_DIRECT are there for a good reason in Windows.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • anton_bassovanton_bassov Member Posts: 5,092

    The fact that Windows has support for mapping user data buffers built-in, is the reason that we don't use what
    Anton calls "an absolutely standard operation in the world of the OSes that support mmap() call"
    -- We don't typically need it.

    However, CreateFileMapping() (which happens to be a Windows equivalent of mmap() as far as "regular"
    disk files are concerned) still seems to be, for this or that reason, quite popular in the Windows userland. Surprise, surprise.......

    I don't know about you, but I have a weird feeling that the userland apps would not mind using this function with the device files either if this option was available to them.

    Let's put it the following way. Let's say some future Windows release provides a uniform memory-mapping framework that works exactly the same way with both disk and device files, and is available to anyone who wishes to make use of it. If your driver decides to make a use of
    this framework, it automatically implies that a FILE__OBJECT, corresponding to a DO created by your driver, may be mapped to the userland by means of CreateFileMapping() call. Are you going to tell your students in OSR classes to avoid this framework as a plague simply because, in your words, "we don't typically need it"????

    Therefore, the things are, probably, not as simple as you are trying to present them.

    Bear in mind that Anton hasn't written any code on Windows for about ten years. He just hangs out here for the abuse.

    I've been told on multiple occasions that validity happens to be one of the things that "ad hominem" arguments are NOT generally known for. Therefore, I was under the impression that "ad hominem"-style arguments might be reputed for anything but adding any extra "validity points" to your argumentation. Am I wrong here?

    Anton Bassov

  • Don_BurnDon_Burn Member - All Emails Posts: 1,673

    Before we go to far down this rathole, can I ask a few questions. How big is the buffer you need to allocate for the device? How big is the typical update to the data? And how frequent are the updates?

    I've been on both side of this, while I am not in favor of mapping kernel memory to user space, I have done it because a device needed it. I was very careful to be sure I really needed it because the device was being delivered to Microsoft Research. As I say I have done it, then the next client insisted they needed the same thing, when once the above questions were answered there was no justification.

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 7,583
    edited December 2019

    @anton_bassov No, I disagree. You need this mmap on Linux, because you don’t have Direct I/O. If you think about it, mmap() for “device files” is really a kinda dumb, hacks, idea. It means whatever the dev wants it to mean... unlike in a file system, where it gives you a view into the single, global, common, cache. For devices, it’s just a way to make up for the lack of Direct I/O... to allow you to do stuff without recopying.

    In terms of ad hominem attacks... pointing out your lack of actual recent experience on Windows is just something a questioner here should consider... it’s just a fact and not ad hominem at all.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • Michal_VodickaMichal_Vodicka Member - All Emails Posts: 72
    edited December 2019

    Bear in mind that Anton hasn't written any code on Windows for about ten years. He just hangs out here for the abuse.

    That's really funny remark but I'm curious. What changed in the last 10 years in the area which is discussed here? Seriously. Windows development isn't my main focus for about 6 years now so I wonder what I have missed.

  • Marcel_RuedingerMarcel_Ruedinger Member Posts: 135

    NUMA support? Maybe a little bit more than 10 years? Couldn't NUMA compatibility also be an argument in favor of the standard way Direct I/O?

    Marcel Ruedinger
    datronicsoft

  • anton_bassovanton_bassov Member Posts: 5,092

    You need this mmap on Linux, because you don’t have Direct I/O. If you think about it, mmap() for “device files”
    is really a kinda dumb, hacks, idea. It means whatever the dev wants it to mean... unlike in a file system,
    where it gives you a view into the single, global, common, cache. For devices, it’s just a way to make up
    for the lack of Direct I/O... to allow you to do stuff without recopying.

    There are few observations to be made here

    1. I don't mean Linux in particular when speaking about " the world of the OSes that support mmap() call". This call is closely related to
      "everything is a file" concept. There is the entire family of the OSes that are based upon this concept, and Linux happens to be just one of them. In fact, if you look at the whole thing from the architectural perspective, the particular implementation of mmap()
      that Linux provides does not really seem to be the most architecturally-sound one in existence. If you have any doubts about this part, you can compare it to, say, SunOS/Solaris memory model.

    2. The "really a kinda dumb, hacks, idea" of mmap() (i.e. of providing a uniform way of mapping memory that works exactly the same way for any kind of file in existence, be it a disk file or a device one) predates Linux 0.01 by more than two decades. Furthermore, it predates not only Linux but UNIX as well. This concept is rooted in MULTICS - this is where the very idea of user apps attaching/detaching memory segments that are backed up by a store to/from their address spaces comes from.

    In actuality, mmap() as we know it is just a result of combining this concept with "everything is a file" one. As you may have guessed, this
    "merger" had been pioneered by SunOS. In fact, the very term "segment drivers" seems to make a very unambiguous reference to MULTICS
    origins (at least the conceptual ones) of memory mapping, don't you think....

    1. The very concept of METHOD_DIRECT has absolutely nothing to do with sharing a buffer between an app and a driver
      (and least it was not originally meant to be used this way). You must be confusing it with METHOD_NEITHER that was, indeed,
      intended specifically for this purpose from the very beginning of "Windows NT world". What actually happened here is that some clever bloke (I guess it was you, right) realised that, by pending METHOD_DIRECT-based IOCTL indefinitely one effectively gets a shared buffer
      while, at the same time, avoiding all the extra pain in the arse that METHOD_NEITHER implies. As a result, the concept of a "Big Honking IRP" was born

    2. In case if you just cannot imagine your life without above mentioned "Big Honking IRPs" (you seem to be particularly fond of them these days, aren't you), I can assure you that you can implement them under Linux just in a click of fingers.

    All you have to do is to mlock() your target buffer before passing its address to a driver. In response to your app's request, your driver will validate the address of every page in the target range, get its 'struct PAGE' descriptor, reference it, and then map all the target pages to the virtually contiguous kernel range with vmap(). At this point you driver will be able to access the shared buffer in any context, and it is 100% safe. Certainly, taking into consideration the availability of mmap() this is, probably, not the most efficient and reasonable way of doing things under Linux, but this is already a different story.

    In other words, your "argument" about "needing this mmap on Linux, because you don’t have Direct I/O" simply
    does not stand a slightest chance.......

    In terms of ad hominem attacks... pointing out your lack of actual recent experience on Windows is just something
    a questioner here should consider... it’s just a fact and not ad hominem at all.

    Actually,I don't really find the very fact of " lacking of actual recent experience on Windows" embarrassing in any possible way.
    After all, I had mentioned this fact quite a few times in this NG.

    The only thing that I am saying here is that you somehow found a way to use this fact as an "ad hominem" attack. In the end of the day,
    sometimes this is not about WHAT you had actually said but more about HOW you had said it. Let's look at your statement one more time

    [begin quote]

    Bear in mind that Anton hasn't written any code on Windows for about ten years. He just hangs out here for the abuse.

    [end quote]

    I hope you agree that the above statement in its actual form is a very obvious example of "ad hominem" argument, and was meant to be used as the one....

    Anton Bassov

  • anton_bassovanton_bassov Member Posts: 5,092

    What changed in the last 10 years in the area which is discussed here? Seriously. Windows development isn't my main focus
    for about 6 years now so I wonder what I have missed

    Although there are no dramatic changes anywhere in sight, but there may be some relatively minor modifications/improvements that were unavailable in the earlier days of Windows. As a result, more options may be available in the more recent OS versions.

    For example, with the advent of the kernel sockets and Windows filtering platform in Vista you can be 100% sure that all your TDI experience can go down the drain. Furthermore, the advent of NDIS 6+ LWFs made NDIS 5 IM filter drivers sort of obsolete as well.

    Another example is KMDF. For example, if you check WDK samples you will see that practically all WDM samples have been removed from it. Therefore, if you advise someone to do things in WDM-like fashion you will be looked upon as a dinosaur who got brought back to life.

    You should not forget about the progress in the field of the development tools/environment either. For example, you can be sure that you are not going to be either called a "STUPID IDIOT" or requested to "PUBLISH YOUR NAME...(etc) " if you say that you want to build drivers with VC. Even more, these days you are sort of encouraged to use C++ in your drivers, so that you are quite unlikely
    to hear Linus-like anti-C++ diatribes from our hosts. In fact, I would not bee too surprised if they actually start promoting the use of C++.

    In context of Peter's "argument" the very obvious example is process creation callbacks. For example, the NotificationRoutin() (i.e the only option was available under the XP) does not allow you to block process creation right of the spot in context of a callback, but the NotificationRoutineEx() that turned up in the more recent versions offers this functionality. IIRC, I put a foot in my mouth once because of this particular improvement.

    Anton Bassov

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 7,583

    What changed in the last 10 years in the area which is discussed here? Seriously.

    One very real and impactful change, aside from the bus drivers being totally rewritten to exhibit different practical behaviors, is the ever increasing presence of IOMMUs. There are other changes as well, including many, many, changes in power management, right? And that’s always impactful.

    But, when it comes to recency of experience, it’s not a matter of “what’s changed” as much as it’s a matter of “how have I used these concepts lately... so I know/remember the true practicalities, the implementation ‘potholes’ as I like to call them, and not just some overall general concepts of how things might/should/did work.”

    @anton_bassov You’re wrong, and your knowledge of Windows architecture is flawed. I’m not debating this with you.

    Be aware that my patience with you is getting thin. You’re not providing helpful guidance for the questioners. And you’ve not amused me lately. You trolled this thread in your initial post and I, stupidly, rewarded you by taking your bait. If I had to guess, I would guess your time here grows short.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • anton_bassovanton_bassov Member Posts: 5,092

    Peter,

    You’re not providing helpful guidance for the questioners.

    Well, judging from the OP's post,, he was well aware of the potential issues that may arise if you map the kernel memory to the user space, but was still going to proceed this way. Therefore, I just showed him the function that allowed him to reach his objective, and dropped
    an extra hint (although, probably, not in a best way - more on it below) that this practice may be frowned upon in the Windows world...

    You trolled this thread in your initial post

    This part was, indeed, totally unnecessary on my behalf. I'm sorry for that....

    it’s not a matter of “what’s changed” as much as it’s a matter of “how have I used these concepts lately... so I know/remember
    the true practicalities, the implementation ‘potholes’ as I like to call them, and not just some overall general concepts
    of how things might/should/did work.”

    True, but my NTDEV participation (apart from the "exciting" trolling side,of course) allows me not only to stay in shape but even to learn something new, or at least to "learn what I have to learn and re-learn" due to the OS changes.....

    You’re wrong, and your knowledge of Windows architecture is flawed. I’m not debating this with you.

    Trolling issues aside, could you please explain to me what I have said wrong from the technical standpoint. I am not either trolling or trying to prove anything to anyone - I just want to learn things for myself....

    Anton Bassov

  • OneKneeToeOneKneeToe Member Posts: 30
    edited December 2019

    Hello Peter:

    Thanks for bearing with me and for the good info:

    I attempted the common buffer approach last night and ran into an issue with my User SW getting an access violation. I had done everything from creating the DmaEnabler to MmMapLockedPagesSpecifyCache in my EvtDeviceAdd; Storing the addresses in the device context. Then I created an IOCTL for my User SW to call and retrieve the Virtual User Address.

    Searching the forum, I found a post, by you actually, pointing out that the MmMapLockedPagesSpecifyCache call needs to be done in the correct Context and that a good location for this would be EvtIoInCallerContext and not EvtIoDeviceControl (which is where I was going to move it to next, as a part of the IOCTL).
    https://community.osr.com/discussion/279797

    @Peter_Viscarola_(OSR)
    ...EvtIoDeviceControl (in fact, EvtIoXxxx) is called in an arbitrary process and thread context. You need to use the EvtIoInCallerContext callback...

    So, I added an EvtIoInCallerContext and moved the MmMapLockedPagesSpecifyCache to the method.
    Unfortunately, I am still getting the access violation. Code Snippets below ( to give further context ).
    I will play around with this some more; My first guess is it may be how I set up the enabler or the common buffer.

    EvtDeviceAdd

        WDF_DMA_ENABLER_CONFIG dmaEnablerConfig;
        WDF_DMA_ENABLER_CONFIG_INIT( &dmaEnablerConfig, WdfDmaProfilePacket64, 128 );
    
        dmaEnablerConfig.EvtDmaEnablerFill = NULL;
        dmaEnablerConfig.EvtDmaEnablerFlush = NULL;
        dmaEnablerConfig.EvtDmaEnablerDisable = NULL;
        dmaEnablerConfig.EvtDmaEnablerEnable = NULL;
        dmaEnablerConfig.EvtDmaEnablerSelfManagedIoStart = NULL;
        dmaEnablerConfig.EvtDmaEnablerSelfManagedIoStop = NULL;
        dmaEnablerConfig.AddressWidthOverride = 0;
        dmaEnablerConfig.WdmDmaVersionOverride = 3;
        dmaEnablerConfig.Flags = WDF_DMA_ENABLER_CONFIG_REQUIRE_SINGLE_TRANSFER;
    
        NTSTATUS status{ WdfDmaEnablerCreate( wdfDevice, &dmaEnablerConfig, WDF_NO_OBJECT_ATTRIBUTES, &( deviceContextP->myDmaEnabler ) ) };
        if( NT_SUCCESS( status ) )
        {
            WDF_COMMON_BUFFER_CONFIG CommonBufferConfig;
            WDF_COMMON_BUFFER_CONFIG_INIT( &CommonBufferConfig, FILE_128_BYTE_ALIGNMENT );
    
            status = WdfCommonBufferCreateWithConfig( deviceContextP->myDmaEnabler,
                                        deviceContextP->myCommonBufferByteSize,
                                        &CommonBufferConfig,
                                        WDF_NO_OBJECT_ATTRIBUTES,
                                        &( deviceContextP->myCommonBuffer ) );
            if( NT_SUCCESS( status ) )
            {
                deviceContextP->virtualKernelAddr = WdfCommonBufferGetAlignedVirtualAddress( deviceContextP->myCommonBuffer );
                deviceContextP->LogicalAddr = WdfCommonBufferGetAlignedLogicalAddress( deviceContextP->myCommonBuffer );
    
                RtlZeroMemory( deviceContextP->virtualKernelAddr, deviceContextP->myCommonBufferByteSize );
                deviceContextP->pMdl = IoAllocateMdl( deviceContextP->virtualKernelAddr,
                                        ( ULONG ) deviceContextP->myCommonBufferByteSize,
                                        FALSE,
                                        FALSE,
                                        NULL );
                if( NULL == deviceContextP->pMdl )
                {
                    status = STATUS_INSUFFICIENT_RESOURCES;
                    TraceEvents( TRACE_LEVEL_ERROR, DBG_INIT, "IoAllocateMdl() failed with status=[%!STATUS!]", status );
                }
                else
                {
                        MmBuildMdlForNonPagedPool( deviceContextP->mySystemMemoryChannelDataBuffer.pMdl );
                }
            }
            else
            {
                TraceEvents( TRACE_LEVEL_ERROR, DBG_INIT, "WdfCommonBufferCreateWithConfig() failed with status=[%!STATUS!]", status );
            }
        }
        else
        {
            TraceEvents( TRACE_LEVEL_ERROR, DBG_INIT, "WdfDmaEnablerCreate() failed with status=[%!STATUS!]", status );
        }
    

    EvtIoInCallerContext

    if( ( NULL != deviceContextP->myCommonBuffer ) && 
        ( NULL == deviceContextP->virtualUserAddr ) )
    {
        __try
        {
            deviceContextP->virtualUserAddr = MmMapLockedPagesSpecifyCache( deviceContextP->pMdl,
                                        UserMode,
                                        MmCached,
                                        NULL,
                                        FALSE,
                                        HighPagePriority );
            if( NULL == deviceContextP->virtualUserAddr )
            {
                TraceEvents( TRACE_LEVEL_ERROR, DBG_IOCTLS, "MmMapLockedPagesSpecifyCache() failed.");
            }
            else
            {
                TraceEvents( TRACE_LEVEL_INFORMATION, DBG_IOCTLS, "VirtualUserAddr=[0x%p]", deviceContextP->virtualUserAddr );
            }
        }
        __except( EXCEPTION_EXECUTE_HANDLER )
        {
            TraceEvents( TRACE_LEVEL_ERROR, DBG_IOCTLS, "MmMapLockedPagesSpecifyCache() threw an exception!]" );
        }
    }
    WdfDeviceEnqueueRequest( device, request );
    

    EvtIoDeviceControl - IOCTL_GET_DMA_USER_ADDRESS

            if( NULL == deviceContextP->myCommonBuffer )
            {
                TraceEvents( TRACE_LEVEL_WARNING, DBG_IOCTLS, "NO COMMON BUFFER!!" );
            }
            else if( NULL == deviceContextP->virtualUserAddr )
            {
                TraceEvents( TRACE_LEVEL_WARNING, DBG_IOCTLS, "NO VIRTUAL ADDRESS" );
            }
            else
            {
                PDMA_USER_ADDRESS pDmaUserAddress{ nullptr };
                status = WdfRequestRetrieveOutputBuffer( request, outputBufferLength, ( PVOID* ) &pDmaUserAddress, NULL );
                if( NT_SUCCESS( status ) )
                {
                    pDmaUserAddress->virtualAddr = reinterpret_cast<UINT64>( deviceContextP->virtualUserAddr ); 
                    bytesTransferred = sizeof( DMA_USER_ADDRESS );
                }
                else
                {
                    TraceEvents( TRACE_LEVEL_ERROR, DBG_IOCTLS, "WdfRequestRetrieveOutputBuffer() failed with status=[%!STATUS!]", status );
                }
            }
            ...
            ...
           WdfRequestCompleteWithInformation( request, status, bytesTransferred );
    

    As always, thank you!

    Juan

  • OneKneeToeOneKneeToe Member Posts: 30

    Hello Don:

    I didn't see your post before my reply to Peter.

    @Don_Burn
    How big is the buffer you need to allocate for the device? How big is the typical update to the data? And how frequent are the updates?

    At the moment, I am looking at allocating 500MB, but may want to up that to 1GB or even 2GB later. The data size can vary, but at the moment I have the FPGA configured to push 64MB packets, 2 packets a second. But the size and rate can also change.

    @Don_Burn
    ... while I am not in favor of mapping kernel memory to user space...

    Yes, this is at the back of my mind as I am playing around with this. Even if I get this to work, what risks have I introduced, can I plug the holes, if not, how big are the risks and do they outweigh the rewards... etc.

    It is clear from other threads and even here, that this approach is not ideal. Unfortunately, I need the guaranteed contiguous logical addresses provided by common buffer.

    @Don_Burn
    I have done it because a device needed it...

    Unfortunately, I cannot change the way the FPGA is currently designed. The current design is having the FPGA DMA data directly to another device. SW then uses other means of processing that data on that device.

    I am changing things a little, so that SW can process the data on system. In my mind I simply want to move that destination memory buffer from the other device to system memory (move the landing spot). To the FPGA the change is transparent as it only sees logical addresses.

    Thanks for your interest and input Don.

    Juan

  • Michal_VodickaMichal_Vodicka Member - All Emails Posts: 72

    One very real and impactful change, aside from the bus drivers being totally rewritten to exhibit different practical behaviors, is the ever increasing presence of IOMMUs. There are other changes as well, including many, many, changes in power management, right? And that’s always impactful.

    Yes, but I didn't mean general OS changes but changes related to things discussed in this thread (sharing memory between user and kernel mode and IOCTLs). To me it seems as things work the same way as before when I used them.

    In context of Peter's "argument" the very obvious example is process creation callbacks. For example, the NotificationRoutin() (i.e the only option was available under the XP)

    If I count correctly, 10 years before we already had Win7 so XP is not a question. Also, there already was KMDF, WDM drivers were outdated and so on.

    (Well, there still was legacy usbser driver not following even XP WDM rules and WDF version wasn't available before Win10. Which fixed old bugs and introduced new ones. Real pain I had to handle recently...)

    Michal

  • OneKneeToeOneKneeToe Member Posts: 30
    edited December 2019

    @Peter_Viscarola_(OSR)

    As a test, I added trace code into the EvtIoInCallerContext to print the data at Virtual User Address. The print worked when EvtIoInCallerContext ran and performed MmMapLockedPagesSpecifyCache - the data being 0 as expected, due to RtlZeroMemory & FPGA not yet being configured.

    However, subsequent calls to EvtIoInCallerContext shows the print throwing an exception - I suspect access violation, same as User SW.

    EvtIoInCallerContext (added printing)

        // Check and if no Virtual User Address, call MmMapLockedPagesSpecifyCache (See previous post for details).
       ...
        if( NULL != deviceContextP->virtualUserAddr )
        {
            UINT64 *pBuffer = ( UINT64* ) deviceContextP->virtualUserAddr;
            for( auto x{ 0 }; x < 10; ++x )
            {
                __try
                {
                    TraceEvents( TRACE_LEVEL_INFORMATION, DBG_IOCTLS
                        , "Addr[%u] @[0x%p] = [%llu]"
                        , x
                        , pBuffer
                        , *( pBuffer + x ) );
                }
                __except( EXCEPTION_EXECUTE_HANDLER )
                {
                    TraceEvents( TRACE_LEVEL_ERROR, DBG_IOCTLS, "EXCEPTION in TraceEvent" );
                }
            }
        }
    
  • Tim_RobertsTim_Roberts Member - All Emails Posts: 13,204

    Dumb question: Is the second EvtIoInCallerContext from the exact same process as the first call? Remember that your virtualUserAddr is only valid for that one process. If you had a quick test app and then started another test app, that address is no longer valid.

    And that points out a bug in your code, In your ioctl handler, you're only doing the mapping if virtualUserAddr is null. That only works if the calling process never ends, unless you are zeroing out that field when the app exits. Personally, I'd just eliminate that check and ALWAYS do the mapping, even if the field already has a value. Windows won't create a new mapping if one already exists.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • OneKneeToeOneKneeToe Member Posts: 30
    edited December 2019

    @Peter_Viscarola_(OSR)
    @Tim_Roberts
    (Tim: I was in the middle of writing when you posted - Not a dumb question at all :smiley: )

    I found my answer, after some lunchtime googling:

    https://osr.com/blog/2014/04/15/evtioincallercontext-callback-called-even-io-operations-dont-queue/

    ... The other point of view was getting what are essentially unexpected (and, to your driver, unsupported) Requests in EvtIoInCallerContext was an annoyance...

    As Tim pointed out above, the context of when EvtIoInCallerContext was called and MmMapLockedPagesSpecifyCache executed, was not the same context as my User SW.

    To ensure MmMapLockedPagesSpecifyCache was executed for my IOCTL_GET_DMA_USER_ADDRESS request I added the below at the beginning of EvtIoInCallerContext:

    EvtIoInCallerContext

    WDF_REQUEST_PARAMETERS requestParams;
    WDF_REQUEST_PARAMETERS_INIT( &requestParams );
    WdfRequestGetParameters( request, &requestParams );
    
    if( IOCTL_GET_DMA_SYSTEM_MEMORY != requestParams.Parameters.DeviceIoControl.IoControlCode )
    {
        WdfDeviceEnqueueRequest( device, request );
        return;
    }
    ...
    ... (see previous posts for more details)
    ...
    WdfDeviceEnqueueRequest( device, request );
    

    I re-ran my driver and the trace prints are printing, with data after the FPGA was configured to run (not sure on quality of the data, but that's a future topic).

    Thank you Tim & Peter

    Juan

  • OneKneeToeOneKneeToe Member Posts: 30

    On to the next "battle" and maybe one of the "hazard" areas of this approach - unmapping.
    https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-mmunmaplockedpages

    Note that if the call to MmMapLockedPages or MmMapLockedPagesSpecifyCache specified user mode, the caller must be in the context of the original process before calling MmUnmapLockedPages. This is because the unmapping operation occurs in the context of the calling process, and, if the context is incorrect, the unmapping operation could delete the address range of a random process.

    I now need to add an IOCTL to support unmapping. Without this, if I restart my User SW, the Virtual User Address is not valid for this new instance (as @Tim_Roberts pointed out previously).

    And the hazard - What happens if my User SW crashes or closes without unmapping? Would I even be able to perform another mapping when a previous one already exists?

    For now, I will march ahead and add an IOCTL to support unmapping. (Though, I will give VirtualAlloc with Large Pages another try later on.)

    Juan

  • anton_bassovanton_bassov Member Posts: 5,092

    At the moment, I am looking at allocating 500MB, but may want to up that to 1GB or even 2GB later. The data size can vary,
    but at the moment I have the FPGA configured to push 64MB packets, 2 packets a second. But the size and rate can also change.

    .........

    I need the guaranteed contiguous logical addresses provided by common buffer.

    Before you proceed with your "conquest" any further, I would rather recommend you to take the following points into consideration

    In case if the target system does not support IOMMU, the '(logical_address==physical_address)' statement is always going to be
    evaluated to TRUE. Don't you see any potential problem with finding a physically contiguous buffer in a GB range???

    Certainly, as long as the target machine is equipped with IOMMU, the system is in a position to present a physically non-contiguous buffer as
    a logically contiguous one to your device. However, you should bear in mind that not every machine in existence is going to support VT-d or AMD-VI.

    For example, if you check the link below you will see that the assumption of IOMMU presence on the target machine is a way too bold

    https://en.wikipedia.org/wiki/List_of_IOMMU-supporting_hardware

    As you can see it yourself, the list is not THAT long.

    Certainly, it is up to you to make a decision, but I think you may find it frustrating to find out that the design that you have
    spent so much time and effort on is, for all practical purposes, simply infeasible....

    Anton Bassov

  • Tim_RobertsTim_Roberts Member - All Emails Posts: 13,204

    I now need to add an IOCTL to support unmapping. Without this, if I restart my User SW, the Virtual User Address is not valid for this new instance (as @Tim_Roberts pointed out previously).

    And the hazard - What happens if my User SW crashes or closes without unmapping?

    Right. This why you have to handle EvtFileCleanup / EvtFileClose events, so you can clean up your dirty work when the app crashes or closes without cleanly shutting down. You CANNOT rely upon the application to clean up for you. You need to assume that all application writers are bozos, and malicious bozos at that.

    Would I even be able to perform another mapping when a previous one already exists?

    Not as the driver is currently written, but that's a driver problem, not an architectural problem. The operating system doesn't care how many times you map a piece of memory, but your driver certainly does.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 7,583

    Juan, just to be clear: You’re getting further and further down the road into a design that is going to wind-up either having serious edge-condition issues with security implications... OR one that’s going to need to include some clever code to take into account and handle these edge conditions. It’s a lot to ask to try to get this right one forum post at a time, without a good background in Windows kernel mode software.

    So... think about it. Do you really want to be doing this? You’re venturing into an area that I advise the students in my Advanced WDF seminar to avoid.

    Having said that, Mr, Roberts is right on target: You need to do the Unmap operation in your EvtFileCleanup Event Processing Callback. This is called in the context of the process that called CloseHandle. You don’t need or want a separate IOCTL for this. If you do it right, it’ll handle the unmap “automatically” even during abnormal thread termination.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • OneKneeToeOneKneeToe Member Posts: 30

    Hello @anton_bassov

    Yes, being able to secure GB(s) of physically contiguous memory is a concern, and one reason for having the driver perform the allocation as I assume (maybe wrongly) that the drive will have a better chance at getting that memory during start-up than User SW.

    The alternative is using VirtualAlloc Large Pages in User SW. The Large Page size would be a multiple of the FPGA page size. This way, even if the logical pages are not contiguous, one+ FPGA pages can be DMAd into one System Large Page.

    I know I am not completely in the clear with Large Pages as the system will still need to find enough contiguous memory for all the Large Pages needed to make up my desired buffer size. Since this is all happening during system start-up, and with 32GB of physical RAM, I would hope this is not a problem.

    FYI: when I attempted to allocate 1GB using the CommonBuffer approach, the allocation failed. I was successful in allocating a 750MB CommonBuffer, however.

    Thanks, Anton.
    Juan

  • OneKneeToeOneKneeToe Member Posts: 30

    Hello @Peter_Viscarola_(OSR) and @Tim_Roberts

    Thank you both again for your help and patience.

    Regarding the unmapping, I went ahead with the IOCTL approach, but not because I disagreed with anything you two said, but because I decided to end my experiment with the CommonBuffer approach.

    I had success allocating 500MB and having my FPGA DMA the data both system memory and the second external device, ping-ponging back and forth between External and System Memory. I ended up having SW perform a copy to the external device, to allow that device process the data.

    The results were seamless and I could not tell the difference between processed data comging directly from FPGA or via the System Memory.

    • A little change in plans, but it was enough to prove using System Memory as a landing spot it doable.
    • Not to mention the warning you both gave about this approach.
    • BTW: performing a print-screen caused a BSOD - a corner case ;-)

    My next step is to attempt allocating the buffer in User SW via VirtualAlloc and using Large Pages. Then feed that address to the Driver. Still not the conventional appraoch that Peter pointed me to, but I suspect better than the CommonBuffer approach I am using now.

    • I suspect I would still need to use EvtFileCleanup/EvtFileClose if my SW were to end abruptly.

    Thank you again! I am having a good time and learning a little bit as I go.
    Juan

  • OneKneeToeOneKneeToe Member Posts: 30
    edited December 2019

    Hello All:

    Just to close off this thread - the below code snippets is what I used in my driver to allocate a common buffer for DMA use and expose that common buffer to User SW for direct access.


    [Mods: With his permission, and in no way intending to show any disrespect, we have removed the code example Mr. OneKneeToe provided here. We did this because, in our judgement, it was likely to create more issues than it solved for future devs who encounter this thread. We understand that Mr. OneKneeToe had a very specific need, and that the solution he posted here met that need for him.

    Indeed, we are grateful to Mr. OneKneeToe for taking the time to "give back" to the Community by posting the code that worked for him, in his specific situation. We just don't want people to copy it in the future without being aware of its limitations.]

    Post edited by Peter_Viscarola_(OSR) on
  • anton_bassovanton_bassov Member Posts: 5,092

    The alternative is using VirtualAlloc Large Pages in User SW. The Large Page size would be a multiple of the FPGA page size.
    This way, even if the logical pages are not contiguous, one+ FPGA pages can be DMAd into one System Large Page.

    If you don't mind, could you please expand it a bit. There may be the case of simply mis-phrasing your idea here, but, judging from the above statement as it has been presented, you've got to learn quite a bit of system-level basics before even thinking about writing drivers....

    Anton Bassov

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 7,583
    edited December 2019

    Sigh! I’m really struggling with leaving your code examples inline in this thread... even given the time/effort you obviously put into posting them and formatting them properly.

    The code you’ve posted is all really just prototype code that shows how to call some APIs... but is in no way production quality or ready for use outside a lab/testing setting.

    In fact, you seem to have ignored just about every piece of advice I gave you in this thread.

    I’m not encouraged.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • OneKneeToeOneKneeToe Member Posts: 30

    Good Morning @Peter_Viscarola_(OSR)

    Ignoring your advice:

    When I read "Ignored" I felt it carried a negative connotation. I like to think that I listen to your advice, look to see if I could make use of it given the task at hand, and found that I could not take it. It definitely was not a case of, "what you say doesn't matter". Quite the opposite. The above approach was a means-to-an-end, if you will.

    In fact, since I met my intended goal with the above, I've stopped work on this approach. I will attempt the VirtualAlloc approach with a special Direct I/O call that stays uncompleted for the life of the program (something along those lines).

    Leave or Remove:

    I would not be hurt if you were to remove the post. My intention was to give back. I did add a caution section echoing the concerns and asking readers to read the thread. Nevertheless, you have been at this far longer than I and, after all, you are a moderator.

    Appreciation:

    Your help and advice is appreciated. There really is no other resource out there. Documentation exits, but it usually doesn't go into these details and nothing beats having an experienced person to talk things through with. Thanks to you and @Tim_Roberts.

    Best Regards,
    Juan

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Upcoming OSR Seminars
Writing WDF Drivers 21 Oct 2019 OSR Seminar Space & ONLINE
Internals & Software Drivers 18 Nov 2019 Dulles, VA
Kernel Debugging 30 Mar 2020 OSR Seminar Space
Developing Minifilters 27 Apr 2020 OSR Seminar Space & ONLINE