Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Before Posting...
Please check out the Community Guidelines in the Announcements and Administration Category.

Page size for x64

rob18767rob18767 Member Posts: 23

The page size for my x64 system is 4K, which I understand cannot be altered.

I have a PCIe based driver with an FPGA that allows me to change the page size from 4K to 8K, 16K, 32K etc. all the way up to 4GB for dynamic address translation.

I needed to change my common buffer size to 64KB so I added the device addresses for the 64K buffer and changed the page size to 64K on the FPGA. The code ran intermittently under certain conditions but was completely unreliable.

I used the same common buffer size of 64K but broke this into 16 x 4K chunks in the FPGA size and the code runs beautifully.

So, do I always need to use a page size of 4K for the FPGA on x64 systems?

Comments

  • rob18767rob18767 Member Posts: 23

    No because I was concerned about this:

    "If your driver specifies an alignment requirement that is greater that the computer's page size (PAGE_SIZE), the logical addresses that the WdfCommonBufferGetAlignedLogicalAddress method returns are always aligned to the specified alignment requirement, but the virtual addresses that the WdfCommonBufferGetAlignedVirtualAddress method returns might not be aligned to the alignment requirement."

    Am I right to be concerned?

    I suppose the logical address is the one I care about. I will branch off in subversion and try this.

  • Eric_WittmayerEric_Wittmayer Member Posts: 31

    I'm trying to think of how the virtual address alignment not matching the requirement would cause you an issue but I can't come up with one.
    Maybe @Peter_Viscarola_(OSR) or one of the other true experts on the forum can provide a more definate answer. :)

  • Tim_RobertsTim_Roberts Member - All Emails Posts: 13,158

    I'm not at all sure what "an FPGA that allows me to change the page size" means. A PCIe device only cares about host addressing when its doing DMA.
    In that case, you usually provide it a set of descriptors, where each one has a starting address and a length. In that case, as long as you're using a contiguous common buffer, it shouldn't matter whether you provide one 64kB descriptor or 16 4kB descriptors. They all describe the same space.

    The virtual address alignment is totally irrelevant to your device. All it cares about is they physical addressing.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • rob18767rob18767 Member Posts: 23

    @Tim_Roberts said:
    I'm not at all sure what "an FPGA that allows me to change the page size" means. A PCIe device only cares about host addressing when its doing DMA.

    I am a hardware guy :)

    On the IP set up for the PCIe block I can change the page size of the address translation table and the number of pages.

    The address translation table is where I store the common buffers' logical addresses.

    I guess that extra gumph of logical and virtual addresses threw me a bit in the Microsoft document.

  • Jamey_KirbyJamey_Kirby Member - All Emails Posts: 439

    You can use large pages. I've used them in the past. Takes a little work to setup privileges. Here is more information: https://docs.microsoft.com/en-us/windows/win32/memory/large-page-support

    Then you can do something like this: VirtualAlloc(NULL, n_bytes, MEM_COMMIT|MEM_LARGE_PAGES, PAGE_READWRITE);

  • rob18767rob18767 Member Posts: 23

    I will look into large pages if and when we need them. Thank you and everyone else.

    Maybe this link will throw some light on what I am trying to describe.

    https://social.msdn.microsoft.com/Forums/Windows/en-US/03170c26-524f-4dfd-8053-91128953d6f0/changing-pagesize-value-in-driver?forum=wdk

  • rob18767rob18767 Member Posts: 23

    Oh I forgot to add that I changed the Wdf Set Alignment to 65535 and I got the same wacky results with 64K page size. The code sometimes works, it sometimes does not.

    Splitting the 64K common buffer over 16 4K PCIe buffers works though.

  • Tim_RobertsTim_Roberts Member - All Emails Posts: 13,158

    OK, but the key thing for you to remember is that the "page size" as used by your FPGA is totally unrelated to the page size used by the processor. You're talking about an address translation inside the FPGA. If you have problems when you change the FPGA's page size, then you aren't setting up the FPGA's address table correctly.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • msrmsr Member Posts: 342

    Probably doesn't matter, but just in case, make WdmDmaVersionOverride 3 and proper profile.

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 7,505

    So, do I always need to use a page size of 4K for the FPGA on x64 systems?

    So.... that works. So... why not just use that?

    Since you're a "hardware guy" and to avoid any confusion: Whenever you read "Logical Address" in the WDK docs, substitute "Physical Address" in your mind. The use of the term "Logical Address" in the WDK is short for "Device Bus Logical Address" and not the same thing that the Intel processor docs refer to as "Logical Address." Technically speaking, from a Windows architecture perspective, "Device Bus Logical Address" is not really the same as a physical address... but for the purposes of our discussion we can say they're the same.

    The Page Size -- which is the granularity at which blocks of physical memory are allocated and on which virtual addresses are translated is defined by Windows, and is not changeable. This page size is 4K (modulo large pages, which don't concern you).

    I have no idea what the page size in your IP block relates to. The Common Buffer that you allocate in host memory (that IS what we're talking about, right?) will effectively be PHYSICALLY contiguous... but will by default being aligned on a 4K physical address boundary that is conveniDoesent to Windows.

    Does that help any?

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • rob18767rob18767 Member Posts: 23

    We are using the 4K method for now. I was just wondering if I was making a mistake by specifying in the FPGA 16 address pages x 64K page size of memory instead of 256 address pages x 4K page size (256 address pages uses up 256 out of 512 total locations in the address translation table of the PCIe IP core so that could be an issue in the future).

    So I have 64K common buffers for either of the solutions.

    For the 64K page size I simply specify the base logical address of the common buffers in each relevant location in the address translation table of the FPGA. I do this for 16 address pages of 16 x 64 common buffers.

    For the 4K page size I use 16 contiguous locations in the address translation table with the common buffer base logical address and offsets of 4096. The 16 x 64 K common buffers are broken down into 256 x 4K buffers using this method.

    It seems to me that the 64K method is simpler to write but I get strange results: sometimes it works (including overnight testing), sometimes I see 1/4 to 3/4 of the data in the buffer and sometimes I get nothing. Restarting and shutting down and restarting the machine causes changes in behavior.

    The 4K method works, for now, however it does use up a lot of address table entries so my boss has asked me to continue investigating.

    I say I am a "hardware guy" however I did work with the guy who wrote the original drivers for Linux for this system while I did the FPGA work. He went to work for Microsoft and that basically left me as the only person with a clue what goes on in this product. I then was asked to transfer the drivers to Windows.

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 7,505

    It seems to me that the 64K method is simpler to write but I get strange results

    It seems to ME this is an issue that lives on the FPGA side, not on the host side. On the host side, you're driver is giving your device the Device Bus Logical Address of your Common Buffer(s) and your FPGA is then... doing something, I still can't figure out what and what address translation is exactly being done on the FPGA. In a Common Buffer method, the device-side usually accesses the Common Buffer via DMA. So.... I'm confused.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • rob18767rob18767 Member Posts: 23

    With 16 address pages of 64K page size.

    The lower 16 bits of the address bits[15:0] pass the 64K byte address. The next four msb's of the the address. bits[19:16}, selects one of the sisxteen pages.

    I then pass this into the FPGA (hardware stuff) and the FPGA uses bits [19:16] to decode which selections from the address translation table to use.

    So if I have sixteen 64K byte common buffers to use (0 through 15)

    Address translation entry 0 has the base logical address of common buffer 0
    Address translation entry 1 has the base logical address of common buffer 1
    .
    .
    .
    .
    .
    Address translation entry 15 has the base logical address of common buffer 15

    So bits [19:16] of the address I provide to the hardware effectively selects the logical base address of one of the 16 common buffers.

    Bits [15:0] of the address I provide are the offset into the 64K buffer.

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 7,505

    OK. You've just explained how address decoding works. I'm sorry, but that doesn't answer my question or help me in any way.

    I'm not sure it matters at this point anyhow. As long as you've got your device working with your driver, it's all good as far as I'm concerned.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • rob18767rob18767 Member Posts: 23
    My bad Peter. I confused Tim with you.

    I am off to the Intel FPGA forum to bug them for now. I do not think this is a Windows' issue at this point in time.
  • rob18767rob18767 Member Posts: 23
    Thank you all though!!!!
  • rob18767rob18767 Member Posts: 23
    edited November 18

    Okay I found what I looks like an issue.

    I tried to use 16KB buffers instead of 64K and tried to set the alignment accordingly.

    My code for setting up the DMA/common buffers is as follows:

    define FULLBUFF 16384

    //
    // evio DMA_TRANSFER_ELEMENTS must be 16Kbyte aligned
    //
    WdfDeviceSetAlignmentRequirement(DevExt->Device,
    16383);

    //
    // Create a new DMA Enabler instance.
    // for Common Buffer
    //
    {
        WDF_DMA_ENABLER_CONFIG   dmaConfig;
    
        WDF_DMA_ENABLER_CONFIG_INIT(&dmaConfig,
            WdfDmaProfilePacket,
            DevExt->MaximumTransferLength);
    
        TraceEvents(TRACE_LEVEL_INFORMATION, DBG_PNP,
            " - The DMA Profile is WdfDmaProfilePacket");
    
        //
        // Opt-in to DMA version 3, which is required by
        // WdfDmaTransactionSetSingleTransferRequirement
        //
        dmaConfig.WdmDmaVersionOverride = 3;
    
        status = WdfDmaEnablerCreate(DevExt->Device,
            &dmaConfig,
            WDF_NO_OBJECT_ATTRIBUTES,
            &DevExt->DmaEnabler);
    
        if (!NT_SUCCESS(status)) {
    
            TraceEvents(TRACE_LEVEL_ERROR, DBG_PNP,
                "WdfDmaEnablerCreate failed: %!STATUS!", status);
            return status;
        }
    
    }
    
    
    
    //
    // Allocate common buffer for building reads
    //
    
    for (i = 0; i < 6; i++)
    {
        status = WdfCommonBufferCreate(DevExt->DmaEnabler,
            FULLBUFF,
            WDF_NO_OBJECT_ATTRIBUTES,
            &DevExt->ADCBuffers[i].ReadCommonBuffer);
    
        if (!NT_SUCCESS(status)) {
            TraceEvents(TRACE_LEVEL_ERROR, DBG_PNP,
                "WdfCommonBufferCreate (read) failed %!STATUS!", status);
            return status;
        }
    
        DevExt->ADCBuffers[i].ReadCommonBufferSize = FULLBUFF;
    
        DevExt->ADCBuffers[i].ReadCommonBufferBase =
            WdfCommonBufferGetAlignedVirtualAddress(DevExt->ADCBuffers[i].ReadCommonBuffer);
    
        DevExt->ADCBuffers[i].ReadCommonBufferBaseLA =
            WdfCommonBufferGetAlignedLogicalAddress(DevExt->ADCBuffers[i].ReadCommonBuffer);
    
        TraceEvents(TRACE_LEVEL_ERROR, DBG_PNP,
            "Alignment is %d", WdfDeviceGetAlignmentRequirement(DevExt->Device));
    
        TraceEvents(TRACE_LEVEL_ERROR, DBG_PNP,
            "Low part %d is %x", i, DevExt->ADCBuffers[i].ReadCommonBufferBaseLA.LowPart);
    
        TraceEvents(TRACE_LEVEL_ERROR, DBG_PNP,
            "High part %d is %x", i, DevExt->ADCBuffers[i].ReadCommonBufferBaseLA.HighPart);
    
        RtlZeroMemory(DevExt->ADCBuffers[i].ReadCommonBufferBase,
            DevExt->ADCBuffers[i].ReadCommonBufferSize);
    
        TraceEvents(TRACE_LEVEL_INFORMATION, DBG_PNP,
            "ReadCommonBuffer  0x%p  (#0x%I64X), length %I64d",
            DevExt->ADCBuffers[i].ReadCommonBufferBase,
            DevExt->ADCBuffers[i].ReadCommonBufferBaseLA.QuadPart,
            WdfCommonBufferGetLength(DevExt->ADCBuffers[i].ReadCommonBuffer));
    }
    

    When I run a trace on this I get.

    00000009 evio 4 96 1 9 11\18\2019-15:14:37:413 - The DMA Profile is WdfDmaProfilePacket
    00000010 evio 4 96 1 10 11\18\2019-15:14:37:413 Alignment is 16383
    00000011 evio 4 96 1 11 11\18\2019-15:14:37:413 Low part 0 is 3f83d000
    00000012 evio 4 96 1 12 11\18\2019-15:14:37:413 High part 0 is 0
    00000013 evio 4 96 1 13 11\18\2019-15:14:37:413 ReadCommonBuffer 0xFFFFBE829843D000 (#0x3F83D000), length 16384
    00000014 evio 4 96 1 14 11\18\2019-15:14:37:413 Alignment is 16383
    00000015 evio 4 96 1 15 11\18\2019-15:14:37:413 Low part 1 is 3f8f4000
    00000016 evio 4 96 1 16 11\18\2019-15:14:37:413 High part 1 is 0
    00000017 evio 4 96 1 17 11\18\2019-15:14:37:413 ReadCommonBuffer 0xFFFFBE82984F4000 (#0x3F8F4000), length 16384
    00000018 evio 4 96 1 18 11\18\2019-15:14:37:413 Alignment is 16383
    00000019 evio 4 96 1 19 11\18\2019-15:14:37:413 Low part 2 is 3f8a8000
    00000020 evio 4 96 1 20 11\18\2019-15:14:37:413 High part 2 is 0
    00000021 evio 4 96 1 21 11\18\2019-15:14:37:413 ReadCommonBuffer 0xFFFFBE82984A8000 (#0x3F8A8000), length 16384
    00000022 evio 4 96 1 22 11\18\2019-15:14:37:413 Alignment is 16383
    00000023 evio 4 96 1 23 11\18\2019-15:14:37:413 Low part 3 is 3f914000
    00000024 evio 4 96 1 24 11\18\2019-15:14:37:413 High part 3 is 0
    00000025 evio 4 96 1 25 11\18\2019-15:14:37:413 ReadCommonBuffer 0xFFFFBE8298514000 (#0x3F914000), length 16384
    00000026 evio 4 96 1 26 11\18\2019-15:14:37:413 Alignment is 16383
    00000027 evio 4 96 1 27 11\18\2019-15:14:37:413 Low part 4 is 3f82c000
    00000028 evio 4 96 1 28 11\18\2019-15:14:37:413 High part 4 is 0
    00000029 evio 4 96 1 29 11\18\2019-15:14:37:413 ReadCommonBuffer 0xFFFFBE829842C000 (#0x3F82C000), length 16384
    00000030 evio 4 96 1 30 11\18\2019-15:14:37:413 Alignment is 16383
    00000031 evio 4 96 1 31 11\18\2019-15:14:37:413 Low part 5 is 3f8ae000
    00000032 evio 4 96 1 32 11\18\2019-15:14:37:413 High part 5 is 0
    00000033 evio 4 96 1 33 11\18\2019-15:14:37:413 ReadCommonBuffer 0xFFFFBE82984AE000 (#0x3F8AE000), length 16384

    Looking at

    Low part 0 is 3f83d000

    For example.

    That looks to me to be aligned on a 4KB boundary, which if true would explain why I am having issues (not with my hardware but with my code).

    For a 16KB aligned boundary shouldn't the lowest 14 bits of the device address always be zero?

    Am I reading something incorrectly, doing something stupid or something else?

    Any help would be appreciated.

  • rob18767rob18767 Member Posts: 23

    To add ADCBuffers[] are of type READ_PAGES which I define as follows

    typedef struct READ_PAGES {
    // Read

    WDFCOMMONBUFFER         ReadCommonBuffer;
    size_t                  ReadCommonBufferSize;
    _Field_size_(ReadCommonBufferSize) PUCHAR ReadCommonBufferBase;
    PHYSICAL_ADDRESS        ReadCommonBufferBaseLA;   // Logical Address
    

    } READ_PAGES;

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 7,505

    There is a long, sometime torturous, thread on the topic of common buffer alignment here. I strongly recommend you get a cup of coffee and read through the entire thing. Mr. Roberts does some excellent investigation.

    Among the findings:

    1) It appears you have to use a scatter/gather profile for your alignment to be respected.

    2) Use WdfCommonBufferCreateWithConfig instead of relying on the device alignment to be respected.

    I’ll be waiting to hear the result of your experiments!

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • rob18767rob18767 Member Posts: 23

    The result of "WdfCommonBufferCreateWithConfig" is that it is looking that all my problems have apparently vanished, for now.

    I will give that thread a read.

    Thank you.

  • Tim_RobertsTim_Roberts Member - All Emails Posts: 13,158

    Even absent a "correct" fix, it's always trivially easy to generate whatever arbitrary alignment you need. If you need a 64k buffer with 16k alignment, just allocate an 80k buffer (64+16), and adjust your starting address to the first 16k aligned address.

    void * myptr = allocate_common_buffer( 65536 + 16384);
    void * aligned = (void *)((UINT_PTR)myptr) + 16384 - ((UINT_PTR)myptr) & 16383));
    

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • Eric_WittmayerEric_Wittmayer Member Posts: 31

    @Peter_Viscarola_(OSR) said:
    There is a long, sometime torturous, thread on the topic of common buffer alignment here. I strongly recommend you get a cup of coffee and read through the entire thing. Mr. Roberts does some excellent investigation.

    Wow, did anyone formally report this bug to MSFT? I agree with the comments in that thread that this is a BUG!
    Thanks @Peter_Viscarola_(OSR) and @Tim_Roberts for some very useful info.

    Eric

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 7,505

    Wow, did anyone formally report this bug to MSFT?

    I didn't report it either formally or informally. Mr. Roberts should get the glory :)

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • rob18767rob18767 Member Posts: 23
    Anyhow everything is working great. Thank you all.

    And if you think that is a bug I should not mention........

    I think driver writers should learn how to create actual devices.....and the other way around.

    I am sooooo tired. But thank you.
Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Upcoming OSR Seminars
Writing WDF Drivers 21 Oct 2019 OSR Seminar Space & ONLINE
Internals & Software Drivers 18 Nov 2019 Dulles, VA
Kernel Debugging 30 Mar 2020 OSR Seminar Space
Developing Minifilters 27 Apr 2020 OSR Seminar Space & ONLINE