Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results
The page size for my x64 system is 4K, which I understand cannot be altered.
I have a PCIe based driver with an FPGA that allows me to change the page size from 4K to 8K, 16K, 32K etc. all the way up to 4GB for dynamic address translation.
I needed to change my common buffer size to 64KB so I added the device addresses for the 64K buffer and changed the page size to 64K on the FPGA. The code ran intermittently under certain conditions but was completely unreliable.
I used the same common buffer size of 64K but broke this into 16 x 4K chunks in the FPGA size and the code runs beautifully.
So, do I always need to use a page size of 4K for the FPGA on x64 systems?
It looks like you're new here. If you want to get involved, click one of these buttons!
Upcoming OSR Seminars | ||
---|---|---|
Writing WDF Drivers | 21 Oct 2019 | OSR Seminar Space & ONLINE |
Internals & Software Drivers | 18 Nov 2019 | Dulles, VA |
Kernel Debugging | 30 Mar 2020 | OSR Seminar Space |
Developing Minifilters | 27 Apr 2020 | OSR Seminar Space & ONLINE |
Comments
Did you set the alignment requirement for you driver to 64KB? https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdfdevice/nf-wdfdevice-wdfdevicesetalignmentrequirement?
No because I was concerned about this:
"If your driver specifies an alignment requirement that is greater that the computer's page size (PAGE_SIZE), the logical addresses that the WdfCommonBufferGetAlignedLogicalAddress method returns are always aligned to the specified alignment requirement, but the virtual addresses that the WdfCommonBufferGetAlignedVirtualAddress method returns might not be aligned to the alignment requirement."
Am I right to be concerned?
I suppose the logical address is the one I care about. I will branch off in subversion and try this.
I'm trying to think of how the virtual address alignment not matching the requirement would cause you an issue but I can't come up with one.
Maybe @Peter_Viscarola_(OSR) or one of the other true experts on the forum can provide a more definate answer.
I'm not at all sure what "an FPGA that allows me to change the page size" means. A PCIe device only cares about host addressing when its doing DMA.
In that case, you usually provide it a set of descriptors, where each one has a starting address and a length. In that case, as long as you're using a contiguous common buffer, it shouldn't matter whether you provide one 64kB descriptor or 16 4kB descriptors. They all describe the same space.
The virtual address alignment is totally irrelevant to your device. All it cares about is they physical addressing.
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.
I am a hardware guy
On the IP set up for the PCIe block I can change the page size of the address translation table and the number of pages.
The address translation table is where I store the common buffers' logical addresses.
I guess that extra gumph of logical and virtual addresses threw me a bit in the Microsoft document.
You can use large pages. I've used them in the past. Takes a little work to setup privileges. Here is more information: https://docs.microsoft.com/en-us/windows/win32/memory/large-page-support
Then you can do something like this: VirtualAlloc(NULL, n_bytes, MEM_COMMIT|MEM_LARGE_PAGES, PAGE_READWRITE);
I will look into large pages if and when we need them. Thank you and everyone else.
Maybe this link will throw some light on what I am trying to describe.
https://social.msdn.microsoft.com/Forums/Windows/en-US/03170c26-524f-4dfd-8053-91128953d6f0/changing-pagesize-value-in-driver?forum=wdk
Oh I forgot to add that I changed the Wdf Set Alignment to 65535 and I got the same wacky results with 64K page size. The code sometimes works, it sometimes does not.
Splitting the 64K common buffer over 16 4K PCIe buffers works though.
OK, but the key thing for you to remember is that the "page size" as used by your FPGA is totally unrelated to the page size used by the processor. You're talking about an address translation inside the FPGA. If you have problems when you change the FPGA's page size, then you aren't setting up the FPGA's address table correctly.
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.
Probably doesn't matter, but just in case, make WdmDmaVersionOverride 3 and proper profile.
So.... that works. So... why not just use that?
Since you're a "hardware guy" and to avoid any confusion: Whenever you read "Logical Address" in the WDK docs, substitute "Physical Address" in your mind. The use of the term "Logical Address" in the WDK is short for "Device Bus Logical Address" and not the same thing that the Intel processor docs refer to as "Logical Address." Technically speaking, from a Windows architecture perspective, "Device Bus Logical Address" is not really the same as a physical address... but for the purposes of our discussion we can say they're the same.
The Page Size -- which is the granularity at which blocks of physical memory are allocated and on which virtual addresses are translated is defined by Windows, and is not changeable. This page size is 4K (modulo large pages, which don't concern you).
I have no idea what the page size in your IP block relates to. The Common Buffer that you allocate in host memory (that IS what we're talking about, right?) will effectively be PHYSICALLY contiguous... but will by default being aligned on a 4K physical address boundary that is conveniDoesent to Windows.
Does that help any?
Peter
Peter Viscarola
OSR
@OSRDrivers
We are using the 4K method for now. I was just wondering if I was making a mistake by specifying in the FPGA 16 address pages x 64K page size of memory instead of 256 address pages x 4K page size (256 address pages uses up 256 out of 512 total locations in the address translation table of the PCIe IP core so that could be an issue in the future).
So I have 64K common buffers for either of the solutions.
For the 64K page size I simply specify the base logical address of the common buffers in each relevant location in the address translation table of the FPGA. I do this for 16 address pages of 16 x 64 common buffers.
For the 4K page size I use 16 contiguous locations in the address translation table with the common buffer base logical address and offsets of 4096. The 16 x 64 K common buffers are broken down into 256 x 4K buffers using this method.
It seems to me that the 64K method is simpler to write but I get strange results: sometimes it works (including overnight testing), sometimes I see 1/4 to 3/4 of the data in the buffer and sometimes I get nothing. Restarting and shutting down and restarting the machine causes changes in behavior.
The 4K method works, for now, however it does use up a lot of address table entries so my boss has asked me to continue investigating.
I say I am a "hardware guy" however I did work with the guy who wrote the original drivers for Linux for this system while I did the FPGA work. He went to work for Microsoft and that basically left me as the only person with a clue what goes on in this product. I then was asked to transfer the drivers to Windows.
It seems to ME this is an issue that lives on the FPGA side, not on the host side. On the host side, you're driver is giving your device the Device Bus Logical Address of your Common Buffer(s) and your FPGA is then... doing something, I still can't figure out what and what address translation is exactly being done on the FPGA. In a Common Buffer method, the device-side usually accesses the Common Buffer via DMA. So.... I'm confused.
Peter
Peter Viscarola
OSR
@OSRDrivers
With 16 address pages of 64K page size.
The lower 16 bits of the address bits[15:0] pass the 64K byte address. The next four msb's of the the address. bits[19:16}, selects one of the sisxteen pages.
I then pass this into the FPGA (hardware stuff) and the FPGA uses bits [19:16] to decode which selections from the address translation table to use.
So if I have sixteen 64K byte common buffers to use (0 through 15)
Address translation entry 0 has the base logical address of common buffer 0
Address translation entry 1 has the base logical address of common buffer 1
.
.
.
.
.
Address translation entry 15 has the base logical address of common buffer 15
So bits [19:16] of the address I provide to the hardware effectively selects the logical base address of one of the 16 common buffers.
Bits [15:0] of the address I provide are the offset into the 64K buffer.
OK. You've just explained how address decoding works. I'm sorry, but that doesn't answer my question or help me in any way.
I'm not sure it matters at this point anyhow. As long as you've got your device working with your driver, it's all good as far as I'm concerned.
Peter
Peter Viscarola
OSR
@OSRDrivers
I am off to the Intel FPGA forum to bug them for now. I do not think this is a Windows' issue at this point in time.
Okay I found what I looks like an issue.
I tried to use 16KB buffers instead of 64K and tried to set the alignment accordingly.
My code for setting up the DMA/common buffers is as follows:
define FULLBUFF 16384
//
// evio DMA_TRANSFER_ELEMENTS must be 16Kbyte aligned
//
WdfDeviceSetAlignmentRequirement(DevExt->Device,
16383);
When I run a trace on this I get.
00000009 evio 4 96 1 9 11\18\2019-15:14:37:413 - The DMA Profile is WdfDmaProfilePacket
00000010 evio 4 96 1 10 11\18\2019-15:14:37:413 Alignment is 16383
00000011 evio 4 96 1 11 11\18\2019-15:14:37:413 Low part 0 is 3f83d000
00000012 evio 4 96 1 12 11\18\2019-15:14:37:413 High part 0 is 0
00000013 evio 4 96 1 13 11\18\2019-15:14:37:413 ReadCommonBuffer 0xFFFFBE829843D000 (#0x3F83D000), length 16384
00000014 evio 4 96 1 14 11\18\2019-15:14:37:413 Alignment is 16383
00000015 evio 4 96 1 15 11\18\2019-15:14:37:413 Low part 1 is 3f8f4000
00000016 evio 4 96 1 16 11\18\2019-15:14:37:413 High part 1 is 0
00000017 evio 4 96 1 17 11\18\2019-15:14:37:413 ReadCommonBuffer 0xFFFFBE82984F4000 (#0x3F8F4000), length 16384
00000018 evio 4 96 1 18 11\18\2019-15:14:37:413 Alignment is 16383
00000019 evio 4 96 1 19 11\18\2019-15:14:37:413 Low part 2 is 3f8a8000
00000020 evio 4 96 1 20 11\18\2019-15:14:37:413 High part 2 is 0
00000021 evio 4 96 1 21 11\18\2019-15:14:37:413 ReadCommonBuffer 0xFFFFBE82984A8000 (#0x3F8A8000), length 16384
00000022 evio 4 96 1 22 11\18\2019-15:14:37:413 Alignment is 16383
00000023 evio 4 96 1 23 11\18\2019-15:14:37:413 Low part 3 is 3f914000
00000024 evio 4 96 1 24 11\18\2019-15:14:37:413 High part 3 is 0
00000025 evio 4 96 1 25 11\18\2019-15:14:37:413 ReadCommonBuffer 0xFFFFBE8298514000 (#0x3F914000), length 16384
00000026 evio 4 96 1 26 11\18\2019-15:14:37:413 Alignment is 16383
00000027 evio 4 96 1 27 11\18\2019-15:14:37:413 Low part 4 is 3f82c000
00000028 evio 4 96 1 28 11\18\2019-15:14:37:413 High part 4 is 0
00000029 evio 4 96 1 29 11\18\2019-15:14:37:413 ReadCommonBuffer 0xFFFFBE829842C000 (#0x3F82C000), length 16384
00000030 evio 4 96 1 30 11\18\2019-15:14:37:413 Alignment is 16383
00000031 evio 4 96 1 31 11\18\2019-15:14:37:413 Low part 5 is 3f8ae000
00000032 evio 4 96 1 32 11\18\2019-15:14:37:413 High part 5 is 0
00000033 evio 4 96 1 33 11\18\2019-15:14:37:413 ReadCommonBuffer 0xFFFFBE82984AE000 (#0x3F8AE000), length 16384
Looking at
Low part 0 is 3f83d000
For example.
That looks to me to be aligned on a 4KB boundary, which if true would explain why I am having issues (not with my hardware but with my code).
For a 16KB aligned boundary shouldn't the lowest 14 bits of the device address always be zero?
Am I reading something incorrectly, doing something stupid or something else?
Any help would be appreciated.
To add ADCBuffers[] are of type READ_PAGES which I define as follows
typedef struct READ_PAGES {
// Read
} READ_PAGES;
There is a long, sometime torturous, thread on the topic of common buffer alignment here. I strongly recommend you get a cup of coffee and read through the entire thing. Mr. Roberts does some excellent investigation.
Among the findings:
1) It appears you have to use a scatter/gather profile for your alignment to be respected.
2) Use WdfCommonBufferCreateWithConfig instead of relying on the device alignment to be respected.
I’ll be waiting to hear the result of your experiments!
Peter
Peter Viscarola
OSR
@OSRDrivers
The result of "WdfCommonBufferCreateWithConfig" is that it is looking that all my problems have apparently vanished, for now.
I will give that thread a read.
Thank you.
Even absent a "correct" fix, it's always trivially easy to generate whatever arbitrary alignment you need. If you need a 64k buffer with 16k alignment, just allocate an 80k buffer (64+16), and adjust your starting address to the first 16k aligned address.
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.
Wow, did anyone formally report this bug to MSFT? I agree with the comments in that thread that this is a BUG!
Thanks @Peter_Viscarola_(OSR) and @Tim_Roberts for some very useful info.
Eric
I didn't report it either formally or informally. Mr. Roberts should get the glory
Peter
Peter Viscarola
OSR
@OSRDrivers
And if you think that is a bug I should not mention........
I think driver writers should learn how to create actual devices.....and the other way around.
I am sooooo tired. But thank you.