Page size for x64

You can use large pages. I’ve used them in the past. Takes a little work to setup privileges. Here is more information: https://docs.microsoft.com/en-us/windows/win32/memory/large-page-support

Then you can do something like this: VirtualAlloc(NULL, n_bytes, MEM_COMMIT|MEM_LARGE_PAGES, PAGE_READWRITE);

I will look into large pages if and when we need them. Thank you and everyone else.

Maybe this link will throw some light on what I am trying to describe.

https://social.msdn.microsoft.com/Forums/Windows/en-US/03170c26-524f-4dfd-8053-91128953d6f0/changing-pagesize-value-in-driver?forum=wdk

Oh I forgot to add that I changed the Wdf Set Alignment to 65535 and I got the same wacky results with 64K page size. The code sometimes works, it sometimes does not.

Splitting the 64K common buffer over 16 4K PCIe buffers works though.

OK, but the key thing for you to remember is that the “page size” as used by your FPGA is totally unrelated to the page size used by the processor. You’re talking about an address translation inside the FPGA. If you have problems when you change the FPGA’s page size, then you aren’t setting up the FPGA’s address table correctly.

Probably doesn’t matter, but just in case, make WdmDmaVersionOverride 3 and proper profile.

So, do I always need to use a page size of 4K for the FPGA on x64 systems?

So… that works. So… why not just use that?

Since you’re a “hardware guy” and to avoid any confusion: Whenever you read “Logical Address” in the WDK docs, substitute “Physical Address” in your mind. The use of the term “Logical Address” in the WDK is short for “Device Bus Logical Address” and not the same thing that the Intel processor docs refer to as “Logical Address.” Technically speaking, from a Windows architecture perspective, “Device Bus Logical Address” is not really the same as a physical address… but for the purposes of our discussion we can say they’re the same.

The Page Size – which is the granularity at which blocks of physical memory are allocated and on which virtual addresses are translated is defined by Windows, and is not changeable. This page size is 4K (modulo large pages, which don’t concern you).

I have no idea what the page size in your IP block relates to. The Common Buffer that you allocate in host memory (that IS what we’re talking about, right?) will effectively be PHYSICALLY contiguous… but will by default being aligned on a 4K physical address boundary that is conveniDoesent to Windows.

Does that help any?

Peter

1 Like

We are using the 4K method for now. I was just wondering if I was making a mistake by specifying in the FPGA 16 address pages x 64K page size of memory instead of 256 address pages x 4K page size (256 address pages uses up 256 out of 512 total locations in the address translation table of the PCIe IP core so that could be an issue in the future).

So I have 64K common buffers for either of the solutions.

For the 64K page size I simply specify the base logical address of the common buffers in each relevant location in the address translation table of the FPGA. I do this for 16 address pages of 16 x 64 common buffers.

For the 4K page size I use 16 contiguous locations in the address translation table with the common buffer base logical address and offsets of 4096. The 16 x 64 K common buffers are broken down into 256 x 4K buffers using this method.

It seems to me that the 64K method is simpler to write but I get strange results: sometimes it works (including overnight testing), sometimes I see 1/4 to 3/4 of the data in the buffer and sometimes I get nothing. Restarting and shutting down and restarting the machine causes changes in behavior.

The 4K method works, for now, however it does use up a lot of address table entries so my boss has asked me to continue investigating.

I say I am a “hardware guy” however I did work with the guy who wrote the original drivers for Linux for this system while I did the FPGA work. He went to work for Microsoft and that basically left me as the only person with a clue what goes on in this product. I then was asked to transfer the drivers to Windows.

It seems to me that the 64K method is simpler to write but I get strange results

It seems to ME this is an issue that lives on the FPGA side, not on the host side. On the host side, you’re driver is giving your device the Device Bus Logical Address of your Common Buffer(s) and your FPGA is then… doing something, I still can’t figure out what and what address translation is exactly being done on the FPGA. In a Common Buffer method, the device-side usually accesses the Common Buffer via DMA. So… I’m confused.

Peter

With 16 address pages of 64K page size.

The lower 16 bits of the address bits[15:0] pass the 64K byte address. The next four msb’s of the the address. bits[19:16}, selects one of the sisxteen pages.

I then pass this into the FPGA (hardware stuff) and the FPGA uses bits [19:16] to decode which selections from the address translation table to use.

So if I have sixteen 64K byte common buffers to use (0 through 15)

Address translation entry 0 has the base logical address of common buffer 0
Address translation entry 1 has the base logical address of common buffer 1
.
.
.
.
.
Address translation entry 15 has the base logical address of common buffer 15

So bits [19:16] of the address I provide to the hardware effectively selects the logical base address of one of the 16 common buffers.

Bits [15:0] of the address I provide are the offset into the 64K buffer.

OK. You’ve just explained how address decoding works. I’m sorry, but that doesn’t answer my question or help me in any way.

I’m not sure it matters at this point anyhow. As long as you’ve got your device working with your driver, it’s all good as far as I’m concerned.

Peter

1 Like

My bad Peter. I confused Tim with you. I am off to the Intel FPGA forum to bug them for now. I do not think this is a Windows’ issue at this point in time.

Thank you all though!!!

Okay I found what I looks like an issue.

I tried to use 16KB buffers instead of 64K and tried to set the alignment accordingly.

My code for setting up the DMA/common buffers is as follows:

#define FULLBUFF 16384

//
// evio DMA_TRANSFER_ELEMENTS must be 16Kbyte aligned
//
WdfDeviceSetAlignmentRequirement(DevExt->Device,
16383);

//
// Create a new DMA Enabler instance.
// for Common Buffer
//
{
	WDF_DMA_ENABLER_CONFIG   dmaConfig;

	WDF_DMA_ENABLER_CONFIG_INIT(&dmaConfig,
		WdfDmaProfilePacket,
		DevExt->MaximumTransferLength);

	TraceEvents(TRACE_LEVEL_INFORMATION, DBG_PNP,
		" - The DMA Profile is WdfDmaProfilePacket");

	//
	// Opt-in to DMA version 3, which is required by
	// WdfDmaTransactionSetSingleTransferRequirement
	//
	dmaConfig.WdmDmaVersionOverride = 3;

	status = WdfDmaEnablerCreate(DevExt->Device,
		&dmaConfig,
		WDF_NO_OBJECT_ATTRIBUTES,
		&DevExt->DmaEnabler);

	if (!NT_SUCCESS(status)) {

		TraceEvents(TRACE_LEVEL_ERROR, DBG_PNP,
			"WdfDmaEnablerCreate failed: %!STATUS!", status);
		return status;
	}

}



//
// Allocate common buffer for building reads
//

for (i = 0; i < 6; i++)
{
	status = WdfCommonBufferCreate(DevExt->DmaEnabler,
		FULLBUFF,
		WDF_NO_OBJECT_ATTRIBUTES,
		&DevExt->ADCBuffers[i].ReadCommonBuffer);

	if (!NT_SUCCESS(status)) {
		TraceEvents(TRACE_LEVEL_ERROR, DBG_PNP,
			"WdfCommonBufferCreate (read) failed %!STATUS!", status);
		return status;
	}

	DevExt->ADCBuffers[i].ReadCommonBufferSize = FULLBUFF;

	DevExt->ADCBuffers[i].ReadCommonBufferBase =
		WdfCommonBufferGetAlignedVirtualAddress(DevExt->ADCBuffers[i].ReadCommonBuffer);

	DevExt->ADCBuffers[i].ReadCommonBufferBaseLA =
		WdfCommonBufferGetAlignedLogicalAddress(DevExt->ADCBuffers[i].ReadCommonBuffer);

	TraceEvents(TRACE_LEVEL_ERROR, DBG_PNP,
		"Alignment is %d", WdfDeviceGetAlignmentRequirement(DevExt->Device));

	TraceEvents(TRACE_LEVEL_ERROR, DBG_PNP,
		"Low part %d is %x", i, DevExt->ADCBuffers[i].ReadCommonBufferBaseLA.LowPart);

	TraceEvents(TRACE_LEVEL_ERROR, DBG_PNP,
		"High part %d is %x", i, DevExt->ADCBuffers[i].ReadCommonBufferBaseLA.HighPart);

	RtlZeroMemory(DevExt->ADCBuffers[i].ReadCommonBufferBase,
		DevExt->ADCBuffers[i].ReadCommonBufferSize);

	TraceEvents(TRACE_LEVEL_INFORMATION, DBG_PNP,
		"ReadCommonBuffer  0x%p  (#0x%I64X), length %I64d",
		DevExt->ADCBuffers[i].ReadCommonBufferBase,
		DevExt->ADCBuffers[i].ReadCommonBufferBaseLA.QuadPart,
		WdfCommonBufferGetLength(DevExt->ADCBuffers[i].ReadCommonBuffer));
}

When I run a trace on this I get.

00000009 evio 4 96 1 9 11\18\2019-15:14:37:413 - The DMA Profile is WdfDmaProfilePacket
00000010 evio 4 96 1 10 11\18\2019-15:14:37:413 Alignment is 16383
00000011 evio 4 96 1 11 11\18\2019-15:14:37:413 Low part 0 is 3f83d000
00000012 evio 4 96 1 12 11\18\2019-15:14:37:413 High part 0 is 0
00000013 evio 4 96 1 13 11\18\2019-15:14:37:413 ReadCommonBuffer 0xFFFFBE829843D000 (#0x3F83D000), length 16384
00000014 evio 4 96 1 14 11\18\2019-15:14:37:413 Alignment is 16383
00000015 evio 4 96 1 15 11\18\2019-15:14:37:413 Low part 1 is 3f8f4000
00000016 evio 4 96 1 16 11\18\2019-15:14:37:413 High part 1 is 0
00000017 evio 4 96 1 17 11\18\2019-15:14:37:413 ReadCommonBuffer 0xFFFFBE82984F4000 (#0x3F8F4000), length 16384
00000018 evio 4 96 1 18 11\18\2019-15:14:37:413 Alignment is 16383
00000019 evio 4 96 1 19 11\18\2019-15:14:37:413 Low part 2 is 3f8a8000
00000020 evio 4 96 1 20 11\18\2019-15:14:37:413 High part 2 is 0
00000021 evio 4 96 1 21 11\18\2019-15:14:37:413 ReadCommonBuffer 0xFFFFBE82984A8000 (#0x3F8A8000), length 16384
00000022 evio 4 96 1 22 11\18\2019-15:14:37:413 Alignment is 16383
00000023 evio 4 96 1 23 11\18\2019-15:14:37:413 Low part 3 is 3f914000
00000024 evio 4 96 1 24 11\18\2019-15:14:37:413 High part 3 is 0
00000025 evio 4 96 1 25 11\18\2019-15:14:37:413 ReadCommonBuffer 0xFFFFBE8298514000 (#0x3F914000), length 16384
00000026 evio 4 96 1 26 11\18\2019-15:14:37:413 Alignment is 16383
00000027 evio 4 96 1 27 11\18\2019-15:14:37:413 Low part 4 is 3f82c000
00000028 evio 4 96 1 28 11\18\2019-15:14:37:413 High part 4 is 0
00000029 evio 4 96 1 29 11\18\2019-15:14:37:413 ReadCommonBuffer 0xFFFFBE829842C000 (#0x3F82C000), length 16384
00000030 evio 4 96 1 30 11\18\2019-15:14:37:413 Alignment is 16383
00000031 evio 4 96 1 31 11\18\2019-15:14:37:413 Low part 5 is 3f8ae000
00000032 evio 4 96 1 32 11\18\2019-15:14:37:413 High part 5 is 0
00000033 evio 4 96 1 33 11\18\2019-15:14:37:413 ReadCommonBuffer 0xFFFFBE82984AE000 (#0x3F8AE000), length 16384

Looking at

Low part 0 is 3f83d000

For example.

That looks to me to be aligned on a 4KB boundary, which if true would explain why I am having issues (not with my hardware but with my code).

For a 16KB aligned boundary shouldn’t the lowest 14 bits of the device address always be zero?

Am I reading something incorrectly, doing something stupid or something else?

Any help would be appreciated.

To add ADCBuffers are of type READ_PAGES which I define as follows

typedef struct READ_PAGES {
// Read

WDFCOMMONBUFFER         ReadCommonBuffer;
size_t                  ReadCommonBufferSize;
_Field_size_(ReadCommonBufferSize) PUCHAR ReadCommonBufferBase;
PHYSICAL_ADDRESS        ReadCommonBufferBaseLA;   // Logical Address

} READ_PAGES;

There is a long, sometime torturous, thread on the topic of common buffer alignment here. I strongly recommend you get a cup of coffee and read through the entire thing. Mr. Roberts does some excellent investigation.

Among the findings:

  1. It appears you have to use a scatter/gather profile for your alignment to be respected.

  2. Use WdfCommonBufferCreateWithConfig instead of relying on the device alignment to be respected.

I’ll be waiting to hear the result of your experiments!

Peter

3 Likes

The result of “WdfCommonBufferCreateWithConfig” is that it is looking that all my problems have apparently vanished, for now.

I will give that thread a read.

Thank you.

Even absent a “correct” fix, it’s always trivially easy to generate whatever arbitrary alignment you need. If you need a 64k buffer with 16k alignment, just allocate an 80k buffer (64+16), and adjust your starting address to the first 16k aligned address.

void * myptr = allocate_common_buffer( 65536 + 16384);
void * aligned = (void *)((UINT_PTR)myptr) + 16384 - ((UINT_PTR)myptr) & 16383));

@“Peter_Viscarola_(OSR)” said:
There is a long, sometime torturous, thread on the topic of common buffer alignment here. I strongly recommend you get a cup of coffee and read through the entire thing. Mr. Roberts does some excellent investigation.
Wow, did anyone formally report this bug to MSFT? I agree with the comments in that thread that this is a BUG!
Thanks @“Peter_Viscarola_(OSR)” and @Tim_Roberts for some very useful info.

Eric

Wow, did anyone formally report this bug to MSFT?

I didn’t report it either formally or informally. Mr. Roberts should get the glory :slight_smile:

Peter

Anyhow everything is working great. Thank you all. And if you think that is a bug I should not mention… I think driver writers should learn how to create actual devices…and the other way around. I am sooooo tired. But thank you.