Win10 Alignment Size Limit of Common Buffers?

Answers Needed

  • Is there a limit on the boundaries one can align a common buffer?
  • Is there a way to translate a physical address to a virtual address?

Details?

We wrote a driver for XPe driver to talk to our PCI-based PLX switch, which has been working well. The driver allocates 3 buffers:

  1. Command flags (24 bytes) that reside in the scratchpad on the device,
  2. Mailbox for commands and replies (16MB), and
  3. Data buffer (32MB) for transferring large amounts of data.

The device requires the data buffers be aligned to 16MB and 32MB respectively.

We ported the driver to WES7 and had to reverse the allocation of the mailbox and data buffers to achieve the proper alignment. Now we are trying to port this driver to Win10 and are having problems so I believe we were just lucky in WES7?

The command flags work fine. We use WdfCommonBufferCreate to allocate the mailbox and data buffers. See the code at the bottom of this post.

Fundamental Problem…

We call WdfDeviceSetAlignmentRequirement to request the alignment before calling WdfCommonBufferCreate. We use WdfDeviceGetAlignmentRequirement and NTSTATUS to ensure the proper size buffers are allocated and the alignment has been properly set and both seem to be in order; however, the physical/logical addresses returned via WdfCommonBufferGetAlignedLogicalAddress are never properly aligned.

We also see unexpectedly close virtual addresses for the two requested buffers. We understand that the virtual addresses will NOT necessarily also have the same alignment as the physical addresses since our buffer size > our page size.

Our Approach to Address Alignment Problem…

The code post below is for function AllocatePCIBuffer2 which requests 2x the necessary buffer space and then determine the base address that has the desired alignment. I then computed the delta from the original base address and added that to the base virtual address returned by WdfCommonBufferGetAlignedVirtualAddress. The driver loads fine at boot but this updated driver code crashes the system the first time an app calls in to request the virtual addresses for its user address space. I believe this is because virtual address space isn?t contiguous but actually references PDEs/PTEs. I cannot find any examples/description of how to translate physical addresses to virtual ? probably because there is no good way to do this and there is probably a better way to go about this; is there?

Questions

  1. What is the limit on common buffer alignment? Our page size is 4K; I know we can configure it to be 2M but that is still shy of the buffer sizes we need.
  2. Is there a way to translate a physical address to a virtual address? I suppose one may be able to run thru all the PTEs!
  3. Most importantly, Should we be using a different approach?

A sample WinDbg output our driver generates is at the bottom of this post.

// ==============================================================================
// ==============================================================================
NTSTATUS AllocatePCIBuffer2(WDFDEVICE device, size_t size, WDFCOMMONBUFFER* buffer,
PVOID* virtualAddress, PHYSICAL_ADDRESS* physicalAddress)
{
NTSTATUS status = STATUS_SUCCESS;

// Entry housekeeping
KdPrintEx((DPFLTR_IHVDRIVER_ID, DPFLTR_INFO_LEVEL,“\n->AllocatePCIBuffer2\n”));
PAGED_CODE();

// No scatter/gather, 32-bit addressing, non-duplex, 4096-byte alignment.
WDFDMAENABLER enabler;
WDF_DMA_ENABLER_CONFIG dmaConfig;
WDF_DMA_ENABLER_CONFIG_INIT(&dmaConfig, WdfDmaProfilePacket, size);

WdfDeviceSetAlignmentRequirement(device, size-1);
if (!NT_SUCCESS(status))
{
KdPrintEx((DPFLTR_IHVDRIVER_ID, DPFLTR_INFO_LEVEL," *** WdfDeviceSetAlignmentRequirement failed with status 0x%08X\n", status));
return status;
}
size_t alignment = WdfDeviceGetAlignmentRequirement(device);
KdPrintEx((DPFLTR_IHVDRIVER_ID, DPFLTR_INFO_LEVEL,“AllocatePCIBuffer2 size = 0x%08X (%lu); alignment = 0x%08X (%lu); status 0x%08X\n”, size, size, alignment, alignment, status));

WdfDmaEnablerCreate(device, &dmaConfig, WDF_NO_OBJECT_ATTRIBUTES, &enabler);
if (!NT_SUCCESS(status))
{
KdPrintEx((DPFLTR_IHVDRIVER_ID, DPFLTR_INFO_LEVEL," *** WdfDmaEnablerCreate failed with status 0x%08X\n", status));
return status;
}

// Allocate the buffer (2X what is needed)
WdfCommonBufferCreate(enabler, size*2, WDF_NO_OBJECT_ATTRIBUTES, buffer);
if (!NT_SUCCESS(status))
{
KdPrintEx((DPFLTR_IHVDRIVER_ID, DPFLTR_INFO_LEVEL," *** WdfCommonBufferCreate failed with status 0x%08X\n", status));
return status;
}
size_t bufferLength = WdfCommonBufferGetLength(*buffer);
KdPrintEx((DPFLTR_IHVDRIVER_ID, DPFLTR_INFO_LEVEL,“bufferLength = 0x%08X (%lu)\n”, bufferLength, bufferLength));

*virtualAddress = WdfCommonBufferGetAlignedVirtualAddress(*buffer);
*physicalAddress = WdfCommonBufferGetAlignedLogicalAddress(*buffer);

// Align physical address
size_t mask = size - 1;
UINT32 qp = (UINT32)physicalAddress->u.LowPart;
void* base = (void*)qp;
void* alignedPhysicalAddress = (void*)(((uintptr_t)base+mask) & ~(uintptr_t)mask);
physicalAddress->QuadPart = (UINT32) alignedPhysicalAddress;
KdPrintEx((DPFLTR_IHVDRIVER_ID, DPFLTR_INFO_LEVEL,“Base physical address = 0x%08X; aligned physical address = 0x%08X\n”, base, alignedPhysicalAddress));

// Align virtual address
size_t baseOffset = (uintptr_t)alignedPhysicalAddress - (uintptr_t)base;
char* alignedVirtualAddress = (char*)virtualAddress + baseOffset;
KdPrintEx((DPFLTR_IHVDRIVER_ID, DPFLTR_INFO_LEVEL,“Base virtual address = 0x%08X; aligned virtual address = 0x%08X\n”, virtualAddress, alignedVirtualAddress));
*virtualAddress = (PVOID)alignedVirtualAddress;

KdPrintEx((DPFLTR_IHVDRIVER_ID, DPFLTR_INFO_LEVEL," Leaving AllocatePCIBuffer2…\n"));
return status;
}

// ======================================================================
// WinDbg outout
// ======================================================================
->EventDeviceAdd WdfDeviceCreate successful AddDevice PDO (0x89F21030) FDO (0xB316B9E8), pDeviceExt (0x89E1E638)->RegisterCallbacks Leaving RegisterCallbacks…

->AllocatePCIBuffer2

AllocatePCIBuffer2 size = 0x02000000 (33554432); alignment = 0x01FFFFFF (33554431); status 0x00000000

bufferLength = 0x04000000 (67108864)

Base physical address = 0xCD60C000; aligned physical address = 0xCE000000

Base virtual address = 0xB264408C; aligned virtual address = 0xB303808C

Leaving AllocatePCIBuffer2…

->AllocatePCIBuffer2

AllocatePCIBuffer2 size = 0x01000000 (16777216); alignment = 0x00FFFFFF (16777215); status 0x00000000

bufferLength = 0x02000000 (33554432)

Base physical address = 0xD2846000; aligned physical address = 0xD3000000

Base virtual address = 0xB2644074; aligned virtual address = 0xB2DFE074

Leaving AllocatePCIBuffer2…

Hmmmm…

Looking at the source code for AllocateCommonBuffer in WDF (https://github.com/Microsoft/Windows-Driver-Frameworks/blob/master/src/framework/kmdf/src/dma/base/fxcommonbuffer.cpp#L109), you can see that it adds your alignment to the length you requested before it requests the allocation. Soooo… it should “just work.”

What are you getting back from your call to WdfCommonBufferGetAlignedLogicalAddress immediately after the Common Buffer is created?

*physicalAddress = WdfCommonBufferGetAlignedLogicalAddress(*buffer);

(Sorry if this is in your post… I’m having trouble reading your code in the forum interface, due to not fault of yours).

Peter
OSR
@OSRDrivers

Peter,

Thanks for the quick reply…

first buffer request is:
size = 0x04000000, alignment requested = 0x01FFFFFF
physical address = 0xCD60C000; virtual address = 0xB264408C

second buffer request is:
size = 0x02000000, alignment requested = 0x00FFFFFF
physical address = 0xD2846000; virtual address = 0xB2644074

So physical addresses are roughly 84MB apart – which is fair since we requested 64MB buffer first – but virtual addresses are only 24 bytes apart! I feel I’m missing something fundamental here.

Any reason this would work in WES7 but not Win10?

Jim

Apart from the unexpected virtual addresses, the driver loads fine. The system bug checks with a 0x50 (page fault in a non-paged area) on the MmMapLockedPagesSpecifyCache call below during the EventIoDeviceControl handler. The kernelAddress passed is the address we calculated during our alignment logic so what we are passing is not the base address returned by WdfCommonBufferGetAlignedLogicalAddress but an address somewhere within the block returned by WdfCommonBufferCreate. MmMapLockedPagesSpecifyCache does not fail when the base address is passed; however, the buffer is not properly aligned!!!

U32* userAddress = NULL;
*mdl = IoAllocateMdl(kernelAddress, size, FALSE, FALSE, NULL);
if (*mdl)
{
MmBuildMdlForNonPagedPool(*mdl);
userAddress = (U32*)MmMapLockedPagesSpecifyCache(*mdl, UserMode, MmNonCached, NULL, FALSE, HighPagePriority);

Yet another follow-up. I want to make sure what I’m trying to do is clear and I see my last post confuses things a bit. Here is the sequence of calls and what’s going wrong…

  1. WdfDeviceSetAlignmentRequirement
  2. WdfCommonBufferCreate(buffer)
  3. *virtualAddress = WdfCommonBufferGetAlignedVirtualAddress(*buffer) // alignment not honored
  4. *physicalAddress = WdfCommonBufferGetAlignedLogicalAddress(*buffer)
  5. physicalAddress->QuadPart = (UINT32)alignedPhysicalAddress // alignment logic omitted
  6. *virtualAddress = (PVOID)alignedVirtualAddress // computed by adding physical alignment byte offset to virtual address
  7. EventIoDeviceControl handler called:
    a. *mdl = IoAllocateMdl(virtualAddress, size, FALSE, FALSE, NULL)
    b. MmBuildMdlForNonPagedPool(*mdl)
    c. U32* userAddress = (U32*)MmMapLockedPagesSpecifyCache(*mdl, UserMode, MmNonCached, NULL, FALSE, HighPagePriority) // This bug checks with 0x50

xxxxx@wideband-sys.com wrote:

Yet another follow-up. I want to make sure what I’m trying to do is clear and I see my last post confuses things a bit. Here is the sequence of calls and what’s going wrong…

My guess is that you actually have a debug print failure, rather than a bug.

  1. WdfDeviceSetAlignmentRequirement
  2. WdfCommonBufferCreate(buffer)
  3. *virtualAddress = WdfCommonBufferGetAlignedVirtualAddress(*buffer) // alignment not honored
  4. *physicalAddress = WdfCommonBufferGetAlignedLogicalAddress(*buffer)

I don’t understand the asterisks in those two lines, and their presence
might point to a confusion. Can you cut and paste the EXACT code you
are using here?


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Tim,

FYI, I emailed the source and output files to you last night. Thanks in advance!

Exact code for in steps 1 thru 6 is in AllocatePCIBuffer2 that I posted at the top of this thread. In the last post above the “alignment not honored” comment should have been associated with WdfCommonBufferGetAlignedLogicalAddress.

The virtualAddress set in AllocatePCIBuffer2 is passed to the code below…

// Create user address space mapping for the data buffer
U32* userAddress = NULL;
*mdl = IoAllocateMdl(virtualAddress, size, FALSE, FALSE, NULL);
if (*mdl)
{
MmBuildMdlForNonPagedPool(*mdl);
userAddress = (U32*)MmMapLockedPagesSpecifyCache(*mdl, UserMode, MmNonCached, NULL, FALSE, HighPagePriority);
if (userAddress)
KdPrintEx((DPFLTR_IHVDRIVER_ID, DPFLTR_INFO_LEVEL," User address: 0x%p is base address for %lu bytes\n", userAddress, size));
else
{
IoFreeMdl(*mdl);
KdPrintEx((DPFLTR_IHVDRIVER_ID, DPFLTR_INFO_LEVEL," *** MmMapLockedPagesSpecifyCache failed!\n"));
}
}
else
KdPrintEx((DPFLTR_IHVDRIVER_ID, DPFLTR_INFO_LEVEL," *** IoAllocateMdl failed!\n"));

You’re printing wrong virtual addresses. I guess you’re printing virtual addresses of pointers, not of actual buffers.

Alex,

I think you’ve found the problem! Not only did I fail to dereference virtualAddress for the debug print but I also failed to do it when calculating the aligned virtual address! I believe that explains why MmMapLockedPagesSpecifyCache bug checks.

I will try this in the morning when I’m back in the office.

Of course, this doesn’t explain why WdfCommonBufferCreate isn’t honoring my alignment request but we’ve got a hack to get around it. Ultimately, I’d prefer to allocate only what’s needed and have it aligned right out of the gate.

Thanks,
Jim

xxxxx@wideband-sys.com wrote:

*virtualAddress = WdfCommonBufferGetAlignedVirtualAddress(*buffer);
*physicalAddress = WdfCommonBufferGetAlignedLogicalAddress(*buffer);

// Align physical address
size_t mask = size - 1;
UINT32 qp = (UINT32)physicalAddress->u.LowPart;
void* base = (void*)qp;
void* alignedPhysicalAddress = (void*)(((uintptr_t)base+mask) & ~(uintptr_t)mask);
physicalAddress->QuadPart = (UINT32) alignedPhysicalAddress;
KdPrintEx((DPFLTR_IHVDRIVER_ID, DPFLTR_INFO_LEVEL,“Base physical address = 0x%08X; aligned physical address = 0x%08X\n”, base, alignedPhysicalAddress));

None of that is necessary, of course.

// Align virtual address
size_t baseOffset = (uintptr_t)alignedPhysicalAddress - (uintptr_t)base;
char* alignedVirtualAddress = (char*)virtualAddress + baseOffset;
KdPrintEx((DPFLTR_IHVDRIVER_ID, DPFLTR_INFO_LEVEL,“Base virtual address = 0x%08X; aligned virtual address = 0x%08X\n”, virtualAddress, alignedVirtualAddress));
*virtualAddress = (PVOID)alignedVirtualAddress;

As I suspected. What you are printing is the address of the variable
where the virtual address is stored. You need another level of indirection:

char* alignedVirtualAddress = (char*)*virtualAddress + baseOffset;
KdPrintEx((DPFLTR_IHVDRIVER_ID, DPFLTR_INFO_LEVEL,“Base virtual address = 0x%08X; aligned virtual address = 0x%08X\n”, *virtualAddress, alignedVirtualAddress));

You might consider putting the parameters into a structure and passing
the address of the structure. It might be less confusing.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Yep… you and Alex picked that up; I couldn’t see the forest thru the trees.

As for the unnecessary alignment logic… I’d certainly prefer to allocate only what’s needed and have it aligned if only WdfCommonBufferCreate would honor my alignment request made in WdfDeviceSetAlignmentRequirement, which brings me back to my original question…

Is there some limitation in Win10 for alignment of common buffers?

Jim

xxxxx@wideband-sys.com wrote:

Answers Needed

  • Is there a limit on the boundaries one can align a common buffer?
  • Is there a way to translate a physical address to a virtual address?

I don’t see how this can fail. I checked the source code.
WdfCommonBufferCreate handles the alignment itself, using the same
mechanism you use. It doesn’t rely on WDM. It gets the alignment from
the WDFDMAENABLER. The WDFDMAENABLER gets its default value from the
DEVICE_OBJECT during WdfDmaEnablerCreate, and
WdfDeviceSetAlignmentRequirement clearly sets the value in the
DEVICE_OBJECT.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

(Nice pickup Messers Greg and Roberts! The addresses being 24 bytes apart, coupled with the complicated parameter handling, should have been a red flag to anyone. Very nice job!)

Right. This is precisely what I pointed out in my initial reply.

I also thought we had established that the alignment WAS working.

Use the source, Mr. Brown. I even gave you a pointer to the specific line where this is done.

PeterOSR
@OSRDrivers

Peter,

Those addresses WERE a red flag thus the post; however, I failed to see the missing asterisk. I appreciate the help in finding that.

Regardless of “the source”, the alignment is NOT honored (see physical addresses returned by WdfCommonBufferGetAlignedLogicalAddress at the bottom of the original post) and so I was forced to allocate more memory than needed and explicitly align it in the driver code.

I welcome anyone to try aligning common buffers of 16MB and 32MB in Win10. That worked in WES7 but not in Win10 – not sure if the OS difference is the reason but that was all that changed.

Jim

Mr. Brown… I’m glad we could help SOME at least, but this:

kinda makes me wonder if we’re talking past each other.

If you read the code I pointed you to, I *think* it would be clear that this is… unlikely. Let me put it that way. The alignment doesn’t come from Windows. It comes from KMDF.

Your code is kinda hard to read… Not only is it not formatted here on the forum, but with the various de-references and casts my head hurts.

Hmmm… perhaps it’s time to ask which version of WDF your driver is using. Are you running the Win7 binary on Win10?

Peter
OSR
@OSRDrivers

You call MmMapLockedPagesSpecifyCache with AccessMode set to UserMode and you do not wrap the call in a try/except block.

Beware that an exception is raised when such a call fails.

Bugcheck 50 means that your driver touched a non-paged system address that is not valid (first bit in PTE set to 0). This means that the address you touched does not belong to the DMA buffer. The crash dump should display the faulting instruction.

As Mr. Noone pointed out in another thread, on another topic, don’t forget that MmMapLockedPagesSpecifyCache with a UserMode specification has to be called in the context of the requesting process.

I don’t know where in your driver you’re calling this function, but if it’s in an EvtIo Event Processing Callback, those callbacks *always* run in an arbitrary process/thread context by definition… so you can’t do it there.

Peter
OSR
@OSRDrivers

Peter/Tim/Alex/D.T.,

Thanks to all of you for your inputs. We got the driver working as coded once the dereference bug was eliminated.

Re: memory alignment… We are currently allocating twice the buffer sizes needed and explicitly aligning the base address the driver will pass the to calling applications. I see Peter’s comment above that the KMDF provides the alignment and so the Windows version is moot. We are building our driver on Win10 using KMDF 1.11; we had previously tried using KMDF 1.9 – which is what we used for our WES7 driver – but neither KMDF version seems to provide the proper alignment.

Re: MmMapLockedPagesSpecifyCache… This is called from our EventIoDeviceControl handler when an application requests the base addresses of the buffers. We will wrap that in a try-catch, thanks D.T. From what Peter wrote, it sounds like we need to find another location for making that call. Looks like it’s back to the manuals for me as I’m an app developer, not a driver writer – as you can probably tell. Know that your inputs are highly appreciated!

Jim

>MmMapLockedPagesSpecifyCache… This is called from our EventIoDeviceControl
handler when an application requests the base addresses of the buffers.

This is the wrong place to call that. MmMapLockedPagesSpecifyCache needs to be in the caller context, and EventIoDeviceControl is not guaranteed to be in the caller context.

The proper place is EvtDeviceWdmIrpPreprocess or EvtDeviceWdmIrpDispatch.

Yup. EvtIoDeviceControl (in fact, EvtIoXxxx) is called in an arbitrary process and thread context.

You need to use the EvtIoInCallerContext callback.

But if you’re mapping a buffer back into a user application’s address space, there’s a FAR bigger issue about mapping buffers back to user-mode that you’re probably not addressing… including at least one very ugly security issue. We babble about this here at lot, I think the last time was here http:



And, of course, you don’t know what you don’t know.

I was on the other side of this about a year ago, when I got stuck having to design a series of services that passed data around a Windows systems dynamically. Knowing exactly nothing about COM, you know what I did? I paid a contractor to do it. I’ve had no trouble MODIFYING his code once it was all designed and written… but I never would have been able to WRITE it in the first place. I had no clue how to start. Heck, I still only barely understand what’s going on. OK… I admit it: I mostly DON’T understand what’s going on. But I can bend what he’s done to my will.

There’s a moral to that story.

Anyhow… we’re happy to help you here as we can. But, you know, helping somebody write a driver one forum post at a time isn’t necessarily efficient.

Peter
OSR
@OSRDrivers</http:>