AllocateCommonBuffer hangs

Hello,
Has anyone ever experienced a hanging problem when calling the AllocateCommonBuffer routine? I’m trying to allocate 128 MBytes of memory. The first time I allocate this it works but the second time it takes an extremely long time to re-allocate it again. i was wondering if there were any limitations under XP that I am not aware of. Or perhaps I’m not freeing the memory properly with FreeCommonBuffer.

Thanks

xxxxx@yahoo.com wrote:

Hello,
Has anyone ever experienced a hanging problem when calling the AllocateCommonBuffer routine? I’m trying to allocate 128 MBytes of memory. The first time I allocate this it works but the second time it takes an extremely long time to re-allocate it again. i was wondering if there were any limitations under XP that I am not aware of. Or perhaps I’m not freeing the memory properly with FreeCommonBuffer.

It is rather likely that there simply isn’t 128 MB of physically
contiguous memory available. That is a HUGE common buffer request. It
doesn’t take very long for physical memory to get fragmented.

Do you really need all 128 MB to be contiguous? Can you allocate it in
smaller chunks?


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Hello,
Thanks for the response. Yes, I was thinking of allocating it in smaller chunks but I need it to be contiguous. I’m not sure on how to go about stringing the smaller allocated blocks together to make them contiguous. I’m sure there is some information on the internet outlining how to code this mechanism but I have yet to find it.

Just an opinion… but any hardware that does 128 MB DMAs that doesn’t
support scatter-gather is massively broken.

Why else in the world could you *possibly* care that your memory is
physically contiguous?

xxxxx@yahoo.com wrote:

Hello,
Thanks for the response. Yes, I was thinking of allocating it in smaller chunks but I need it to be contiguous. I’m not sure on how to go about stringing the smaller allocated blocks together to make them contiguous. I’m sure there is some information on the internet outlining how to code this mechanism but I have yet to find it.


Ray
(If you want to reply to me off list, please remove “spamblock.” from my
email address)

xxxxx@yahoo.com wrote:

Thanks for the response. Yes, I was thinking of allocating it in smaller chunks but I need it to be contiguous. I’m not sure on how to go about stringing the smaller allocated blocks together to make them contiguous. I’m sure there is some information on the internet outlining how to code this mechanism but I have yet to find it.

Well, this leads me to think that you may have a misunderstanding. You
can’t take discontiguous physical memory blocks and make them
contiguous. It just doesn’t work that way. Physical memory is what it
is: it’s physical. You can play games with the VIRTUAL addresses; if
all you need is contiguous VIRTUAL memory, then you don’t need a common
buffer at all; just use ExAllocatePool.

What does your hardware really require? Does it actually expect to be
able to do bus-mastered DMA into a 128MB block, without using
scatter/gather? As Ray said, such a design is massively broken. It
will never work in the general case.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Yes, sorry I should have clarified that it is the Virtual memory which needs to be contiguous for my purposes.

Calling the AllocateCommonBuffer multiple times to get the virtual addresses is easy to do. But I need a way to make these addresses contiguous. That aside, I’m still confused as to why the AllocateCommonBuffer routine was able to allocate 128 MBytes of memory the first time, but when I free up that memory it takes a while to re-allocate it again.

xxxxx@yahoo.com wrote:

Yes, sorry I should have clarified that it is the Virtual memory which needs to be contiguous for my purposes.

Then forget about AllocateCommonBuffer entirely, and just use
ExAllocatePool. AllocateCommonBuffer is needed only in very specific
circumstances.

Even then, however, 128MB is a honking big allocation. Is there a
reason you can’t do it once when your driver loads and then keep it
around forever? Or is that what you are doing?

Calling the AllocateCommonBuffer multiple times to get the virtual addresses is easy to do. But I need a way to make these addresses contiguous. That aside, I’m still confused as to why the AllocateCommonBuffer routine was able to allocate 128 MBytes of memory the first time, but when I free up that memory it takes a while to re-allocate it again.

Consider how dynamic your virtual memory space is. If you watch
performance monitor, you’ll often see hundreds or thousands of page
faults per second. Each one of those page faults has to readjust the
virtual memory state – swap something out, allocate new empty pages,
etc. Also remember that unused physical memory is wasted memory.
Windows naturally tries to use as much of your physical memory as
possible. If you are running 10 active processes at once, there are
parts of all 10 of those processes in physical memory at once. Only one
process is active in virtual memory, but the physical pages are still
there until they are needed for something else.

Because of that, physical memory gets wildly fragmented very early on.
If you look at physical memory, page 48901 might be assigned to a DLL in
process 16. Page 48902 might be free. Page 48903 might be part of a
kernel driver. Page 48904 might be free. Page 48905 might be assigned
to the disk cache. Page 48906 might be free. In that scenario,
although there are 8192 bytes in free pages, there’s only 4096 bytes of
contiguous space. Because of the magic of virtual memory, physical
fragmentation is completely unimportant, except for drivers that need
common buffers for DMA.

As long as things haven’t been “stirred up” very much, your driver can
find 128MB free. When you release that memory, what if the algorithm
puts those freed pages right to the top of the free list? The next
allocation would take a chunk of that, and now there isn’t 128MB free
any more. In order to get that space, the system has to wait until that
much space is available. It can’t do that by tweaking the page tables
– it actually has to copy memory from one place to another, and the
owning process has to be idle.

If the truth be told, I’m surprised at your description. I thought
AllocateCommonBuffer would simply fail in that case. I didn’t think the
system actually tried to rebalance resources to satisfy the allocation.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

>I’m still confused as to why the AllocateCommonBuffer routine was able to

allocate 128 MBytes of memory the first time, but when I free up that memory
it
takes a while to re-allocate it again.

AllocateCommonBuffer allocates physically contiguous pages. For the first time,
it could easily find a 128MB range of physically contiguous free memory. For
the next times, the memory is too fragmented and AllocateCommonBuffer just
plain waits
for free contiguous pages to appear.

It is a bad approach to require the physically contiguous buffers of such size.
The better approach is to allocate any memory of this size, lock it to the MDL,
pass the MDL thru MapTransfer/GetScatterGatherList and get the SGL for this
MDL. Then convert the SGL to some form understandable by the hardware and feed
it to the hardware.

The hardware must understand discontiguous buffers and SGLs for this. If the
hardware does not, and requires physically contiguous buffers of such size -
then it is misdesigned.


Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

You may also want to rethink your program if you require a 128MB virtually contiguous buffer. There are plenty of ways to adapt code so that it can deal with a buffer in smaller chunks and you may want to look at one of them.

In terms of being able to get the memory you want, your best bet may be to use MmAllocatePagesForMdl to allocate 128MB worth of physical pages and then use the memory mapping APIs to map small windows of that into system address space as needed.

-p

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@yahoo.com
Sent: Tuesday, January 30, 2007 9:07 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] AllocateCommonBuffer hangs

Yes, sorry I should have clarified that it is the Virtual memory which needs to be contiguous for my purposes.

Calling the AllocateCommonBuffer multiple times to get the virtual addresses is easy to do. But I need a way to make these addresses contiguous. That aside, I’m still confused as to why the AllocateCommonBuffer routine was able to allocate 128 MBytes of memory the first time, but when I free up that memory it takes a while to re-allocate it again.


Questions? First check the Kernel Driver FAQ at http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Thanks for such detailed explanations, guys. I have to digest what was posted and try to come up with another way of doing the allocations. I originally thought I could do the smaller allocations and make them virtually contiguous this way.

  1. Call AllocateCommonBuffer multiple times

  2. Build up an MDL with the underlying CPU physical pages using MmGetPhysicalAddress on the returned virtual address.

  3. remap it to a contiguous virtual memory region using MmMapLockedPagesSpecifyCache.

I don’t think this will work so now I have to figure out a different scheme.

xxxxx@yahoo.com wrote:

Thanks for such detailed explanations, guys. I have to digest what was posted and try to come up with another way of doing the allocations. I originally thought I could do the smaller allocations and make them virtually contiguous this way.

*EVERY* memory allocation API gives you virtually contiguous memory,
including malloc and new in user mode… If that is the only thing you
require, and it sounds like it is, then you can use ExAllocatePool,
although Peter’s suggestion of allocating a hunk of pages and mapping
them a bit at a time is worth considering.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

“Tim Roberts” wrote:

Because of the magic of virtual memory, physical
fragmentation is completely unimportant, except for drivers that need
common buffers for DMA.

And applications using large pages.

In order to get that space, the system has to wait until that
much space is available. It can’t do that by tweaking the page tables
– it actually has to copy memory from one place to another, and the
owning process has to be idle.

The memory manager doesn’t move pages in physical memory
when trying to satisfy a contiguous memory request. It simply
trims all working sets and flushes modified pages to disk,
hoping that eventually there will be enough available
pages (standby+free+zeroed) that the allocation will succeed.
This trimming and flushing can be *very* expensive,
often making the system unresponsive for up to 15 seconds.

If the truth be told, I’m surprised at your description. I thought
AllocateCommonBuffer would simply fail in that case. I didn’t think the
system actually tried to rebalance resources to satisfy the allocation.

Starting with Vista, user-mode requests for contiguous memory
(such as VirtualAlloc(MEM_LARGE_PAGES)) will either succeed
or fail immediately, without causing expensive trimming and flushing.

A similar change was considered for kernel mode requests
but unfortunately it turned out that this would break too many
existing drivers.


This posting is provided “AS IS” with no warranties, and confers no
rights.