AW: Shared Memory Problem

xxxxx@osr.com wrote:

Well… You’re not giving us much to go on here. How and when are
the buffers allocated? How and when are the buffers DEallocated?
How many buffers do you have allocated simultaneously? How many
buffers do you need to allocate on a given system before you get the
error? Is the number always the same?

What steps have you taken to debug the problem so far, other than
looking at Performance Monitor and posting here?

We’ll be happy to help, I’m sure, but you DO have to give us some
data…

All buffers are allocated in user-mode and their addresses are passed
via IoControl (METHOD_BUFFERED) to my top-level driver. In each call I
pass two buffers of fixed size of about 4k Byte and a variable size
buffer which is usually much bigger.

In my PyNwAgntIoCtrlEvtDeviceIoInCallerContext handler the addresses are
mapped into SystemSpace using following scheme:

  1. MmInitializeMdl()
  2. MmProbeAndLockPages()
  3. MmGetSystemAddressForMdlSafe()

When successful the buffers are fed into an input queue by another
IoControl request, filled by the hardware, retrieved and processed by
the user mode app (for example visualization).

When done resp. the device needs to be reconfigurated the buffers are
detached, saying the handler calls:

  1. MmUnlockPages()

Then the buffers is returned and deallocated by the user-mode app.

I tried to tracing my calls whether allocation/disallocation was done in
pairs without finding an obvious leak.

I tried to determine a limit have been able to allocate 40 Buffers @ 10
MB each. I repeat this many times without a problem on my test system.

When I do on a system with “workload” it happens that it may start it
once and the next time it doesn’t. The error occurs when
MmGetSystemAddressForMdlSafe() is called. This leads me to assumption
that there might be a problem with Page Table Entries. Are there further
resources that are allocated in that call?

My reasoning is:

  • Buffers are allocated i user-mode => can’t fail anymore
  • Mdl is created => can’t fail anymore
  • Pages are locked in memory => kernel address space was not exhausted
    -> something else is missing, but what if not PTE

It is not a desaster that I’m not able to allocate resources for huge
amounts of memory (although I don’t think 400 MB is huge nowadays). But
I would love to know what the limits are on specific system. It would be
ideal if I were able report the amount of available resource prior to
the call to MmGetSystemAddressForMdlSafe().

Any help would be appreciated.


Hartmut

If your description is accurate you ARE exhausting the kernel’s virtual address space- because you don’t unmap the pages before locking (with MmUnmapLockedPages).

>
My reasoning is:

  • Buffers are allocated i user-mode => can’t fail anymore
  • Mdl is created => can’t fail anymore
  • Pages are locked in memory => kernel address space was not exhausted
    [rbk] <<
    Wrong- it just means the pages are marked so that they can no longer be paged out- they are not mapped into the kernel space UNTIL you call MmGetSystemAddressForMdlSafe. After all- if you were doing DMA, there’s no need to ever map the pages into the kernel.
    -> something else is missing, but what if not PTE
    [rbk] <<
    A sufficiently large range of virtual addresses for the kernel to map you pages into
    [rbk]
    And before you ask- the address spaces are separate- freeing the buffer in user mode does NOTHING to the fact that you’ve got a bunch of physical pages mapped (and therefore referenced) in the kernel space.

Now Driver Verifier probably catches this incorrect pattern (and so might a checked build)- don’t know because I’ve not made this kind of an error in years.

MmUnlockPages automatically unmaps the MDL if has been mapped to system space.

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Bob Kjelgaard
Sent: Tuesday, December 07, 2010 11:35 AM
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] Shared Memory Problem

If your description is accurate you ARE exhausting the kernel’s virtual address space- because you don’t unmap the pages before locking (with MmUnmapLockedPages).

> My reasoning is:

  • Buffers are allocated i user-mode => can’t fail anymore
  • Mdl is created => can’t fail anymore
  • Pages are locked in memory => kernel address space was not exhausted
    -> something else is missing, but what if not PTE

Pages may be locked in memory, but that doesn’t necessarily mean there is available address space (aka “free system PTEs”) to map them.

It is not a desaster that I’m not able to allocate resources
for huge amounts of memory (although I don’t think 400 MB
is huge nowadays).

Today’s machines may have a lot of RAM but in your case the limiting factor is the size of the kernel address space which is still 2 GB on 32 bit. (In fact, the more RAM you have in the system the *less* address space will be available, because various kernel structures such as the PFN database grow with the size of RAM).

One thing you could try is reserving system space using MmAllocateMappingAddress. This will make sure your mappings will always succeed, but if you consume 400 MB of VA space that way, something else might eventually stop working (you could start getting pool failures etc. under load), so you should test this carefully.

But I would love to know what the limits are on specific
system. It would be ideal if I were able report the amount
of available resource prior to the call to MmGetSystemAddressForMdlSafe().

The only way to find out whether mapping the pages will succeed is to actually try it.