Shared mem on 64 bit windows

Hi,

I’m a total newbie to on this forum, so please have mercy.
This is not a driver question, but it is a windows INTERNALS question, so I’m hoping you guys can help me.

I have a 64-bit windows server 2003 system with 8 AMD cpus and 16 gig of RAM. I want to use about 5 gig of RAM for my own evil purposes. To that end I have created 3 shared regions. (memory-mapped files backed by the system page file) 2 of them are 1 gig and one of them is 3 gig in size. This may seem extreme, but if I’ve got 16 gig of RAM, I’m going to make use of it.

I have one application which creates the 3 shared regions and “holds them open”. It is a simple service that does nothing but create the regions and wait to be shut down. It seems to start up and create the regions with no problem. (It starts very quickly, calls SetProcessWorkingSetSize to increase it’s WS to about 6 gig, and calls VirtualLock to lock the region pages in memory). This all happens in a couple of seconds.

Next, I start up my primary application. It has total success in opening the shared regions. I set it’s working set size to about 6 gig to encompass all of the regions, it’s own memory usage, etc. When I attempt to lock the first region (one gig in size) the application freezes. It takes about five minutes (yes, five MINUTES) to lock these pages in memory. When I try to lock the next one gig region, it takes even longer… ten to fifteen minutes. I don’t know how long it would take to lock the 3 gig region, because I don’t have that kind of patience. This is obviously not acceptable behaviour.

What in the world is going on here? I come from 15 years in a VMS background, so go ahead and tell me all about page table entries, page faults, working set sizes and the like…I understand the concepts.

Why is the second process suffering so much when attempting to lock down the pages? Task manager does not show that the process is page faulting. Indeed, it’s physical memory usage grows almost instantaneously. But it just sits there and it is burning a respectable amount of CPU within the VirtualLock routine. AMDs CodeAnalyst shows me inside of ntoskrnl.exe at miLocateWsle very often…what is this? Locate working set locked entry? Just a guess.

Since these are shared memory sections and I’m locking them into physical memory, I assumed that, no matter how many processes mapped these regions, no additional memory would be used up (except for overhead like PTEs and such). Is this an incorrect assumption?

What is the best solution to this problem? Should my first application NOT lock the pages, or perhaps should it be the only one to do so? I want to run a third application to map these regions and work with them as well…what is the best configuration to set up for something like this? Is there a way to “steal” memory from Windows and keep it non-paged? Can it then be shared? That would be ideal.

My “third” application zeros the 3 gig region with memset. This takes an eternity as well. Here is stack for that thread from process explorer:

ntoskrnl.exe!ExAcquireSharedWaitForExclusive+0x115
ntoskrnl.exe!ExAcquireSharedWaitForExclusive+0x545
ntoskrnl.exe!MmProbeAndLockPages+0x5ce
ntoskrnl.exe!KeSynchronizeExecution+0x41c
ntoskrnl.exe!ExfReleasePushLock+0x1f
ntoskrnl.exe!ExfReleasePushLock+0x112
ntoskrnl.exe!KeStackAttachProcess+0x23a
ntoskrnl.exe!NtClose+0x61b
ntoskrnl.exe!PsReturnProcessNonPagedPoolQuota+0x4fd
ntdll.dll+0x315aa
kernel32.dll+0x30596

System is set up for best performance for background services, memory usage is set up for best performance of programs (not system cache) and the page file is being managed by windows itself (system managed size). It is sitting at about 16 gig in size at the moment. Should I configure the system with no page file? I haven’t tried that yet.

Any help whatsoever would be massively appreciated. I’m obviously missing something crucial.

Regards,
Greg

I’m wondering if there is something going funny with the page file. Are the
results any different if you use a private file for backing the global
section instead of the system page file?

I’m also wondering if the system is having trouble allocating non-paged
memory for creating the process page table entries to map the section’s
pages into the process? Under the task manager, do you see the system cache
falling off sharply or other system memory usage going up sharply?

Do you see any performance hits from not locking the pages in every process?
Since the global section is already locked into memory, the only thing I can
figure it is tracking overhead causing the problem.

Just some guesses. I’d be curious what you eventually discover.

Greg

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@hotmail.com
Sent: Friday, August 10, 2007 9:00 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Shared mem on 64 bit windows

Hi,

I’m a total newbie to on this forum, so please have mercy.
This is not a driver question, but it is a windows INTERNALS question, so
I’m hoping you guys can help me.

I have a 64-bit windows server 2003 system with 8 AMD cpus and 16 gig of
RAM. I want to use about 5 gig of RAM for my own evil purposes. To that end
I have created 3 shared regions. (memory-mapped files backed by the system
page file) 2 of them are 1 gig and one of them is 3 gig in size. This may
seem extreme, but if I’ve got 16 gig of RAM, I’m going to make use of it.

I have one application which creates the 3 shared regions and “holds them
open”. It is a simple service that does nothing but create the regions and
wait to be shut down. It seems to start up and create the regions with no
problem. (It starts very quickly, calls SetProcessWorkingSetSize to increase
it’s WS to about 6 gig, and calls VirtualLock to lock the region pages in
memory). This all happens in a couple of seconds.

Next, I start up my primary application. It has total success in opening the
shared regions. I set it’s working set size to about 6 gig to encompass all
of the regions, it’s own memory usage, etc. When I attempt to lock the first
region (one gig in size) the application freezes. It takes about five
minutes (yes, five MINUTES) to lock these pages in memory. When I try to
lock the next one gig region, it takes even longer… ten to fifteen minutes.
I don’t know how long it would take to lock the 3 gig region, because I
don’t have that kind of patience. This is obviously not acceptable
behaviour.

What in the world is going on here? I come from 15 years in a VMS
background, so go ahead and tell me all about page table entries, page
faults, working set sizes and the like…I understand the concepts.

Why is the second process suffering so much when attempting to lock down the
pages? Task manager does not show that the process is page faulting. Indeed,
it’s physical memory usage grows almost instantaneously. But it just sits
there and it is burning a respectable amount of CPU within the VirtualLock
routine. AMDs CodeAnalyst shows me inside of ntoskrnl.exe at miLocateWsle
very often…what is this? Locate working set locked entry? Just a guess.

Since these are shared memory sections and I’m locking them into physical
memory, I assumed that, no matter how many processes mapped these regions,
no additional memory would be used up (except for overhead like PTEs and
such). Is this an incorrect assumption?

What is the best solution to this problem? Should my first application NOT
lock the pages, or perhaps should it be the only one to do so? I want to run
a third application to map these regions and work with them as well…what
is the best configuration to set up for something like this? Is there a way
to “steal” memory from Windows and keep it non-paged? Can it then be shared?
That would be ideal.

My “third” application zeros the 3 gig region with memset. This takes an
eternity as well. Here is stack for that thread from process explorer:

ntoskrnl.exe!ExAcquireSharedWaitForExclusive+0x115
ntoskrnl.exe!ExAcquireSharedWaitForExclusive+0x545
ntoskrnl.exe!MmProbeAndLockPages+0x5ce
ntoskrnl.exe!KeSynchronizeExecution+0x41c
ntoskrnl.exe!ExfReleasePushLock+0x1f
ntoskrnl.exe!ExfReleasePushLock+0x112
ntoskrnl.exe!KeStackAttachProcess+0x23a
ntoskrnl.exe!NtClose+0x61b
ntoskrnl.exe!PsReturnProcessNonPagedPoolQuota+0x4fd
ntdll.dll+0x315aa
kernel32.dll+0x30596

System is set up for best performance for background services, memory usage
is set up for best performance of programs (not system cache) and the page
file is being managed by windows itself (system managed size). It is sitting
at about 16 gig in size at the moment. Should I configure the system with no
page file? I haven’t tried that yet.

Any help whatsoever would be massively appreciated. I’m obviously missing
something crucial.

Regards,
Greg


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

I haven’t tried using a private file for the sections, but it’s worth looking into. I will try that, thanks!
I also suspect that there is something funny going on with page table entries, but I have no visibility into “guts” of the system to see what might be going on. The system cache did seem to be “down” but it wasn’t alarmingly low. The system cache is for file access isn’t it? Is it used for anything besides program code and data?

Like you, I came from the VMS internals world doing real-time systems. I
got into Windows NT back in '94. Now I’m being dragged kicking and
screaming into Linux for some real-time seismic systems.

I’m not completely sure about the system cache under NT. I was thinking it
worked in a way similar to the Non-paged pool under VMS. You may check out
www.systeminternals.com for some tools to peek under the covers of NT. They
used to be an independent group that had access to private information about
NT and its internals. They were bought by Microsoft a while back, but much
of the good stuff is still available.

Hope that helps.

Greg

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@hotmail.com
Sent: Friday, August 10, 2007 10:22 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Shared mem on 64 bit windows

I haven’t tried using a private file for the sections, but it’s worth
looking into. I will try that, thanks!
I also suspect that there is something funny going on with page table
entries, but I have no visibility into “guts” of the system to see what
might be going on. The system cache did seem to be “down” but it wasn’t
alarmingly low. The system cache is for file access isn’t it? Is it used for
anything besides program code and data?


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

This is a serious system you’ve got. Just out of curiosity, what are
you planning on doing with it?

I can’t say I know exactly what the cause is, as I’ve never even
considered allocating even a 10th of this amount, and I’ve never seen a
system with 8 CPUS. To really understand what is going on better, a
larger stack trace from WinDbg would be needed, at least for me - there
is probably someone else on this list, particularly with source code
access, who could tell you why this is happening. WinDbg would also
give you visibility in to the architectural components involved. That
being said, here are some reasons why I think this would be a slow
operation.

miLocateWlse, I believe, locates an entry in the working set list of a
project. I don’t know how working set lists are represented internally,
and I kind of doubt that they are a list, but I also seriously doubt
that they are tuned for the sizes your asking for. This is going to
take a lot of lookups. In any case, while the specific amount of time
does surprise me, that it takes a long time does not. I believe the
basic issue is that when you open the sections and issue the lock, a
huge number of prototype page table entries have to be created, these
changes reflected to the architecture, all of which have to be
synchronized over 8 CPUS, the VAD, as well as two contexts. This is a
very expensive operation.

SetProcessWorkingSetSize doesn’t actually have to be honored, although
the VirtualLock should enforce that, I believe. One question would be
what you passed as the flags to SetProcessWorkingSetSize, and also
whether there are any quotas in place. Another question is what impact
setting the working set to 6GB (I assume that is your minimum) in two
processes will have on the rest of the system; even with all that
physical memory, I have to believe that the system is going to start
trimming other processes if it can’t do so to yours. If nothing else, I
would think it would trim the system cache, and to do so also requires
more VMM operations that have to be synchronized across 8 CPUs. Working
set’s also don’t mean anything if you’re idle. I believe that
VirtualLock() should again enforce the pages staying in memory, but I’m
not certain. Also, all of this must run with APC’s disabled. In short,
all in all, a major goal of the Windows VMM is to prevent people from
doing this sort of thing, and the tuning assumptions surely must be for
smaller systems.

You’re not dealing with non-paged memory here; you’re dealing with
pageable memory that has to be locked down. It may be worth trying are
mapping using large pages; I’m not sure if that will help or not, but I
think it might. In the past, applications that required this much
memory and like you wished to be truly non-paged used AWE. I don’t know
much about it, but I kind of doubt that it makes much sense on a 64 bit
system.

These are just some guesses; I really don’t know the answer. Hopefully
there is something simple that is causing this problem that some else
can point out and tell you how to fix.

mm

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@hotmail.com
Sent: Friday, August 10, 2007 10:00
To: Windows System Software Devs Interest List
Subject: [ntdev] Shared mem on 64 bit windows

Hi,

I’m a total newbie to on this forum, so please have mercy.
This is not a driver question, but it is a windows INTERNALS question,
so I’m hoping you guys can help me.

I have a 64-bit windows server 2003 system with 8 AMD cpus and 16 gig of
RAM. I want to use about 5 gig of RAM for my own evil purposes. To that
end I have created 3 shared regions. (memory-mapped files backed by the
system page file) 2 of them are 1 gig and one of them is 3 gig in size.
This may seem extreme, but if I’ve got 16 gig of RAM, I’m going to make
use of it.

I have one application which creates the 3 shared regions and “holds
them open”. It is a simple service that does nothing but create the
regions and wait to be shut down. It seems to start up and create the
regions with no problem. (It starts very quickly, calls
SetProcessWorkingSetSize to increase it’s WS to about 6 gig, and calls
VirtualLock to lock the region pages in memory). This all happens in a
couple of seconds.

Next, I start up my primary application. It has total success in opening
the shared regions. I set it’s working set size to about 6 gig to
encompass all of the regions, it’s own memory usage, etc. When I attempt
to lock the first region (one gig in size) the application freezes. It
takes about five minutes (yes, five MINUTES) to lock these pages in
memory. When I try to lock the next one gig region, it takes even
longer… ten to fifteen minutes. I don’t know how long it would take to
lock the 3 gig region, because I don’t have that kind of patience. This
is obviously not acceptable behaviour.

What in the world is going on here? I come from 15 years in a VMS
background, so go ahead and tell me all about page table entries, page
faults, working set sizes and the like…I understand the concepts.

Why is the second process suffering so much when attempting to lock down
the pages? Task manager does not show that the process is page faulting.
Indeed, it’s physical memory usage grows almost instantaneously. But it
just sits there and it is burning a respectable amount of CPU within the
VirtualLock routine. AMDs CodeAnalyst shows me inside of ntoskrnl.exe at
miLocateWsle very often…what is this? Locate working set locked entry?
Just a guess.

Since these are shared memory sections and I’m locking them into
physical memory, I assumed that, no matter how many processes mapped
these regions, no additional memory would be used up (except for
overhead like PTEs and such). Is this an incorrect assumption?

What is the best solution to this problem? Should my first application
NOT lock the pages, or perhaps should it be the only one to do so? I
want to run a third application to map these regions and work with them
as well…what is the best configuration to set up for something like
this? Is there a way to “steal” memory from Windows and keep it
non-paged? Can it then be shared? That would be ideal.

My “third” application zeros the 3 gig region with memset. This takes an
eternity as well. Here is stack for that thread from process explorer:

ntoskrnl.exe!ExAcquireSharedWaitForExclusive+0x115
ntoskrnl.exe!ExAcquireSharedWaitForExclusive+0x545
ntoskrnl.exe!MmProbeAndLockPages+0x5ce
ntoskrnl.exe!KeSynchronizeExecution+0x41c
ntoskrnl.exe!ExfReleasePushLock+0x1f
ntoskrnl.exe!ExfReleasePushLock+0x112
ntoskrnl.exe!KeStackAttachProcess+0x23a
ntoskrnl.exe!NtClose+0x61b
ntoskrnl.exe!PsReturnProcessNonPagedPoolQuota+0x4fd
ntdll.dll+0x315aa
kernel32.dll+0x30596

System is set up for best performance for background services, memory
usage is set up for best performance of programs (not system cache) and
the page file is being managed by windows itself (system managed size).
It is sitting at about 16 gig in size at the moment. Should I configure
the system with no page file? I haven’t tried that yet.

Any help whatsoever would be massively appreciated. I’m obviously
missing something crucial.

Regards,
Greg


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Thanks - systeminternals is a God-send. I’ll sniff around there some more and see if I cand find anything that could help.

I’ve had an additional observation: I modified the first program (the one that creates the sections - call it the daemon) to NOT lock any pages in memory. Once again, this all happens very quickly. I then run a utility to map and zero the 3 gig region. This takes a long time. I forced the first program to zero the 3 gig region as well as soon as it mapped it. It was virtually instantaneous.
It would seem that the “second process” to map and use this memory suffers. I wonder if I have an affinity issue related to processor cache cohesion. I’m going to play with affinity and see what happens.

Affinitizing both processes that use the region to the same CPU does not help. Rats.

Sounds like this issue:

http://groups.google.com/group/microsoft.public.win32.programmer.kernel/browse_frm/thread/f89105d8acaf0e2f/491bf81bfa65c44e?&hl=en#491bf81bfa65c44e


This posting is provided “AS IS” with no warranties, and confers no
rights.

poppgs wrote:

I have a 64-bit windows server 2003 system with 8 AMD cpus and 16 gig of
RAM. I want to use about 5 gig of RAM for my own evil purposes. To that
end I have created 3 shared regions. (memory-mapped files backed by the
system page file) 2 of them are 1 gig and one of them is 3 gig in size.
This may seem extreme, but if I’ve got 16 gig of RAM, I’m going to make
use of it.

I have one application which creates the 3 shared regions and “holds them
open”. It is a simple service that does nothing but create the regions and
wait to be shut down. It seems to start up and create the regions with no
problem. (It starts very quickly, calls SetProcessWorkingSetSize to
increase it’s WS to about 6 gig, and calls VirtualLock to lock the region
pages in memory). This all happens in a couple of seconds.

Next, I start up my primary application. It has total success in opening
the shared regions. I set it’s working set size to about 6 gig to
encompass all of the regions, it’s own memory usage, etc. When I attempt
to lock the first region (one gig in size) the application freezes. It
takes about five minutes (yes, five MINUTES) to lock these pages in
memory. When I try to lock the next one gig region, it takes even longer…
ten to fifteen minutes. I don’t know how long it would take to lock the 3
gig region, because I don’t have that kind of patience. This is obviously
not acceptable behaviour.

What in the world is going on here? I come from 15 years in a VMS
background, so go ahead and tell me all about page table entries, page
faults, working set sizes and the like…I understand the concepts.

Why is the second process suffering so much when attempting to lock down
the pages? Task manager does not show that the process is page faulting.
Indeed, it’s physical memory usage grows almost instantaneously. But it
just sits there and it is burning a respectable amount of CPU within the
VirtualLock routine. AMDs CodeAnalyst shows me inside of ntoskrnl.exe at
miLocateWsle very often…what is this? Locate working set locked entry?
Just a guess.

Since these are shared memory sections and I’m locking them into physical
memory, I assumed that, no matter how many processes mapped these regions,
no additional memory would be used up (except for overhead like PTEs and
such). Is this an incorrect assumption?

What is the best solution to this problem? Should my first application NOT
lock the pages, or perhaps should it be the only one to do so? I want to
run a third application to map these regions and work with them as
well…what is the best configuration to set up for something like this?
Is there a way to “steal” memory from Windows and keep it non-paged? Can
it then be shared? That would be ideal.

My “third” application zeros the 3 gig region with memset. This takes an
eternity as well. Here is stack for that thread from process explorer:

ntoskrnl.exe!ExAcquireSharedWaitForExclusive+0x115
ntoskrnl.exe!ExAcquireSharedWaitForExclusive+0x545
ntoskrnl.exe!MmProbeAndLockPages+0x5ce
ntoskrnl.exe!KeSynchronizeExecution+0x41c
ntoskrnl.exe!ExfReleasePushLock+0x1f
ntoskrnl.exe!ExfReleasePushLock+0x112
ntoskrnl.exe!KeStackAttachProcess+0x23a
ntoskrnl.exe!NtClose+0x61b
ntoskrnl.exe!PsReturnProcessNonPagedPoolQuota+0x4fd
ntdll.dll+0x315aa
kernel32.dll+0x30596

That definitely looks like a similar situation.

Several posters have suggesting using large pages. I am attempting to do so, however I keep getting the following error when calling CreateFileMaping with SEC_LARGE_PAGES:
“Not all privileges referenced are assigned to the caller”

I am, of course, enabling the SeLockMemoryPrivilege in the code prior to the attempt and this seems to succeed…and yet the above error happens as though I didn’t enable the privilege. I even tried enabling the privilege for user system and user administrator (that’s me) using the command-line util NTRights.exe. Seemed to succeed, yet the error persists.

Anybody got any ideas?

It means that you tried to make a change to a privilege that is not
assigned to the account in question; that is, you’re trying to enable
something that is assigned. From MSDN:

“The token does not have one or more of the privileges specified in the
NewState parameter. The function may succeed with this error value even
if no privileges were adjusted. The PreviousState parameter indicates
the privileges that were adjusted.”

The easiest way to change this is to use GPEDIT.MSC. Under

Local Computer Policy
Windows Settings
Security Settings
Local Policies
User Rights Assignments
Lock Pages In Memory

mm

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@hotmail.com
Sent: Monday, August 13, 2007 10:30
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Shared mem on 64 bit windows

Several posters have suggesting using large pages. I am attempting to do
so, however I keep getting the following error when calling
CreateFileMaping with SEC_LARGE_PAGES:
“Not all privileges referenced are assigned to the caller”

I am, of course, enabling the SeLockMemoryPrivilege in the code prior to
the attempt and this seems to succeed…and yet the above error happens
as though I didn’t enable the privilege. I even tried enabling the
privilege for user system and user administrator (that’s me) using the
command-line util NTRights.exe. Seemed to succeed, yet the error
persists.

Anybody got any ideas?


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Sorry - My bad - Once enabling the privilege with ntrights one has to LOG OFF AND LOG BACK IN!!!
DOH!

Preliminary results with large pages are very promising. I will update after I have tested over an extended period of time.

Thank you all VERY MUCH for your most expert help.

I’d just like to wrap up this thread with a great big thank you, once again. The large pages flag is like magic!