Mapping userland memory from one process to another.

Hello everyone,

I need help regarding implementing fast ReadProcessMemory. The solution I’m
hoping for is something like shared memory.
I would start by saying that ReadProcessMemory is not fast enough for my
product, and smart caching just doesn’t do it.
I already have a sort of a driver in the product, so that’s why
I’m turning to a kernel solution.
My plain is to make double mapping of the target memory into my reading
process.

From the little research I’ve done, it looks like what need to be done is
as followings:

  1. Attach to the address space of the target process using
    KeStaclAttachProcess
  2. Create appropriate MDL of the target memory using MmInitializeMdl
  3. Make sure the pages are available using MmProbeAndLockPages
  4. Detach from the target process.
  5. Attach back to the reader process
  6. Use MmMapLockedPagesSpecifyCache to map the memory referenced by the MDL
    to the reader process.
  7. Unlock the memory using MmUnlockPages

My questions are:

  1. Is my plan right? Are there any obvious pitfalls in it?
  2. Am I going to damage the paging mechanism? What would happen to memory
    which is paged out?
  3. Am I really going to get better performance than ReadProcessMemory or is
    it just about the same thing?
  4. Aren’t there any APIs more appropriate for this task? I’ve been told
    many times before that modules are loaded only once in memory using the
    Copy-On-Write mechanism, so I thought there would be some APIs just for
    that.

Thanks,
Assaf

What is the goal of this fast ReadProcessMemory? Before you jump in
with this what problem are you trying to solve?

Don Burn
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr

“Assaf Nativ” wrote in message
news:xxxxx@ntdev:

> Hello everyone,
>
> I need help regarding implementing fast ReadProcessMemory. The solution I’m
> hoping for is something like shared memory.
> I would start by saying that ReadProcessMemory is not fast enough for my
> product, and smart caching just doesn’t do it.
> I already have a sort of a driver in the product, so that’s why
> I’m turning to a kernel solution.
> My plain is to make double mapping of the target memory into my reading
> process.
>
> From the little research I’ve done, it looks like what need to be done is
> as followings:
> 1. Attach to the address space of the target process using
> KeStaclAttachProcess
> 2. Create appropriate MDL of the target memory using MmInitializeMdl
> 3. Make sure the pages are available using MmProbeAndLockPages
> 4. Detach from the target process.
> 5. Attach back to the reader process
> 6. Use MmMapLockedPagesSpecifyCache to map the memory referenced by the MDL
> to the reader process.
> 7. Unlock the memory using MmUnlockPages
>
> My questions are:
> 1. Is my plan right? Are there any obvious pitfalls in it?
> 2. Am I going to damage the paging mechanism? What would happen to memory
> which is paged out?
> 3. Am I really going to get better performance than ReadProcessMemory or is
> it just about the same thing?
> 4. Aren’t there any APIs more appropriate for this task? I’ve been told
> many times before that modules are loaded only once in memory using the
> Copy-On-Write mechanism, so I thought there would be some APIs just for
> that.
>
> Thanks,
> Assaf

From the little research I’ve done, it looks like what need to be done is as followings:

  1. Attach to the address space of the target process using KeStaclAttachProcess
  1. Create appropriate MDL of the target memory using MmInitializeMdl
  1. Make sure the pages are available using MmProbeAndLockPages
  1. Detach from the target process.
  1. Attach back to the reader process
  1. Use MmMapLockedPagesSpecifyCache to map the memory referenced by the MDL to the reader process.
  1. Unlock the memory using MmUnlockPages

Now compare it to CreateFileMapping() / OpenFileMapping() in respectively source and target processes,
followed by MapViewOfFile(Ex) () in them both, plus some simplistic synchronization scheme.
Which of these is more reasonable way to go???

Anton Bassov

Sharing memory between processes requires NO kernel component at all! It
has been possible to do this entirely in user mode since Widows NT 3.1.

The methods are
shared data segment executable
shared data segment DLL
memory-mapped file

These are listed in most-restrictive to most-general order.

Shared data segment executable uses compile-time allocation of memory, and
both processes must be running the same executable

Shared data segment DLL has the same restrictions about allocation, but te
executables can be completely different; only the DLL must be the same

Memory-mapped file is the most general, because its size can be
established at runtime. Note that it is very complex to change the size
once it has been set.

If you need to dynamically allocate objects in the shared segment, you
have to write your own allocator. Fortunately, for most cases this is
trivial.

You cannot use C++ collections in the shared segment; there are so many
problems with this it is not worth trying.

You cannot use pointers in te shared space; you must use _based pointers.

I used to give a two-hour lecture on all of this in my course. Since this
is a user-level problem that in no way involves the kernel driver
facilities, send me private email (xxxxx@flounder.com) and I’ll send
you the shared-memory project and one answer.

Note that you cannot use CRITICAL_SECTION for inter-process synchronization.

I suspect that you are seeing this problem as a nail, and it is actually a
lag bolt.
joe

What is the goal of this fast ReadProcessMemory? Before you jump in
with this what problem are you trying to solve?

Don Burn
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr

“Assaf Nativ” wrote in message
> news:xxxxx@ntdev:
>
>> Hello everyone,
>>
>> I need help regarding implementing fast ReadProcessMemory. The solution
>> I’m
>> hoping for is something like shared memory.
>> I would start by saying that ReadProcessMemory is not fast enough for my
>> product, and smart caching just doesn’t do it.
>> I already have a sort of a driver in the product, so that’s why
>> I’m turning to a kernel solution.
>> My plain is to make double mapping of the target memory into my reading
>> process.
>>
>> From the little research I’ve done, it looks like what need to be done
>> is
>> as followings:
>> 1. Attach to the address space of the target process using
>> KeStaclAttachProcess
>> 2. Create appropriate MDL of the target memory using MmInitializeMdl
>> 3. Make sure the pages are available using MmProbeAndLockPages
>> 4. Detach from the target process.
>> 5. Attach back to the reader process
>> 6. Use MmMapLockedPagesSpecifyCache to map the memory referenced by the
>> MDL
>> to the reader process.
>> 7. Unlock the memory using MmUnlockPages
>>
>> My questions are:
>> 1. Is my plan right? Are there any obvious pitfalls in it?
>> 2. Am I going to damage the paging mechanism? What would happen to
>> memory
>> which is paged out?
>> 3. Am I really going to get better performance than ReadProcessMemory or
>> is
>> it just about the same thing?
>> 4. Aren’t there any APIs more appropriate for this task? I’ve been told
>> many times before that modules are loaded only once in memory using the
>> Copy-On-Write mechanism, so I thought there would be some APIs just for
>> that.
>>
>> Thanks,
>> Assaf
>
>
> —
> NTDEV is sponsored by OSR
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

On Feb 27, 2013 8:38 PM, wrote:
>
> Sharing memory between processes requires NO kernel component at all! It
> has been possible to do this entirely in user mode since Widows NT 3.1.

I highly suspect OP knows that. You could probably estimate by his original
question that he’s a seasoned developer. While a short mention of usermode
alternatives is in place, I don’t think you need to elaborate. :slight_smile:

With that said, OP talked about ReadProcessMemory which is a
master-and-slave kind of thing, while all proposed alternatives are
cooperative in nature.

I don’t know shit about anything i user mode, but I *suspect* that under the covers ReadProcessMemory and WriteProcessMemory do almost exactly what you’ve described: Call KeStackAttachProcess. Which, by the way, is definitely *not* a low overhead function.

Peter
OSR

On Wed, Feb 27, 2013 at 8:34 AM, Assaf Nativ wrote:

> 2. Am I going to damage the paging mechanism? What would happen to memory
> which is paged out?
>

Unless you code in some bugs you shouldn’t damage anything. Paged out
memory gets paged back in as part of MmProbeAndLockPages.

> 3. Am I really going to get better performance than ReadProcessMemory or
> is it just about the same thing?
>

Probably not. I mean it is possible that the implementation sucks and yours
will be orders of magnitude better, but it is not likely that is the case.

Mark Roddy

It’s a Really Bad Idea to attach a thread of one user mode process to another. What if the newly attached process gets terminated?

On 2/27/2013 5:43 PM, xxxxx@broadcom.com wrote:

It’s a Really Bad Idea to attach a thread of one user mode process to another. What if the newly attached process gets terminated?

Unless you have acquired rundown protection the attaching thread will
hang indefinitely. However, I see that nt!MiReadWriteVirtualMemory does
not acquire explicit rundown protection. However, I think having an
open handle to a process establishes implicit rundown protection. At
least some place I remember reading that the process address space isn’t
torn down as long as there is still an open handle to the process. Or
am I dreaming?

When the process you attached to terminates, most of the resources associated with that process will get freed, but because you still have a reference to the process object you can safely do whatever cleanup is necessary (e.g. release any MDLs you built for the process). Rundown protection is an internal OS mechanism and is not necessary for this.

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of George M. Garner Jr.
Sent: Wednesday, February 27, 2013 3:40 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Mapping userland memory from one process to another.

On 2/27/2013 5:43 PM, xxxxx@broadcom.com wrote:

It’s a Really Bad Idea to attach a thread of one user mode process to another. What if the newly attached process gets terminated?

Unless you have acquired rundown protection the attaching thread will hang indefinitely. However, I see that nt!MiReadWriteVirtualMemory does not acquire explicit rundown protection. However, I think having an open handle to a process establishes implicit rundown protection. At least some place I remember reading that the process address space isn’t torn down as long as there is still an open handle to the process. Or am I dreaming?

>When the process you attached to terminates, most of the resources associated
with that process will get freed, but because you still have a reference to the
process object you can safely do whatever cleanup is necessary (e.g. release any
MDLs you built for the process).

So this AttachProcess won’t make the thread (requested) terminated when the attached process gets killed?

AFAIK, the OP has not explained the purpose of ths sharing, and it looks
like an overly-complex solution is being designed for what is a truly
trivial problem.

What is the purpose of this sharing and why aren’t the existing,
documented, supported user-level mechanisms insufficient?
joe

>When the process you attached to terminates, most of the resources
> associated
with that process will get freed, but because you still have a reference
to the
process object you can safely do whatever cleanup is necessary (e.g.
release any
MDLs you built for the process).

So this AttachProcess won’t make the thread (requested) terminated when
the attached process gets killed?


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Why is everyone insisting on debating the merits of types of feathers and
glue when we don’t even know what the problem is. I think what is desired
is already supported at user level and all this discussion is irrelevant.
joe

When the process you attached to terminates, most of the resources
associated with that process will get freed, but because you still have a
reference to the process object you can safely do whatever cleanup is
necessary (e.g. release any MDLs you built for the process). Rundown
protection is an internal OS mechanism and is not necessary for this.

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of George M. Garner
Jr.
Sent: Wednesday, February 27, 2013 3:40 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Mapping userland memory from one process to another.

On 2/27/2013 5:43 PM, xxxxx@broadcom.com wrote:
> It’s a Really Bad Idea to attach a thread of one user mode process to
> another. What if the newly attached process gets terminated?
>

Unless you have acquired rundown protection the attaching thread will hang
indefinitely. However, I see that nt!MiReadWriteVirtualMemory does not
acquire explicit rundown protection. However, I think having an open
handle to a process establishes implicit rundown protection. At least
some place I remember reading that the process address space isn’t torn
down as long as there is still an open handle to the process. Or am I
dreaming?


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

> On 2/27/2013 5:43 PM, xxxxx@broadcom.com wrote:

> It’s a Really Bad Idea to attach a thread of one user mode process to
> another. What if the newly attached process gets terminated?
>

Unless you have acquired rundown protection the attaching thread will
hang indefinitely. However, I see that nt!MiReadWriteVirtualMemory does
not acquire explicit rundown protection. However, I think having an
open handle to a process establishes implicit rundown protection. At
least some place I remember reading that the process address space isn’t
torn down as long as there is still an open handle to the process. Or
am I dreaming?

Probably dreaming. We have no specification of the problem to be solved,
and I could have implemented interprocess shared memory in less time than
it took to write the original post, and I suspect, in the absence of any
actual specification, that it would be a complete solution to the problem.
But since this was a “How do I implement this solution?” question instead
of the sensible “Here is my problem, how do I solve it?” question, it is
kind of hard to tell.
joe


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

First, let me thank everyone for the the kind help.

The purpose of the mapping is for implementing a monitoring process for security measures over another process. In my scenario I have a very important process running in the system that should have no downtime whatsoever. My product scan the target’s memory in order to get out some internal information from it, such as the user currently logged in, in the application. This is a security product, not a virus.

From what I was told in personal response to my post, is that when memory is mapped using MDL, Windows don’t manage the memory in the same way.

I’ll try to refine my question:
Lets say I want to map part of the heap of another process (target process) to my process (reader process).
Lets also say that this part of the heap is found at virtual address 0x10000 at the target process address space.
I want the same memory to be mapped to the reader process at any virtual address. I want this memory to keep behaving the same way any other piece of userland memory is behaving under Windows, which means that if it is not used for some time neither from the target process nor the reader process, Windows might or might not choose to page it out as it usually does. And if it is paged out Windows would page it in at any attempt to access it from either the reader or the target.
I think that memory mapped file is has about this kind of behavior, I just want to make memory that is not mapped file to act the same.
Is there a way to do such a thing?
Am I missing something?

Basically, what you are describing above is just exactly how it all actually works. IIRC, MapViewOfFileEx() allows you to specify the virtual address to which pages should be mapped, and if the page is swapped out it will be brought into RAM at the very first attempt to access it, for understandable reasons. In any case, if you don’t want memory to be swapped out you can lock it with LockVirtualMemory()…

In other words, the question stays open - what is the point of driver involvement here if use of UM API allows all your objectives to be met in just a few lines of code ??? As Joe pointed out already, it may take longer to type the post that describes your requirements, compared to actually implementing a scheme that meets them…

Anton Bassov

Thanks Anton.
It sure is how MapViewOfFileEx works, only I don’t want to map a file between two processes. I want to map the heap of another process to my process’s address space. The other process is already running, and I’m not allowed to patch it.
I understand that mapping using MDL might not give me the desired result, is that so?
The point of the driver is just to make the mapping possible. The point of the entire thing is to make a monitoring process for security purposes.

Assaf

You can get the currently-logged-in user without sharibg memory; this can
be obtained from the process handle.

You will have serious problems mapping the heap into your address space,
because you will never know when it is stable. There is no way to
synchronize your monitoring program’s access to the heap with the target
process’s access to te heap, and, since the heap is expanding in multiple
discontiguous sets of pages, one mapping won’t do it, and there’s no good
way to notice that the heap has been extended. The heap is NOT implmented
as a single, contiguous block of pages.

You could also implement interprocess messaging to query know values;
trying to decrypt critical information from a heap, even if it were a
single block of contiguous pages, requires far more magic than could be
sensible. I’ve had the misfortune to have to debug heap problems using a
debugger to explore the heap, and I cannot imagine having to write code
that did what I did.

There are major problems in dealing with pointers, which often do not
point into the heap (for example, string literals are pooled in the code
segment). And if you have a structure you want to read, you have no idea
if you are getting consistent data. Consider something as simple as an
array or vector. Your app calls “qsort” on it. Your monitoring app has
no way to know this, so what it sees at any given instant may be a
partially-correct structure which is in te middle of being sorted, and may
have an element missing entirely, or have two pointers to the same
element. And those are just the simplest of the failure modes. Without
an inter-process synchronization mechanis to protect the heap (nearly
impossible to implement correctly), your monitoring process cannot be
expected to work correctly.

Note also that the block of memory will map into your process with a start
address quite different than the address in the target. There is no way
to force it to map at a specific address (and you would still have to map
the multiple discontiguos heap segments, and if you had a pointee, you
would, in effect, have to map it to a block-relative address, then, in
your monitoring app, convert it to a true 32/64-bit address in your
monitoring process’s address space. You would have to know which of the
many discontiguous heap blocks it referred to, convert it relative to that
block’s offset, and then, knowing the block, convert it back to an address
in your process’s address space.

While all this is doable, you end up in te Turing Tarpit, in which all
things are possible if you know every single problem, but ultimately
nothing is easy.
n
So explain how being able to examine the heap can be done in a thread-safe
fashion, and how you anticipate that reading (but not modifying) it can
convey useful information in a way that, as I said above, can improve its
uptime.

I’ve engineered several “non-stop” systems in my career, and I find it
hard to understand how looking at the heap can help. In fact, in all my
recovery code, I had to treat the heap as “damaged goods” and simply reset
it to have all elements free and re-create the strucures “from scratch”,
and retrieve necessary state from the transactions on the disk (in a set
of fies) or state in its co-process, or, in one case, in metadata I stored
in te kernel (since I “owned” both the kernel component and the
application (neither f which I had created) it was easy for me to add the
necessary IOCTLs to the kernel to set and get the metadata). In the
co-process model, a graceful shutdown of either component sent a “I’m
going away” message to its partner. Each partner could restart te other.
Each also waited on the process handle of its partner. When this broke
loose, the rocess would signal all of the components, “Daddy gone bye-bye”
for te secondary process and “Child ran away” was broadcast to various
components of the primary process. A photo of the process was put on a
milk carton (“Have you sseen this process?”) and the parent process
restarted its child. If the parent process failed abruptly, the child
process went to the nice men in blue coats, who took it to their station,
bought it ice cream, and waited for the parent to restart. Again, each
coud do this without knwing anything about the other’s heap. So I am very
suspicious about why this mapping is necessary, and why you think it is
even doable, and what you hope to get from it. I think a much simpler
mechanism might solve the same problem, which we still don’t know.
joe
How does examining the heap improve the reliability of the target program
and thus increase its uptime? I have managed to create service code
thatsimply couldn’t fail without once needing to consult the heap, even
from within the process. The code had a “hard failure” (unrecoverable
error, typically a memory access error) at least once a day–we could only
explain it when examining the dumps by postulating transient memory
failure–but when I finished the rewrite, no one ever knew that such
failures happened; it was “self-healing”.

The techniques I describe will not give you access to the heap, but I
believe that such access is probably unnecessary, and probably impossible
to do “right”. In one case, I had two mutually-dependent processes, and
if either oe failed, the other knew how to restart it and re-create all
necessary state in its companion. Again, if it failed (turns out there
was a bug in a third-party library that they said was too hard to fix),
the users never really saw the failure; the GUI component might disappear
for 30 seconds, but when it cameback it had all the same windows up, and
they contained all the data they had previously contained; if the crash
happened in the middle of a transaction, it treated it as a true
“transaction”, that had either been committed or rolled back. All done by
interprocess message protocols.

First, let me thank everyone for the the kind help.

The purpose of the mapping is for implementing a monitoring process for
security measures over another process. In my scenario I have a very
important process running in the system that should have no downtime
whatsoever. My product scan the target’s memory in order to get out some
internal information from it, such as the user currently logged in, in the
application. This is a security product, not a virus.

From what I was told in personal response to my post, is that when memory
is mapped using MDL, Windows don’t manage the memory in the same way.

I’ll try to refine my question:
Lets say I want to map part of the heap of another process (target
process) to my process (reader process).
Lets also say that this part of the heap is found at virtual address
0x10000 at the target process address space.
I want the same memory to be mapped to the reader process at any virtual
address. I want this memory to keep behaving the same way any other piece
of userland memory is behaving under Windows, which means that if it is
not used for some time neither from the target process nor the reader
process, Windows might or might not choose to page it out as it usually
does. And if it is paged out Windows would page it in at any attempt to
access it from either the reader or the target.
I think that memory mapped file is has about this kind of behavior, I just
want to make memory that is not mapped file to act the same.
Is there a way to do such a thing?
Am I missing something?


NTDEV is sponsored by OSR

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Wow, quite an answer.

The data the monitoring process is inquiring at the target process is found on the heap. It is not just the user name (Which is not the OS user name, but the application user name), but great deal of information that the monitoring process is interesting in.
I have a very good understanding of the internal structures used by the target process, and that include a way to validate the data at real time, and a way of knowing if the memory is valid or just… garbage.
As for the addresses translation, I can handle that using a translation map/table. In general that’s not what worries me.
I don’t have any control over the code of the target process, I just have information about the internal structures. If I had a way to change the code of the target, I could have done many other things to haste the memory scan. This is not the case, though.

Hello,

beside the fact that I don’t really understand what “validation” you will perform, and what actions you will do in “invalid” data. And also I’m wondering how you will synchronize access to the other heap, which may not by read-only.

Have you considered injecting a Thread/DLL to the target process? Then you can access the heap directly. So you only issue requests from the monitoring process, and recieve results.

GP