> beside the fact that I don’t really understand what “validation” you will perform, and what actions
you will do in “invalid” data
Yes, it sounds more of a classical malware, rather than the “security product”. It looks like the OP tries to modify the behavior of the target process (which is generally known as a subversion) while remaining totally transparent to it. This is why he does not want to modify its address space in the userland, just in case the target implements some scheme that tries to fight the subversion (for example, terminates right on the spot if it detects any external involvement with itself), and, instead, prefers to involve a driver. Furthermore, it is obvious that the OP gained a lot of knowledge of this application via disassembly - it even knows the binary layout of its internal structures.
The very first possible objective of this “security product” that gets into my head is capturing the sensitive data (for example, in some banking or trading application)…
When the process you attached to terminates, most of the resources
> associated with that process will get freed, but because you still
> have a reference to the process object you can safely do whatever
> cleanup is necessary (e.g. release any MDLs you built for the
> process). Rundown protection is an internal OS mechanism and is not
> necessary for this. <
I believe actually that it was the KeStackAttachProcess which hung when
called while a process object was still valid (thanks to addref) but the
process had begun to exit. My previous statements were based on
empirical observation. I am sure that what you say about releasing
MDL’s is true; however, in the cases where things went bad we never
actually made it that far due to the hang mentioned above. Out of
curiousity, are you saying that Windows will not generate a
PROCESS_HAS_LOCKED_PAGES or DRIVER_LEFT_LOCKED_PAGES_IN_PROCESS if the
pages are locked by an *attached* thread? http://support.microsoft.com/kb/256010.
From what I was told in personal response to my post, is that when memory is mapped using MDL, Windows don’t manage the memory in the same way.
Locking the memory with a MDL prevents the virtual addresses from being
paged. You need to lock the pages down if you are going to access the
memory from elevated IRQL. If you will only be accessing the user mode
pages at passive level you may be able just wrap the access in a
__try…__except block and handle to exception if the memory is bad. A
friend of mine tried valiantly to get the MDL approach to work without
crashing or hanging and eventually abandoned the approach in favor of a
simple __try…__except block. Of course he was testing the design
under some extreme, albeit possible, conditions, which is more than a
lot of people do.
Directly accessing the memory of another arbitrary process from a driver
entails some considerable risk. I do not fully understand why you
cannot do this from user mode using one of the published API’s (e.g.
ReadProcessMemory). Perhaps a security application shouldn’t trust user
mode. However, from what I gather you intend to perform the
verification from user mode anyway. So at that point I don’t see how it
really matters whether you initiate the access to the remote address
space in an IOCTL or in a system service call. Either approach is just
as easily subverted.
I’m going to access the memory from userland and userland only. The kernel
part is just to make the memory available to my process.
Or in other words, what I need is exactly ReadProcessMemory, but with
better performance. What happens in ReadProcessMemory is:
A system call
Attaching to the address space of another process
Memcopy
Attaching back to the address space of the calling process
Another memcopy
Return to user space.
My product has to scan a specific structure that is found on the heap of
the target process, and it has to do it fast, and it has to do it again n’
again.
By mapping the memory of the target process to mine I hope to eliminate
points 1 to 6 for every read, and replace them with just about one IOCTL
that I’ll implement, and a standard memory read (or memory read + address
translation) per read.
If I understand what you were saying, I don’t have to lock the memory at
all, I just need to:
Attach to the address space of the target process using
KeStaclAttachProcess
Create appropriate MDL of the target memory using MmInitializeMdl
Detach from the target process.
Attach back to the reader process
Use MmMapLockedPagesSpecifyCache to map the memory referenced by the MDL
to the reader process.
It looks like MmMapLockedPagesSpecifyCache handles only locked pages, is
that so?
I thought MDL can only relate to physical memory, and the heap of the
target process might be paged out, is that ok?
The fact that I’m going to access this memory only from userspace means
that Windows is going to take care for the rest?
> From what I was told in personal response to my post, is that when
> memory is mapped using MDL, Windows don’t manage the memory in the same way.
>
Locking the memory with a MDL prevents the virtual addresses from being
paged. You need to lock the pages down if you are going to access the
memory from elevated IRQL. If you will only be accessing the user mode
pages at passive level you may be able just wrap the access in a
__try…__except block and handle to exception if the memory is bad. A
friend of mine tried valiantly to get the MDL approach to work without
crashing or hanging and eventually abandoned the approach in favor of a
simple __try…__except block. Of course he was testing the design under
some extreme, albeit possible, conditions, which is more than a lot of
people do.
Directly accessing the memory of another arbitrary process from a driver
entails some considerable risk. I do not fully understand why you cannot
do this from user mode using one of the published API’s (e.g.
ReadProcessMemory). Perhaps a security application shouldn’t trust user
mode. However, from what I gather you intend to perform the verification
from user mode anyway. So at that point I don’t see how it really matters
whether you initiate the access to the remote address space in an IOCTL or
in a system service call. Either approach is just as easily subverted.
Userland address is valid only in context of a given process. Therefore, if you want to access the target memory in any other context you have to map it into the address space you want to access it in (for drivers it is normally kernel address space), and this is what locking down pages in MDL is for - you need
to provide a pointer to locked-down MDL as a parameter to MmMapLockedPagesSpecifyCache() . In this particular situation it has absolutely nothing to do with IRQL-related constraints
Locking pages in MDL has to be done in __try…__except block anyway - just consider what happens if the
userland code frees the memory while you are trying to lock it down in a driver
If this is, indeed, the case, you can simply inject a DLL into your target process, and this DLL will allow you to
monitor it right from within. Simple, ugh. So why don’t you want to do it and, instead, want to use a driver,
unless total transparency to the target process is one of your main objectives???
George Garner, I am not sure that your friend is correct.
AFAIR MSDN doesn’t say it explicitly, but mapping MDL allocates virtual memory and fills PTEs that map this memory region with PFNs from MDL. The problem here is that connection between original PTEs and ‘your’ PTEs is ‘weak’. It means that original PTEs may be invalidated and filled with different PFNs later on. But ‘your’ PTEs will not be updated with these new PFNs, so they will point to the previous physical memory location. Which may affect arbitrary process.
I would like to be corrected if I am wrong.
Assaf, I don’t think that you will eliminate points 1-6. You have to call NtDeviceControlFile, which is a system call. NtDeviceControlFile may be slower than ReadProcessMemory since the latter doesn’t create IRP, while NtDeviceControlFile does. You have to attach to target process in any case. And detach too. And for sure you will return from NtDeviceControl to user mode since it is a system call. Do not forget that you have to call IoCompleteReuqest, so it is not only about IRP allocation, but also about IRP termination.
> Therefore, if you want to access the target memory in any
> other context you have to map it into the address space you
> want to access it… <
Or you can attach to the other process and access the memory within the
context of the other process, which is the paradigm upon which my
comments were based. You are right in saying that
MmMapLockedPagesSpecifyCache requires a MDL and you have to build a MDL
for the user memory if you want to use that API. I don’t remember that
particular API being mentioned previously in this thread and am a bit
perplexed as to where it came from or why you would want to use it in
this context. ReadProcessMemory doesn’t map the remote address into the
calling process. It attaches to the remote process and copies the
target memory into a buffer supplied by the caller. As Peter pointed
out, the KeStackAttachProcess is what consumes CPU cycles. And it is
going to do so whether or not you call ReadProcessMemory from user mode
or KeStackAttachProcess from a driver. So I am a bit mystified as to
why the OP believes that using an IOCTL will improve performance over a
system service call. Nor am I aware of any practical way for the OP to
map the remote user addresses into the address space of the calling
process without, directly or indirectly, passing through
KeStackAttachProcess.
What the OP is proposing entails a considerable level of risk. I don’t
see any way around that.
> Locking pages in MDL has to be done in
> __try…__except block anyway … <
True. But that is only if you use a MDL in the first place.
Indeed DLL injection could have been a good solution to this kind of
problem, in most cases.
But in my case it is not possible, and that’s for the following reasons:
I’m not allowed to impose any risk to the target process, and DLL
injection is, probebly, more dangerous (I know we can argue about that).
The monitoring process, is doing many other things, such as patterns
evaluating, reporting problems to a another server, updating a database
about statistics of the target… and so on. Therefor my monitoring process
is quite huge and hard to inject into another process as a hole.
I have to be able to shutdown the monitoring process without risking the
target.
I’m not allowed to take CPU time from the target process. I get a single
CPU to run on the machine, and I’m not allowed to exceed that.
I might need to attach to more than one process in the future, and I
think that memory mapping would be a better design for that.
Anyhow, It doesn’t really matter, why, I just don’t want to inject a DLL.
This memory mapping thing might come as useful for plenty of other
things, if not for that.
Mika,
Why would I have to call NtDeviceControlFile? As far as I understand it,
and according to what other people wrote here, I’ll have a normal userland
memory that is just mapped to two processes. Offcourse, I’ll have to make a
one IOCTL to make the mapping, but after that I’ll access the memory as if
it is any other part of memory in userland. Please explain why you think
it’s not going to work like that.
On Thu, Feb 28, 2013 at 5:45 PM, wrote:
> George Garner, I am not sure that your friend is correct. > > AFAIR MSDN doesn’t say it explicitly, but mapping MDL allocates virtual > memory and fills PTEs that map this memory region with PFNs from MDL. The > problem here is that connection between original PTEs and ‘your’ PTEs is > ‘weak’. It means that original PTEs may be invalidated and filled with > different PFNs later on. But ‘your’ PTEs will not be updated with these new > PFNs, so they will point to the previous physical memory location. Which > may affect arbitrary process. > > I would like to be corrected if I am wrong. > > Assaf, I don’t think that you will eliminate points 1-6. You have to call > NtDeviceControlFile, which is a system call. NtDeviceControlFile may be > slower than ReadProcessMemory since the latter doesn’t create IRP, while > NtDeviceControlFile does. You have to attach to target process in any case. > And detach too. And for sure you will return from NtDeviceControl to user > mode since it is a system call. Do not forget that you have to call > IoCompleteReuqest, so it is not only about IRP allocation, but also about > IRP termination. > > — > NTDEV is sponsored by OSR > > OSR is HIRING!! See http://www.osr.com/careers > > For our schedule of WDF, WDM, debugging and other seminars visit: > http://www.osr.com/seminars > > To unsubscribe, visit the List Server section of OSR Online at > http://www.osronline.com/page.cfm?name=ListServer >
> …but mapping MDL allocates virtual memory and fills PTEs that map
> this memory region with PFNs from MDL. <
I presume that you are referring to a page table which may get allocated
if the calling process is out of address space. The scenario to which I
was referring does not actually map the remote addresses into the
calling process. It attaches to the remote process and copies the
memory into a buffer supplied by the caller.
You could, of course, with some effort, map the remote address into the
calling process as every one keeps suggesting. However, to do so you
are going to have to attach to the remote process some where along the
line and thereby incur the same performance hit as ReadProcessMemory.
Moreover, if you do probe and lock the remote pages and map them into
the calling process you need to make damn sure the remote process
doesn’t exit while the pages are locked (unless you like seeing the
PROCESS_HAS_LOCKED_PAGES bugcheck).
If you attach to the remote process and access the remote user address
at passive level the page fault handler will bring the page back into
memory if the VA is paged. No doubt the data may be brought back in to
a different physical address. But that is only a problem if you
actually map the remote address into the OP’s process.
You call NtDeviceControlFile to send the IoCTL (sorry, I had to say that this is native function that serves DeviceIoControl in your case). And there is some overhead with the function. I didn’t measure it though. So you have to do the same amount of DeviceIoControl calls as amount of ReadProcessMemory calls. And every call builds and completes an IRP.
AFAIR ReadProcessMemory is adaptive, dependently on amount of bytes it decides just to copy these bytes to a kernel buffer and then to a user buffer, or to build an MDL to transfer bytes in a more short way (no sense to build MDLs for small buffers since building MDL also has price).
George Garne,
The scenario to which I
was referring does not actually map the remote addresses into the
calling process.
Ah, then this is not what Assaf asks about.
Moreover, if you do probe and lock the remote pages and map them into
the calling process you need to make damn sure the remote process
doesn’t exit while the pages are locked (unless you like seeing the
PROCESS_HAS_LOCKED_PAGES bugcheck)
This is exactly what we talk about. How big the overhead is? Is it possible to avoid it? Assaf also was warned about process termination and the bugcheck.
But that is only a problem if you
actually map the remote address into the OP’s process.
Yes, that was the idea – to map remote process’s memory into process-monitor to make memory reads quick. But then you have to lock pages. And unlock them.
I’m not allowed to impose any risk to the target process, …
Like unsynchrionized random access to Heap-Structures, … mapped by the Kernel
The monitoring process, is doing many other things, such as patterns
evaluating, reporting problems to a another server, …
Therefor my monitoring process is quite huge and hard to inject
into another process as a hole.
It’s not about injecting the whole thing, you would control your injection-logic by a Monitoring-Process, just the scanning would happen “in-place”. You would communicate with a well-definied
interface (like Pipes, Sockets, …) with it.
I have to be able to shutdown the monitoring process without risking the
target.
I don’t see why this should not be possible.
I’m not allowed to take CPU time from the target process. I get a single
CPU to run on the machine, and I’m not allowed to exceed that.
Sorry, but we live in a world of limited resources. So your “whatever-processing” needs CPU, regardless where this thread runs. And if you are in a complete seperate (but not seperate) address-space, the whole translation-layer is likely to consume a whole lot (if not most) of it.
I might need to attach to more than one process in the future, and I
think that memory mapping would be a better design for that.
So you think…
Anyhow, It doesn’t really matter, why, I just don’t want to inject a DLL.
I think Anton already told us:
“Yes, it sounds more of a classical malware, rather than the “security product”. It looks like the OP tries to modify the behavior of the target process (which is generally known as a subversion) while remaining totally transparent to it.”
This memory mapping thing might come as useful for plenty of other
things, if not for that.
I would say the whole idea is doomed from the start.
The purpose of the mapping is for implementing a monitoring process for security measures over another process. In my scenario I have a very important process running in the system that should have no downtime whatsoever. My product scan the target’s memory in order to get out some internal information from it, such as the user currently logged in, in the application. This is a security product, not a virus.
Hmm. This may be what your boss told you, but…
all this looks strikingly similar to something else.
Furthermore, it is obvious that the OP gained a lot of knowledge of this application via disassembly - it even knows the binary layout of its internal structures.
Or this app uses a well known opensource library, like openssl
– pa
I don’t like the these allegations. I do not write malware. Beside, I’m
having hard time thinking of a reason for someone to care
about performance when writing a malware. This is a legit security
monitoring service, which I can’t tell more about because of NDA.
Nevertheless, this is a technical question, why do everyone keep on digging
on the purpose of the code. Some people, some times asks questions
just because they want to know better.
On Fri, Mar 1, 2013 at 2:43 AM, Pavel A. wrote:
> On 28-Feb-2013 15:38, xxxxx@hotmail.com wrote: > > Furthermore, it is obvious that the OP gained a lot of knowledge of this >> application via disassembly - it even knows the binary layout of its >> internal structures. >> >> > Or this app uses a well known opensource library, like openssl > > – pa > > > — > NTDEV is sponsored by OSR > > OSR is HIRING!! See http://www.osr.com/careers > > For our schedule of WDF, WDM, debugging and other seminars visit: > http://www.osr.com/seminars > > To unsubscribe, visit the List Server section of OSR Online at > http://www.osronline.com/page.**cfm?name=ListServerhttp: ></http:>