About PTEs, VADs and PFN database

Aditya_Shrivastava · October 12, 2009, 2:14pm

Past few days I read so many articles on memory management. Following is what I understand from these what I did’nt.

Virtual address to physical address translation is achieved using page directory, page table and page frame structures. Intel manual mentioned about register CR3 which points to the start of per process page directory. A PTE can point to a page in RAM, page file or the mapped file where in later most case a 28 bit value is used to fetch the name of the mapped file having the data. No confusion.

Next comes VAD which actually are data structures stored by Memory manager to represent address space of a process plus it maintains a tree for every process.

What I did not understand here is that why at all a separate data structure is required if complete address space can be accessed using PTEs.

MSDN talks about keeping track of the used and unused addresses of a process, also mentions VirtualCreate to reserving memory in VAD?s context. OSR book (regarding VADs) talks about demand paging and also said that ?In addition, it must also be able to obtain the contents of that page from its current storage location?. Isn?t this possible using PTEs? Especially when PTE can point to the page file or mapped file. Windows internals talks about both.

What I am missing here?

Last is page frame database again managed by memory manager to maintain state of every physical page in system. An article on msdn says that ?each page-frame entry in the database reverse-reference its corresponding PTE?.

What will happen here in case of process switch. For instance if one particular physical page is referred by two process; and at a given time process A is active, so page-frame entry will point to the PTE of process A. After some time when context change to process B, will MM modifies all page frame entries to point to PTEs of process B?

Thanks for your time and patience,
Aditya

Prokash_Sinha-1 · October 12, 2009, 3:02pm

> What will happen here in case of process switch. For instance if one particular physical page is referred by two process; and at a given time process A is active, so page-frame entry will point to the PTE of process A. After some time when context change to process B, will MM modifies all page frame entries to point to PTEs of process B?

That would be inefficient. Off hand, without reading or refreshing my
memory, page frame related stuff would be created and or populated, at
least partially if not totally when the B process was first
instantiated. From then on, PTE of process B pointing to the same page
should be fine. If you are thinking about whether it would be resident
or not since it was resident when process A was switched off, perhaps
depends of lot of thing, since it could be that the same page may not
even be in the working set. And even if it is, it may not be resident…
only access pattern is our friend here I guess!

-pro

Ken_Johnson · October 12, 2009, 5:09pm

Among other things, the concept of VADs allows user mode programs to allocate or reserve potentially very large address spans without MM needing to create all of the intervening P*E’s up front. If the program never touches part of a reservation then there was no need to create those page table entries.

A single VAD can span an arbitrarily large region whereas many PTEs/PDEs/PPEs/PXEs might be required to fully realize the hardware view of that address range.

VADs also perform other bookkeeping functions such that MM knows the size of an allocation region, etc.

(This information subject to change with future releases and soforth.)

S

-----Original Message-----
From: xxxxx@gmail.com
Sent: Monday, October 12, 2009 11:14
To: Windows System Software Devs Interest List
Subject: [ntdev] About PTEs, VADs and PFN database

Past few days I read so many articles on memory management. Following is what I understand from these what I did’nt.

Virtual address to physical address translation is achieved using page directory, page table and page frame structures. Intel manual mentioned about register CR3 which points to the start of per process page directory. A PTE can point to a page in RAM, page file or the mapped file where in later most case a 28 bit value is used to fetch the name of the mapped file having the data. No confusion.

Next comes VAD which actually are data structures stored by Memory manager to represent address space of a process plus it maintains a tree for every process.

What I did not understand here is that why at all a separate data structure is required if complete address space can be accessed using PTEs.

MSDN talks about keeping track of the used and unused addresses of a process, also mentions VirtualCreate to reserving memory in VAD?s context. OSR book (regarding VADs) talks about demand paging and also said that ?In addition, it must also be able to obtain the contents of that page from its current storage location?. Isn?t this possible using PTEs? Especially when PTE can point to the page file or mapped file. Windows internals talks about both.

What I am missing here?

Last is page frame database again managed by memory manager to maintain state of every physical page in system. An article on msdn says that ?each page-frame entry in the database reverse-reference its corresponding PTE?.

What will happen here in case of process switch. For instance if one particular physical page is referred by two process; and at a given time process A is active, so page-frame entry will point to the PTE of process A. After some time when context change to process B, will MM modifies all page frame entries to point to PTEs of process B?

Thanks for your time and patience,
Aditya

—
NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

anton_bassov · October 12, 2009, 5:40pm

> What I did not understand here is that why at all a separate data structure is required if

complete address space can be accessed using PTEs.

PTE is CPU-level data structure that gets set up in a page table by the system so that a physical page it describes can be accessed by CPU in protected mode. However, the system does not have to make the whole address space available - it can set up valid PTEs only for the specific memory regions, and VADs describe these regions. Furthermore, the system does not necessarily have to back up valid regions with physical pages straight away. Instead, it can implement demand paging. When the target process accesses a region and page is not present, page fault exception will get raised. Therefore, the system will examine VADs, see that the target address is valid, allocate a physical page for it and set up Present bit in PTE, and at this point, CPU will be able to acess this address without raising exceptions.

What will happen here in case of process switch. For instance if one particular physical page
is referred by two process; and at a given time process A is active, so page-frame entry will
point to the PTE of process A. After some time when context change to process B, will MM
modifies all page frame entries to point to PTEs of process B?

There is no need to play around with page tables in order to do the above- all that the system has to do is simply to load another page directory into CR3…

Anton Bassov

Prokash_Sinha-1 · October 12, 2009, 8:22pm

Prokash Sinha wrote:

> What will happen here in case of process switch. For instance if one
> particular physical page is referred by two process; and at a given
> time process A is active, so page-frame entry will point to the PTE
> of process A. After some time when context change to process B, will
> MM modifies all page frame entries to point to PTEs of process B?
>
>
That would be inefficient. Off hand, without reading or refreshing my
memory, page frame related stuff would be created and or populated, at
least partially if not totally when the B process was first
instantiated. From then on, PTE of process B pointing to the same page
should be fine. If you are thinking about whether it would be resident
or not since it was resident when process A was switched off, perhaps
depends of lot of thing, since it could be that the same page may not
even be in the working set. And even if it is, it may not be
resident… only access pattern is our friend here I guess!

-pro
Sorry, I took the hazard of guess!. For dynamically allocated, and for
sparse large address space creation, the VAD is essential, and
definitely not the full page table needs to be created. As both Ken, and
Anton eluded to that already.

I think if you look at the Windows internal book, and NT file system
books, ply thru the sections couple time, things should fall in place
together.

-pro

Pavel_Lebedinsky · October 13, 2009, 1:50am

In addition to what Anton and Skywing said…

An article on msdn says that each page-frame entry in the database
reverse-reference its corresponding PTE.

What will happen here in case of process switch. For instance if one
particular physical page is referred by two process; and at a given time
process A is active, so page-frame entry will point to the PTE of
process A. After some time when context change to process B,
will MM modifies all page frame entries to point to PTEs of process B?

No. For shared pages, the PFN’s PteAddress field refers to a
prototype PTE allocated in paged pool, not to any of the actual
PTEs mapping it. You can see this in debugger:

lkd> !process 0 0 services.exe
PROCESS fffffa8003545060
lkd> .process /P /r fffffa8003545060
lkd> !pte ntdll
VA 0000000076d40000

<…>PTE at FFFFF680003B6A00 <…> pfn 2abea

lkd> !pfn 2abea
PFN 0002ABEA at address FFFFFA8000803BE0
flink 000003F2 blink / share count 0000002A pteaddress
FFFFF8A00028B058
reference count 0001 used entry count 0000 Cached color 0
Priority 6
restore pte FA80028E47A00420 containing page 02AD61 Active
P
Shared

lkd> !pool FFFFF8A00028B058 2
Pool page fffff8a00028b058 region is Paged pool
*fffff8a00028b000 size: e00 previous size: 0 (Allocated) *MmSt
Pooltag MmSt : Mm section object prototype ptes, Binary :
nt!mm

–
Pavel Lebedinsky/Windows Kernel Test
This posting is provided “AS IS” with no warranties, and confers no rights.

Aditya_Shrivastava · October 13, 2009, 2:38am

@Pavel

Thanks for inputs but I did not meant this. based on my understanding it is possible for two virtual addresses in two different process to point to same physical PAGE, even though this is *not shared*.
For example process A’s virtual address 0x00100000 and process B’s virtual address 0x00200000 can point to same physical page.

I asked the question based on this understanding. Kindly correct me if I am wrong. If the scenario is fine than my original question was that “as page frame entries have a reverse reference to PTEs, will MM validates these entries on every process switch”.

Thanks
Aditya

Aditya_Shrivastava · October 13, 2009, 2:39am

@Pro

True, even I thought that it would be inefficient and hence asked the question.

@Skywing

Thanks for input, it helps.

Aditya

Aditya_Shrivastava · October 13, 2009, 2:46am

@Anton

>"Therefore, the system will examine VADs, see that the target address is valid, allocate a physical page for it and set up Present bit in PTE, and at this point, CPU will be able to acess this address without raising exceptions. "

Thanks a lot, I think this is what I was looking for.

>There is no need to play around with page tables in order to do the above- all that the system has to do is simply to load another page directory into CR3…

I understand that but I did not ask this “How different process address spaces will be managed?”. I read following at MSDN

“The virtual-memory manager uses a private data structure for maintaining the status of every physical page of memory in the system. The structure is called the page-frame database. The database contains an entry for every page in the system, as well as a status for each page”

and

“Each page-frame entry in the database also reverse-references its corresponding PTE. This is necessary so that the VM manager can quickly return to the PTE to update its status bits when the status of a page changes. The VM manager is also able to reverse-reference prototype PTEs to update their status changes, but note that the prototype PTE does not reverse-reference any of its corresponding PTEs.”

Based on this information I asked the question that what will happen to this “reverse-reference” in case of process switch, provided that there exist a PTE in process 2 which points to same page. Will it validate it to point to the new process PTE or what?

I hope it make better sense now.

Thanks
Aditya

Sercan_ercan · October 13, 2009, 2:54am

No, it is not possible.
It is the point of protected mode virtual memory management. Make it impossible.

Aditya_Shrivastava · October 13, 2009, 3:29am

>>No, it is not possible. It is the point of protected mode virtual memory management. Make it impossible.

I think it should be possible. I am not stating that at same time two process’s virtual address can point to same page even though it is not shared. I know it is not possible.

A simple scenario to explain could be on a single CPU machine, so at a time only one process’s thread can run , with total X numbers of physical pages available. And two process A and B running and each uses Y no of pages so that Y > X/2, lets say Y is 70% of X. So that means when process A is running and using its Y no of pages they reverse reference to process A’s PTE.

later on when a process context switch happens. CR3 will be filled with page directory of process B, *Now* 70% of page frame entries should reverse reference to PTE of process B as old references are useless with new page directories. Does MM do this on every process switch, isn’t this a overhead.

Thanks
Aditya

anton_bassov · October 13, 2009, 4:01am

> based on my understanding it is possible for two virtual addresses in two different process

to point to same physical PAGE, even though this is *not shared*.

WEll, you seem to have quite peculiar understanding of things - if two virtual addresses in two different process translate to same physical PAGE it automatically implies that the target physical page is shared between these two processes, don’t you think…

< MSDN quotation>

“Each page-frame entry in the database also reverse-references its corresponding PTE. This is necessary so that the VM manager can quickly return to the PTE to update its status bits when the status of a page changes. The VM manager is also able to reverse-reference prototype PTEs to update their status changes, but note that the prototype PTE does not reverse-reference any of its corresponding PTEs.”

MSDN quotation>

I asked the question that what will happen to this “reverse-reference” in case of process
switch, provided that there exist a PTE in process 2 which points to same page. Will it validate
it to point to the new process PTE or what?

Again, the same story - if two PTEs in two different processes point to the same page it means that the page is shared, and, as Pavel already told you, MM handles shared pages differently from non-shared ones. The excerpts that you have quoted refers to handling of non-shared pages…

Anton Bassov

Pavel_Lebedinsky · October 13, 2009, 5:03am

> Thanks for inputs but I did not meant this. based on my understanding

it is possible for two virtual addresses in two different process to point
to same physical PAGE, even though this is *not shared*.
For example process A’s virtual address 0x00100000 and process
B’s virtual address 0x00200000 can point to same physical page.

The only way this scenario can occur without using “normal” shared
memory (which is based on prototype PTEs ) is if a driver maps the
same locked MDL into both processes. Since these pages are locked
and cannot be trimmed, the link from the PFN to the user PTE is not
necessary (it is the driver’s responsibility to keep track of the
mappings it creates and destroy them when appropriate).

–
Pavel Lebedinsky/Windows Kernel Test
This posting is provided “AS IS” with no warranties, and confers no rights.

Pavel_Lebedinsky · October 13, 2009, 5:35am

> A simple scenario to explain could be on a single CPU machine,

so at a time only one process’s thread can run , with total X numbers
of physical pages available. And two process A and B running and
each uses Y no of pages so that Y > X/2, lets say Y is 70% of X.

Let’s say process A has Y valid private pages in its working set.
Each of the corresponding PFNs points back to process A’s PTEs.

If process B has also allocated Y private pages, some of them will
have to be paged out (because there are only X physical pages in
total). When process B runs again and references these pages,
the memory manager might trim some pages from process A,
write their contents to the pagefile, change process A’s PTEs to
point to the corresponding pagefile locations, and finally give the
physical frames to process B, at which point it will also update
PteAddress to point to process B’s PTEs. There will never be a
situation in which both processes have PTEs pointing to the
same PFN.

–
Pavel Lebedinsky/Windows Kernel Test
This posting is provided “AS IS” with no warranties, and confers no rights.

Aditya_Shrivastava · October 13, 2009, 5:55am

@Pavel

>Let’s say process A has Y valid private pages in its working set. Each of the corresponding PFNs points back to process A’s PTEs. …

Thanks, I got it. In my scenario I was not changing Process A’s PTE to point to pagefile and hence wrongly stated that they both point to same page.

Aditya

Prokash_Sinha-1 · October 13, 2009, 9:58am

@Adi,

Now based on your investigation, and seems like you are actually
doing/observing these paging area. It would be nice if you could plot
out what are ( if any documents) you would suggest for someone to read.
( Aside: Yes perhaps weired, I’d been hacking Android on Openmoko and
Beagle board for quite sometime, and it seems like I would be needing a
huge context switching. I will be looking at IOmeter too, once again.
And I might have to come up with some decision as to what would be the
best way - perhaps mini filter to trap every modification to files:
mostly data files, and not related to virus or any reverse-engineering
). Performance is critical here.

@Pavel: You gave some insight about pointing PTE to page file ( when
context switched) instead of PFN. I don’t remember seeing anything in
Windows internals or File system book. Is there any documents that I can
follow?

@Everybody: What is the current status of Registry call back
infrastructure? I mean the legit way of doing it. I can still use
hookers, but I don’t want to go thru that path if I could avoid it.

-pro

xxxxx@gmail.com wrote:

@Pavel

>> Let’s say process A has Y valid private pages in its working set. Each of the corresponding PFNs points back to process A’s PTEs. …
>>

Thanks, I got it. In my scenario I was not changing Process A’s PTE to point to pagefile and hence wrongly stated that they both point to same page.

Aditya

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Maxim_S_Shatskih · October 13, 2009, 11:08am

> What I did not understand here is that why at all a separate data structure is required if complete ?

address space can be accessed using PTEs.

a) Reserve-commit feature
b) PTE arrays are not trees and a hard to find the hole in. VADs are a tree.

PTE of process A. After some time when context change to process B, will MM modifies all page
frame entries to point to PTEs of process B?

MM will remap all PTE tables on process switch.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Prokash_Sinha-1 · October 13, 2009, 11:24am

Maxim S. Shatskih wrote:

> What I did not understand here is that why at all a separate data structure is required if complete ?
> address space can be accessed using PTEs.
>

a) Reserve-commit feature
b) PTE arrays are not trees and a hard to find the hole in. VADs are a tree.

> PTE of process A. After some time when context change to process B, will MM modifies all page
> frame entries to point to PTEs of process B?
>

MM will remap all PTE tables on process switch.

This is what I read, also for large address space ( lot of it might not
be used, accessed), creating all the PTEs a priori does not make sense.
So VAD is essential !!. Also, iirc any reserve only allocation will
create a VAD entry, but a commit will actually create VAD and
corresponding PTE(s) ???

-pro

Ken_Johnson · October 13, 2009, 11:33am

Not necessarily, they may be materialized on the first access.

S

-----Original Message-----
From: Prokash Sinha
Sent: Tuesday, October 13, 2009 8:23
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] About PTEs, VADs and PFN database

Maxim S. Shatskih wrote:
>> What I did not understand here is that why at all a separate data structure is required if complete ?
>> address space can be accessed using PTEs.
>>
>
> a) Reserve-commit feature
> b) PTE arrays are not trees and a hard to find the hole in. VADs are a tree.
>
>
>> PTE of process A. After some time when context change to process B, will MM modifies all page
>> frame entries to point to PTEs of process B?
>>
>
> MM will remap all PTE tables on process switch.
>
>
This is what I read, also for large address space ( lot of it might not
be used, accessed), creating all the PTEs a priori does not make sense.
So VAD is essential !!. Also, iirc any reserve only allocation will
create a VAD entry, but a commit will actually create VAD and
corresponding PTE(s) ???

-pro

—
NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Maxim_S_Shatskih · October 13, 2009, 11:40am

> create a VAD entry, but a commit will actually create VAD and

corresponding PTE(s) ???

IIRC PTEs are created in a page fault path, but the PTEs which are described by VAD are zeroed to force MmAccessFault to consult the VAD tree.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com