Physical Page to Virtual Address mapping Q

Gang,
Recently the following question came up with regards to a performance/ “is this the right way to do this” discussion …

Here are the initial conditions: given a series of physical addresses and variable data lengths inside of those pages, and the desire to quickly coalesce and move data out of those pages, is it more efficient to a) assemble an MDL from those PA’s and lengths, let the OS turn that into a VA and range and them memcpy from there or b) memcpy from each PA in turn into the ultimate destination.

So, imagine a series of 4K pages nonpaged memory into which data resides (0 offset, variable length) that needs to be coalesced and copied into another area (say a usermode buffer).

One opinion is that when the OS is given the MDL made up of those pages and lengths it will simply walk the list copying (n) bytes here, (n) bytes there, etc. until the total length to copy has been reached… so, one “copy” operation

Another opinion is that for pages less than a full page that the length does not fall on a 4byte alignment the OS will first need to coalesce these pages onto full pages to achieve the proper alignment and THEN do the copy operation above; so, if the first page had 17 bytes and the second page 21 then the OS would allocate a third page, copy 17 and 21 to that page and THEN do the copy … so, two “copy” operations …

Thoughts? Or, perhaps better, given the initial conditions (passel of kernel mode 4K pages with variable data) and the end result (usermode buffer will all of the KM pages coalesced) and getting done as quickly/ efficiently as possible is there a better way than building an MDL from the KM pages and doing an RtlCopyMemory from the VA from that MDL?

Cheers!

Only the first and last pages can be partial, and only the last page
can not extend to the end of the page.

On 1/16/09, choward@ix.netcom.com wrote:
> Gang,
> Recently the following question came up with regards to a
> performance/ “is this the right way to do this” discussion …
>
> Here are the initial conditions: given a series of physical
> addresses and variable data lengths inside of those pages, and the desire to
> quickly coalesce and move data out of those pages, is it more efficient to
> a) assemble an MDL from those PA’s and lengths, let the OS turn that into a
> VA and range and them memcpy from there or b) memcpy from each PA in turn
> into the ultimate destination.
>
> So, imagine a series of 4K pages nonpaged memory into which data
> resides (0 offset, variable length) that needs to be coalesced and copied
> into another area (say a usermode buffer).
>
> One opinion is that when the OS is given the MDL made up of those
> pages and lengths it will simply walk the list copying (n) bytes here, (n)
> bytes there, etc. until the total length to copy has been reached… so, one
> “copy” operation
>
> Another opinion is that for pages less than a full page that the
> length does not fall on a 4byte alignment the OS will first need to coalesce
> these pages onto full pages to achieve the proper alignment and THEN do the
> copy operation above; so, if the first page had 17 bytes and the second page
> 21 then the OS would allocate a third page, copy 17 and 21 to that page and
> THEN do the copy … so, two “copy” operations …
>
> Thoughts? Or, perhaps better, given the initial conditions (passel
> of kernel mode 4K pages with variable data) and the end result (usermode
> buffer will all of the KM pages coalesced) and getting done as quickly/
> efficiently as possible is there a better way than building an MDL from the
> KM pages and doing an RtlCopyMemory from the VA from that MDL?
>
> Cheers!
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>


Mark Roddy

> Another opinion is that for pages less than a full page that the length does not fall on a 4byte

alignment the OS

Only the first page in the MDL can have space skipped at start.

Only the last page in the MDL can have space skipped at end.

So, if you have some intermediate pages with skips at start/end, forget about MDLs and just use lots of memcpy.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Why physical addresses? Typically, in a driver, you will be dealing with
virtual addresses. You don’t assemble a MDL from physical addresses anyway.
So I don’t understand the question.

A MDL represents a contiguous block of storage, so you would have to have a
list-of-MDLs; you can’t do it with a single MDL.

Are the pages truly in the non-paged pool, or are they just user pages that
are locked down and temporarily non-pageable?

If this were DMA, you could create a scatter-gather list of physical
addresses to program the DMA chip, but you would have to be using a DMA chip
that could handle scatter-gather.

The 4-byte alignment? Did you mean, perhaps, 4K-byte alignment? The OS is
not “given the MDL”; at this point, you *are* the OS. The “other opinion”
wouldn’t make sense, because a MDL is implicitly a single, contiguous
address range. Note that RtlCopyMemory would only copy a contiguous range
of data, so you would walk the list of MDLs and copy the data described by
each to user space. Note that RtlCopyMemory wants a contiguous virtual
address range for both source and destination, and cannot use a MDL as
either argument. But why do you need a MDL at all? If you have addresses,
you copy from addresses to addresses; if the addresses to the source are
kernel addresses, you don’t need a MDL at all. The purpose of a MDL is to
allow you access to user-mode addresses in a process-independent fashion.
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
choward@ix.netcom.com
Sent: Friday, January 16, 2009 9:35 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Physical Page to Virtual Address mapping Q

Gang,
Recently the following question came up with regards to a
performance/ “is this the right way to do this” discussion …

Here are the initial conditions: given a series of physical
addresses and variable data lengths inside of those pages, and the desire to
quickly coalesce and move data out of those pages, is it more efficient to
a) assemble an MDL from those PA’s and lengths, let the OS turn that into a
VA and range and them memcpy from there or b) memcpy from each PA in turn
into the ultimate destination.

So, imagine a series of 4K pages nonpaged memory into which data
resides (0 offset, variable length) that needs to be coalesced and copied
into another area (say a usermode buffer).

One opinion is that when the OS is given the MDL made up of those
pages and lengths it will simply walk the list copying (n) bytes here, (n)
bytes there, etc. until the total length to copy has been reached… so, one
“copy” operation

Another opinion is that for pages less than a full page that the
length does not fall on a 4byte alignment the OS will first need to coalesce
these pages onto full pages to achieve the proper alignment and THEN do the
copy operation above; so, if the first page had 17 bytes and the second page
21 then the OS would allocate a third page, copy 17 and 21 to that page and
THEN do the copy … so, two “copy” operations …

Thoughts? Or, perhaps better, given the initial conditions (passel
of kernel mode 4K pages with variable data) and the end result (usermode
buffer will all of the KM pages coalesced) and getting done as quickly/
efficiently as possible is there a better way than building an MDL from the
KM pages and doing an RtlCopyMemory from the VA from that MDL?

Cheers!


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


This message has been scanned for viruses and dangerous content by
MailScanner, and is believed to be clean.