Concatenate a large number of physical memory buffers to one large virtual buffer

OSR_Community_User · February 25, 2014, 4:16am

Sometimes, it is important to call a spade a spade, or a f***ing shovel.
Too many p-baked (p < 0.1) get handed down from “on high” by people who
are remarkably unqualified to do any kind of design. They expect the
programmer at the bottom of the hierarchy to somehow magically make their
dreams (or, in some cases, what appears to be LSD-induced visions) into
deliverable reality. In such cases, it can help that poor person at the
bottom convince his or her manager that the desired goal is impossible, or
the design is just completely wrong for Windows, or that the design cannot
work in Windows < N or Windows >= N (for example, unsigned drivers in
Windows 8.1, or a design that requires DDI calls not found on XP). The
goal is that the programmer can go back up the chain with evidence (“the
experts say this is insane, and here’s the proof”) so that some manager
who programmed in MS-DOS or some RTOS of the 1970s understands that the
design simply cannot be made to work (or work reliably, or robustly, or in
Windows XP, or in Windows 8.1). If the goals are insanely impossible,
that’s important, too. It saves getting a bad performance review for
failing to accomplish the impossible. Or establishes a basis for an NLRB
appeal over unfair firing (I’ve had two friends go through this. One won,
one lost. The one who lost had no proof that he had been asked to build
something that is perhaps three PhD dissertations from having the problem
space defined, never mind the solution implemented).

Calling a design insane is not necessarily an attack on the OP, who has
been costrained to implement a specific design, but an attack on the
designer. And yes, if the OP is the designer, it is an attack on the OP.
Some programmers think they can treat Windows just like their favorite
RTOS, perhaps (or almost certainly) one that did not use virtual memory.
The number of times I had programmers in my classes who had NO IDEA what
“virtual memory” was or how it worked was frighteningly high. Some had
come from a background of RTOS real-mode programming, but a greater number
were just people who knew that the name existed, and allowed their
programs to run, but had no idea about the underlying mechanisms. And
were trying to do things that required this understanding. They also
don’t understand the difference between “it works” and “it works on my
machine, with no user load, massive amounts of physical memory, and a
single-threaded app, on Version N of Windows” and confuse this with “I
have created a deliverable product that will perform correctly and to spec
on all customer machines, including heavily-loaded servers”. I used to
say things like, “If you haven’t tested your {driver, app} on a multicore
machine, you haven’t tested it”. Now that multicore machines are no
longer exotic (my mother, at age 90, had a multicore machine), I would say
things like “If you haven’t tested your 32-bit driver with a 3GB user
partition, you haven’t tested your driver” and “if you haven’t tested your
driver by opening it for async I/O, you haven’t tested it”, and for shared
devices, “If you have not had at least three concurrent multithreaded apps
sending requests to your driver, you haven’t tested it”. The number of
failures I’ve seen for these reasons is far too high, and might have been
caught if the designs had been posted here and the OP had been told the
design was completely wrong.
joe

Peter,

Such criticism isn’t helpful, or even relevant, to the discussion here. If
you want to scold the OP for something in his post, that’s fine. Heck, I
did that. If you want to beat him over his marketing department’s choice
of words or his product’s advertised goals…

Actually, It has absolutely nothing to do with " his marketing
department’s choice of words"…

My point is that the OP may be simply trying to comply with the request
that is unreasonable in itself,and, as a result, tries to do things that
may be simply ridiculous from the technical standpoint. In your
terminology, he may be requested to stick wings to a pig and has no chance
to explain to his boss that pigs don’t fly (you wrote a good article about
it few years ago in NT Insider). Some years ago I worked with a client
like that so that I know what it is like…

Anton Bassov

NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · February 25, 2014, 4:44am

>> You may be reading more into “under all conditions” than they intended

>

Please note that the OP says that he “has to be able to capture 40 Gb/s
without any packetloss.”, and the docs on their website emphasize that
their adapter is able to work without packet loss under any circumstances.
Needless to say that this requirement is simply unrealistic. For example,
consider what happens if the system is low on RAM , say, because of the
heavy network traffic. It just has no option other than to start dropping
packets, right. However, the OP maintains that he cannot afford to lose
packets…

Having finally found the posting that generated all the flak, I don’t see
anything wrong here. If the device /does/ start to lose packets, whose
neck is on the block? Not the marketing department; it is the poor
programmer who failed to write a driver that lived up to its hype. So it
is important to that programmer to know that there /are/ conditions under
which packet loss is inevitable. So when the tech support people get a
complaint about packet loss, they know what to ask for in terms of system
load and resources, and (internally) the marketing people get dinged for
having promised the impossible, and not the programmer, for having failed
to implement the impossible.
joe

Anton Bassov

NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Tim_Roberts · February 25, 2014, 1:25pm

xxxxx@napatech.com wrote:

There is a reason for my comment in the beginning. Looking around the forum a lot of people ask questions. A lot of people answering the questions with comment about ?t is nuts, ridiculous etc. Please if you don’t know how to help, then don’t write an answer. Fortunately a lot of people wants to help. I appreciate that.

Well, in fairness to us, allow me to share with you the process that
often leads to [ntdev] questions.

Someone has a problem. They don’t know quite what the problem is, or
exactly what caused it, but they know a problem has to be solved. Very
often, they will find some important-seeming factoid and latch onto it
for dear life, chasing it until they have no other clue. Then, they
come to ask us some about some arcane aspect of this very narrow issue.
We spend days offering suggestions and solution for this very narrow
issue. And then, usually quite accidentally, we will learn that the
actual problem was something quite different, and often something that
can be solved in a very straightforward way, once you know the whole story.

Peter calls this the “gluing wings on a pig” problem. People spend days
debating the best type of glue to use for gluing wings on a pig, when in
the end what they really wanted to do was transport the pig to another
farm, something that has a much less exotic solution.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

David_R_Cattley · February 25, 2014, 1:30pm

as bacon.

Cheers,
Dave Cattley

Peter_Viscarola_OSR · February 25, 2014, 5:35pm

We should make this required reading. Please read the following 2.5 page essay, and correctly answer five questions about it’s content, before you are allowed to join NTDEV.

Anyhow:

http://www.osronline.com/downloads/pp_asking.pdf

ACTUALLY, in the story the OP doesn’t want to transport the pig at all. It’s worse. HE wants to go to the neighboring farm for what I judge to be a very good and valuable reason.

Peter
OSR
@OSRDrivers

Pavel_Lebedinsky · February 25, 2014, 7:53pm

The problem with large pages is that they are not always available. So for example you might be able to allocate 64 GB of large pages shortly after boot (e.g. from a service) but then if you stop the service, then do something which can fragment memory, and then try to start the service again, you might only be able to get a fraction of the original number of large pages.

You could try to work around this by creating a very simple service that will never terminate, and whose only purpose is to allocate a SEC_LARGE_PAGES shared memory section. Then the actual worker process can map this section into its address space whenever it needs to.

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@broadcom.com
Sent: Sunday, February 23, 2014 6:07 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Concatenate a large number of physical memory buffers to one large virtual buffer

Then subdivide it in a linear way to 4MB chunks, and call overlapped ReadFile
once per each 4MB chunk

The OP’s adapter doesn’t support SG.

Anyway, if he can live with 2MB chunks, he could use large page allocation for the region:

VirtualAlloc(NULL, 0x1000000000ULL, MEM_COMMIT | MEM_RESERVE |MEM_LARGE_PAGES, PAGE_READWRITE);

Pavel_Lebedinsky · February 25, 2014, 8:28pm

I’m not convinced the overhead of managing non-contiguous buffers would be significant compared to the amount of work required to transfer 4 MB of data… But if you really need a single contiguous buffer I would suggest using VirtualQuery to find the largest free region in your process address space, then mapping your 4 MB chinks somewhere in the middle of that region.

If you simply reserve and then free some memory, it is quite possible that other allocations that happen in your process (some of which you can’t prevent - for example various monitoring tools can inject threads or allocate memory into your process, system DLLs can have worker threads that might decide to allocate memory in parallel with your code running etc.) will use a portion of the region you just freed.

If you pick a free address using VirtualQuery the probability of a conflict is much lower (especially on a 64 bit system), but still technically not zero. You might want to add a retry loop so that if there is a conflict you unmap everything and start over.

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@napatech.com
Sent: Monday, February 24, 2014 12:45 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Concatenate a large number of physical memory buffers to one large virtual buffer

Alex: Yes I am using ZwVirtualAlloc without commit, but I have to free the allocated memory again before using MmMapLockedPagesSpecifyCache/UserMode, but I have a free area that I can use. Otherwise it will not work. The problem is that I need to be sure that no other process is allocating in the free area I have found. When using ZwVirtualAlloc I am not able to raise run level as it requires PASSIVE_LEVEL, so I not sure how to prevent other APC/PASSIVE processes to run while I am mapping.

Peter; As Alex says with MmAllocateMappingAddress it is only possible to map using the address returnd. You cannot map from within the buffer. Have been there.

Mark_Roddy · February 25, 2014, 8:55pm

The op is dealing with an embedded system where presumably they can control
what starts when and does what.

Mark Roddy

On Tue, Feb 25, 2014 at 7:52 PM, Pavel Lebedynskiy wrote:

> The problem with large pages is that they are not always available. So for
> example you might be able to allocate 64 GB of large pages shortly after
> boot (e.g. from a service) but then if you stop the service, then do
> something which can fragment memory, and then try to start the service
> again, you might only be able to get a fraction of the original number of
> large pages.
>
> You could try to work around this by creating a very simple service that
> will never terminate, and whose only purpose is to allocate a
> SEC_LARGE_PAGES shared memory section. Then the actual worker process can
> map this section into its address space whenever it needs to.
>
> -----Original Message-----
> From: xxxxx@lists.osr.com [mailto:
> xxxxx@lists.osr.com] On Behalf Of xxxxx@broadcom.com
> Sent: Sunday, February 23, 2014 6:07 PM
> To: Windows System Software Devs Interest List
> Subject: RE:[ntdev] Concatenate a large number of physical memory buffers
> to one large virtual buffer
>
> >Then subdivide it in a linear way to 4MB chunks, and call overlapped
> ReadFile
> once per each 4MB chunk
>
> The OP’s adapter doesn’t support SG.
>
> Anyway, if he can live with 2MB chunks, he could use large page allocation
> for the region:
>
> VirtualAlloc(NULL, 0x1000000000ULL, MEM_COMMIT | MEM_RESERVE
> |MEM_LARGE_PAGES, PAGE_READWRITE);
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
> OSR is HIRING!! See http://www.osr.com/careers
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

OSR_Community_User · February 26, 2014, 1:16am

> The problem with large pages is that they are not always available. So for

example you might be able to allocate 64 GB of large pages shortly after
boot (e.g. from a service) but then if you stop the service, then do
something which can fragment memory, and then try to start the service
again, you might only be able to get a fraction of the original number of
large pages.

You could try to work around this by creating a very simple service that
will never terminate, and whose only purpose is to allocate a
SEC_LARGE_PAGES shared memory section. Then the actual worker process can
map this section into its address space whenever it needs to.

I think you have to create the name of the segment in the \Global
namespace; by default, an unqualified name will go into \Local. Note also
that \Global and \Local are, unlike nearly every other name in Windows,
case-sensitive.
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@broadcom.com
Sent: Sunday, February 23, 2014 6:07 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Concatenate a large number of physical memory buffers
to one large virtual buffer

>Then subdivide it in a linear way to 4MB chunks, and call overlapped
> ReadFile
once per each 4MB chunk

The OP’s adapter doesn’t support SG.

Anyway, if he can live with 2MB chunks, he could use large page allocation
for the region:

VirtualAlloc(NULL, 0x1000000000ULL, MEM_COMMIT | MEM_RESERVE
|MEM_LARGE_PAGES, PAGE_READWRITE);

NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · February 26, 2014, 8:21am

At risk of returning to the matter in hand…

A while ago I had to represent some discontiguous regions as one bigger region (i.e. map the same physical memory to a contiguous virtual address). Comment 6 on http://www.osronline.com/showthread.cfm?link=237506 led me to a workable solution.

addr = MmAllocateMappingAddress(length)
mdl = IoAllocateMdl(addr, …)
pfns = MmGetMdlPfnArray(mdl)
for (…)
copy PFNs from many smaller regiions into this mdl
mappedAddr = MmMapLockedPagesWithReservedMapping(addr, TAG, mdl, MmCached);

Requires the original memory to be locked down, and probably some other caveats that I can’t remember.

NB take note of the documentation on MmMapLockedPagesWithReservedMapping:-
“Note that the virtual address returned by this function does include the byte offset that the MDL specifies. However, the MappedSystemVa field of the MDL that is set by this function does not include the byte offset”