Mapped files/Physical Memory

We want to be able to search the contents of a file very fast. The file(s)
is(are) on the order of 100MB. We tried file mapping, mapping the entire
file, and this works. But…

In our testing we had dramatically different performance on comparable
machines except machine A has 128 MB and machine B has 256MB. Machine A took
~ 10s to touch each page where machine B took ~ 2s. There are two camps: a)
camp one (me) believes this performance is due to NT swapping; b) camp two
believes that file mapping actually reserves physical memory for the file
and that once mapped the file resides in memory until unmapped.

Questions:

  1. Does file mapping reserve all the physical memory required to hold the
    file or does NT only pre-load some portion of the mapped file?

  2. If file mapping does not reserve all the physical memory required to hold
    the file is there any Win32 API that will allow allocation of physical
    memory (outside of writing a kernel mode driver)?

  3. What choices do we have to allocate physical memory so that we can load
    this file into ram and not have to worry about swapping?

Regards,
Pete

Peter Ellis / KLA-Tencor / (925)245-8649

On August 3, 2000, Peter Ellis wrote:

We want to be able to search the contents of a file very fast.
The file(s) is(are) on the order of 100MB. We tried file mapping,
mapping the entire file, and this works. But…

In our testing we had dramatically different performance on
comparable machines except machine A has 128 MB and machine B
has 256MB. Machine A took ~ 10s to touch each page where machine
B took ~ 2s. There are two camps: a) camp one (me) believes this
performance is due to NT swapping; b) camp two believes that file
mapping actually reserves physical memory for the file and that
once mapped the file resides in memory until unmapped.

Questions:

Peter,

I’ll try to answer some of these. I recommend you look at Inside Windows NT
(2nd edition) to get more info.

  1. Does file mapping reserve all the physical memory required to
    hold the file or does NT only pre-load some portion of the mapped
    file?

No. Basically, the mapped files works just like the rest of virtual memory
with data in it. As you touch pages, they are paged in (plus extra pages as
determined by NT).

  1. If file mapping does not reserve all the physical memory required to
    hold the file is there any Win32 API that will allow allocation of
    physical memory (outside of writing a kernel mode driver)?

IIRC, you can’t force NT to reserve physical memory in user mode.

  1. What choices do we have to allocate physical memory so that we
    can load this file into ram and not have to worry about swapping?

I think you’d have to write a driver.

  • Danilo

At 04:32 PM 08/03/2000 -0700, you wrote:

We want to be able to search the contents of a file very fast. The file(s)
is(are) on the order of 100MB. We tried file mapping, mapping the entire
file, and this works. But…

In our testing we had dramatically different performance on comparable
machines except machine A has 128 MB and machine B has 256MB. Machine A took
~ 10s to touch each page where machine B took ~ 2s. There are two camps: a)
camp one (me) believes this performance is due to NT swapping;

That’s the correct answer.

b) camp two

believes that file mapping actually reserves physical memory for the file
and that once mapped the file resides in memory until unmapped.

Nope. There are only a very few specific instances where physical memory is
permanently allocated like that, and this isn’t one of them. You should
operate on the presumption that you cannot disable NT’s paging subsystem.

Two ideas which might yield incremental performance improvements: open the
file for read-only access, and open it for sequential access. This will give
hints to NT which may permit it to optimize things a bit more.

RLH

First, the behavior you’re talking about here is properly called PAGING, not
swapping. NT doesn’t really swap in the traditional sense.

That said, Danilo is 99% correct. Essentially, when you map a file, the
file becomes the backing store for the mapped region of virtual address
space. It is paged in and out like any other portion of virtual address
space, but instead of being backed up by the paging file it is backed to the
file you mapped.

This is really not a special case. It’s the same way that the various
sections (code, resources, etc.) of .exe’s and .dll’s get mapped. e.g.
LoadLibrary doesn’t really “load” the library, instead it sets up mappings
between the sections of the DLL and the virtual address space. When you
then call a routine from the DLL, the routine entry point (and a few more
pages as determined by the readahead mechanism) gets faulted in from the
DLL.

The 1%: There IS a VirtualLock API. “When I have to lose a page from my
working set due to my incurring a page fault when I’m already at my limit, I
don’t want it to be one of *these* pages.” You can’t lock 100 MB at a time
by default, though. Read the doc on this – you will have to first increase
your min and max working set with the SetProcessWorkingSetSize API, and that
in itself might be enough. You still are unlikely to be able to lock 100 MB
on a 128 MB system. Locking might not help anything anyway, especially if
your pass through the file is one long sequential pass.

It’s still amazing that on machine A it took “10 seconds to touch each page”
where on machine B it only took 2 seconds. No way do page faults take eight
seconds per page! Perhaps your machine is really tight on memory, such that
faulting in the chunk of file you’re looking at pushes the code that’s
looking at it out of the working set, and faulting the code in pushes the
data out, and… Try to reduce the number of different pages you’re using
over time by ensuring the program exhibits good locality of reference, both
to code and to data.

— Jamie Hanrahan, Azius Developer Training
http://www.azius.com/
Windows 2000/98/NT Drivers and Developer Training

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com]On Behalf Of Danilo Almeida
Sent: 2000-August-03 Thursday 16:55
To: NT Developers Interest List
Cc: NT Developers Interest List
Subject: [ntdev] RE: Mapped files/Physical Memory

On August 3, 2000, Peter Ellis wrote:
> We want to be able to search the contents of a file very fast.
> The file(s) is(are) on the order of 100MB. We tried file mapping,
> mapping the entire file, and this works. But…
>
> In our testing we had dramatically different performance on
> comparable machines except machine A has 128 MB and machine B
> has 256MB. Machine A took ~ 10s to touch each page where machine
> B took ~ 2s. There are two camps: a) camp one (me) believes this
> performance is due to NT swapping; b) camp two believes that file
> mapping actually reserves physical memory for the file and that
> once mapped the file resides in memory until unmapped.
>
> Questions:

Peter,

I’ll try to answer some of these. I recommend you look at Inside
Windows NT
(2nd edition) to get more info.

> 1) Does file mapping reserve all the physical memory required to
> hold the file or does NT only pre-load some portion of the mapped
> file?

No. Basically, the mapped files works just like the rest of
virtual memory
with data in it. As you touch pages, they are paged in (plus
extra pages as
determined by NT).

> 2) If file mapping does not reserve all the physical memory required to
> hold the file is there any Win32 API that will allow allocation of
> physical memory (outside of writing a kernel mode driver)?

IIRC, you can’t force NT to reserve physical memory in user mode.

> 3) What choices do we have to allocate physical memory so that we
> can load this file into ram and not have to worry about swapping?

I think you’d have to write a driver.

  • Danilo

You are currently subscribed to ntdev as: xxxxx@cmkrnl.com
To unsubscribe send a blank email to $subst(‘Email.Unsub’)

>We want to be able to search the contents of a file very fast. The file(s)

is(are) on the order of 100MB. We tried file mapping, mapping the entire
file, and this works. But…

Have you considered a more appropriate data structure, like an index of
some kind? Mapping a database and sequentially scanning it is not exactly
an appropriate algorithm unless your searches and data have no structure,
which is rare. You should also consider the ratio of searches to updates.
Does 100MB change every minute, or is the data pretty stable and you just
want to find things. Mapping a database and updating it in place is not
exactly appropriate from an integrity standpoint.

In our testing we had dramatically different performance on comparable
machines except machine A has 128 MB and machine B has 256MB. Machine A took
~ 10s to touch each page where machine B took ~ 2s. There are two camps: a)
camp one (me) believes this performance is due to NT swapping; b) camp two
believes that file mapping actually reserves physical memory for the file
and that once mapped the file resides in memory until unmapped.

Like everybody here says, mapped files are paged. Offhand I think
sequentially scanning 100MB of data that get’s paged each time is very good
performance on NT’s part. The numbers you give are in the range of disk
transfer rates, which means NT is doing a great job of sequentially
clustering page in’s. Look at this way, 100 MB is about 24k+ pages
(assuming 4K page size). A really fast disk can do perhaps 200 accesses/sec
(5 ms average access time). If NT just paged in 4K at a time, scanning your
100 MB would take about 120 seconds. Since your actual performance is 12x
better, it suggests NT is getting the transfer size up to around 48K per
disk transfer. By default, I believe NT doesn’t do transfers bigger than
64KB (don’t know if W2K is different).

So let’s consider how you might speed it up. It sounds like you have no
control over the hardware configuration, so some users will run it on 128
MB systems. Offhand, your only choice seems like a better data
structure/index. If you have control over the hardware configuration, you
might just consider always having 256 MB of memory. Personally, if every
time some program needed to access some 100 MB database, and it took 2
seconds at 100% cpu (and a 100 MB working set), I’d find that pretty rude.
Assuming you have something like a b-tree based index, it should take a
couple (like 2-4) disk accesses to retrieve a piece of data (even into
gigabytes). If you disk is slow, say 10 ms/access, that still should be 20
database lookups/sec, or about 40X faster than the memory resident case you
describe. You might even be able to hash into the data, and get that down
to like 100+ lookup/sec. There has been quite a lot of work done on
searching free form text databases, and the typical performance is rather
better than a sequential scan of all the data.

My suggestion would be to forget about the details of how NT memory mapped
files work, and start focusing on the structure in your data.

  • Jan

Thank you Jamie for pointing out Swapping versus Paging. I hear so many
(especially recovering Unix addicts call everything swapping). :wink: I too
was wondering about the 2 seconds and 10 seconds per “page”. When the
original author says “page” are you referring to a memory page (usually 4KB)
or an entire file?

I would also look at these factors as well:

  1. The relative speed of the disks on the two machines. Since the file
    still resides on disk and is only the paging file for this mapped section,
    the disk speed still plays a part.

  2. The relative level of fragmentation of the files on the disks. This can
    play a large part, although cannot account for the entire difference.

  3. Relative amount of free memory. Use the Task Manager to monitor this
    while your program is running.

Greg

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com]On Behalf Of Jamie Hanrahan
Sent: Thursday, August 03, 2000 7:24 PM
To: NT Developers Interest List
Subject: [ntdev] RE: Mapped files/Physical Memory

First, the behavior you’re talking about here is properly called
PAGING, not
swapping. NT doesn’t really swap in the traditional sense.

That said, Danilo is 99% correct. Essentially, when you map a file, the
file becomes the backing store for the mapped region of virtual address
space. It is paged in and out like any other portion of virtual address
space, but instead of being backed up by the paging file it is
backed to the
file you mapped.

This is really not a special case. It’s the same way that the various
sections (code, resources, etc.) of .exe’s and .dll’s get mapped. e.g.
LoadLibrary doesn’t really “load” the library, instead it sets up mappings
between the sections of the DLL and the virtual address space. When you
then call a routine from the DLL, the routine entry point (and a few more
pages as determined by the readahead mechanism) gets faulted in from the
DLL.

The 1%: There IS a VirtualLock API. “When I have to lose a page from my
working set due to my incurring a page fault when I’m already at
my limit, I
don’t want it to be one of *these* pages.” You can’t lock 100 MB
at a time
by default, though. Read the doc on this – you will have to
first increase
your min and max working set with the SetProcessWorkingSetSize
API, and that
in itself might be enough. You still are unlikely to be able to
lock 100 MB
on a 128 MB system. Locking might not help anything anyway, especially if
your pass through the file is one long sequential pass.

It’s still amazing that on machine A it took “10 seconds to touch
each page”
where on machine B it only took 2 seconds. No way do page faults
take eight
seconds per page! Perhaps your machine is really tight on
memory, such that
faulting in the chunk of file you’re looking at pushes the code that’s
looking at it out of the working set, and faulting the code in pushes the
data out, and… Try to reduce the number of different pages you’re using
over time by ensuring the program exhibits good locality of
reference, both
to code and to data.

— Jamie Hanrahan, Azius Developer Training
http://www.azius.com/
Windows 2000/98/NT Drivers and Developer Training

> -----Original Message-----
> From: xxxxx@lists.osr.com
> [mailto:xxxxx@lists.osr.com]On Behalf Of Danilo Almeida
> Sent: 2000-August-03 Thursday 16:55
> To: NT Developers Interest List
> Cc: NT Developers Interest List
> Subject: [ntdev] RE: Mapped files/Physical Memory
>
>
> On August 3, 2000, Peter Ellis wrote:
> > We want to be able to search the contents of a file very fast.
> > The file(s) is(are) on the order of 100MB. We tried file mapping,
> > mapping the entire file, and this works. But…
> >
> > In our testing we had dramatically different performance on
> > comparable machines except machine A has 128 MB and machine B
> > has 256MB. Machine A took ~ 10s to touch each page where machine
> > B took ~ 2s. There are two camps: a) camp one (me) believes this
> > performance is due to NT swapping; b) camp two believes that file
> > mapping actually reserves physical memory for the file and that
> > once mapped the file resides in memory until unmapped.
> >
> > Questions:
>
> Peter,
>
> I’ll try to answer some of these. I recommend you look at Inside
> Windows NT
> (2nd edition) to get more info.
>
> > 1) Does file mapping reserve all the physical memory required to
> > hold the file or does NT only pre-load some portion of the mapped
> > file?
>
> No. Basically, the mapped files works just like the rest of
> virtual memory
> with data in it. As you touch pages, they are paged in (plus
> extra pages as
> determined by NT).
>
> > 2) If file mapping does not reserve all the physical memory
required to
> > hold the file is there any Win32 API that will allow allocation of
> > physical memory (outside of writing a kernel mode driver)?
>
> IIRC, you can’t force NT to reserve physical memory in user mode.
>
> > 3) What choices do we have to allocate physical memory so that we
> > can load this file into ram and not have to worry about swapping?
>
> I think you’d have to write a driver.
>
> - Danilo
>
>
> —
> You are currently subscribed to ntdev as: xxxxx@cmkrnl.com
> To unsubscribe send a blank email to $subst(‘Email.Unsub’)
>
>


You are currently subscribed to ntdev as: xxxxx@pdq.net
To unsubscribe send a blank email to $subst(‘Email.Unsub’)

> -----Original Message-----

From: Gregory G. Dyess [mailto:xxxxx@pdq.net]
Sent: Friday, August 04, 2000 7:55 AM
To: NT Developers Interest List
Subject: [ntdev] RE: Mapped files/Physical Memory

Thank you Jamie for pointing out Swapping versus Paging. I
hear so many
(especially recovering Unix addicts call everything
swapping). :wink:

Must be a linux thing as most unixes have been paging OS’s for just about
forever :slight_smile:

I too
was wondering about the 2 seconds and 10 seconds per “page”. When the
original author says “page” are you referring to a memory
page (usually 4KB)
or an entire file?

I think that you and Jamie misinterpreted what he was saying. I interpreted
his numbers as 2s to read 100MB vs. 10s to read 100MB. That interpretation
would fit with the 128MB vs. 256MB ram, as there would be far less need to
page-out anything on the larger system, while a 128MB system is clearly not
going to contain the entire 100MB file in ram.

I slightly mis-spoke. Most recovering Unix-oids (and OS/2 people) “refer”
to paging as swapping, not that Unix does all swapping. My appologies for
the confusion.

Greg

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com]On Behalf Of Roddy, Mark
Sent: Friday, August 04, 2000 8:33 AM
To: NT Developers Interest List
Subject: [ntdev] RE: Mapped files/Physical Memory

> -----Original Message-----
> From: Gregory G. Dyess [mailto:xxxxx@pdq.net]
> Sent: Friday, August 04, 2000 7:55 AM
> To: NT Developers Interest List
> Subject: [ntdev] RE: Mapped files/Physical Memory
>
>
> Thank you Jamie for pointing out Swapping versus Paging. I
> hear so many
> (especially recovering Unix addicts call everything
> swapping). :wink:

Must be a linux thing as most unixes have been paging OS’s for just about
forever :slight_smile:

>I too
> was wondering about the 2 seconds and 10 seconds per “page”. When the
> original author says “page” are you referring to a memory
> page (usually 4KB)
> or an entire file?
>
>

I think that you and Jamie misinterpreted what he was saying. I
interpreted
his numbers as 2s to read 100MB vs. 10s to read 100MB. That interpretation
would fit with the 128MB vs. 256MB ram, as there would be far less need to
page-out anything on the larger system, while a 128MB system is
clearly not
going to contain the entire 100MB file in ram.


You are currently subscribed to ntdev as: xxxxx@pdq.net
To unsubscribe send a blank email to $subst(‘Email.Unsub’)

Thanks for all the comments/suggestions.

The goal is to search the file and find the information as fast as possible
by whatever means. Don’t fret we don’t make consumer software, we make a
tool for a single purpose, to run our application.

The times I mentioned are the total time to open, map, touch each 4k page,
unmap, and close file.

Regards,
Pete

Peter Ellis / KLA-Tencor / (925)245-8649

The fastest way to do this would be to use VirtualAlloc (and VirtualLock)
and use two 64KB buffers. Create a handle to the file using
FILE_FLAG_NO_BUFFERING if the data is unlikely to be read frequently, or
FILE_FLAG_SEQUENTIAL_SCAN if it is. It is probably better to do the former
so that the cache of the system as a whole is not impacted, otherwise you’re
gonna cause every other app on the system to page. Create an IOCP and
pre-load the two buffers. While one buffer is being searched, the other
buffer will be used to read the next block of data. This design will let
you read data around 15-20MB/sec on a moderate system. Make sure the search
algorithm is efficient so that it is not a bottleneck. Also, file
fragmentation will seriously impact performance.

Regards,

Paul Bunn, UltraBac.com, 425-644-6000
Microsoft MVP - WindowsNT/2000
http://www.ultrabac.com

-----Original Message-----
From: Ellis, Peter [mailto:xxxxx@kla-tencor.com]
Sent: Friday, August 04, 2000 9:12 AM
To: NT Developers Interest List
Subject: [ntdev] RE: Mapped files/Physical Memory

Thanks for all the comments/suggestions.

The goal is to search the file and find the information as fast as possible
by whatever means. Don’t fret we don’t make consumer software, we make a
tool for a single purpose, to run our application.

The times I mentioned are the total time to open, map, touch each 4k page,
unmap, and close file.

Regards,
Pete

Peter Ellis / KLA-Tencor / (925)245-8649


You are currently subscribed to ntdev as: xxxxx@ultrabac.com
To unsubscribe send a blank email to $subst(‘Email.Unsub’)

> 1) Does file mapping reserve all the physical memory required to hold the

file or does NT only pre-load some portion of the mapped file?

The file mapping creates a structure called “segment” (MmSt) in the paged
pool.
The segment has a header + a 4byte word (prototype PTE) for each page in
the file. For 100MB file, the segment will be 25KB in size - not so
bothering.
The physical memory is not reserved, surely. Physical pages are allocated
during page faults on the mapping and then put on reuse lists by the usual
MM
algorithm described by Helen Custer and David Solomon.
Page faults on mapped files are clustered - the cluster size is about 64KB.
So, up to 16 pages will be brought in to the physical memory on each page
fault on the file. This is the only preloading done.

  1. If file mapping does not reserve all the physical memory required to
    hold
    the file is there any Win32 API that will allow allocation of physical
    memory (outside of writing a kernel mode driver)?

No. VirtualLock is possibly the nearest thing to this.

  1. What choices do we have to allocate physical memory so that we can load
    this file into ram and not have to worry about swapping?

Try calling VirtualLock on the whole mapping. This will be VERY long - to
read
100MB from the disk to the memory.

Max

> file for read-only access, and open it for sequential access. This will
give

hints to NT which may permit it to optimize things a bit more.

The latter one will not influence the mapped file performance - only the
ReadFile one (which goes through the cache manager).

Working with 100MB mapping on 128MB physical memory will surely cause
swapping (and maybe even thrashing). I have large doubts that this can be
solved.

Max

> Must be a linux thing as most unixes have been paging OS’s for just about

forever :slight_smile:

Linux has paging too. The main differences from NT’s paging:

  • what is VAD in NT is an object with virtual method table (called VMA) in
    Linux. Page faults are resolved by calling the VMA’s methods.
  • Linux has a single zero page in the system. All newly allocated pages
    have
    PTEs pointing to this zero page and are copy-on-write. The real page is
    allocated and zeroed only during COW fault on this PTE.
    NT uses a list of zeroed pages and a special thread which zeroes them.
  • NT uses prototype PTE tables (segments) to implement shared pages.
    Linux uses some degenerate form of this only for System V shared memory -
    NOT for mapped files. For mapped files and pages shared by fork(), Linux
    uses hash lists of the pages indexed by (file descriptor, offset) pair to
    resolve
    the shared page fault. The side effect is that Linux has to perform
    reference
    counting on the swap device locations - not so on NT.
  • Linux has no working set concept. The trimming is performed by kswapd
    thread which is awaken when the OS is low of physical memory and scan the
    page tables of all processes in the OS to make pages swappable - then the
    kflushd thread will write them to the disk. The only replacement policy in
    Linux
    is “young” bit in the PTE. The PTE is marked young when it is made valid.
    Then kswapd thread swaps out the pages which are NOT young and mark the
    young pages as not young.
    NT’s working set IIRC maintains a FIFO list for a pages of a given address
    space. First faulted in page will be first to outswap (to go to
    standby/modified
    list).

Max

Using IOCP form of overlapped reads will not be as good as MMF, unless you
disable caching. If you are reading from a cached file, ReadFile() checks
to see if the data is in the cache, and if it is, copies it to the user
buffer. If it is not, it queues a work queue item and returns
pending. Later a system thread will synchronously fault in the desired
pages, and complete the IRP. The additional memory to memory data copy
adds unneeded overhead to the process. If you disable caching, you will
eliminate that copy, but at the same time, you eliminate the benefits of
caching. I think the best solution is to use 64k or so mapped views, and
VirtualLock() them into memory. This will of course, block the calling
thread until the IO is completed, so you will need a few threads, but it
eliminates the memory to memory copy without eliminating caching. It would
have been really nice of the kernel to support an async version of
VirtualLock() for this kind of thing.

At 09:47 AM 8/4/00 -0700, you wrote:

The fastest way to do this would be to use VirtualAlloc (and VirtualLock)
and use two 64KB buffers. Create a handle to the file using
FILE_FLAG_NO_BUFFERING if the data is unlikely to be read frequently, or
FILE_FLAG_SEQUENTIAL_SCAN if it is. It is probably better to do the former
so that the cache of the system as a whole is not impacted, otherwise you’re
gonna cause every other app on the system to page. Create an IOCP and
pre-load the two buffers. While one buffer is being searched, the other
buffer will be used to read the next block of data. This design will let
you read data around 15-20MB/sec on a moderate system. Make sure the search
algorithm is efficient so that it is not a bottleneck. Also, file
fragmentation will seriously impact performance.

Regards,

Paul Bunn, UltraBac.com, 425-644-6000
Microsoft MVP - WindowsNT/2000
http://www.ultrabac.com

-----Original Message-----
From: Ellis, Peter [mailto:xxxxx@kla-tencor.com]
Sent: Friday, August 04, 2000 9:12 AM
To: NT Developers Interest List
Subject: [ntdev] RE: Mapped files/Physical Memory

Thanks for all the comments/suggestions.

The goal is to search the file and find the information as fast as possible
by whatever means. Don’t fret we don’t make consumer software, we make a
tool for a single purpose, to run our application.

The times I mentioned are the total time to open, map, touch each 4k page,
unmap, and close file.

Regards,
Pete

Peter Ellis / KLA-Tencor / (925)245-8649


You are currently subscribed to ntdev as: xxxxx@ultrabac.com
To unsubscribe send a blank email to $subst(‘Email.Unsub’)


You are currently subscribed to ntdev as: xxxxx@iag.net
To unsubscribe send a blank email to $subst(‘Email.Unsub’)

–>Phillip Susi
xxxxx@iag.net

Reading a 100MB file and having NT’s cache manager attempt to cache it is
non-sensical (unless we’re talking about a system with gobs of RAM),
prinicipally:
* additional overhead of searching cache/making new cache entries
* pollution of true commonly-accessed data being paged out

IOCP will deliver better performance, and will not have the horrendous
negative impact of having NT attempt to cache the data. On NT4 (haven’t
investigated on Win2K) the situation is really dire – NT’s cache manager
will simply attempt to grow the cache from more VM, resulting in horrendous
thrashing when performing large read (or write) operations with large
amounts of data.

Regards,

Paul Bunn, UltraBac.com, 425-644-6000
Microsoft MVP - WindowsNT/2000
http://www.ultrabac.com

-----Original Message-----
From: Phillip Susi [mailto:xxxxx@iag.net]
Sent: Monday, August 07, 2000 4:49 PM
To: NT Developers Interest List
Subject: [ntdev] RE: Mapped files/Physical Memory

Using IOCP form of overlapped reads will not be as good as MMF, unless you
disable caching. If you are reading from a cached file, ReadFile() checks
to see if the data is in the cache, and if it is, copies it to the user
buffer. If it is not, it queues a work queue item and returns
pending. Later a system thread will synchronously fault in the desired
pages, and complete the IRP. The additional memory to memory data copy
adds unneeded overhead to the process. If you disable caching, you will
eliminate that copy, but at the same time, you eliminate the benefits of
caching. I think the best solution is to use 64k or so mapped views, and
VirtualLock() them into memory. This will of course, block the calling
thread until the IO is completed, so you will need a few threads, but it
eliminates the memory to memory copy without eliminating caching. It would
have been really nice of the kernel to support an async version of
VirtualLock() for this kind of thing.

At 09:47 AM 8/4/00 -0700, you wrote:

The fastest way to do this would be to use VirtualAlloc (and VirtualLock)
and use two 64KB buffers. Create a handle to the file using
FILE_FLAG_NO_BUFFERING if the data is unlikely to be read frequently, or
FILE_FLAG_SEQUENTIAL_SCAN if it is. It is probably better to do the former
so that the cache of the system as a whole is not impacted, otherwise
you’re
gonna cause every other app on the system to page. Create an IOCP and
pre-load the two buffers. While one buffer is being searched, the other
buffer will be used to read the next block of data. This design will let
you read data around 15-20MB/sec on a moderate system. Make sure the
search
algorithm is efficient so that it is not a bottleneck. Also, file
fragmentation will seriously impact performance.

Regards,

Paul Bunn, UltraBac.com, 425-644-6000
Microsoft MVP - WindowsNT/2000
http://www.ultrabac.com

-----Original Message-----
From: Ellis, Peter [mailto:xxxxx@kla-tencor.com]
Sent: Friday, August 04, 2000 9:12 AM
To: NT Developers Interest List
Subject: [ntdev] RE: Mapped files/Physical Memory

Thanks for all the comments/suggestions.

The goal is to search the file and find the information as fast as possible
by whatever means. Don’t fret we don’t make consumer software, we make a
tool for a single purpose, to run our application.

The times I mentioned are the total time to open, map, touch each 4k page,
unmap, and close file.

Note Bene: IOCP alone will not, you must disable the caching explicitly
when you open the file. You will need to decide wether the files you are
accessing are too large to allow the system to cache. Then you may want to
come up with your own caching scheme that would do a better job that the
generic filesystem cache, unless you truly are scanning the data once and
only once.

At 04:58 PM 8/7/00 -0700, you wrote:

Reading a 100MB file and having NT’s cache manager attempt to cache it is
non-sensical (unless we’re talking about a system with gobs of RAM),
prinicipally:
* additional overhead of searching cache/making new cache entries
* pollution of true commonly-accessed data being paged out

IOCP will deliver better performance, and will not have the horrendous
negative impact of having NT attempt to cache the data. On NT4 (haven’t
investigated on Win2K) the situation is really dire – NT’s cache manager
will simply attempt to grow the cache from more VM, resulting in horrendous
thrashing when performing large read (or write) operations with large
amounts of data.

Regards,

Paul Bunn, UltraBac.com, 425-644-6000
Microsoft MVP - WindowsNT/2000
http://www.ultrabac.com

-----Original Message-----
From: Phillip Susi [mailto:xxxxx@iag.net]
Sent: Monday, August 07, 2000 4:49 PM
To: NT Developers Interest List
Subject: [ntdev] RE: Mapped files/Physical Memory

Using IOCP form of overlapped reads will not be as good as MMF, unless you
disable caching. If you are reading from a cached file, ReadFile() checks
to see if the data is in the cache, and if it is, copies it to the user
buffer. If it is not, it queues a work queue item and returns
pending. Later a system thread will synchronously fault in the desired
pages, and complete the IRP. The additional memory to memory data copy
adds unneeded overhead to the process. If you disable caching, you will
eliminate that copy, but at the same time, you eliminate the benefits of
caching. I think the best solution is to use 64k or so mapped views, and
VirtualLock() them into memory. This will of course, block the calling
thread until the IO is completed, so you will need a few threads, but it
eliminates the memory to memory copy without eliminating caching. It would
have been really nice of the kernel to support an async version of
VirtualLock() for this kind of thing.

At 09:47 AM 8/4/00 -0700, you wrote:
>The fastest way to do this would be to use VirtualAlloc (and VirtualLock)
>and use two 64KB buffers. Create a handle to the file using
>FILE_FLAG_NO_BUFFERING if the data is unlikely to be read frequently, or
>FILE_FLAG_SEQUENTIAL_SCAN if it is. It is probably better to do the former
>so that the cache of the system as a whole is not impacted, otherwise
you’re
>gonna cause every other app on the system to page. Create an IOCP and
>pre-load the two buffers. While one buffer is being searched, the other
>buffer will be used to read the next block of data. This design will let
>you read data around 15-20MB/sec on a moderate system. Make sure the
search
>algorithm is efficient so that it is not a bottleneck. Also, file
>fragmentation will seriously impact performance.
>
>Regards,
>
>Paul Bunn, UltraBac.com, 425-644-6000
>Microsoft MVP - WindowsNT/2000
>http://www.ultrabac.com
>
>
>-----Original Message-----
>From: Ellis, Peter [mailto:xxxxx@kla-tencor.com]
>Sent: Friday, August 04, 2000 9:12 AM
>To: NT Developers Interest List
>Subject: [ntdev] RE: Mapped files/Physical Memory
>
>
>Thanks for all the comments/suggestions.
>
>The goal is to search the file and find the information as fast as possible
>by whatever means. Don’t fret we don’t make consumer software, we make a
>tool for a single purpose, to run our application.
>
>The times I mentioned are the total time to open, map, touch each 4k page,
>unmap, and close file.


You are currently subscribed to ntdev as: xxxxx@iag.net
To unsubscribe send a blank email to $subst(‘Email.Unsub’)

–>Phillip Susi
xxxxx@iag.net