How big is “Big”? This definition changes dynamically and has to be
interpreted in the context of its time and its environment.
In the MFC newsgroup, I regularly got questions like this one:
OP: I have this HUGE file, and I have to do a lot of forward and backward
scanning of the file to do searches. How can I handle this when the
string I’m searching for crosses a buffer boundary? I find it very hard
to get my head around how to do this optimally when doing backward scans.
Me: “HUGE” is not a number. File sizes are expressed in number of bytes.
Please explain what “HUGE” is.
OP: 10MB max, and most files will be somewhat smaller, but no smaller than
about 5MB
Me: Oh, you said HUGE when you meant TINY. The algorithm is:
Ask how big the file is
Get a buffer of that size
Read the entire file into it
Write your code, knowing everything is in memory
OR
Memory-map the file into your address space
Write your code, knowing everything is, or will be as needed, in memory
OP: But that means I would be using up 10MB of memory!
Me: How much memory do you have on your machine?
OP: 4GB
Me: 10MB/4GB = ? Why are you concerned with using a fraction of a percent
of your memory?
OP: But the file is still huge
Me: And how big is your virtual address space? 2GB? 3GB? Do the same
arithmetic. You’re still using a fraction of a percent of your address
space.
OP: So why isn’t a 10MB file considered huge?
Me: HUGE files are expressed in GB to TB. BIG files are expressed in
multiples of 100MB. SMALL files are under 100 MB. 10MB is TINY.
Anything under 1MB is INFINITESIMAL. This is not a PDP-11 in 1975, this is
Windows machine in (year > 2000). The tradeoffs are different. Why
write and debug complex code that does not need to exist?
I did not track how many times I had conversations like this on the
newsgroup, but it happened with alarming frequency.
joe
PRECISELY!
The data being written is ALREADY in memory. Unless you’re advocating
writing the data serially (in which case you just opted for a *much*
slower set of operations), you’ll have precisely the same amount of memory
pinned and in use simultaneously regardless of the number of discrete
transfers involved, right?So, by requiring smaller transfers, all you get is MORE transfers that are
smaller. MORE calls to IoCallDriver (for each device object in the
stack). MORE IRPs. More… everything.People are all hung-up on 2GB because it sounds so big… but as Dr.
Newcomer said “if there is 2GB free in the 64GB memory, why not use it” –
If the machine has 64GB of memory, pinning just over 3% of that memory is
actually pretty reasonable, right?I *hear* Mr. Grig’s argument that cache thrashing is not good… and I
agree that it would be *highly* advantageous to have a CopyFile API that
did something other than read followed by write. But condemning large
I/Os because they can be used for copy file operations seems to me to be
“throwing the baby out with the bath water” (does that English-language
saying make sense to everyone?). S’pose I’m capturing data from a
collection of high-speed satellite links. I want to be able to collect
and write that data as quickly as possible. If I can get 2GB writes, that
can ONLY be a good thing in my book.Peter
OSR
NTDEV is sponsored by OSR
OSR is HIRING!! See http://www.osr.com/careers
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminarsTo unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer