Do you mean 150MB? I do not believe it is possible to map 150GB into the
address space; even in Win64, I believe the mapping is limited. But if it
is truly 150GB, and you are seeing this behavior, it sounds like extreme
memory pressure is being exercised. If you have less than 150 GB of
physical memory, somebody has to be paged out. The list you describe
sounds like the LRU list; recent pages are added to the end (an O(1)
operation), and the candidate least-used page is removed from the head.
Here’s the problem: when a page is added to the list, it must first be
removed from the existing list. While a linear search is O(n/2),
pathological paging behavior can force it to O(n). In addition, a lot of
performance tradeoffs are based on “typical” behavior, where n is
typically “small”, for suitable definition of “small”. When you push n up
to the number of pages required to handle 150GB, you have probably
stressed the algorithm far outside its design limits.
I have encountered problems like this many times in my career, and
although the sizes were much smaller, so were the machines.
Now, there are many ways to approach this problem. But the one most under
your control is the file mapping. While there are lots of advantages to
mapping the entire file as contiguous bytes, the performance problems you
are seeing might be mitigated by throwing this assumption out. I have
often commented that optimizing lines of code at the line level, barring
complex inner loops of DSP processing, generally buys you single-digit
percentage improvements; architectural and high-level algorithmic changes
will buy you orders of magnitude performance improvement.
For example, transformingba matrix-multiply of two large matrices to
access the data in a cache-aware fashion, you get a significant
performance improvement; it is not unusual to see factors of 10 to 20.
So, if the fix involves masive work on the part of Microsoft to support
one customer who needs 150GB of mapping, it will probably go into the
Someday, Maybe pile. Don’t expect an improvement; you are many sigmas
from the mean value. So what you are left with is changing te algorithms
you use. If the cost of making the change gives you a high payoff, the
effort may be justified.
The other thing I used to tell my students was “ignore code size. Code
size is irrelevant in modern computers. Instead, worry about the data.
Data will kill you.”
For problems of the massive size you are dealing with, you have to realize
that there may be NO solution that is possible given existing machine
architectures. I was part of an OS performance team in the late 1970s,
and while others did the measurements, I was part of the “evaluation team”
that had to figure out what to do to change the performance. Like
Windows, we were a general-purpose OS whose users pushed the envelope. We
couldn’t solve the four-sigmas-out problems, and you’re far beyond four
sigmas. So you may have to reassess the decision about a single
contiguous mapping, no matter how unpleasant it sounds. I’ve written tens
of thousands of lines of code in the past 50 years solely to get
acceptable performance on machines that were too small. You have just
reinvented that problem, just with scaled-up numbers. As the technology
got bigger and faster, problems have kept pace and have gotten larger.
Your problem just got larger than the technology can support.
The other day, I bought an 8GB SD card for a development machine. The
machine is a 16MHz machine with 256K memory, in a form factor smaller than
one card from my first mainframe. When I started in this profession, 50
years ago, I doubt there was 8GB if you summed up the memory of all the
computers in the world. Every day, we faced the problem you face now.
And we didn’t have easy answers, either. “Big” filles were 100K. “Huge”
files might be 2MB (the limit of physical disk drives). And we might have
2K of buffer space. Modulo scaling, you’ve got the same problem.
Re-examine your design assumptions. It is clear the current ones don’t work.
joe
Odd … my last post seemed to disappear somewhere into the ether.
Anyway, after being dragged off this for a couple of days, I have some
xperf traces:
https://s3.amazonaws.com/random-bitbucket/base.etl
https://s3.amazonaws.com/random-bitbucket/base2.etl
And I’ve also attached a kernel debugger.
The culprit appears to be:
Child-SP RetAddr Call Site
fffff88005f86a88 fffff800
0172b804 nt!MiGetProtoPteAddressExtended
fffff88005f86a90 fffff800
016e5069 nt!MiCheckUserVirtualAddress+0x10c
fffff88005f86ac0 fffff800
016d6cae nt!MmAccessFault+0x249
fffff88005f86c20 00000001
40083750 nt!KiPageFault+0x16e
000000000012bb40 00000000
00000000 0x1`40083750
Stepping through there is is some loop:
fffff8000171ff49 33c0 xor eax,eax fffff800
0171ff4b 488b5c2430 mov rbx,qword ptr [rsp+30h]
fffff8000171ff50 4883c420 add rsp,20h fffff800
0171ff54 5f pop rdi
fffff8000171ff55 c3 ret fffff800
0171ff56 498b5a50 mov rbx,qword ptr [r10+50h]
fffff8000171ff5a 492bd9 sub rbx,r9 fffff800
0171ff5d 48c1fb03 sar rbx,3
fffff8000171ff61 412b5a18 sub ebx,dword ptr [r10+18h] fffff800
0171ff65 03da add ebx,edx
fffff8000171ff67 eb0b jmp nt!MiGetProtoPteAddressExtended+0x68 (fffff800
0171ff74)
fffff8000171ff69 4d8b4010 mov r8,qword ptr [r8+10h] fffff800
0171ff6d 2bd8 sub ebx,eax
fffff8000171ff6f 4d85c0 test r8,r8 fffff800
0171ff72 74d5 je
nt!MiGetProtoPteAddressExtended+0x3d (fffff8000171ff49) fffff800
0171ff74 418b4018 mov eax,dword ptr [r8+18h]
fffff8000171ff78 3bd8 cmp ebx,eax fffff800
0171ff7a 73ed jae
nt!MiGetProtoPteAddressExtended+0x5d (fffff800`0171ff69)
At the top of this loop I have @rbx = 0x2487460 and it doesn’t exit until
@rbx = 158 dropping by 0x@200 each time
It seems to be following some linked list, and what it wants is always at
the end of a growing list. The size in @rbx is ~ the number of pages that
have been touched.
Not sure what the significance of the 0x200 is?
I also get stack traces that look like:
fffff88005f867e8 fffff800
016b4963 nt!DbgBreakPointWithStatus
fffff88005f867f0 fffff800
016e3f41 nt! ?? ::FNODOBFM::string'+0x5d94 fffff880
05f86820 fffff800016f5617 nt!KiSecondaryClockInterrupt+0x131 (TrapFrame @ fffff880
05f86820)
fffff88005f869b0 fffff800
016e5179 nt!MiDispatchFault+0x2e7
fffff88005f86ac0 fffff800
016d6cae nt!MmAccessFault+0x359
fffff88005f86c20 00000001
40083750 nt!KiPageFault+0x16e (TrapFrame @
fffff88005f86c20) 00000000
0012bb40 00000000`00000000 RapidResponse!DFS::Store::WarmUp+0x250
No idea what the FNODOBFM bit is?
Stack traces for when the process is ‘healthy’ do not show
MiGetProtoPteAddressExtended on the stack.
Unfortunately since the file is 150GB in size, and has offsets within it
for addressing I need the whole 150GB to be mapped in a contiguous chunk
of address space. As far as I can tell there is no way to guarantee this
using smaller sized views
NTDEV is sponsored by OSR
Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
OSR is HIRING!! See http://www.osr.com/careers
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer