Can Ramdisk can be used for pagefile disk?

Firstly, the purpose is just for a test according to the hardware designer’s suggestion.
In WDK 7600, there is a ramdisk sample, when set the size as 31MB, then i can set the pagefile on it(16MB~20MB), when reboot the PC(with windows7 on it) as required, there is no pagefile.sys on the ramdisk drive. Then I found that the ZwCreatePagingFile is called much earlier than the Ramdisk driver’s DriverEntry routine. I have changed the Ramdisk driver starttype as 0.
My questions are:
1 Can Ramdisk be used for the pagefile drive, if the ramdisk driver can be loaded very earlier(I guess it is hard to be loaded before the pagefile on the drive is created)?
2 How to let the ramdisk driver to be loaded before the pagefile on the drive is created?

Thanks in advance.
Leo

Why would you waste RAM on a RAM disk page file? The user’s RAM is best used by actual applications and Windows will automatically, and quite efficiently, use any “spare” RAM for disk caching (a popular feature Microsoft introduced about 20 years ago, while RAM disks are very strongly deprecated).

Tim.

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of activism@163.com
Sent: 26 November 2012 10:53
To: Windows System Software Devs Interest List
Subject: [ntdev] Can Ramdisk can be used for pagefile disk?

Firstly, the purpose is just for a test according to the hardware designer’s suggestion.
In WDK 7600, there is a ramdisk sample, when set the size as 31MB, then i can set the pagefile on it(16MB~20MB), when reboot the PC(with windows7 on it) as required, there is no pagefile.sys on the ramdisk drive. Then I found that the ZwCreatePagingFile is called much earlier than the Ramdisk driver’s DriverEntry routine. I have changed the Ramdisk driver starttype as 0.
My questions are:
1 Can Ramdisk be used for the pagefile drive, if the ramdisk driver can be loaded very earlier(I guess it is hard to be loaded before the pagefile on the drive is created)?
2 How to let the ramdisk driver to be loaded before the pagefile on the drive is created?

Thanks in advance.
Leo


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

I guess I’m missing the point here. If you use a RAMdisk for a paging
file, then the total address space used by all processes must be less than
or equal to the total size of physical memory. This means that you don’t
need a paging file at all! So why in the world would you want to slow
performance simply by requiring memory-to-memory copies? Do the
arithmetic! What would RAMdisk paging buy you? It would merely reduce
performance without contributing anything else!
joe

Firstly, the purpose is just for a test according to the hardware
designer’s suggestion.
In WDK 7600, there is a ramdisk sample, when set the size as 31MB, then i
can set the pagefile on it(16MB~20MB), when reboot the PC(with windows7 on
it) as required, there is no pagefile.sys on the ramdisk drive. Then I
found that the ZwCreatePagingFile is called much earlier than the Ramdisk
driver’s DriverEntry routine. I have changed the Ramdisk driver starttype
as 0.
My questions are:
1 Can Ramdisk be used for the pagefile drive, if the ramdisk driver can be
loaded very earlier(I guess it is hard to be loaded before the pagefile on
the drive is created)?
2 How to let the ramdisk driver to be loaded before the pagefile on the
drive is created?

Thanks in advance.
Leo


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Thanks for your opinion, as i have said, may they can use another storage media replace part of the DDR3 memmory(may be cheaper), so they want to know the effect with less DDR3 memory, even the performance has a little degraded, but the market may accept it for the price.

Would it be an option for you to put the %temp% folder into the ramdisk (supposed you have enough RAM)? This should speed up your system. More or less it depends on how the sytem is used, but it will reduce disk activity.

>Would it be an option for you to put the %temp% folder into the ramdisk

(supposed you have enough RAM)? This should speed up your system. More or less
it depends on how the sytem is used, but it will reduce disk activity.

Yes, it is an option. Even we can put Windows directory on the Cache buffer and so on, but i still like to use RAM to play as the new storage media first, put pagefile on it.

According to Joseph’s arithmetic I’d say setting the pagefile to “0” seems to be your best choice. This solves your loading order problem too. And for the rest of the disk activity “redirect” all to RAM, similar to a live system started from CD/DVD/(locked)USB-stick where there is no hard disk available.

Thanks again? let me think more about the way to do the test.

Back in 1968, our IBM 360/67 had 768K of sub-microsecond (I think 750ns)
RAM, and 2MB (!) of slow, 8-microsecond RAM (it was all magnetic cores
back then). We didn’t have an expensive (2MB!) paging drum, and we paged
from fast-ram to slow-ram. It turned out that we got better performance
just by executing directly out of the slow ram, because of the overhead of
the paging operations.

There’s nothing wrong with using slower, cheaper memory (although the last
DDR3 I bought cost me just over $4.50/GB: $18.50 for a 4GB module). But
use it appropriately. Given L1 and L2 caches, you may find that simply
using slower, cheaper memory instead of a RAMdisk gives you better
performance because you never take page faults. Note that our “large,
slow” ram was more than 10 times slower than main memory and we STILL won
big just executing it out of the slow memory. The only “cache” the 360/67
had was an 8-slot TLB (which, by the way, was a module about five feet
high and two feet wide, with well over 100 little cards in it. We usually
ran with it swung out from its cabinet, because if the system hung, the
TLB status lights would stop blinking, which was the most reliable
indicator of system malfunction!)

I used to present this data in my course. Some of it may be obsolete
these days, because the chips are faster; in particular, the last number
may be serious underestimate.

Data in the write pipe or L1 cache costs 0 CPU clock cycles to access
Data in the L2 cache, caused by an L1 cache miss, costs 1 CPU clock
cycle to access
Data in the main memory, caused by an L2 cache miss, costs 20-200 CPU
clock cycles
A page fault costs 30,000,000-40,000,000 clock cycles

Note that a page fault is about SEVEN ORDERS OF MAGNITUDE SLOWER than an
L2 cache miss. So you’ve got to have a LOT cheaper memory to justify this
performance hit. Given you are using RAMdisk instead of a real disk, you
might be only five or six orders of magnitude slower; compare this with
using cheap memory everywhere with no paging at all and you see why I
think it is a waste of effort.

So you may be trying to envision a complex solution to a simple problem.
Just use the cheaper memory everywhere and see what the performance is
like, given L1 and L2 caching and a large TLB. Only then do you start to
worry about complex solutions if the performance is not acceptable.
joe

Thanks for your opinion, as i have said, may they can use another storage
media replace part of the DDR3 memmory(may be cheaper), so they want to
know the effect with less DDR3 memory, even the performance has a little
degraded, but the market may accept it for the price.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Just as the relationship in L1, L2 cache and main memory, here it is just a cost balance, if every thing can be done in L1 cache speed, then we don’t want L2 and the main memory, ofcourse we don’t want the slow disk at all too. So here , I modify the RAMDISK sample, just for simulation of another speed media, better than disk, but worse than RAM, put pagefile on it, as i have said, may be put Windows or user directory on it too if possible. In the machine i tested, there is 4GB RAM, it use less than 2GB, if we can use only 1GB memory, with another media which is better than disk but can have lower cost than RAM, then will the performance get unacceptable? This is what i really take care of.

Thanks

Data in the write pipe or L1 cache costs 0 CPU clock cycles to access
Data in the L2 cache, caused by an L1 cache miss, costs 1 CPU clock
cycle to access
Data in the main memory, caused by an L2 cache miss, costs 20-200 CPU
clock cycles
A page fault costs 30,000,000-40,000,000 clock cycles

You don’t have much choice about the existence of the L1 cache, since it
is built onto the processor chip. My memory is that the L2 cache is now
on-chip, but I don’t have time to do the research to check it out.

So it goes back to why there is any advantage to using memory as a paging
disk, when you can just use it to hold programs and data, and even if it
is slower by a factor of 5, the evidence suggests that using it as a
pagefile is less effective than simply executing out of it.

Before you commit to a complex solution, see if the simple solution works.
joe

Just as the relationship in L1, L2 cache and main memory, here it is just
a cost balance, if every thing can be done in L1 cache speed, then we
don’t want L2 and the main memory, ofcourse we don’t want the slow disk at
all too. So here , I modify the RAMDISK sample, just for simulation of
another speed media, better than disk, but worse than RAM, put pagefile on
it, as i have said, may be put Windows or user directory on it too if
possible. In the machine i tested, there is 4GB RAM, it use less than 2GB,
if we can use only 1GB memory, with another media which is better than
disk but can have lower cost than RAM, then will the performance get
unacceptable? This is what i really take care of.

Thanks

> Data in the write pipe or L1 cache costs 0 CPU clock cycles to access
> Data in the L2 cache, caused by an L1 cache miss, costs 1 CPU clock
> cycle to access
> Data in the main memory, caused by an L2 cache miss, costs 20-200 CPU
> clock cycles
> A page fault costs 30,000,000-40,000,000 clock cycles


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

> A page fault costs 30,000,000-40,000,000 clock cycles

That number seems a little fishy, back of enveloper calculations suggest a real different number than 30 million cycles.

A page fault might take 30-40 million clocks of time to resolve, but unless the I/O subsystem is just polling, it doesn’t seem likely it will consume 30-40 million clocks of processor power. On a 3 Ghz processor, 30 million clocks is 1/100 of a second, about disk seek time, that would also be no more than 100 page faults/sec. It seems like I’ve seen soft fault rates, which are essentially the page fault without the physical I/O, a couple orders of magnitude higher, like 10K+/sec, which would be more like 300K clock cycles.

I know there are storage controllers than can execute 500K+ IOPS/sec, and if page fault handing was a similar cycles as an I/O operation, that would only be 6 thousand cycles. Factoring in multiple cores is a little fuzzy, as I don’t know if that’s 500K IOPS/sec, is consuming 16 cores at 100% cpu load or something a little kinder.

Another data point, some Googling finds http://blogs.technet.com/b/askperf/archive/2008/01/29/an-overview-of-troubleshooting-memory-issues-part-two.aspx, where someone reports 200K soft faults/sec, so that would be 15K clocks each on a 3 Ghz processor.

I might believe a page fault takes 30K cycles, but not 30M. As there are flash storage devices that can do 500K-1000K random read IOPS/sec, hard page faults may not really take 1/100 of a second of elapsed time anymore either

Jan

The typical price of a hard pagefault is a few hundred microseconds if backed by a SSD and a few hundred milliseconds if backed by a HDD. And can take up to many seconds if the drive is spun down.

//Daniel

I believe the number applies to the total stall time of the thread. Which
is why I pointed out that the numbers would be possibly a couple orders of
magnitude lower for a RAMdisk.

Note that “A few hundred microseconds” is a lot of cycles on a 2.8GHz
processor, and if it is a superscalar the CPU cycles turn into a large
number of instructions per cycle (2-6, if I’ve read the Core docs
correctly). So at, say, 3 instructions per cycle, that’s 9 instructions
per nanosecond, 9000 instructions per microsecond, and 900,000
instructions in a hundred microseconds. So it sounds like under optimal
conditions, that’s six orders of magnitude degradation. Executing the
code out of slow RAM will lose less, because of instruction prefetch,
instruction pipelining, L1 and L2 caching and speculative execution. None
of which we had on the 360/67, and with a factor-of-10 slower memory, the
overhead of the page faults was higher than the cost of direct execution
from slow RAM.

So there is a simpler solution that using a RAMdisk for paging: Don’t do
paging. You can turn paging off, execute from slow RAM, and still be
ahead of the game. You get your savings in system cost, and quite likely
better performance than the paging solution, with zero effort expended.

By the way, there is no way to accurately predict the performance. A lot
of it depends upon execution patterns and data access patterns. You
actually have to build and measure. But you may well have a simpler
solution if you build only one kind of memory onto the board. So it saves
fabrication cost, saves system complexity, has essentially zero extra cost
for trying to figure out how to reroute paging to the RAMdisk, and I’d go
that way first.

Otherwise, you are “pre-optimizing” without any substantiating data. This
never works out well.
joe

> A page fault costs 30,000,000-40,000,000 clock cycles

That number seems a little fishy, back of enveloper calculations suggest a
real different number than 30 million cycles.

A page fault might take 30-40 million clocks of time to resolve, but
unless the I/O subsystem is just polling, it doesn’t seem likely it will
consume 30-40 million clocks of processor power. On a 3 Ghz processor, 30
million clocks is 1/100 of a second, about disk seek time, that would also
be no more than 100 page faults/sec. It seems like I’ve seen soft fault
rates, which are essentially the page fault without the physical I/O, a
couple orders of magnitude higher, like 10K+/sec, which would be more like
300K clock cycles.

I know there are storage controllers than can execute 500K+ IOPS/sec, and
if page fault handing was a similar cycles as an I/O operation, that would
only be 6 thousand cycles. Factoring in multiple cores is a little fuzzy,
as I don’t know if that’s 500K IOPS/sec, is consuming 16 cores at 100% cpu
load or something a little kinder.

Another data point, some Googling finds
http://blogs.technet.com/b/askperf/archive/2008/01/29/an-overview-of-troubleshooting-memory-issues-part-two.aspx,
where someone reports 200K soft faults/sec, so that would be 15K clocks
each on a 3 Ghz processor.

I might believe a page fault takes 30K cycles, but not 30M. As there are
flash storage devices that can do 500K-1000K random read IOPS/sec, hard
page faults may not really take 1/100 of a second of elapsed time anymore
either

Jan


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer