Do you have any evidence that would suggest that this will improve your
performance? The RAMdisk approach made sense in an era when a
single-cylinder seek took 25ms, drives did not have caches, and the
underlying MS-DOS file system did not do page management, and the
processor clock was 4.7MHz. Today, with gigabytes of RAM available for
the file system cache and automatically optimized for dynamic performance,
processors are much faster than even their clock speeds suggest, and disk
drives have onboard caches, is it really going to help to create something
that, on the whole, is already better than the classic RAMdisk approach?
Have you measured the performance? Have you analyzed the application to
see if it is using the best algorithms? (Seriously. I’ve seen
better-than-order-of-magnitude performance changes accomplished by trivial
changes in the app. I once got 30% performance improvement with a
one-line change). Before heading off to do a very expensive and complex
driver, do you have the performance numbers to justify this effort? And
how robust is this solution?
I once had an MS-DOS app that took five hours to run. I copied all the
input files to a RAMdisk and it ran in 40 minutes. Some years later I had
a much faster Windows machine. Alas, the script that ran involved a lot
of complexity but started off with a copy to RAMdisk. I mapped the
logical RAMdisk letter to a directory of the local disk and started it
running. I went to fix something to eat, planning to come back and watch
it run. By the time I came back, it had finished. It ran in 5 minutes,
without the RAMdisk! So before you rush off to reinvent a very complex
wheel, can you present any evidence that suggests that drive delays are
your performance bottleneck? And, when is the last time you defragmented
your disk? Don’t rush off to spend lots of money, time, and people until
you know where the problem is.
And, if you really need a RAMdisk, I’m sure there are tested and debugged
RAMdisk packages already in existence. It would be really inexpensive to
buy one, install it, and see if you got measurable improvement. If you
don’t, the project is not worth doing. If you do, but you need the
mirroring capability, write a service that can handle the updating of the
hard-drive files using lazy writeback. This would give you what I think
you are asking for, but at ***vastly*** lower cost.
I’ve heard Tony Mason give talks on the issues of building file systems.
He knows all the weird corner cases you haven’t thought of (yet). How
will your file system handle memory-mapped files? How will it work when
other drivers exist in the stack? Will it be FAT or NTFS? Hmmm.
There’s a FAT driver example, and NTFS is still proprietary. And what
about ReFS? How will you deal with protection issues if it’s FAT (You
can’t put ACLs on FAT)?
Or, do you want to build a virtual volume whose implementation is RAM?
Then you can support NTFS trivially because that is managed far above your
pay grade…uh…I mean, far above you in the stack. Well, it’s te same
thing, actually. Or is this idea of RAMdisk a design thought up by
someone in management or marketing who once used one under MS-DOS and
“knows” this is the “right solution”?
One thing I learned in 15 years of performance measurement: never, ever
undertake a problem in “optimization” until you know that one exists,
where it exists, and have analyzed the application to determine who is at
fault. In 1982-1983 I implemented what was probably the fastest storage
allocator in existence. I spent at least a month doing nothing but
performance analysis and performance enhancement. A few weeks after I
released it, one of te product groups came to me and said, “We’ve
identified the performance bottleneck in our product. It’s your
allocator.” Proof of this was the program-counter-sampling report that
showed a huge spike at the storage allocator module. Well, having just
spent a month getting the “typical” allocation path down to under 50
instructions, and the “best” allocation down to about 20 instructions, I
had a little trouble believing this. So, since they were running the
“debug” version, I reached in with the debugger and turned on the
performance counters. Turns out they had a tight loop that called a
function to do something. Each time this function was called, it
allocated a small buffer to work in, and before it left, it deallocated
this buffer. The result was over 4,000,000 unnecessary calls on the
allocator. I changed it to put the buffer as a local variable on the
stack, and got a noticeable performance improvement–at least a factor of
4 or 5. The fastest known (at that time) allocator was not fast enough to
handle several million unnecessary calls. So you have to not only
determine what is the per-disk-transaction cost, but then take the next
step: if this cost was 0, what would happen to overall performance? One
researcher came to me because he’d been told I was the performance guru.
He lamented, “I’ve spent a week optimizing this subroutine. It’s at least
twice as fast as it used to be. And my program still takes forever!” So
I ran my performance analysis tool. His subroutine, on which he’d labored
long and hard, was not called very often, and accounted for 0.25% of the
total execution time. Before, it had accounted for 0.5% of the time. The
aalysis identified several “hotspots” he had not even suspected.
Local optimization in the absence of real performance data is usually a
waste of time. Buy a RAMdisk package and measure your performance using
it. Suppose DiskPerf tells you it is 10 times faster. Does the
application run ten times faster? If it runs at the same speed, or
perhaps 2x faster, disk I/O is not your problem. If your app also runs
10x faster, then you know disk I/O /is/ the bottleneck. Then say, “Well,
half the solution cost $N. Is there a way to get the rest of the solution
at a reasonable price?”
I did a google search, and the first hit gave a RAMdisk that runs up
through Win8. It costs $18.99, or you could spring for the deluxe version
at $22.99 that comes with a T-shirt. I didn’t look further, although the
entire first page of the google search appeared to be products. I would
not pursue your project any further until I had purchased this package and
done measurements with it. And done a thorough analysys of the app.
joe
Dear List Members,
Thanks for your input up to this point. Below is the main idea on which we
are working.
Basically we are developing a caching system for Windows desktop and
server platforms. For giving the user an access point for using the cache
we are creating a virtual device i.e. X: in My Computer. The geometry of
X: will be same as the geometry of disk partition for which X: is
providing caching services. Let for example X: is providing caching for D:
then if size of D: is 40GB then the size of X: will also be 40GB. In other
words all IOCTL requests for X: will be directed to D: All the write
requests for X: will be written to Z: which is basically a RAMDISK. When a
certain amount of RAMDISK is full or when the system is going to shutdown
then the dirty data of X: should be synched to D: This synching is
basically I think causing file system check on next reboot.
So for X: three types of requests should be sent to D:
Read Requests [Cache Miss Case]
Write Requests [Dirty Data Synch Case]
IOCTL Requests [Device I/O Control Requests except I think
IOCTL_MOUNTDEV_QUERY_DEVICE_NAME]
Hopefully this would clearify the problem I am facing.
Thanks,
Uzair Lakhani
NTFSD is sponsored by OSR
OSR is hiring!! Info at http://www.osr.com/careers
For our schedule of debugging and file system seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer