Also, how is it that you expect to know that the file request is coming
from SQL server? And how are you going to deal with requests from
industrial-strength backup utilities that are backing up live SQL
databases? And what about disk defragmenters?
A p-baked idea is a generalization; half-baked ideas have p==0.5. A lot
of the ideas I hear in this newsgroup seem to have p < 0.1. When you
think about mucking in file systems, you have to understand fourth-order
effects if you want any hope of success. I’ve been writing device drivers
on a variety of platforms for over 40 years. I’ve heard Tony Mason’s
talks at conferences, and they are one reason I would never attempt to
write a file system. His knowledge of nth-order effects is amazing. I am
not sure how I could reliably tell that reqests were coming from SQL
server, and I don’t know all the things it does, but my past experience
suggests that it might have background threads doing file optimizations,
record compaction, transaction management, etc., and which of these can be
safely “virtualized”? The, “Hey, fellas, let’s just redirect the IRPs to
a virtual file” approach is a bit scary. And even if you slog through
millions of IRP trace lines, you cannot be sure you have seen all the
patterns that *might* exist, and therefore might have no idea if your
solution is complete.
If I had a client that wanted something like this, the first thing I’d
suggest is a solid requirements document. Then I’d suggest they pay
someone at OSR to do a sanity check on it, and perhaps even give a quote
on doing it. File systems are not for beginners, or even people like me
who have only done simple PCI devices. And there is essentially no room
for errors.
Note that a requirements document states the problem that needs to be
solved, fairly precisely. “I need to intercept the IO for SQL server for
certain operations” is not a requirement; it is a suggestion for an
implementation. There may be much easier ways to achieve this, perhaps
involving redirection using existing mechanisms already implemented and
tested for years. Or, maybe you really do have to track every IRP. But
I’m not sure from your questions that you have the in-depth understanding
of the complexities of the file system to avoid making some grievous
error. And while my success rate is high, my failures were nearly always
from some subtlety I was not aware of. And I saw one project nearly crash
and burn because the disk did not handle power failure well, and this
hardware failure compromised the transaction management. It was a
fourth-order effect.
For example, the statement about a “fake” file that doesn’t need to exist
at all is confusing. If I write to it, is the data simply discarded
(because the file doesn’t exist at all) and if so, what happens when I try
to read that data back? What do I get? If I have a terabyte of database,
what does the fake file do if I write to byte offset 760,238,124? Does
the fake file have to handle sparse record updte? And where does the
“fake” file store all the information? Obviously, for realistic-sized
databases it can’t store it in main memory, so in fact the “fake” file has
to have a place to put this, which would be on the disk, making it a
“real” file. How is this going to live in the SQL ecosystem, which as I
indicated might have lots of pathways in SQL server itself and more
components than SQL server in the ecosystem (backup and defragmentation
being the two most immediate ones that came to mind). And don’t forget
that SQL is transacted, so how does the fake file live in harmony with the
notion of transactioning? Nd what about record locking?
A long time ago, in a different lifetime, I worried about these things,
and although I’ve not used SQL server or dealt with the Windows file
system, I do know that the actions of both are very complex, and the two
together add composite complexity to the problem.
joe
> I need to intercept the IO for SQL server for certain operation …
You are aware that SQL-Server (as nearly every transactional DBMS), has
very strict IO Requirements? File-Integrity, Order of IO-Operations,
Flushes, None-Cached Access, …
[SQL Server I/O Basics, Chapter 2]
http://technet.microsoft.com/en-us/library/cc917726.aspx
You should at least run the MS provided tests for IO Systems:
[How to use the SQLIOSim utility to simulate SQL Server activity on a disk
subsystem]
http://support.microsoft.com/kb/231619/en-us
GP
NTFSD is sponsored by OSR
For our schedule of debugging and file system seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer