I meant to add that k in my O(k) is not a constant, but a formula, e.g.,
log n, n^2, etc.
However, big-O notation is a shorthand for a much more complex equation
O(e) = t[setup] + C * e + t[teardown]
Usually, t[setup] and t[teardown] are extremely close to, or exactly,
zero, and can be ignored. But that big-C is a potential killer. C is the
“constant of proportionality” that converts the value determined by the
expression e to actual time. It may be that e is a simple “n log n”, as
for a sort, but C is the cost of doing the comparison, and can be very
large. I once had a case where the cost of strcmp dominated everything.
So I sorted the global symbol table once, associated each string with an
integer, e.g. s[i] ~ v[i], where s is a string and v is an integer value,
or, for all i, j if v[i] < v[j], then s[i] < s[j] (and there are no
duplicates, so although the same relationship held for ==, it didn’t
matter). Reduced sorting large string sets from minutes to seconds (yes,
there was a reason to sort them multiple times, using multiple keys, but
only one key was a string).
So what you have observed is that if you have to repeat a complex test on
every write, C becomes extremely large, even if e is log n (e.g., binary
search) or even if e is a constant (e.g., under a perfect hash, e == 1,
and under less-than-perfect hash, you can usually get e == 1.2 without
killer effort), so C does not have to dominate. And it puts the delay,
uniformly, into every file, precisely once. So always be cautious of that
C, it can kill you. O(1) is not very good if that one test involves a disk
read, for example, to perform the test. An O(n^2) done entirely in memory
needs a pretty large n before it loses to the O(1) requiring a disk
access.
joe
> Hi,
>
> I need a little help with FS Minifilter and file contexts.
>
> PROBLEM STATEMENT:
> => How can I identify a particular file (whose IRP_MJ_CREATE have been
> missed) and associate a file context with it in a time complexity of
> O(1)…?
>
> INTRODUCTION:
> I’m writing a FS minifilter driver that would track changes in a list of
> files and maintain them in a bitmap.
> And to save the trouble to have to string compare the file path during
> every IRP_MJ_WRITE (so as to check whether the file has been set for
> tracking), I’m setting a File Context in the IRP_MJ_CREATE itself, to
> all
> files that are found in my tracking list.
> That is the only place where I have to do a string comparison of the
> file
> path.
> Down the line, during any of the IRP_MJ_WRITEs all I have to do is check
> for the context and if it exists, it means that this file has been set
> for
> tracking.
> Then I can straight away move on to updating the bitmap.
What, exactly, does the bitmap do? Or are you dealing with image files of
some kind?
As far as I know, the FsContext and FsContext2 are private to the file
system and cannot be changed or modified in any way. What, exactly, are
you setting?
>
> NOW HERE’S THE PROBLEM:
> There might be files in the system that were open before my driver
> started, or in my case, before my driver was installed.
> It means that I have missed the IRP_MJ_CREATE for such files, and with
> that I have missed my chance to set the file context which will
> supposedly
> help me identify whether changes to this file are to be tracked or not.
>
> => Is there a way that I can set a context to such already opened files
> so
> that I could fetch the context in the consequent IRP_MJ_WRITEs…?
>
> PROPOSED SOLUTIONS:
> The first solution that comes to mind is to string compare the file path
> in the IRP_MJ_WRITEs and if the file is found to be one from the
> tracking
> list, then you set the context.
> But it would mean that if there are 100 open files in the system and not
> a
> single one of them falls in our tracking list, we’d DELAY the WRITE
> operations of all of these files.
It sounds to me like you only need to delay the FIRST write you see for an
“unmarked” file. That is, your decision is three-state: care, don’t care,
don’t know. Once you determine that the state is “don’t know”, you only
need to delay that one write long enough to decide whether it should be
marked “care” or “don’t care”, and therefore on subsequent write
operations, no string compares are required.
>
> Another thing one could do is demand a System reboot after installation
> of
> my software and have the minifilter driver load at System boot so that
> we
> won’t miss any of the IRP_MJ_CREATEs, as the driver will be loaded and
> ready before a file is accessed in the system.
> But we don’t have the liberty to ask for a System reboot either.
This would be silly. You have to work on the premise that you cannot
demand a reboot and that there may already be open files. But see my
above description of the three-state issue.
>
> There are various approaches I have devised that would reduce the
> expected
> delay in processing IRP_MJ_WRITEs, but it still leaves the non-relevant
> IRP_MJ_WRITEs ending up with a small hit of unnecessary extra
> processing.
No, it is not “unnecessary”; from the viewpoint of your filter driver,
that processing is essential. But you only need to do it once per file,
on the first write operation you intercept for a “don’t know” file. Once
you have determined whether or not you care, the lengthy computation never
again needs to be done.
>
> The best thing would be to be able to set the file context (to files
> whose
> CREATEs we have missed) in a time complexity of O(1).
You can already do that for files you create. But it isn’t O(1) even in
that case; it is a function of the number of files in the “catch table”
(or the number of extensions, depending on what you are trying to catch).
Therefore, the first write operation will be delayed by O(k), whereas for
files being newly created, the create will be delayed by O(k), where k is
the cost of determining if the file is of interest. There is no reason to
ever perform this lookup more than once on an unmarked file; once it is
performed, you mark the file, and everything beyond that point works in
O(1) time (where 1 represents the cost of asking “care or don’t care?”)
I note that you have not said what the purpose of this detection is; if it
is one of those “we want to make sure nobody is doing X to files of type
Y”, note that there is absolutely, positively, no conceivable way you can
tell anything about the type of data in a file. Extensions are a hint of
what the content might be, but I can save a JPEG file as .BMP, .EXE, or
.TXT. Note that if I try to read that file with Paint, or run it, or read
it in NotePad, I will get errors or gibberish, but file content is not
compelled to match the extension. So it is not clear what purpose this
minifilter serves.
joe
>
> Any help would be greatly appreciated.
> Thanks
>
> —
> NTFSD is sponsored by OSR
>
> OSR is hiring!! Info at http://www.osr.com/careers
>
> For our schedule of debugging and file system seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>
NTFSD is sponsored by OSR
OSR is hiring!! Info at http://www.osr.com/careers
For our schedule of debugging and file system seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer