Which context shall I use?

This is what happens when I copy a file:
I get 6 postCreate events.

This is what I won’t:
I will scan the file only once for every file copy and then remember to the next postCreat event that I have already scanned this file.
And when I do a file copy again on the same file I shall be able to rescan the file only once.

This is what I have tested:
When I use StreamHandleContext it is lost after each event so in that case I scan the file 6 times.
And when I use StreamContext the context are there forever and therefore I will never be able to rescan the file.

This is the question:
Are there some other ways to remember that I have scanned the file and when I can rescan the file only once for every file copy?

Kind regards
Mattias Bergkvist

Maybe you could put something in the StreamContext that tells you
whether you need to re-scan it or not? Like the mtime for the file, and
if it has changed when you see a postcreate, you rescan it. Or if
computing an md5 (or some other) checksum for the file is cheap compared
to scanning it, use that if you’re concerned about an adversary
manipulating the timestamps.

~Eric

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@netcleantech.com
Sent: Friday, January 11, 2008 10:21 AM
To: Windows File Systems Devs Interest List
Subject: [ntfsd] Which context shall I use?

This is what happens when I copy a file:
I get 6 postCreate events.

This is what I won’t:
I will scan the file only once for every file copy and then remember to
the next postCreat event that I have already scanned this file.
And when I do a file copy again on the same file I shall be able to
rescan the file only once.

This is what I have tested:
When I use StreamHandleContext it is lost after each event so in that
case I scan the file 6 times.
And when I use StreamContext the context are there forever and therefore
I will never be able to rescan the file.

This is the question:
Are there some other ways to remember that I have scanned the file and
when I can rescan the file only once for every file copy?

Kind regards
Mattias Bergkvist


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars (including our new
fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: unknown lmsubst tag argument:
‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com

>And when I use StreamContext the context are there forever and therefore I

will never be able to rescan the file

Well, what time do think is appropriate? 10 secs? What if the file is being
modified within this 10 secs? In this case you wouldn’t rescan the file?

Why don’t you utilize StreamContext to maintain a “state” for the file. The
right moment to rescan the file is when it’s been modified and not after a
particular time range. On file modification (e.g. IRP_MJ_WRITE) you would
set the state to “unknown” which is the same as having no StreamContext for
the file.

Regards
Frank

Hi Mattias,

This is what happens when I copy a file:
I get 6 postCreate events.

Do all these open succeed? It is not the kind of behavior that I would expect from an application, i.e., it opens the file multiple times for a single copy operation. However this can happen if there are multiple streams within a file.

This is what I won’t:
I will scan the file only once for every file copy and then remember to the next postCreat event that I have already scanned this file.
And when I do a file copy again on the same file I shall be able to rescan the file only once.

For one thing, the kernel does not understand a copy operation.

This is what I have tested:
When I use StreamHandleContext it is lost after each event so in that case I scan the file 6 times.
And when I use StreamContext the context are there forever and therefore I will never be able to rescan the file.

This is what you should expect. StreamHandleContext are per FILE_OBJECT, whereas StreamContext are common for all open instances of a particular stream.

This is the question:
Are there some other ways to remember that I have scanned the file and when I can rescan the file only once for every file copy?

Hence, all that you can do is that make your filter intelligent enough so that it recognizes that the source file has not been modified since the last scan you did. For this, you can maintain hash table. Look up in the hash table for the file name (or preferably ID). If it is not found it means that it was not scanned. If it is found and a flag associated with it indicates that it has been modified, it means that you need to scan the file. Scan the file then. Update the hash table accordingly. If the flag indicates that it has not been modified, you don’t need to scan it. Also track all file modifications to that file, so that you can update this hash accordingly, i.e., marking that the file has been modified and needs to be scanned again. You also need to track the file close/ cleanup, because you will generally rescan the file there to check if nothing malicious has been written to it. Update the hash table accordingly.
I may have went wrong somewhere while explaining what to do. But I hope that you have understood the general idea.

Regards,
Ayush Gupta

> Hence, all that you can do is that make your filter intelligent enough
so that it recognizes that the source

file has not been modified since the last scan you did. For this, you
can maintain hash table. Look up in the
hash table for the file name (or preferably ID). If it is not found it
means that it was not scanned. If it is
found and a flag associated with it indicates that it has been
modified, it means that you need to scan the
file. Scan the file then. Update the hash table accordingly. If the
flag indicates that it has not been
modified, you don’t need to scan it. Also track all file modifications
to that file, so that you can update
this hash accordingly, i.e., marking that the file has been modified
and needs to be scanned again. You also
need to track the file close/ cleanup, because you will generally
rescan the file there to check if nothing
malicious has been written to it. Update the hash table accordingly.

Actually, this is exactly the place I just realized I’m reinventing the
wheel :wink: At least it seems that way. Is there a reason I should be
using a hash table over a streamcontext? It seems like using a stream
context frees you from having to worry about which volume a file id is
on, or maintaining separate hash tables on a per-volume basis. Of
course you need to be able to maintain state yourself if the FS doesn’t
support StreamContexts which may necessitate a hash table anyway. If
the FS supports it though, is one preferable over the other?

Thanks,

~Eric

Hi Eric,

Actually, this is exactly the place I just realized I’m reinventing the
wheel :wink: At least it seems that way. Is there a reason I should be
using a hash table over a streamcontext?
It seems like using a stream
context frees you from having to worry about which volume a file id is
on, or maintaining separate hash tables on a per-volume basis. Of
course you need to be able to maintain state yourself if the FS doesn’t
support StreamContexts which may necessitate a hash table anyway. If
the FS supports it though, is one preferable over the other?

It is not reinventing the wheel. :slight_smile:

If you look at it more carefully, this approach lets you keep the
information even if the file (stream) has been closed.
This is not the case with StreamContext. As soon as no FILE_OBJECT
representing the stream is left, the context is also deleted.
Hence, if you need to maintain any information related to a stream (file),
you need to implement something that is managed by you and not the file
system. Only then can you make that information persist across multiple
open-close on a stream.

Regards,
Ayush Gupta

Ayush,

Thanks for pointing that out. I hadn’t thought of the case of having to
maintain state between openings because it doesn’t apply in my
situation. Just out of curiousity, how do you decide when to drop a
file’s state from your hash table? Obviously if the answer is never,
you’ll eventually eat up all the available memory. Do you limit the
total size of the table and purge using a least recently used strategy,
possibly weighted with the size of the file (if longer files take more
time to scan, the penalty for making a redundant scan is higher)?

Anyhow, with that in mind, I’ll definitely hang on to the hash table
code I wrote in case I need it down the road.

~Eric


From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Ayush Gupta
Sent: Friday, January 11, 2008 12:07 PM
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] Which context shall I use?

Hi Eric,

Actually, this is exactly the place I just realized I’m reinventing the

wheel :wink: At least it seems that way. Is there a reason I should be

using a hash table over a streamcontext?

It seems like using a stream

context frees you from having to worry about which volume a file id is

on, or maintaining separate hash tables on a per-volume basis. Of

course you need to be able to maintain state yourself if the FS doesn’t

support StreamContexts which may necessitate a hash table anyway. If

the FS supports it though, is one preferable over the other?

It is not reinventing the wheel. :slight_smile:

If you look at it more carefully, this approach lets you keep the
information even if the file (stream) has been closed.

This is not the case with StreamContext. As soon as no FILE_OBJECT
representing the stream is left, the context is also deleted.

Hence, if you need to maintain any information related to a stream
(file), you need to implement something that is managed by you and not
the file system. Only then can you make that information persist across
multiple open-close on a stream.

Regards,

Ayush Gupta


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: unknown lmsubst tag argument:
‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com

I would never buy into contexts or this sort of thing unless strictly
necessary, this way you can keep things completely under your control and
you eliminate chances that you get bitten by some snake down one of the
paths nobody ever went before. Chances of running into minifilter or file
system bugs is high, if you do your own administration you can choose to
save exactly what is important to you, based on file name, file ID, file
object, stream, handle or whatever and you decide yourself what to hash, how
to speed up things and when to allocate it and when to free it as well.

/Daniel

“Eric Diven” wrote in message news:xxxxx@ntfsd…
Actually, this is exactly the place I just realized I’m reinventing the
wheel :wink: At least it seems that way. Is there a reason I should be
using a hash table over a streamcontext? It seems like using a stream
context frees you from having to worry about which volume a file id is
on, or maintaining separate hash tables on a per-volume basis. Of
course you need to be able to maintain state yourself if the FS doesn’t
support StreamContexts which may necessitate a hash table anyway. If
the FS supports it though, is one preferable over the other?

Thanks,

~Eric

Would anybody else care to weigh in one way or the other before I start
tearing into the codebase? I need to make changes either way, but I’d
like to do it once. I finished the design for doing it with my own hash
table, and I don’t think I’m doing anything particulary extravagant that
would cause concern about ceding control of the hashing and storage
allocation to the OS on this.

On the other hand, if there really are large risks to using the MS way,
I’d like to know. I’m not saying I’ll write flawless code the first
time, but if I’m going to have bugs either way, I’d rather have ones I
can fix without involving MS. I haven’t heard nearly the level of
complaints about this here as I’ve heard about FltGetFileNameInformation
(), so I’m assuming either contexts are robust and well-supported, or
everybody wrote them off as crap years ago. Thoughts?

Thanks,

~Eric

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@resplendence.com
Sent: Friday, January 11, 2008 1:22 PM
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] Which context shall I use?

I would never buy into contexts or this sort of thing unless strictly
necessary, this way you can keep things completely under your control
and you eliminate chances that you get bitten by some snake down one of
the paths nobody ever went before. Chances of running into minifilter or
file system bugs is high, if you do your own administration you can
choose to save exactly what is important to you, based on file name,
file ID, file object, stream, handle or whatever and you decide yourself
what to hash, how to speed up things and when to allocate it and when to
free it as well.

/Daniel

“Eric Diven” wrote in message
news:xxxxx@ntfsd…
Actually, this is exactly the place I just realized I’m reinventing the
wheel :wink: At least it seems that way. Is there a reason I should be
using a hash table over a streamcontext? It seems like using a stream
context frees you from having to worry about which volume a file id is
on, or maintaining separate hash tables on a per-volume basis. Of
course you need to be able to maintain state yourself if the FS doesn’t
support StreamContexts which may necessitate a hash table anyway. If
the FS supports it though, is one preferable over the other?

Thanks,

~Eric


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars (including our new
fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: xxxxx@edsiohio.com To
unsubscribe send a blank email to xxxxx@lists.osr.com

Yes contexts are safe, reliablem and good design. Worse yet getting decent
performance without them is a pain. You basically have to create tables
where you lookup the file object yourself and manage these. As I did point
out in the other question you asked there may be a few third party file
systems that this will not work for, but they are few and likely to have a
lot of other problems for filtering.


Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr
Remove StopSpam to reply

“Eric Diven” wrote in message news:xxxxx@ntfsd…
Would anybody else care to weigh in one way or the other before I start
tearing into the codebase? I need to make changes either way, but I’d
like to do it once. I finished the design for doing it with my own hash
table, and I don’t think I’m doing anything particulary extravagant that
would cause concern about ceding control of the hashing and storage
allocation to the OS on this.

On the other hand, if there really are large risks to using the MS way,
I’d like to know. I’m not saying I’ll write flawless code the first
time, but if I’m going to have bugs either way, I’d rather have ones I
can fix without involving MS. I haven’t heard nearly the level of
complaints about this here as I’ve heard about FltGetFileNameInformation
(), so I’m assuming either contexts are robust and well-supported, or
everybody wrote them off as crap years ago. Thoughts?

Thanks,

~Eric

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@resplendence.com
Sent: Friday, January 11, 2008 1:22 PM
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] Which context shall I use?

I would never buy into contexts or this sort of thing unless strictly
necessary, this way you can keep things completely under your control
and you eliminate chances that you get bitten by some snake down one of
the paths nobody ever went before. Chances of running into minifilter or
file system bugs is high, if you do your own administration you can
choose to save exactly what is important to you, based on file name,
file ID, file object, stream, handle or whatever and you decide yourself
what to hash, how to speed up things and when to allocate it and when to
free it as well.

/Daniel

“Eric Diven” wrote in message
news:xxxxx@ntfsd…
Actually, this is exactly the place I just realized I’m reinventing the
wheel :wink: At least it seems that way. Is there a reason I should be
using a hash table over a streamcontext? It seems like using a stream
context frees you from having to worry about which volume a file id is
on, or maintaining separate hash tables on a per-volume basis. Of
course you need to be able to maintain state yourself if the FS doesn’t
support StreamContexts which may necessitate a hash table anyway. If
the FS supports it though, is one preferable over the other?

Thanks,

~Eric


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars (including our new
fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: xxxxx@edsiohio.com To
unsubscribe send a blank email to xxxxx@lists.osr.com

Hi Eric,

Maybe you could put something in the StreamContext that tells you
whether you need to re-scan it or not? Like the mtime for the file, and
if it has changed when you see a postcreate, you rescan it.

This would provide some sort of efficiency. But not too much.

Let’s see the following sequence of operations:

  1. You open file A. Let’s say file object F1 is created. Let’s say this is
    the first open. You will scan it & associate a stream context.
  2. You get another open on file A. Let’s say file object F2 is created. You
    already have a stream context on it stating that the file has been scanned.
  3. You receive close for F1.
  4. You receive close for F2.
  5. You receive open for file A. Let’s say file object created is F3. In the
    post create you will not find the stream context that you had set on F1. You
    will end up scanning the file again.

… And the list will continue or may not continue… :slight_smile:

I hope you understood the point.

Regards,
Ayush Gupta

Hi Eric,

Do you limit the total size of the table and purge using a least recently
used strategy, possibly weighted with the size of the file (if longer files
take more time to scan, the penalty for making a redundant >scan is higher)?

You can have various strategies to handle this. For example, you can:

  1. Keep purging the contents after every t minutes. As an enhancement
    you can also check how much memory have you really consumed. If that is not
    much, you don’t have to purge.

  2. Another option is to maintain it in a memory mapped file of your
    own. But try to implement a hash that does not make the file grow
    tremendously.

  3. You can also make a memory mapped file without using hashing. But I
    don’t know how good its performance will be.

  4. For NTFS volumes, you can insert the information in a separate
    stream in that file.

There may be many more. And each once with its own advantages &
disadvantages.

Regards,

Ayush Gupta

Hi Eric,

One thing I forgot to mention is that also associate this info as the stream
context. This will enable you to skip searching in the hash table. And you
will need to search in the hash table when you don’t find any stream context
associated with a file object in post create.

Basically the aim is if a file object representing a stream is already open
and a stream context exists for that stream; for a second open (that comes
before the close of first file object) you will have the same stream context
and you can use that information and skip searching in the hash. However, if
the second open comes after the close of first file object, you will not get
the stream context in the post create. This time you will have to search in
hash table and associate the information with the stream.

You may have different approaches for optimization. But the bottom line is
to make it perfect and optimized. :slight_smile:

Regards,

Ayush Gupta

From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Ayush Gupta
Sent: Saturday, January 12, 2008 7:28 AM
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] Which context shall I use?

Hi Eric,

Do you limit the total size of the table and purge using a least recently
used strategy, possibly weighted with the size of the file (if longer files
take more time to scan, the penalty for making a redundant >scan is higher)?

You can have various strategies to handle this. For example, you can:

  1. Keep purging the contents after every t minutes. As an enhancement
    you can also check how much memory have you really consumed. If that is not
    much, you don’t have to purge.

  2. Another option is to maintain it in a memory mapped file of your
    own. But try to implement a hash that does not make the file grow
    tremendously.

  3. You can also make a memory mapped file without using hashing. But I
    don’t know how good its performance will be.

  4. For NTFS volumes, you can insert the information in a separate
    stream in that file.

There may be many more. And each once with its own advantages &
disadvantages.

Regards,

Ayush Gupta


NTFSD is sponsored by OSR

For our schedule debugging and file system seminars
(including our new fs mini-filter seminar) visit:
http://www.osr.com/seminars

You are currently subscribed to ntfsd as: unknown lmsubst tag argument: ‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com

Its feels like the StreamContext is the way to go. The only thing is that the context is there for ever.

How can I know if I get the PostClose event or not? Since I could not use FltGetFileNameInformation I can newer know if there is the right Close event or not. If I don’t get the PostClose event the StreamContext is never freed I supposed.
Is there anybody that has a clue what it can be or what I shall do?

Regards
Mattias Bergkvist

Hi Mattias,

Its feels like the StreamContext is the way to go. The only thing is that the context is there for ever.

AFAIK a stream context is related with FCB of a stream. And FCB is common to all FILE_OBJECTS representing that stream.
If all FILE_OBJECTS close, the FCB will also be torn down (if I am correct) and so will the stream context.
So, if I am correct, once all FILE_OBJECTs close (which may include the FILE_OBJECTs used by the cache manager) for a particular stream, the stream context won’t be there. And on the next open on that stream, you will have to associate a new context.

In short, the lifetime of stream context is the life time of FCB (if I am correct).

Regards,
Ayush Gupta

The “ctx” sample answers all your questions. “scanner” sample is a good
sample for StreamContext, too.

6000\src\filesys\miniFilter\ctx
6000\src\filesys\miniFilter\scanner

wrote news:xxxxx@ntfsd…
Its feels like the StreamContext is the way to go. The only thing is that
the context is there for ever.

How can I know if I get the PostClose event or not? Since I could not use
FltGetFileNameInformation I can newer know if there is the right Close event
or not. If I don’t get the PostClose event the StreamContext is never freed
I supposed.
Is there anybody that has a clue what it can be or what I shall do?

Regards
Mattias Bergkvist

Hi Mattias,

How can I know if I get the PostClose event or not? Since I could not use FltGetFileNameInformation I can newer know if there is the right Close event >or not. If I don’t get the PostClose event the StreamContext is never freed I supposed.
Is there anybody that has a clue what it can be or what I shall do?

The Stream Context is valid till PreClose. You can copy the name stored in there (if you are storing it in StreamContext) or query the name by FltGetFileNameInformation and put it in the completion context. Then you can use it in the PostClose.

Regards,
Ayush Gupta

> If all FILE_OBJECTS close, the FCB will also be torn down (if I am correct) and so will the stream context.

So, if I am correct, once all FILE_OBJECTs close (which may include the FILE_OBJECTs used by the cache manager) for a particular stream, the stream context won’t be there. And on the next open on that stream, you will have to associate a new context.

AFAIK, the FCB might be held for longer time by the FSD, than the last
FILE_OBJECT. I am NOT 100% sure about this, but I would like to say this
here, maybe somebody who definitively knows the answer will tell us. In
practice I have observed that a stream context (and, I guess, implicitly
the associated FCB) be valid quite a long time after the last close is
performed. I think, the FSDs MIGHT have some sort of internal caches to
the last accessed files / mostly accessed files and that they keep alive
the corresponding stream contexts. Anyone knows an exact answer?

(However, Ayush you are perfectly right in that one shall NOT relay on
the stream context being kept alive after the last close. This IS AFAIK
at the sole decision of the FSD).

In short, the lifetime of stream context is the life time of FCB (if I am correct).

This shall be correct :slight_smile:

have a nice day, thank you very much,

Sandor LUKACS

> AFAIK, the FCB might be held for longer time by the FSD, than the last

FILE_OBJECT.

No. FCB dies on MJ_CLOSE to the last file object, but Cc and Mm can hold a
reference on this last file object for a long time.


Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

Yes a FO will be held open for an indefinite amount of time after the
last user handle is closed on all FO for a file. If you use stream
contexts you are going to leverage that behavior understanding that
you will have to reconstruct your data on a particular file when the
system decides to finally close the last FO. In general this works
just fine, your mileage may vary. Don’t count on the behavior of a
system that is not under much stress as typical of all systems under
all loads.

On Jan 14, 2008 11:48 AM, Sandor LUKACS wrote:
>
> > If all FILE_OBJECTS close, the FCB will also be torn down (if I am correct) and so will the stream context.
> > So, if I am correct, once all FILE_OBJECTs close (which may include the FILE_OBJECTs used by the cache manager) for a particular stream, the stream context won’t be there. And on the next open on that stream, you will have to associate a new context.
> >
> AFAIK, the FCB might be held for longer time by the FSD, than the last
> FILE_OBJECT. I am NOT 100% sure about this, but I would like to say this
> here, maybe somebody who definitively knows the answer will tell us. In
> practice I have observed that a stream context (and, I guess, implicitly
> the associated FCB) be valid quite a long time after the last close is
> performed. I think, the FSDs MIGHT have some sort of internal caches to
> the last accessed files / mostly accessed files and that they keep alive
> the corresponding stream contexts. Anyone knows an exact answer?
>
> (However, Ayush you are perfectly right in that one shall NOT relay on
> the stream context being kept alive after the last close. This IS AFAIK
> at the sole decision of the FSD).
> > In short, the lifetime of stream context is the life time of FCB (if I am correct).
> >
> This shall be correct :slight_smile:
>
> have a nice day, thank you very much,
>
> Sandor LUKACS
>
>
> —
> NTFSD is sponsored by OSR
>
> For our schedule debugging and file system seminars
> (including our new fs mini-filter seminar) visit:
> http://www.osr.com/seminars
>
> You are currently subscribed to ntfsd as: xxxxx@hollistech.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>


Mark Roddy