Writing to file with shadow FO

Hi,

I am trying to design my minifilter driver to bypass share access for files. I have files which are simple stubs. When an application opens these files and request to read data, I want to get the data from some other source (intercepting and completing pre-read operations), and in the meantime fill the stub with actual data. When all data is copied to the stub, the application can use the file regularly.

I have a problem in case the application opens the file without share write. In this case I cannot write to the file while it is opened by the application.

In order to overcome this issue, I though of using shadow FO. The application will still open the original file, I will write the data to some other temporary file and when the file is ready, I will forward all read/write requests to the new FO of the temporary file (by replacing the FO in pre-read operations).
When the file is closed by the application, I can replace the original file with the temporary file.

Do you see any issues with this approach?

Thank you for your help,
Mickey

A few more details on the above description.
I also have a userspace service which is responsible for retrieving the data. When an application tries to read from a stub file, the minifilter driver requests the data from the service and completes the read operation.
The service is the one responsible to prepare the temporary file and lets the driver know when it is ready. Once the temporary file is ready all future read/write requests are forward to the temporary file by replacing the FO.
When the application closes the file, the service replaces the stub with the temporary file.

You will have to create this file as a page(swap) file or this will never work fine with paging IO. But marking a file as a page file defeats the purpose of replacing the original file with it. Also, not all FSD support page files, e.g. network redirectors.

The Memory Manager or the Cache Manager invokes some FSD callbacks for the original file object to allow FSD to acquire resources in correct order. Then paging IRP_MJ_READ / IRP_MJ_WRITE is issued. So you have resources acquired before calling a pre-operation which makes it impossible to correctly synchronize IO with another general file from this pre-operation that also requires some resources acquisition.

That is why when processing paging IO for general file only recursive requests to page file are allowed, i.e. you can touch paged pool memory. Page file processing differs as FSD doesn’t try to synchronize access to it leaving this to a caller ( the Memory Manager in general ).

You can carry on without page file but you have been warned about synchronization hazard for paging IO.

Thank you for your assistant.
My driver and service are restricted only to NTFS.

Can you please refer me to some documentation regarding these FSD callbacks for acquiring resources?

#define IRP_MJ_ACQUIRE_FOR_SECTION_SYNCHRONIZATION ((UCHAR)-1)
#define IRP_MJ_RELEASE_FOR_SECTION_SYNCHRONIZATION ((UCHAR)-2)
#define IRP_MJ_ACQUIRE_FOR_MOD_WRITE ((UCHAR)-3)
#define IRP_MJ_RELEASE_FOR_MOD_WRITE ((UCHAR)-4)
#define IRP_MJ_ACQUIRE_FOR_CC_FLUSH ((UCHAR)-5)
#define IRP_MJ_RELEASE_FOR_CC_FLUSH ((UCHAR)-6)

typedef struct _CACHE_MANAGER_CALLBACKS {

PACQUIRE_FOR_LAZY_WRITE AcquireForLazyWrite;
PRELEASE_FROM_LAZY_WRITE ReleaseFromLazyWrite;
PACQUIRE_FOR_READ_AHEAD AcquireForReadAhead;
PRELEASE_FROM_READ_AHEAD ReleaseFromReadAhead;

} CACHE_MANAGER_CALLBACKS, *PCACHE_MANAGER_CALLBACKS;

BTW. Consider using intermediate buffers and a worker thread to perform write to a temporary file.

Write can be asynchronous in this case - data is put in an intermediate buffer and the request is completed with STATUS_SUCCESS. Then a worker thread writes data to a file with ZwWriteFile or redirects data to a user mode service. This will remove most of the synchronization issues for write and allow regular file usage. You will need to synchronize read with asynchronous write to preserve data coherence. You can put read in the same worker thread queue, this will synchronize read and write. You can even scan the queue to pick up data that is still in the queue. But you should wait for data being ready as you can’t complete a read request to an original file without data. This wait for read completion is the only path that has some synchronization issues.

This design reduces the probability of a deadlock and allows general file usage.

> You will have to create this file as a page(swap) file or this will never

work fine with paging IO.

Can you expand? You seem to be saying that paging IO only ever goes to
paging files, which we both known to be wrong?

Do you see any issues with this approach?

Extra-ordinarily complicated for what in effect yet another HSM. As soon as
you are talking shadow file objects you are talking 6 months work (4 if
you’ve done two or 3 before), as soon as you are talking about cache manager
callbacks you are talking multiple years. Add in transactions and WLK
conformance can double this.

Can you not rename the file in post-last-cleanup? At that stage the handle
count has gone to zero. Of course you need to keep all the juggling.

An alternative is to open the file carefully in the kernel as
IO_IGNORE_SHARE_ACCESS_CHECK.

I can’t find out how you come to such conclusion as the explanation was given. I will reiterate it again.

If you need to do some IO to a file B on a paging IO path to a file A then the file B can only be a page file, this allows paged pools usage by FSDs and filters. Otherwise there is a danger of a deadlock especially when the both files are on the same volume.

Thanks Slava.

Regarding the IRPs you have mentioned, I thought I can simply forward these operations to the temporary file (by replacing the FO). These locks will be applied on the temporary file similarly as I intend to do with the read operation.

BTW. Consider using intermediate buffers and a worker thread to perform write to
a temporary file.

Not sure I understand what you mean here. In my design I intended that my userspace service will retrieve the file and will write it. The driver can forward read operations (or lock operations) to this file object if the relevant range was previously written by the service.

If this is a read only implementation then you do not need what I outlined to support write requests from applications.

So assuming the following:

  1. I will forward the above IRPs (read and resource acquiring/releasing) to the temporary file by replacing the file object.
  2. The temporary file is read only.
  3. The temporary file is not a page file.

Do you see any potential issues with this design?

In the future, in order to support write operations, I can simply replace the FO of the original file with the temporary one (in addition to all other IRPs mentioned above). I will pend any write operation to a part which was not written yet by my service. Is there something I am missing here?

Thanks Rod.

An alternative is to open the file carefully in the kernel as
IO_IGNORE_SHARE_ACCESS_CHECK.

So you suggest that although the application opened the original file without share write, I will open it later in kernel for write access and with IO_IGNORE_SHARE_ACCESS_CHECK. Won’t it cause any issues if I will perform write operation on a file which was opened without share write?

I have about 2 months experience with playing with minifilter drivers. Can you please point me to some of the difficulties in shadow file objects? I currently don’t see any reason to interfere with the cache manager activity.

> If you need to do some IO to a file B on a paging IO path to a file A then

the file B can only be a page file

Thanks, that makes sense (it was the filea/fileb thing I was missing).

> So you suggest that although the application opened the original file

without share write, I will open
it later in kernel for write access and with IO_IGNORE_SHARE_ACCESS_CHECK.
Won’t it cause any
issues if I will perform write operation on a file which was opened
without share write?

Only if you let it. That’s the point. You are involved in the read and
write path for this file and it is only you who will be subverting things.
For instance someone writes to a block. You could take notice of this and
make sure that you don’t restore it.

Can you please point me to some of the difficulties in shadow file
objects?

For a straight shadow file object implementation your biggest issue is going
to be finding how to stop NTFS from crashing because of yet another
mechanism used to pass the file object down (rename, set hard link,
movefile)

I currently don’t see any reason to interfere with the cache manager
activity.

In that case you are lucky. You can achieve a great deal with what I think
of as “shadow” operation (own the file object, do not own
FileObject->SectionObjectPointer which I think of as “Isolation”). Remember
as per Slava’s excellent point that you are not allowed to divert paging
writes to another file object (or the lazy writer will acquire for the NTFS
file object in the SOP and when the write ends up on another file object
NTFS will try to acquire locks it hasn’t pre-acquired).

You main challenge is going to be to work out what has gone wrong when the
machine suddenly deadlocks during final test of after 6 months deployment on
your angriest customer’s site). I have lost count of the number of drivers
I have encountered in this space which are “99% done” but “occasionally hang
for no reason”. I’ve been doing this stuff for over 10 years and Slava had
to remind me about the mixed-file object issue (which I tend to design out).

There is something that I still don’t understand.

If my minifilter forwards all relevant IRPs (read, write & resource acquiring/releasing) to the temporary file (by replacing the FO), why would I still have a synchronization issue?

I also looked at “fastfat” sample to see when it calls CcInitializeCacheMap function. I don’t see it is being called on create flow. It is being called on read/write flows which should never be called on my original file. Is it different for ntfs?

What am I missing?

> If my minifilter forwards all relevant IRPs

It’s “the all relevant” bit that you need to worry about. If you are
splitting requests between two file object between two FSD file objects then
you are making assumptions about the FSD’s approach to re-entrancy so you
need to be ready for it.

(read, write & resource acquiring/releasing)

have you considered:

IRP_MJ_SET_INFORMATION (set length can procoke cache activity. Might
the filesystem have done some locking first?
IRP_MJ_QUERY_INFORMATION (if you are in the read, what about the write)
IRP_MJ_SET_EA (is that going to provoke a paging write? Any lock
assumptions?)
IRP_MJ_FLUSH_BUFFERS (might acquire a lock and then call CcFlush,
expecting that lock to be held)
IRP_MJ_FILE_SYSTEM_CONTROL (completely open ended but I’d worry about
the secret sauce that EFS files involve)

And so on. You are only going to find them all by playing a long game of
“whack-a-mole”[1]. Its all do-able, but its never something you can dismiss
out of hand.

[1]https://en.wikipedia.org/wiki/Whac-A-Mole

Thanks Rod.