Minifilter: When to flush effectively

My minifilter decrypts the file contents of one specific EXE file. The file is only allowed to be opened for read access, so no need to worry about anything write related in this thread.

First of all, here's my "flush" code, which works very well in itself:

BOOLEAN FlushFileObject(_In_ PFILE_OBJECT FileObject)
// flush the read and image caches of a specific FileObject
{
  PAGED_CODE();

  BOOLEAN result = TRUE;

  // we only have to do anything if there's an FsContext and a FsContexte->Resource
  if ((FileObject) && (FileObject->FsContext) && (((PFSRTL_COMMON_FCB_HEADER) (FileObject->FsContext))->Resource))
  {
    result = FALSE;
    PERESOURCE fcbResource         = ((PFSRTL_COMMON_FCB_HEADER) (FileObject->FsContext))->Resource;
    PERESOURCE fcbPagingIoResource = ((PFSRTL_COMMON_FCB_HEADER) (FileObject->FsContext))->PagingIoResource;
    for (int i1 = 0; (i1 < 20) && (!result); i1++)
    {
      if (i1)
      {
        // this is the 2nd (or 3rd or ...) iteration of the loop
        // we couldn't manage to lock both resources, so let's wait 50ms, then try again
        LARGE_INTEGER interval = {0};
        interval.QuadPart = -50 * 10000LL;  // 50 ms
        KeDelayExecutionThread(KernelMode, FALSE, &interval);
      }
      KeEnterCriticalRegion();
      if (ExAcquireResourceExclusiveLite(fcbResource, TRUE))
      {
        if ((!fcbPagingIoResource) || (ExAcquireResourceExclusiveLite(fcbPagingIoResource, FALSE)))
        {
          // we managed to lock both resources, so we can flush now
          result = TRUE;
          if (FileObject->SectionObjectPointer)
          {
            // there's actually something we can flush
            DbgPrintEx(DPFLTR_IHVDRIVER_ID, DPFLTR_ERROR_LEVEL, "before flush ImageSectionObject: %p, DataSectionObject: %p", FileObject->SectionObjectPointer->ImageSectionObject, FileObject->SectionObjectPointer->DataSectionObject);
            if (FileObject->SectionObjectPointer->ImageSectionObject)
              MmFlushImageSection(FileObject->SectionObjectPointer, MmFlushForWrite);
            if (FileObject->SectionObjectPointer->DataSectionObject)
              CcPurgeCacheSection(FileObject->SectionObjectPointer, NULL, 0, FALSE);
            DbgPrintEx(DPFLTR_IHVDRIVER_ID, DPFLTR_ERROR_LEVEL, "after flush ImageSectionObject: %p, DataSectionObject: %p", FileObject->SectionObjectPointer->ImageSectionObject, FileObject->SectionObjectPointer->DataSectionObject);
          }
          if (fcbPagingIoResource)
            ExReleaseResourceLite(fcbPagingIoResource);
        }
        ExReleaseResourceLite(fcbResource);
      }
      KeLeaveCriticalRegion();
    }
  }
  return result;
}

Now whenever my minifilter is started or stopped, I need to flush the cache for the encrypted file, because (obviously) I don't want the OS to return the file half encrypted and half decrypted, due to outdated data in the cache.

As a first try, I opened the file both in "DriverEntry" and "Unload", and then flushed it. The flushing succeeds and is effective (the "after flush" debug prints confirm that). And it works perfectly in "Unload".

HOWEVER, and this is my key problem: Sometimes, even though flushing in DriverEntry succeeded (and I do that after filtering is already started), the EXE still gets partially encrypted data - and thus doesn't run properly. I don't really understand why this happens. It seems that the file cache is sometimes refilled after the successful flush from the original (still decrypted) file behind my minifilter's back. And yes, I also handle paging file I/O requests, so it's not that.

So as a workaround, I added an extra flush in the "PostCreate" callback. That fully solves the above problem. I never get encrypted data now.

But now I have a different problem: The flushing in PostCreate has the effect that sometimes read accesses have a few zeroed out bytes in them. I believe this happens when the PostCreate flushes the file at the same time when another thread is accessing the file through the cache.

So now I'm lost. Any ideas how to solve this mess properly?

To be more specific, the main situation where the "PostCreate" flushes create problems is if I do a "memcpy" from the EXE's resource section within the EXE file's code.

I think (but am not 100% sure) this is due to some kind of collision between the PostCreate flush and the prefetch which memcpy likely uses internally. I can workaround this issue by copying DWORDs in a loop manually, instead of using memcpy, but of course that's just a hack around the underlying problem and not a real solution.

The only right advise here is to use a file system isolation filter, but I have a feeling this would be out of budget for this project.

I would suggest to create a new file on the same or another file system and redirect open requests to it. Report all metadata for this file as required (size, etc). Then substitute data for paging IO requests. As this is an executable file I do not expect non cached IO for this file, which creates most problems for the type of architecture you are trying to implement. Anyway, as executable and data sections are backed by different pages, you can ignore / process loosely requests that are not related to running an executable image.

If you flush image section pages for a running process and replace them with pages with modified data, you most likely will crash a user process, so redirecting open requests to another file allows to preserve system stability. A process that was started before your driver loaded will continue to use an unmodified image. New processes will use a new image.

Thanks for your helpful comments, as always!

Yeah, I've already started looking into isolation filters today, but I've not really found any good resources about it. I don't assume there's a "simple" demo isolation filter project available anywhere for me to look at?

I've also already contacted OSR earlier today about their "IMSF" (Isolation Minifilter Solution Framework).

Do you have any thoughts on why it could happen that even after a successful and complete flush in DriverEntry, outdated (= still encrypted) data can creep back in, with my current minifilter? I don't understand how that can happen. I've logged all PreCreate calls and all Pre/PostRead failures, to double check I'm not missing or skipping anything related to the file I'm targetting, and nothing.

The only explanation I can think of is that somehow my minifilter is skipped for some read or file open requests somehow.

Ok, I've made good progress. ChatGPT DeepResearch led me to this project:

Interestingly, this project creates a shadow SOP (SectionObjectPointer), but does not create a shadow FileObject. I've now tried to use the same approach in my driver, and on a quick check, it seems to work very well, solving all my issues. More tests needed, though.

A couple questions:

  1. Is it a valid approach to allocate my own SOP structure in the PostCreate callback and assign it to TargetFileObject, but to not create a shadow version of the FileObject?

  2. Which benefit (if any) am I missing by not doing a shadow FileObject?

  3. It seems since my SOP is all NULL, caching is at least partially disabled? Though, I can see that "ImageSectionObject" becomes filled at some point, but "DataSectionObject" always seems to stay NULL. Is this a huge problem for an EXE file? If so, should I try to initialize my SOP to allow full caching? Any hints on how to do that?

  4. How to cleanup my shadow SOP properly? It's easy enough to do in my context cleanup callback. But what happens if my driver is unloaded while some FileObjects still remain open, surviving the unload of my driver and still linking to my private custom SOP?

FWIW, I've tried to solve 4) now by using a tricky approach: I'm storing the FileObject pointer in my stream context, but I'm NULLing the pointer in IRP_MJ_CLEANUP. Now if my context cleanup is called, with the FileObject pointer being non-NULL, I'm restoring the FileObject's original SOP. I'm hoping this is a valid approach? Or is there a better solution?

This is not a valid approach. The SectionObjectPointer is managed by a file system driver. Changing this pointer is an undefined behaviour.

Ok, I see, thanks.

Would it be possible to sum up the key technical concept of an isolation minifilter in a few short sentences?

For example: What to do in PreCreate, PostCreate, and how to cleanup things? I know how to handle Pre/PostRead, Pre/PostQueryInfo, PrePostFolderEnum etc, so no info needed there.

There is an article An Introduction to Standard and Isolation Minifilters – OSR

Generally, what an isolation file system filter does is as follows:

  • In PreCreate it initialises the FILE_OBJECT and completes the request w/o calling the underlying filters and file system. See IRP_MJ_CREATE handlers in any file system driver example to get an idea how FILE_OBJECT is initialised in IRP_MJ_CREATE.
  • Manages requests to this FILE_OBJECT by registering for all callbacks.
  • If required, a kernel API is used to open handles and FILE_OBJECTs to access the underlying file system.
  • In no way FILE_OBJECT initialised by the isolation filter should be passed to lower filters and file system, this will crash them.
  • IRP_MJ_CLOSE preoperation callback is the last call in FILE_OBJECT life cycle where all resources must be released.

Thanks a bunch, that's very helpful! Very good summary, short and precise!

One question, if I may: How does such an isolation filter deal with the situation that the driver gets unloaded, but some FILE_OBJECTs haven't been closed yet?

The only "solution" I can think of is for the isolation filter to keep count of open FILE_OBJECTs and simply refuse to unload if any are still open? But what do I do if the Unload() function is called with the "FLTFL_FILTER_UNLOAD_MANDATORY" flag, but there are still open FILE_OBJECTs?

I think FLTFL_REGISTRATION_DO_NOT_SUPPORT_SERVICE_STOP can help with this.

Thanks once more. :grinning:

I've modified my driver to be full isolation now. It seems to work fine, but I have run into an issue. Or maybe it's normal, I don't know.

Basically my problem is that if I reject "IRP_MJ_ACQUIRE_FOR_SECTION_SYNCHRONIZATION" requests, then copying the EXE file works fine, but I can't execute it. Which makes sense, I guess. The upside to this is, though, that I can always unload my driver and it always works fine.

Now if I confirm "IRP_MJ_ACQUIRE_FOR_SECTION_SYNCHRONIZATION", then executing the EXE file works fine. However, now my driver rarely (if ever) is willing to unload again, because the OS seemingly keeps some FILE_OBJECT open forever, or at least for a very long time. So even if I stop the EXE, IRP_MJ_CLOSE isn't coming. IRP_MJ_CLEANUP is, but not CLOSE.

Is this normal? Is there any way I can force the OS to release all resources, so I can actually unload the driver, if the EXE isn't running?

Actually they do implement, but in a different way because they are not minifilters, but actually file system drivers. That is a fake major function created by filter manager. Legacy file system drivers deal with this using FsRtlRegisterFileSystemFilterCallbacks or the obsolete FastIo dispatch table.

Regards,

--
Fernando Roberto da Silva
DriverEntry Kernel Development
https://www.driverentry.com.br

Yes, it is normal. Memory Manager and Cache Manager will keep file objects opened as long they want. Before unloading your file system driver, you must close all handles for opened files and directories, besides asking these components to purge any cache mapping they are keeping. I think there are a couple routines for that. Look at the request for deleting files in FASTFAT, in which the file system must ask the OS to purge anything before deleting a file.

Regards,

--
Fernando Roberto da Silva
DriverEntry Kernel Development
https://www.driverentry.com.br

Thanks for your reply!

I'm now flushing my SOBs in "Unload", and that seems to do the trick. My driver is now always able to unload properly.

Very happy right now... :grinning: