Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Home NTFSD

Before Posting...

Please check out the Community Guidelines in the Announcements and Administration Category.

More Info on Driver Writing and Debugging


The free OSR Learning Library has more than 50 articles on a wide variety of topics about writing and debugging device drivers and Minifilters. From introductory level to advanced. All the articles have been recently reviewed and updated, and are written using the clear and definitive style you've come to expect from OSR over the years.


Check out The OSR Learning Library at: https://www.osr.com/osr-learning-library/


Possible solution for generating a unique file id for files in a minifilter for caching?

brad_Hbrad_H Member Posts: 189
edited June 2023 in NTFSD

I have a minifilter driver, where I scan every new file that hasn't been seen before by minifilter in my pre create, and collect some info regarding that file.

Currently I have a caching mechanism based on the hash of the file path, so I scan a file, collect its info and put the hash of its path in an AVL tree and the key of the AVL tree is the hash and the content is the file info that I collected. And I also rescan the file upon a successful write since the info might change, and also remove it from cache upon deletion.

Now my question is, is there a more optimal way of generating a unique file id for files, other that the hash of the path?

Comments

  • rod_widdowsonrod_widdowson Member - All Emails Posts: 1,266

    REFS/NTFS only? If so why not use the FileId or the ObjectId

  • brad_Hbrad_H Member Posts: 189
    edited June 2023

    @rod_widdowson said:
    REFS/NTFS only? If so why not use the FileId or the ObjectId

    Yes I'm fine with supporting only REFS/NTFS, but I assume if the file system is not REFS or NTFS, then FltQueryInformationFile would just fail right? I just want to make sure it wouldn't be returning junk id or the same id for every file.

    And what is the difference between FileId and ObjectId? Which class in FltQueryInformationFile should I use to find the id?

  • rod_widdowsonrod_widdowson Member - All Emails Posts: 1,266

    Its probably best to read the doc

    FileId

    ObjectId

    The doc says NTFS only but I have a query in about that with the doc team.

  • brad_Hbrad_H Member Posts: 189
    edited June 2023

    @rod_widdowson said:
    Its probably best to read the doc

    FileId

    ObjectId

    The doc says NTFS only but I have a query in about that with the doc team.

    One thing that raises concern in the fileid doc is this:

    "File reference numbers, also called file IDs, are guaranteed to be unique only within a static file system. They are not guaranteed to be unique over time, because file systems are free to reuse them. Nor are they guaranteed to remain constant."

    So if by static file system, they mean a file system that never gets written on (which means no system in the real world), then doesn't this mean that this fileid isnt unique at all?

    When will the fileid change? If it changes when a file is written to, or when a machine reboots, I'm fine with it. But fileid changing randomly over time on a live machine (without the content of the file changing) makes it useless! So am I understanding this right?

    And considering that there is no such sentence in the doc of file object id, does this mean that file object id is more unique and a better choice in my scenario? (If it is in fact supported in REFS). This has nothing to do with the FILE_OBJECT tho right? Because to my understanding that is per file handle.

  • rod_widdowsonrod_widdowson Member - All Emails Posts: 1,266

    By observation only the FileId in NTFS consists of a number (possibly an index into the file table) and a sequence number. This is what ODS2 did 20+ years earlier it and there is good reason to assume that the design of one was influenced by its earlier antecedent). Obviouesly sequence numbers can and do wrap but.....

    I wouldn't like to make any definitive statement about the ObjectId but there are others who watch this space and who could.

    And you are correct: this has nothing to do with a FILE_OBJECT. Kernel OBJECTS (of which FILE_OBJECT is just one example) are ephemeral, but referenced, in-memory-only things used to manipulate things (files in the case of FILE_OBJECTS but there are OBJECTS for threads or processes or security structures)

  • brad_Hbrad_H Member Posts: 189
    edited June 2023

    @rod_widdowson said:
    By observation only the FileId in NTFS consists of a number (possibly an index into the file table) and a sequence number. This is what ODS2 did 20+ years earlier it and there is good reason to assume that the design of one was influenced by its earlier antecedent). Obviouesly sequence numbers can and do wrap but.....

    I wouldn't like to make any definitive statement about the ObjectId but there are others who watch this space and who could.

    And you are correct: this has nothing to do with a FILE_OBJECT. Kernel OBJECTS (of which FILE_OBJECT is just one example) are ephemeral, but referenced, in-memory-only things used to manipulate things (files in the case of FILE_OBJECTS but there are OBJECTS for threads or processes or security structures)

    Okay so assuming we have the following code:

        NTSTATUS status = FltQueryInformationFile(
            FltObjects->Instance,
            fileObject,
            &fileInternalInfo,
            sizeof(fileInternalInfo),
            FileInternalInformation,
            NULL);
    
          LONGLONG fileId = fileInternalInfo.IndexNumber.QuadPart;
    
    

    And I use this FileId as a key in the AVL tree. And I remove the file from cache when a file is written to, or it gets deleted/renamed/moved.

    Will I face any issues? I assume that the reuse of a fileid only happens when the file is deleted or modified, which means that I am fine. Right?

  • Dejan_MaksimovicDejan_Maksimovic Member - All Emails Posts: 636
    via Email
    FileId is static as long as the file is not deleted, yes.
    That call won't work for ReFS, because ReFS uses 16-byte IDs, in the
    FOLDER_ID SUBFILE_ID format. So if you query internal ID only, it will
    match so often it would look like the same file has more copies than files
    on the C drive :)
    You need to query the 128-bit object ID on ReFS instead.

    You can use USN to track if a file has changed (note that it won't change
    in case of memory mapping until data is flushed)

    Regards, Dejan
  • brad_Hbrad_H Member Posts: 189

    @Dejan_Maksimovic said:
    FileId is static as long as the file is not deleted, yes.
    That call won't work for ReFS, because ReFS uses 16-byte IDs, in the
    FOLDER_ID SUBFILE_ID format. So if you query internal ID only, it will
    match so often it would look like the same file has more copies than files
    on the C drive :)
    You need to query the 128-bit object ID on ReFS instead.

    You can use USN to track if a file has changed (note that it won't change
    in case of memory mapping until data is flushed)

    Regards, Dejan

    So should I use objectid (I assume by object id you meant FILE_OBJECTID_INFORMATION) instead of fileid in order to support both the NTFS and REFS? Will using objectid have any drawback compared to fileid as a key to AVL tree?

    Also why would I need to use the USN (update sequence number, right?) to track file changes when I have a minifilter? Can't I just track if a file that I have cached has changes by monitoring write and setinfo callbacks?

  • Dejan_MaksimovicDejan_Maksimovic Member - All Emails Posts: 636
    via Email
    Yes, that object id. But I don't recall if it works on NTFS, so please
    check and use the InternalInformation on NTFS if needed (and let us know).

    Just mentioned USN. Also minifilters below you can change the file without
    you knowing it.
  • brad_Hbrad_H Member Posts: 189

    So based on my research, the most optimal way of achieving what I'm asking is to do it the same way as the Microsoft's avscan sample :
    https://github.com/microsoft/Windows-driver-samples/tree/main/filesys/miniFilter/avscan

    Which uses fileid, and it works both in REFS and NTFS.

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. Sign in or register to get started.

Upcoming OSR Seminars
OSR has suspended in-person seminars due to the Covid-19 outbreak. But, don't miss your training! Attend via the internet instead!
Kernel Debugging 13-17 May 2024 Live, Online
Developing Minifilters 1-5 Apr 2024 Live, Online
Internals & Software Drivers 11-15 Mar 2024 Live, Online
Writing WDF Drivers 20-24 May 2024 Live, Online