Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results


Before Posting...

Please check out the Community Guidelines in the Announcements and Administration Category.

More Info on Driver Writing and Debugging

The free OSR Learning Library has more than 50 articles on a wide variety of topics about writing and debugging device drivers and Minifilters. From introductory level to advanced. All the articles have been recently reviewed and updated, and are written using the clear and definitive style you've come to expect from OSR over the years.

Check out The OSR Learning Library at:

best approach to make big cache

Pavel_SPavel_S Member Posts: 85

Hi folks.

I have question related to design. I need to create a driver that will process in time many files based up to certain size. Any file can be processed just once (until some expiry date).
E.g if process A.exe will open interesting file, we will process it in driver. If then process A.exe or X.exe will try to do same we should skip. After 12 days such cache can be invalidated. Now assume that cache is volatile (not survivng reboot) but I need to assume server machines that are not rebooted often.

Question: Since I need to process file just once I think I need to have sort of cache in kernel. I am affraid that if there will be dozens of files on disk (10mln etc) then such cache will take a lot of memory and wondering if such approach is correct? Suppose that my internal cache info will be small, still I think with such big amount of files in 12 days it may take a lot of space.
Can you point me some additional hints how to maybe address it more smart (shared memory?).

Thank you


  • Dejan_MaksimovicDejan_Maksimovic Member - All Emails Posts: 636
    via Email
    Memory mapping a cache file?
    You have to test what will work best for you :(
    Tuning how much will be in memory vs. in a file is every product/env's main
  • Pavel_SPavel_S Member Posts: 85
    Thank you for feedback.

    Hm I thought about it but not sure how it would work and what would be performance penalty?

    For instance suppose we need to have in cache fid of file as key and 2 DWORDs as value. Normally I would do it with rtl generic table that implements it with avl or splay tree. Or actually it could be done in many other ways. Thing is all data is in memory structure that is easy to alter (add/delete) and query.
    In case it will be in file such thing is hard to achieve as every time there is need to alter cache i need to rewrite(?) block of memory right? Even if I think i could write to this file binary representation of some tree still once one thread alter file seconf need to reread whole file to initialize tree from scratch…or maybe you thought about something different?

    Maybe you have some sample?

    Thank you
  • Dejan_MaksimovicDejan_Maksimovic Member - All Emails Posts: 636
    via Email
    An AVL tree may not be a good idea here.
    I would implement a sorted array, that is updated on each change. You do a
    simple binary search then.
    AVL tree may be faster for some uses, and definitely for general searching
    but recreation is slower than a file write (allocations are slow, for 1M
    IDs even lookaside allocation will show slowness, and definitely won't have
    1M slots on each recreate ready).

    Note that if you have a million IDs and an ID is just a file ID (128 bit,
    because otherwise it is not unique on ReFS at all), having it all in memory
    is not a big deal in terms of memory usage (16MB per million, +8MB for USN
    /file time stamp, but if it's tens of millions, then I see the problem).
    Your much bigger issue is how not to fragment memory, so some special
    suballocator for the AVL tree here is crucial.

    What does the scan do, anyway?
    I am skeptical that so many of the files need to be scanned.

    Did you consider kernel EAs? They are best for exactly for this purpose,
    like an AV scanned-files cache.
  • Pavel_SPavel_S Member Posts: 85
    Thanks for this great answer.

    Actually I cannot assume number of files to scan yet but I need to be pesimistic ;) and need to consider such scenario too.

    Your comment about sorted array is valid and I think correct too.

    However I am unsure of one thing. If I map a file to kernel wont it be accessible only for system process? Sorry if this is nooby question gonna check it soon, but I feel it will not be accessible in arbitrary theead context isnt it? If so that would mean I would need to map such thingy to every process in which context my part is happening?

    My project is about calculating special hash of files along with special token in it. Unfortunately all should happen when files are altered so i dont always know who will alter file. And i nedz to do all hashijg once per file content and such info should survive reboot.

    EA looks good, but here I have another question: if my driver will be deinstalled how to remove such EAs from files?

    Thanks again!!
  • Dejan_MaksimovicDejan_Maksimovic Member - All Emails Posts: 636
    via Email
    Hmm, good point about processes :(
    I frankly donno what happens if the view is made in the system process, I
    never tried... please lemme know :)

    EAs need to be manually deleted. They survive reboots and are very secure
    (special kernel EAs, not general EAs)
  • Pavel_SPavel_S Member Posts: 85

    Hello, I made a test.
    From within a driver I did this:

    • in driverentry I created section
    • then I started thread routine and mapped this section to SYSTEM process, after that I wrote some string to this memory
    • finally from debugger I could see following:If I am in context of System process I could make >da ADDRESS and I say beutiful string, however switching to e.g explorer.exe clearly shows there is trash. Switching back again to System - string visible

    To sum it up: it is visible to SYSTEM process only.

  • Dejan_MaksimovicDejan_Maksimovic Member - All Emails Posts: 636
    via Email
    Pitty :(
    Interlocked system queue with memory mapping then?
Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. Sign in or register to get started.

Upcoming OSR Seminars
OSR has suspended in-person seminars due to the Covid-19 outbreak. But, don't miss your training! Attend via the internet instead!
Kernel Debugging 13-17 May 2024 Live, Online
Developing Minifilters 1-5 Apr 2024 Live, Online
Internals & Software Drivers 11-15 Mar 2024 Live, Online
Writing WDF Drivers 20-24 May 2024 Live, Online