Additional level of read cache in RAM with Volume class filter

Hi,

I’m glad to start my first discussion on this forum, hope it will be interesting for everyone.

I’m working on the implementation of a WDF filter driver which performs caching of data reads in NonPagedPool. For that, I have an LRU data structure that stores data in chunks with sizes 0x8000, when in read callback I’m receiving requests for read with a size that is not multiple to 0x8000 I’m creating a new request which is aligned with 0x8000, and send it to the lower level driver with completion routine, once request is completed I’m completing original request with the requested data and saving 0x8000 to the LRU data structure. If I’m getting a write request which intercepts with any chunk in the LRU, a written chunk is deleted.

The solution was working as expected until I enabled the page file. Once it happened I got BSOD after some time of work, BSOD is related to the verification of the CRC value of the page file.

From my investigation, I see that it is happening because a WRITE request can come and modify data on the disk during aligned READ request is in progress (it is async with WdfIoTargetFormatRequestForRead and WdfRequestSend), so when the completion routine return, actual data on the disk is different from the data returned by read completion routine, and I’m unable to invalidate this data in WRITE callback because at the time of WRITE request arrival completion routine was in progress or even not arrived yet.

So I wanted to use WdfIoTargetSendReadSynchronously but from what I see it is only for PASSIVE_LEVEL and I do not want to move all requests to the disk in the queue which will be processed by the thread on the PASSIVE_LEVEL.

Maybe other options exist?

The purpose of such caching is to reduce network load on the systems without HD or SDD which are booted from the image on the central computer over the network.

Thanks.

So I wanted to use WdfIoTargetSendReadSynchronously but from what I see it is only for PASSIVE_LEVEL…

Yes, because you cannot block above PASSIVE_LEVEL. However, it’s very easy just to send the read asynchronously, and do the stuff you WOULD have done after your call into the completion routine.

HOWEVER, you need to know that Windows already has very extensive LRU file system caching that does exactly what you’re describing. Stuff written to disk is held in the cache in case someone comes along to read it later. If the system is not memory-constrained, I believe it will actually use up to half of your physical RAM as a file system cache. It seems unlikely to me that you’ll be able to improve things with your scheme.

@Tim_Roberts said:

So I wanted to use WdfIoTargetSendReadSynchronously but from what I see it is only for PASSIVE_LEVEL…

Yes, because you cannot block above PASSIVE_LEVEL. However, it’s very easy just to send the read asynchronously, and do the stuff you WOULD have done after your call into the completion routine.

HOWEVER, you need to know that Windows already has very extensive LRU file system caching that does exactly what you’re describing. Stuff written to disk is held in the cache in case someone comes along to read it later. If the system is not memory-constrained, I believe it will actually use up to half of your physical RAM as a file system cache. It seems unlikely to me that you’ll be able to improve things with your scheme.

Can you suggest where to check for how to configure windows embedded LRU?

https://learn.microsoft.com/en-us/windows-server/administration/performance-tuning/subsystem/cache-memory-management/
But really there is nothing much to configure.

1 Like

At the end of the day, I’ve configured I/O queue processing on the PASSIVE_LEVEL and do send requests to the underline device with WdfIoTargetSendReadSynchronously , the solution works fine and I’m getting expected performance increase, the only part about which developer needs to take care is to properly handle situation when I/O queue is synchronous and inside read handler you are sending synch request to the underline device if some driver will try to allocate paged out memory in the request completion routine the system can fall in the deadlock which will be hard to find, the solution which I’ve used is parallel queue with locks on the LRU