FltReadFile from file system filter driver and memory usage...

Prasad_Dabak · August 24, 2011, 5:50am

Hello,

I am calling FltReadFile from my file system filter driver as follows.

ns = FltReadFile(fltObjects->Instance,
fltObjects->FileObject,
&byteOffset,
byteCount,
buff,
FLTFL_IO_OPERATION_DO_NOT_UPDATE_BYTE_OFFSET,
&bytesRead,
NULL,
NULL);

I notice that, when my driver is loaded, all the processes in the system report high memory usage. For some processes, the memory usage is 5 to 6 times higher than without my driver loaded. On some systems, the page file size is automatically increased.

I have confirmed and pretty sure that the driver doesn’t have any memory leaks. If I do either of the following, the issue goes away.

Don’t call FltReadFile at all.
Call FltReadFile with additional FLTFL_IO_OPERATION_NON_CACHED flag.

Any idea why this happens and if it’s an expected behavior? If it’s expected behavior, can somebody enlighten on how to prove that it’s an expected behavior using any tools OR performance monitor etc.?

Thanks.
-Prasad

Aditya_Shrivastava · August 25, 2011, 11:51pm

Hi Prasad,

I do not think its some generic bug of API (used it many times in multiple drivers). Interesting thing is FLTFL_IO_OPERATION_NON_CACHED solving it. Now this is a fltmgr specific flag which must be telling FS to do NonCached read. So cached operation with your driver loaded causing system memory to go high (with FltReadFIle).

I would start investigating my checking;

Is this the only driver loaded in the system; I mean do you see same bug; if you simply load your driver on a clean VM?

Does you filter trying to communicate with cache manager some how by accessing field(s) inside FO which as a filter it is not suppose to check.

Prasad_Dabak · August 26, 2011, 12:31am

Hello,

Actually using FLTFL_IO_OPERATION_PAGING|FLTFL_IO_OPERATION_SYNCHRONOUS_PAGING also resolves the problem. I looked at some other filter drivers on google and most of them seem to use FLTFL_IO_OPERATION_NON_CACHED. Is that what filter driver supposed to do?

Based on further investigation, it seems like prefetching is playing some role here. If I do the following steps, the issue goes away even if I use FltReadFile the way I am using it.

Install the filter driver on a clean system.
Delete c:\windows\prefetch\NTOS* file.
Set the filter driver to manual start so that it won’t start at boot time.
Reboot the machine.
Wait till c:\windows\prefetch\NTOS* gets re-created.
Reset the filter driver to start at boot time.
Reboot the machine.

I also wrote one small hello world type program that only links to KERNEL32.DLL and looked the working set of the process. With my filter driver loaded, KERNEL32.DLL contributes about 258 pages to the working set and without my filter driver loaded, KERNEL32.DLL contributes only 41 pages. The difference of 217 pages (258 - 41) is 868K and this matches with the memory usage difference shown by task manager.

CSRSS.EXE is probably the first process that uses KERNEL32.DLL and if I look at the working set of CSRSS.EXE I get the same numbers. Hence, later on when I start my test application, the KERNEL32 loaded pages are charged to the working set of my test application and all other applications using KERNEL32 giving it boost of 868K.

Hence, it seems like, once the boot is captured in the NTOS* file without my driver loaded, the subsequent boots are served from the prefetch and hence it doesn’t trigger FltReadFile for KERNEL32 (I confirmed this from debugger) and hence the workaround I mentioned above (1-7) works.

Thanks.
-Prasad

OSR_Community_User · August 30, 2011, 3:31am

>FLTFL_IO_OPERATION_NON_CACHED

I would really use this flag if lots of reads are needed.

Windows’s Cc is not so smart, and there are really scenarios where you can pollute the cache and make the operation extremely slow.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Prasad_Dabak · August 30, 2011, 7:36am

Thanks Maxim.

There is one more observation done using VMMap tool. When the memory shoots up, VMMap shows most of the extra memory attributed towards “Page table” memory type e.g. Here are results for some of the processes for “Page table” memory type with and without memory shoot-up.

Process Name Normal Usage Abnormal Usage.
explorer.exe 260K 32712K
winlogon.exe 236K 29524K
services.exe 136K 7040K
smss.exe 48K 896K
hello.exe 56K 1940K

I am wondering, why should filter driver calling FltReadFile shoot up memory usage (Page table memory type) for all the processes in the system? If my filter doesn’t call FltReadFile OR uses non-cached/paging I/O the issue goes away.

Do you have any thoughts on this?

Thanks.
-Prasad

OSR_Community_User · August 30, 2011, 11:04am

Are you opening all exe and dll files using your FltCreateFile?

Then, for each file, the cache map and segment/prototype PTE table is created, the cache is maybe read-ahead and thus also uses some data pages.

I would just plain use noncached mode and forget the issue. The cache is good only if you have lots of small reads, many of them hitting the same 4K page. If you read the file in a chunks of >PAGE_SIZE, and do this sequentially only, so no pages are read several times - then I would suggest to only go noncached.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

wrote in message news:xxxxx@ntfsd…
> Thanks Maxim.
>
> There is one more observation done using VMMap tool. When the memory shoots up, VMMap shows most of the extra memory attributed towards “Page table” memory type e.g. Here are results for some of the processes for “Page table” memory type with and without memory shoot-up.
>
> Process Name Normal Usage Abnormal Usage.
> explorer.exe 260K 32712K
> winlogon.exe 236K 29524K
> services.exe 136K 7040K
> smss.exe 48K 896K
> hello.exe 56K 1940K
>
> I am wondering, why should filter driver calling FltReadFile shoot up memory usage (Page table memory type) for all the processes in the system? If my filter doesn’t call FltReadFile OR uses non-cached/paging I/O the issue goes away.
>
> Do you have any thoughts on this?
>
> Thanks.
> -Prasad
>
>
>

Prasad_Dabak · August 30, 2011, 11:48am

Thanks Maxim for your response. Yes, I am reading 128K bytes/filesize bytes whichever is lower for all the files. Reading lesser bytes viz. 4K doesn’t make much difference to the memory usage.

Last few questions in case you can answer.

Why should this result in consuming private memory (of page table type) in all processes? Are you saying that the data read is automatically mapped in address space of all processes with unique page tables pointing to same data blocks? The reason I ask is, even if unload my filter driver, the subsequently started processes also consume high memory.
Why does the workaround of recreating the prefetch file without the driver traces solve the problem?
The workaround works when driver is configured as system start. However, if I switch the driver back to boot start, the issue resurfaces again?

Thanks.
-Prasad

Prasad_Dabak · August 30, 2011, 11:59am

Sorry, one more question is: If I used non-cached reads, can there be any cache consistency issues? e.g. If somebody is doing paged write, I may not get up-to-date data on non cached FltReadFile? OR is this taken care off by flushing the cache first?

Thanks.
-Prasad

Alex_Carp · August 30, 2011, 1:00pm

The file system has a cache coherency protocol. Nevertheless, what if data
is written to the file immediately after FltReadFile returns a buffer to you
? What do you do then ?

Thanks,
Alex.

Prasad_Dabak · August 30, 2011, 1:15pm

Hi Alex,

My filter driver would know about the subsequent write and I will read up-to-date data during last cleanup in that case.

Basically I am triggering FltReadFile from PostOpCreate when I haven’t seen the file ever before and in last PreOpCleanup when I notice that the file is modified since I last looked at it last.

Thanks.
-Prasad

Alex_Carp · August 30, 2011, 1:28pm

I guess I’m not clear why it would only know about the subsequent paging
write and not the original paging write in your previous question?

I’ve not read the whole thread so I might wrong but I’m thinking that by
reading the data for executables using cached reads you will initialize the
SharedCacheMap member of the SECTION_OBJECT_POINTERS associated with the
stream and that seems to persist for a long time after you’re done with the
file. Moreover, that memory will be counted against every process where that
binary is mapped into which would exhibit symptoms similar with what you’re
describing.

Thanks,
Alex.

Prasad_Dabak · August 30, 2011, 1:43pm

Hi Alex,

Yes, it would know about original paging write as well as subsequent paging write. The dirty is maintained in the stream context which is set in the PostOpWrite.

I may be wrong here, but, I am worried about the case where the dirty bit is set and hence I attempt to read the file, however since I am doing a non-cached read, I won’t get up-to-date data and I will clear the dirty bit which means that I may miss the modified data. Hope that makes things clear?

The hello.exe mentioned in my earlier VMMap results is a simple win32 console application that just prints “Hello world” on screen and only links to KERNEL32.DLL. The memory type “Page table” usage for hello.exe is 1940K (abnormal usage) vs. 56K (normal usage). That’s huge difference. I only read initial 128K bytes of the file. Even if there is read ahead, the memory usage that we talking about is not for the file data but for the page tables that map to the file data that is read.

Thanks.
-Prasad

Alex_Carp · August 30, 2011, 1:54pm

IIRC NTFS will do a coherency flush in the case you’ve described (page is
dirty and you’re trying to read non-cached).

I’m sorry but I don’t know about your memory usage.

Thanks,
Alex.

OSR_Community_User · August 30, 2011, 11:07pm

In addition, memory mapped files and non-cached I/O are not coherent on all versions of Windows. This is a known, documented restriction.

Tony
OSR

Hope to see everyone September 19 for the next Developing File Systems for Windows seminar (http://www.osr.com/fsd.html)

OSR_Community_User · August 30, 2011, 11:08pm

Coherency flushing is only done starting with Windows 7 and it is possible (under narrow circumstances) that it will fail, in which case you might lose coherency.

Tony
OSR

Hope to see everyone September 19 for the next Developing File Systems for Windows seminar (http://www.osr.com/fsd.html)

Prasad_Dabak · August 31, 2011, 10:10am

Thanks Tony for your inputs.

Do you have any thoughts on the memory usage? I am completely stumped here. Consuming 1940K vs 56K worth of page table memory type is insane for a hello world application only linking to KERNEL32.DLL unless vmmap is misinterpreting the memory type.

I ran “wsle 7” command on hello.exe when it was hogging insane memory and the working set had lots of NTDLL.DLL pages. I put a breakpoint on KiThreadStartup and when it was initially triggered in the context of hello.exe, that time also working set had lot of NTDLL.DLL pages. But the wsle command output didn’t showed any virtual addresses belonging to page tables (>C0000000). Hence I am not sure if vmmap is showing correct information?

Here is the stripped wsle 7 output.

kd> !wsle 7

Working Set @ c0881000
FirstFree 298 FirstDynamic a
LastEntry 2c1 NextSlot 7 LastInitialized 4c1
NonDirect 7 HashTable c0a82000 HashTableSize 200

Reading the WSLE data …

Virtual Address Age Locked ReferenceCount
c0600203 0 1 1
c0601203 0 1 1
c0602203 0 1 1
c0603203 0 1 1
c0604203 0 1 1
c0880203 0 1 1
c0881203 0 1 1
c0882203 0 1 1
c0605203 0 1 1
c0a82203 0 1 1
30021 0 0 1
14221 0 0 1
13221 0 0 1
12221 0 0 1
11221 0 0 1
1d221 0 0 1
…
…
…
33221 0 0 1
32021 0 0 1
21021 0 0 1
10021 0 0 1
c03ff201 0 0 1
7ffe0209 0 0 1
c03e4201 0 0 1
7c800201 0 0 1
7c801201 0 0 1
7c802201 0 0 1
7c803201 0 0 1
…
…
…

Thanks.
-Prasad

Prasad_Dabak · August 31, 2011, 10:39am

I mean, there are few page table pages in the working set. However, nothing to extent of 1940K and these are present even when the issue doesn’t reproduce.

Thanks.
-Prasad

Prasad_Dabak · September 29, 2011, 7:49am

Hello,

This issue is still a mystery to me. Just to mention the scale of memory shoot-up: I was working on a 32-bit Windows XP SP3 VM with 512MB of RAM with no virtual memory configured. When the filter driver is not loaded the usage is around 100MB. With driver loaded and doing cached reads from PostOpCreate was taking this usage to around 400MB. Starting cmd.exe instance was taking another 20MB against 2MB which it takes otherwise and I wasn’t able to start more than 5 instances of cmd.exe (20 X 5 = 100MB). So it indeed had a real impact and not just an accounting issue. There are definitely no explicit memory leaks in the filter driver.

As I mentioned earlier, doing a non-cached reads was solving the issue. However, I believe that it would have had performance impact. I finally used a mixed approach and it seems to work fine.

One of the observation was, for a given file, when PostOpCreate is called for the first time, I see that SECTION_OBJECT_POINTERS structure is initialized and the FILE_OBJECT has a pointer to it, however all the members of the structure viz. DataSectionObject, SharedCacheMap, ImageSectionObject are NULL. As soon as I do a cached read, all the members are filled up.

I tried putting condition such that do a non-cached read only if both DataSectionObject, ImageSectionObject pointers are NULL and do cached reads otherwise. This did not help.

Then, I put the condition such that do a non-cached read only if SharedCacheMap is NULL and do cached read otherwise. This seems to solve the problem.

Although this seems to work, I am not convinced why it solves the issue. Anybody has any thoughts?

Thanks.
-Prasad

OSR_Community_User · September 29, 2011, 1:15pm

I’d be quite surprised if you saw the ImageSectionObject used after you performed I/O to or from a file, since that’s only used for a file being mapped as an executable image (hence the name).

The DataSectionObject is used to support memory mapping of the file. This would include applications that do memory mapping (e.g., Notepad,) as well as the cache manager.

The SharedCacheMap is used (along with the PrivateCacheMap in the File Object) by the Cache Manager to managed it’s mapped views of the given file.

So, by not doing cached I/O on a file without a shared cache map, you discouraged the cache manager from mapping the file. This, in turn, would limit the memory usage of the system; when you do cached I/O the memory will be used to back the cache. By doing non-cached I/O you only use the memory for your own buffer.

As for “solving a problem” it would seem that your problem statement is itself fundamentally flawed - the OS uses memory for caching. You seem unhappy when it does so because you are doing cached I/O to the file. While not doing cached I/O “fixes” this problem, in many scenarios it will do so at the cost of added I/O, which will have a different performance impact on the system.

Tony
OSR

OSR_Community_User · September 29, 2011, 1:26pm

That would mean the cache has as much importance as a regular memory allocation, which, from my experience, is not the case (it can grow with more available memory, but if any memory is required, the cache manager will release its own memory to support such allocations if needed).
So, he might get increased total usage of memory, but not total usage of same-importance memory by using the cache.

Also, the XP Task Manager does not show the cache use in the memory used graph. So if that graph shows 500MB are used it’s without the cache.

Enabling pool tagging and seeing which tag uses the most memory should help.

So, by not doing cached I/O on a file without a shared cache map, you discouraged the cache manager from mapping the file. This, in turn, would limit the memory usage of the system; when you do cached I/O the memory will be used to back the cache. By doing non-cached I/O you only use the memory for your own buffer.

As for “solving a problem” it would seem that your problem statement is itself fundamentally flawed - the OS uses memory for caching. You seem unhappy when it does so because you are doing cached I/O to the file. While not doing cached I/O “fixes” this problem, in many scenarios it will do so at the cost of added I/O, which will have a different performance impact on the system.

–
Kind regards, Dejan (MSN support: xxxxx@alfasp.com)
http://www.alfasp.com
File system audit, security and encryption kits.