Word's paging io and cache io.

OSR_Community_User · November 21, 2007, 2:30am

hello all:
I am trying to do a “COPY” file filter for Microsoft word now, and I meet
some problems that I hope can get some advice from you all, thanks in
advance!

My filter process all the irp and if the fileobject of the irp refer to a
file that in a specific directory(e.g C:\sdir), I do the same thing(like
rename, delete and write) to a file with the same name but in another
directory(e.g C:\tdir). Here I only process the cache io when the irp is for
writing.

This driver works fine for some application like wordpad.exe notepad.exe,
but when I try Microsoft word, the doc file in C:\tdir has some difference
compare to the file in C:\sdir, the header of the doc file is a little bit
different and the tail of the doc file is missing for about 1K bytes.

I try to log all the write operation for the file in C:\tdir and the
C:\sdir(using Filemon for one time and print out in my driver for another
time), and I am sure I havn’t miss any cache io operation. but from Filemon
I learned there’re some paging io write to the file in C:\sdir\ that I
didn’t do the same thing for the C:\tdir\ doc file.

after this , I try to process all the cache io and paging io, and the file
under C:\tdir become the same with the file in C:\sdir.

if I understand well, I should only process cache io in this case, but this
don’t make sense in Microsoft word, can anybody tell me how can I build a
same doc file from all the irp request send by Microsoft word?

B.R
YANG Xiao

OSR_Community_User · November 26, 2007, 2:15pm

Well, you don’t mention which versions of MS word you’ve been using, so I’ll assume you are using Word 2007 (the current version.)

Honestly, I’m not sure why this would work with notepad, either, but when you work with cached I/O, you will miss all non-cached I/O.

Try writing a test program that opens a file and specifies FILE_NO_INTERMEDIATE_BUFFERING for the source and destination files and then copy them. I bet your filter skips those I/O operations as well.

Modify your test program so that it creates sections and maps them in some reasonable increment (like 64MB) and just “copies” them from one to the other by doing an RtlCopyMemory (or your favorite equivalent.) Again, you’ll likely see that you miss something.

Having test programs that isolate the problem will make your life much easier because you won’t have to deal with all the other “junk” that an application like Word is doing, but these should manifest the issue. Once you have those working, you can go back and retry Word.

Bottom line: trying to rely upon the cached I/O is not generally a good design. It’s better to work with non-cached I/O (including paging I/O) but you can go hybrid (where you detect once the file has been memory mapped, and you deal with user level non-cached I/0.)

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

OSR_Community_User · December 10, 2007, 10:12pm

hello mason and all the other guys/girls:
thanks for your help and your suggestion, I have finished my work based on
your proposal and it seem work well until know.
here I also want to share some information(also some question interval) for
the people that also doing this kind of job.

1.in the IRP Write dispatch, I only try to copy the data in NO_CACHE I/O,
and try to remember the file length when CACHE I/O coming, because NO_CACHE
I/O is always 4K sync, so you need to resize the file after you write a 4K
sync block of data.

2.in the IRP Write dispatch, if you try to call ZwWriteFile to copy data to
another file(in the same vol as the origanal one) when there’s NO_CACHE I/O
arrived, you will meet the FILE CONFLICT error, I guess this is because of
the FS have some lock when it process the NO_CACHE I/O(seems the lock is
different from each vol), you need to used a kernel thread to do the copy
job by calling ZwWriteFile.

thanks again for mason’s help and hope my experience can help other ones.

B.R
YANG Xiao

2007/11/27, xxxxx@osr.com :
>
> Well, you don’t mention which versions of MS word you’ve been using, so
> I’ll assume you are using Word 2007 (the current version.)
>
> Honestly, I’m not sure why this would work with notepad, either, but when
> you work with cached I/O, you will miss all non-cached I/O.
>
> Try writing a test program that opens a file and specifies
> FILE_NO_INTERMEDIATE_BUFFERING for the source and destination files and then
> copy them. I bet your filter skips those I/O operations as well.
>
> Modify your test program so that it creates sections and maps them in some
> reasonable increment (like 64MB) and just “copies” them from one to the
> other by doing an RtlCopyMemory (or your favorite equivalent.) Again,
> you’ll likely see that you miss something.
>
> Having test programs that isolate the problem will make your life much
> easier because you won’t have to deal with all the other “junk” that an
> application like Word is doing, but these should manifest the issue. Once
> you have those working, you can go back and retry Word.
>
> Bottom line: trying to rely upon the cached I/O is not generally a good
> design. It’s better to work with non-cached I/O (including paging I/O) but
> you can go hybrid (where you detect once the file has been memory mapped,
> and you deal with user level non-cached I/0.)
>
> Tony Mason
> Consulting Partner
> OSR Open Systems Resources, Inc.
> http://www.osr.com
>
>
> —
> NTFSD is sponsored by OSR
>
> For our schedule debugging and file system seminars
> (including our new fs mini-filter seminar) visit:
> http://www.osr.com/seminars
>
> You are currently subscribed to ntfsd as: xxxxx@gmail.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>