The company I work for is working on an HSM product that uses reparse
points and a filter driver to transparently retrieve that have been
locally released from long term storage. When we use this on a file
server we’re running into problems. We hold the create requests in a
pending state while we retrieve the file to local storage, and with
large files, we’re holding them too long.
The result of this is that the windows Server service is becoming
unresponsive when a client requests a file that is too large and takes
too long to restore to the server. The share becomes inaccessible from
the client that requested the large file, and other clients are unable
to mount the share. Once the file retrieval is completed, and
FltCompletePendedPostOperation () gets called, everything resumes
working. It’s not really acceptable that it stops working in the
meantime, however, and I’m looking for a way to keep things alive or
reset any timeout mechanism that may be at work. The limit here appears
to be around 10 minutes.
The first question would be is there a hard and fast limit to how long
you should hold something pending? If so, how is it enforced in the
system, and is there a direct way to reset it. Second, if we can’t
reset some timeout directly, does anybody think that reissuing the
request will keep the Server service from going unresponsive (I’m
guessing that if the problem is in the Server service, the re-requests
in the kernel won’t affect it)? Third, does anybody know of any other
reason we might be seeing this problem, we’re assuming it’s related to
the Server service, since we don’t have a problem restoring large files
locally.
~Eric
A few bits of further information:
Fileserver is Win2K3, client is WinXP
Changing the network timeouts referenced in
http://support.microsoft.com/kb/297684 has no effect. Given that the
entire service becomes unavailable, we don’t think this issue is related
at the moment. If you can think of a reason it might be, any light you
can shed on it would be appreciated.