FO_SYNCHRONOUS_IO, IRP_MJ_READ, SFU 3.0 NFS and read delays

My turn to ask for advice:

We have a file system filter that can induce high latency into i/o
requests. Delays of 1 to 30 seconds are normal for some i/o, depending
on file size, file location etc. In general, this latency occurs while
processing IRP_MJ_READ, since getting a file’s data is the most
expensive operation. In the case where the delay will occur, we also
disallow fast i/o reads on the file, including the MDL read (which is
used by SFU).

The IRP_MJ_READ request sees this delay in one of two ways. The normal
case just defers the IRP and services it on a work queue. The callers
IRP is returned with STATUS_PENDING. In the case of a synchronous IRP
(FO_SYNCHRONOUS_IO), the request stalls in the caller’s thread and is
dispatched to the FSD after the delay is incurred. I believe that
deferring an IRP on a synchronous file object is not the best behavior,
so I added this synchronous handling in hopes of fixing a problem I was
seeing with SFU 3.0 and its NFS server. However, the problem still
exists.

The problem is the NFS server seems to temporarily deadlock local UDP
traffic on the Windows 2000 server when it has a number of pending
requests. This number increases due to the relative delays in the i/o
requests. The problem is categorized by local “ping” timeouts during a
“ping localhost -n 9999” command execution. Since local IP is used by
other services, this deadlock of local UDP causes a cascading meltdown
of services on the Windows 2000 server - including increased latency of
the read requests which require local IP traffic to be serviced.

As a test case, I imposed a 20 second delay on all synchronous reads in
FILESPY and reproduced the same problem. CIFS and other file i/o
continues to work (at a very slow pace), but SFU 3.0’s NFS server causes
this UDP timeout.

Any experience in this area would be appreciated.