What is happening with reads to the same file from different processes?

I have a driver that does file checking - think anti-virus scans for example. When a process reads a file, the driver intercepts IRP_MJ_READ, and if the read is from a few special processes (my scanner, other anti-virus scanners, Windows Defender, etc) they pass through immediately, otherwise a request is sent to my service first. Those requests to the service (via FltSendMessage with a timeout) might happen right there on the IRP_MJ_READ thread, or the request might get deferred and then handled in a worker thread. There are lots of protections and checks for all sorts of things (ignore the paging path, etc, etc).

That call to my service (via FltSendMessage) causes my application to open the file and scan it (thus sending in another IRP_MJ_READ, which should pass through my driver immediately).

Everything works perfectly 99.9% of the time, except that sometimes my process’ read seems to get stuck behind the first read (which may or may not have been deferred). I can tell this because my process Win32 ReadFile call will take the exact amount of time as the FltSendMessage timeout - if the timeout happens, the paused/deferred IRP_MJ_READ proceeds unhindered, and then the ReadFile finishes immediately. 99.9% of the time the ReadFile happens much, much faster than the timeout that is blocking the first IRP_MJ_READ.

I’m trying to understand what might cause my service’ read to get stuck behind the paused/deferred read. I’m guessing the I/O Manager is mixed up in this somehow. Also note that this can happen on small text files, so I wouldn’t think it involves paging.

Can anyone point me to some documentation or give me some hints what to look for?

Thanks everyone.

Oplocks?
Non cached io?

That apart this sounds pretty much like text book convoying, you just need to work what is causing it…… For myself I’d grab a dump at the hang and take a poke around.

Also note that paging io and ‘small files’ (embedded) are orthogonal. IO to a small file can provoke paging read - it’s just to the MFT (or at least it was last time I looked - admittedly probably 10 years ago …). I seem to recalls that NTFS has changed in that time and it is less obvious that that is what is going on but others will have better memory.

Thanks for the tip Rod. “Convoying” isn’t a term I’ve heard before but it definitely gives me more to research.

Hi @rod_widdowson , I’ve been investigating the idea of convoying, and also oplocks. I haven’t yet grabbed a dump (still TODO). Can you explain why non-cached I/O might cause one request to get stuck behind another request?

I could construct all sorts of contrived examples but I’d guess that filter interop might be involved.

For instance the non cached io might have been pended because of an av section (status_purge_failed) and then for some reason the release of the av section is waiting on the cached io…

I must admit I’m interested to find what the answer is…