Detecting difference between file copy and file search

Doug_N · February 19, 2025, 4:37pm

I have a customer that says a competing product can tell the difference on a file server between when a client computer reads a file for a copy vs a search (searching the contents of the file). This is surely impossible because I could write a trivial program that reads a file from a server into memory, and at any point could search the memory for a string, or it could decide to write the file to disk (so now it's a copy).

But, in trying to meet his needs for maybe 80% of the time (users using Windows Explorer for copy or searching?) does Explorer, or the file copy API, do anything unusual that could be detected in a mini filter vs an application just reading the file? I'm coming up blank.

Bo_Branten · February 19, 2025, 10:28pm

Perhaps the trick is to study the output file instead! It should eventually get the same size as the input file (and also the same hash) Perhaps you can try to find this fenomen.

MBond2 · February 19, 2025, 10:33pm

For a single file there really isn't any way to know, but there may be a pattern of calls for multiple files that indicates a search is in progress

aursulis · February 19, 2025, 10:34pm

I'm not sure how this feature interacts with file servers, but recent Windows actually has in-kernel APIs for file copying - see Kernel-mode File Copy and Detecting Copy File Scenarios - Windows drivers - I suspect this might not actually work across network boundaries.

Paul_Y · February 20, 2025, 1:00pm

When you say “surely impossible“, I need to ask what assumptions you might be making. What is the probability of success? It is very difficult to do this with 100% accuracy, which no doubt makes you think it impossible.

The real question, I think, is whether some specific scenarios that have a high probability of occurring and whether those scenarios have a high enough probability make the effort worth it.

For example, your counter-example of random code. However, the more common cases would be using either APIs or command line tools. Walking the stack might allow some degree of certainty for the API case, and looking at process information could resolve the command line case. These might detect a high enough percent of cases to make it worth the effort to implement.

Accessing the user mode stack from kernel mode — in case the code in question is, in fact, kernel mode — adds a bit of complexity, but I suspect even that is doable

rod_widdowson · February 20, 2025, 3:02pm

I guess it depends whether you need a marketing or an engineering solution....

But I'd spend some time looking at IO_COPY_CHUNK I have absolutely no experience of it and it might be on the wrong machine (client, not server) but it also might be interesting

Doug_N · February 20, 2025, 9:43pm

They only have access to the server, not the client. I do already keep track of copied ranges so I can tell if the entire file was accessed, but a search could (should?) access the whole file to I would think.

Doug_N · February 20, 2025, 9:44pm

That's an interesting idea I hadn't considered. Thanks!

Doug_N · February 20, 2025, 9:45pm

I wasn't aware of that, so time to dig in. Thanks for the tip.

Doug_N · February 20, 2025, 9:47pm

I guess I think it's impossible to be 100% accurate, so I'm taking the approach you recommend by looking for how to do it in probable scenarios.

The copy is initiated on a client computer, and I can only see the server side, so APIs and stack walking won't help in this case, but maybe spying on the client process (Explorer.exe) during my investigations will give me some hints.

Doug_N · February 20, 2025, 9:47pm

I'm looking for an engineering solution I'm reading about IO_COPY_CHUNK. Thanks for the hint.

MBond2 · February 20, 2025, 11:04pm

as you already know, there is no 100% proof positive way to determine the intent of an application when it opens a file handle. All you know for sure is that read access was requested and is allowed

The engineering solution to a situation like this is to use an heuristic. For a single handle open, and set of reads on that file, nothing can really be inferred. Both copy and search probably read the data sequentially, so if the IO pattern is not like that you can probably rule out either. Possibly the search application takes longer between reads, but that is hardly certain depending on the algorithm. Copying a single file will have a long delay between opening one handle and a handle for another file in the same directory, but copying a whole directory won't. Searching from windows explorer can't be done within a single file, so presumably it must open a bunch of handles.

Obviously the same considerations apply to recursion into sub-directories.

And these behaviours are subject to change between versions of windows explorer / other applications that do copying or searching.

So without giving any answer that you can use directly, by suggestion is to monitor and test. And then see what conclusions you can draw from what you see

system · May 21, 2025, 11:05pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.