Are minifilter communications really thread safe?

Hi, I’m experiencing a weird issue in communication with my driver.

Background, I open a port with my driver and have 4 threads listening for messages in order to process several FltSendMessage calls in parallel. I’m not using completion ports. Each thread has its own buffer, overlapped struct and handle.

I call FilterGetMessage, wait until the overlapped struct’s handle is signalled, call GetOverlappedResult and, if all is ok, I process the incoming message and optionally call to FilterReplyMessage if the driver is expecting a reply.

Now the weird, I have this portion of code:

lpDriverReplyHeader = NULL;
if (lpFltMsgHeader->ReplyLength > 0) {
    lpDriverReplyHeader = (PDRIVER_REPLY_HEADER)(cReplyBuf.Get() + sizeof(FILTER_REPLY_HEADER));
}
hRes = cMessageCallback((PDRIVER_MESSAGE_HEADER)(cMsgBuf.Get() + sizeof(FILTER_MESSAGE_HEADER)), (SIZE_T)(lpFltMsgHeader->ReplyLength), lpDriverReplyHeader);
if (SUCCEEDED(hRes)) {
    //send reply if any
    if (lpFltMsgHeader->ReplyLength > 0) {
        lpFltReplyHeader->MessageId = lpFltMsgHeader->MessageId;
        lpFltReplyHeader->Status = STATUS_SUCCESS;
        hRes = FilterReplyMessage(hPort, lpFltReplyHeader, (ULONG)sizeof(FILTER_REPLY_HEADER) + lpDriverReplyHeader->MessageSize);
    }
}
....

Visual Studio debugger stops in FilterReplyMessage because lpDriverReplyHeader is NULL but it shouldn’t if lpFltMsgHeader->ReplyLength is greater than 0. Also, when I put the mouse over lpFltMsgHeader->ReplyLength the debuggers some times shows the actual value is 0 and some other times shows 0x60.

Because only the user mode code is stopped by debugger, my theory is the driver is sending other messages (it happens) and user mode part of filter manager overwriting the buffers.

I saw in an old thread here another guy having an issue too. Like if when driver sends a message, more than one FilterGetMessage gets signalled.

Are there some extra measures I should handle when I call to FilterGetMessage?

Regards,
Mauro.

What is cReplyBuf and what does cReplyBuf.Get() do?

In here:

lpDriverReplyHeader = NULL;  
 if (lpFltMsgHeader-\>ReplyLength \> 0) {  
 lpDriverReplyHeader = (PDRIVER_REPLY_HEADER)(cReplyBuf.Get() +  
sizeof(FILTER_REPLY_HEADER));  
 }  

should not go:

else  
 continue; // Or return or something of sorts  

i.e., not cMessageCallback with a NULL lpDriverReplyHeader?

cReplyBuf is locally allocated with a byte buffer of size: sizeof(FILTER_REPLY_HEADER) + (SIZE_T)(lpFltMsgHeader->ReplyLength)

Each message I send from driver has a header which contains the message type so the user-mode code knows how to deal with it and if it must send a reply or not so it is not a problem to call my cMessageCallback with lpDriverReplyHeader == NULLL if no reply is expected.

Also I ensure any message (incoming, outgoing, replies) has a maximum size of 128k.

Despite of this, let’s say my callback or any other part of the service is writing some bytes to a wrong location (although I would expect a crash elsewhere), why the contents of lpFltMsgHeader changes from time to time while the whole app is paused by the debugger!

Overwriting DATA will likely never crash. E.g. you get the wrong file
data for a copy - a crash would not occur.
Or you could be NULLing a location of a Length field - causing a path
to simply not execute.

What does cReplyBuf.Get() do exactly? Is this C or C++ code, and what
is the type of cReplyBuf?

You don’t need to worry about FltMgr synchronizing its own data (that
would caused a lot of crashes before it even went to MS QA, much less
to us).
So, I believe you have an issue in your code, overwriting data.

What is the value you get from cReplyBuf.Get() for the lpDriverReplyHeader line?

I cast cReplyBufto PFILTER_REPLY_HEADER and saved into lpFltReplyHeader. cReplyBuf is a C++ object that frees the buffer on object destruction. I pass the pointer to the byte offset after the FILTER_REPLY_HEADER struct so my callback can fill the reply. In the reply, the first DWORD is the actual size of the reply (which includes this dword)

When the crash happen, lpDriverReplyHeader was NULL and lpFltMsgHeader->ReplyLength was 0x60. The call to cMessageCall

lpFltReplyHeader->MessageId and lpFltMsgHeader->MessageId; were 1.

Here some captures I took:

https://pasteboard.co/JaWxH59.jpg
https://pasteboard.co/JaWxSD9.jpg
https://pasteboard.co/JaWxZd7.jpg
https://pasteboard.co/JaWy6JR.jpg

K, got it now.

I do not have a clue then… your forth capture shows the Watch window
with ReplyLength 0x60 in the struct, and 0x0 in the field view… which
makes no sense at all.

I do not recall the message sent to the usermode app (lpFltMsgHeader)
not being valid after FilterReplyMessage.
But it is possible the buffer is not valid after FilterReplyMessage
and thus lpFltMsgHeader->ReplyLength is not the same as it was before
the call.

Try saving the value to a local variable instead, and using that
variable after FilterReplyMessage?

Less important, and not directly related to this issue:
Just checking - when you say “locally allocated” for cReplyBuf, I hope
you don’t mean stack-allocated, but are simply using the C++ object as
a simpler way to handle memory allocations?
IMO, not using pure C for the direct communication with drivers is a
bad idea, simply because a lot of C++ presumtions can spill into the
thought process.

Hi Dejan, Both the receive and reply buffers are allocated from the heap. These, plus the overlapped handle are created in the Thread Proc and not shared. Only the overlapped struct is fully located on the stack. What you see in the locals window, is the weird because also the app is paused. The only explanation I found is the driver, which is still active, is sending messages and buffers being overwritten after the call is supposed to be completed. Or more than one events of the pending get message requests are being signaled when they shouldn’t

>

What you see in the locals window, is the weird because also the app is
paused. The only explanation I found is the driver, which is still active,
is sending messages and buffers being overwritten after the call is
supposed to be completed.

The wierd part is that the SAME field has different value, st the same time.

Try the saving to a locsl variable.

I’m creating a new code using completion ports like scanner sample. Hope I don’t suffer the same issue.