Minifilter communication performance based on scanner sample

Hello, First of all thank you for the invaluable information on this site. I have a very basic question. We have a very simple minifilter for inner use, that sometimes sends a great number of messages to userland. The communication is based on the scanner sample, through a completion port. In the sample, there is a requestCount parameter, and the user component reserves that many messages in the memory per working thread, and sends that many filterGetMessage s to the filter. I thought that the reason behind this was that when the filter sends up say a thousand messages, if there are enough pending filterGetMessages, the messages are delivered quasi instant, and then we can use the completion port to process them one by one, without blocking the filter. However, what I see is that changing this parameter to match the expected message count does not influence at all the speed of operation, as if the message from the minifilter was not delivered until popped from the completion port. Am I missing something? Thank you, Gabor

When you call FltSendMessage() from the driver, it blocks until the message has been read, and if a response is requested, until a response is received (with timeouts). You can get good performance by having multiple foreground threads waiting on the completion port so that a given kernel thread generally only has to wait as long as it takes that message to be processed, as opposed to waiting for the entire queue to be processed.