REFERENCE_BY_POINTER BSOD Windows 8

Hi all,

We have a mini-filter that sends certain filesystem events using FltSendMessage to user mode component. We have been seeing hard to reproduce REFERENCE_BY_POINTER BSOD on Windows 8. The crash dump points to FltSendMessage, specifically to the code that references object using handle of ClientPort*.

a) The crash** occurs when stopping the user mode component which is a service (net stop service_name)
b) During port disconnect callback (registered during FltCreateCommunicationPort), our driver closes the ClientPort using FltCloseClientPort.
c) The driver code that uses FltSendMessage does NOT synchronize ClientPort (i.e. no exclusive access)
d) When I look at disassembly of FltSendMessage (of FltMgr.sys), I see this:
KeEnterCriticalSection

ObReferenceObjectByHandle(ClientPort)

KeEnterCriticalSection should raise the IRQL of the executing thread to APC_LEVEL. So my question is: since the port connect/disconnect callbacks are invoked at PASSIVE_LEVEL is it possible that closing of FltCloseClientPort in disconnect callback be in race condition with FltSendMessage?

* ClientPort -> is PFLT_PORT parameter recieved during port connect callback
** This issue was never seen in Win 7/Vista but happened once in Windows 8

> KeEnterCriticalSection should raise the IRQL of the executing thread to APC_LEVEL

No, it IIRC does not, it only sets the critical region count.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

Maxim, thanks for your reply and apologies for asking the question earlier without researching properly. You were correct it just decrements the count of KernelApcDisable.

Just to clarify, for any future reference:
KeEnterCriticalRegion just decrements the value of _KTHREAD.KernelApcDisable. KeLeaveCriticalRegion increments the value of _KTHREAD.KernelApcDisable. If there are APC’s waiting to be delivered and KernelApcDisable == 0 and SpecialApcDisable == 0, the APC’s are infact delivered.

So in effect this works as mentioned in the MS documentation,
“The KeEnterCriticalRegion routine temporarily disables the delivery of normal kernel APCs; special kernel-mode APCs are still delivered.”

So going back to the original question of BSOD, this looks like a bug in our implementation, as we had stored the ClientPort in two places:

Thread I:
FltCloseClientPort(global.Filter, &global.ClientPort);

Thread II:
FltSendMessage(global.Filter, &sendMessage.ClientPort, …);

FltSendMessage, FltCloseClientPort synchronize internally using _FLT_FILTER.PortLock. FltCloseClientPort sets the second param to NULL after acquiring the lock and then invokes ZwClose(ClientPort). This ensures that even if FltSendMessage is invoked simultaneously, it still does not invoke ObRefenceObjectByHandle. I guess this is what is mentioned in the MS documentation
“To ensure that any messages sent by FltSendMessage are synchronized properly when the communication client port is being closed, FltCloseClientPort sets this variable to NULL.”

The fix should be simple i.e. invoke FltSendMessage(global.Filter, &global.ClientPort, …). Please let me know if I have misunderstood anything and many thanks for the help.