@“Scott_Noone_(OSR)” said:
If you can’t share the dump and can’t run DebugView or ProcMon then you need to figure out a new plan of attack…Have you tried reproducing it locally? Also, are you using WPP tracing? If so you can extract your logs from the dump file. Otherwise you can try running !dbgprint on the crash dump, sometimes the last bit of the DbgPrint buffer is available. Maybe something in there useful for you.
Unfortunately it is not reproducible, it only happens on few particular machines of the customer, and even then it takes hours before it happens again, and there are no useful logs found in WPP tracing or !dbgprint.
But I am really sure that the root cause of this is related to that Resource @ 0xffffb408e35b8310 and those four threads that are using it. because:
Resource @ 0xffffb408e35b8310 Exclusively owned
Contention Count = 11
NumberOfSharedWaiters = 3
NumberOfExclusiveWaiters = 1
Threads: ffffb408e3bbc080-02<*> ffffb408de8d3080-01 ffffb408de8e4080-01 ffffb408e262b040-01<*>
Threads Waiting On Exclusive Access:
ffffb408e36d7800
- In the crash dump, the stack of one of the cores indeed ends up in thread ffffb408e3bbc080, which is a
chrome.exe
thread, and this is the one that is taking the most kernel time out of all the processes (by alot, KernelTime 00:03:28.390). its callstack ends with. :
nt!KxReleaseQueuedSpinLock+0x23
nt!ExpReleaseResourceForThreadLite+0x1f6
NTFS!NtfsReleaseFcb+0x4b
NTFS!NtfsCommonQueryInformation$fin$0+0x6c
nt!_C_specific_handler+0x18e
nt!RtlpExecuteHandlerForUnwind+0xd
nt!RtlUnwindEx+0x432
nt!_C_specific_handler+0xe2
nt!RtlpExecuteHandlerForException+0xd
nt!RtlDispatchException+0x421
nt!RtlRaiseStatus+0x4e
NTFS!NtfsRaiseStatusInternal+0x6c
NTFS!NtfsCommonQueryInformation+0xbfc
NTFS!NtfsFsdDispatchSwitch+0xcc
NTFS!NtfsFsdDispatchWait+0x40
FLTMGR!FltpQueryInformationFile+0x112
FLTMGR!FltpGetFileName+0x30c
FLTMGR!FltpGetOpenedFileName+0x19
FLTMGR!FltpCallOpenedFileNameHandler+0x2b
FLTMGR!FltpExpandFilePathWorker+0x445
FLTMGR!FltpExpandFilePath+0x1a
FLTMGR!FltpGetNormalizedFileNameWorker+0x112
FLTMGR!FltpGetNormalizedFileName+0x1a
FLTMGR!FltpCreateFileNameInformation+0x32c
FLTMGR!HandleStreamListNotSupported+0x115
FLTMGR!FltpGetFileNameInformation+0x5e7
FLTMGR!FltGetFileNameInformation+0x1b0
luafv!LuafvGenerateFileName+0x4c
FLTMGR!FltpCallOpenedFileNameHandler+0x89
FLTMGR!FltpGetNormalizedFileNameWorker+0x2f
FLTMGR!FltpGetNormalizedFileName+0x1a
FLTMGR!FltpCreateFileNameInformation+0x32c
FLTMGR!HandleStreamListNotSupported+0x113
FLTMGR!FltpGetFileNameInformation+0x5e7
FLTMGR!FltGetFileNameInformation+0x1b0
wcifs!WcGenerateFileName+0x44
FLTMGR!FltpCallOpenedFileNameHandler+0x89
FLTMGR!FltpGetNormalizedFileNameWorker+0x2f
FLTMGR!FltpGetNormalizedFileName+0x1a
FLTMGR!FltpCreateFileNameInformation+0x32c
FLTMGR!HandleStreamListNotSupported+0x115
FLTMGR!FltpGetFileNameInformation+0x5e1
FLTMGR!**FltGetFileNameInformation**+0x1b0
klfle+0x610f0 ----> kaspersky
klfle+0x57950 ----> kaspersky
FLTMGR!FltpPerformPreCallbacks+0x2ea
FLTMGR!FltpPassThroughInternal+0x88
FLTMGR!FltpCreate+0x2e1
nt!IopParseDevice+0x168f
nt!ObpLookupObjectName+0x8af
nt!ObOpenObjectByNameEx+0x1dd
nt!IopCreateFile+0x860
nt!IoCreateFileEx+0x115
FLTMGR!FltpCreateFile+0x1cd
FLTMGR!FltCreateFile+0x8d
ourdriver+0x220a ----> ourdriver
And this is for a preclose happening on a backup file that we have created in our minifilter (so we have created this file previously, not in this callstack, using a FltCreateFile. Here however we are trying to get some info about the file on disk so we have to first use FltCreateFile to open the file, and we are using share delete/read/write). Our product is a backup system that backs up some files on some particular folders.
(This is the only meaningful core call stack in the crash dump, and also note that in this time, the system was frozen, so it has to be related to this)
- Thread ffffb408e36d7800 is also a chrome.exe thread :
nt!KiSwapContext+0x76
nt!KiSwapThread+0x17d
nt!KiCommitThreadWait+0x14f
nt!ExpWaitForResource+0x2a8
nt!ExAcquireResourceExclusiveLite+0x27e
NTFS!NtfsAcquireExclusiveFcb+0x78
NTFS!NtfsAcquireFcbWithPaging+0xbd
NTFS!NtfsFindPrefixHashEntry+0xa6a
NTFS!NtfsFindStartingNode+0xa88
NTFS!NtfsCommonCreate+0x43e
NTFS!NtfsCommonCreateCallout+0x1d
nt!KxSwitchKernelStackCallout+0x27
nt!KiSwitchKernelStackContinue
nt!KiExpandKernelStackAndCalloutOnStackSegment+0x241
nt!KiExpandKernelStackAndCalloutSwitchStack+0xa6
nt!KeExpandKernelStackAndCalloutInternal+0x2f
NTFS!NtfsFsdCreate+0x23a
FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x1a6
FLTMGR!FltpCreate+0x2f5
nt!IopParseDevice+0x168f
nt!ObpLookupObjectName+0x8af
nt!ObOpenObjectByNameEx+0x1dd
nt!IopCreateFile+0x860
nt!IoCreateFileEx+0x115
FLTMGR!FltpNormalizeNameFromCache+0x1a6
FLTMGR!FltpExpandShortNames+0xc082
FLTMGR!FltpGetNormalizedDestinationFileName+0x68
FLTMGR!FltGetDestinationFileNameInformation+0x191
klfle+0x207e9 ---> kaspersky
And inspecting this one, shows that this is also related to another file in the CacheStorage of chrome.
- Both the ffffb408de8d3080 and ffffb408de8e4080 threads are related to kaspersky (avf.exe), and both of them have the following call stack:
nt!KiSwapContext+0x76
nt!KiSwapThread+0x17d
nt!KiCommitThreadWait+0x14f
nt!ExpWaitForResource+0x2a8
nt!ExAcquireResourceSharedLite+0x365
NTFS!NtfsAcquireFcbWithPaging+0x11e
NTFS!NtfsFindPrefixHashEntry+0xa6a
NTFS!NtfsFindStartingNode+0xa88
NTFS!NtfsCommonCreate+0x43e
NTFS!NtfsCommonCreateCallout+0x1d
nt!KxSwitchKernelStackCallout+0x27
nt!KiSwitchKernelStackContinue
nt!KiExpandKernelStackAndCalloutOnStackSegment+0x241
nt!KiExpandKernelStackAndCalloutSwitchStack+0xa6
nt!KeExpandKernelStackAndCalloutInternal+0x2f
NTFS!NtfsFsdCreate+0x23a
FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x1a6
FLTMGR!FltpCreate+0x2f5
Sadly, I couldn’t retrieve any file name from these threads :(, but I bet they are related to some chrome caching file too.
- Finally, the thread ffffb408e262b040’s call stack is the following, which seems not that important, and it belongs to System process:
nt!KiSwapContext+0x76
nt!KiSwapThread+0x17d
nt!KiCommitThreadWait+0x14f
nt!ExpWaitForResource+0x2a8
nt!ExAcquireResourceSharedLite+0x365
NTFS!NtfsAcquireScbForLazyWrite+0x1ed
nt!CcWriteBehindInternal+0xa5
nt!ExpWorkerThread+0xe9
nt!PspSystemThreadStartup+0x41
nt!KiStartSystemThread+0x16
And the following is the content of that resource :
dt _ERESOURCE 0xffffb408e35b8310 -r1
nt!_ERESOURCE
+0x000 SystemResourcesList : _LIST_ENTRY [ 0xffffb408`e33caa20 - 0xffffb408`e3ac1a70 ]
+0x000 Flink : 0xffffb408`e33caa20 _LIST_ENTRY [ 0xffffb408`e2fff580 - 0xffffb408`e35b8310 ]
+0x008 Blink : 0xffffb408`e3ac1a70 _LIST_ENTRY [ 0xffffb408`e35b8310 - 0xffffb408`e3b29330 ]
+0x010 OwnerTable : 0xffffb408`e3b2a1a0 _OWNER_ENTRY
+0x000 OwnerThread : 0
+0x008 IoPriorityBoosted : 0y1
+0x008 OwnerReferenced : 0y1
+0x008 IoQoSPriorityBoosted : 0y1
+0x008 OwnerCount : 0y00000000000000000000000000000 (0)
+0x008 TableSize : 7
+0x018 ActiveCount : 0n1
+0x01a Flag : 0xf86
+0x01a ReservedLowFlags : 0x86 ''
+0x01b WaiterPriority : 0xf ''
+0x020 SharedWaiters : _KWAIT_CHAIN
+0x000 Head : _SINGLE_LIST_ENTRY
+0x028 ExclusiveWaiters : 0xffffb408`dce876d0 _KEVENT
+0x000 Header : _DISPATCHER_HEADER
+0x030 OwnerEntry : _OWNER_ENTRY
+0x000 OwnerThread : 0xffffb408`e3bbc080
+0x008 IoPriorityBoosted : 0y0
+0x008 OwnerReferenced : 0y0
+0x008 IoQoSPriorityBoosted : 0y1
+0x008 OwnerCount : 0y00000000000000000000000000010 (0x2)
+0x008 TableSize : 0x14
+0x040 ActiveEntries : 1
+0x044 ContentionCount : 0xb
+0x048 NumberOfSharedWaiters : 3
+0x04c NumberOfExclusiveWaiters : 1
+0x050 Reserved2 : (null)
+0x058 Address : (null)
+0x058 CreatorBackTraceIndex : 0
+0x060 SpinLock : 0xffffdc00`7b721c20
But as I am no an expert in the NTFS and file system in kernel, I can’t put all these things together to understand what is causing this system freeze… !locks output doesn’t seem to indicate a deadlock, but it sure as hell seems like it…
How can I find more info about that resource @ 0xffffb408e35b8310? Anyone has any idea what else can search for in my crash dump to find out the root cause of this?
As a side note, lowering our minifilter altitude and making ourself go below kaspersky’s minifilter solved the issue, but this is not a optimal long term solution, I need to find the reason that this is happening…