Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results
The free OSR Learning Library has more than 50 articles on a wide variety of topics about writing and debugging device drivers and Minifilters. From introductory level to advanced. All the articles have been recently reviewed and updated, and are written using the clear and definitive style you've come to expect from OSR over the years.
Check out The OSR Learning Library at: https://www.osr.com/osr-learning-library/
Hi,
We have a NDIS LWF driver, and on a single machine we got a DPC_WATCHDOG_VIOLATION 133/1 bugcheck. This could be related to our NdisFIndicateReceiveNetBufferLists, as the IRQL is raised to DISPATCH before calling it (and obviously lowered to whatever it was afterward). This is done due to a workaround for some bug in a Microsoft driver in older windows 10 versions.
Now this is the bugcheck:
******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* DPC_WATCHDOG_VIOLATION (133) The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL or above. Arguments: Arg1: 0000000000000001, The system cumulatively spent an extended period of time at DISPATCH_LEVEL or above. The offending component can usually be identified with a stack trace. Arg2: 0000000000001e00, The watchdog period. Arg3: fffff807422fb320, cast to nt!DPC_WATCHDOG_GLOBAL_TRIAGE_BLOCK, which contains additional information regarding the cumulative timeout Arg4: 0000000000000000
STACK_TEXT: nt!KeBugCheckEx nt!KeAccumulateTicks+0x1846b2 nt!KiUpdateRunTime+0x5d nt!KiUpdateTime+0x4a1 nt!KeClockInterruptNotify+0x2e3 nt!HalpTimerClockInterrupt+0xe2 nt!KiCallInterruptServiceRoutine+0xa5 nt!KiInterruptSubDispatchNoLockNoEtw+0xfa nt!KiInterruptDispatchNoLockNoEtw+0x37 nt!KxWaitForSpinLockAndAcquire+0x2c nt!KeAcquireSpinLockAtDpcLevel+0x5c wanarp!WanNdisReceivePackets+0x4bb ndis!ndisMIndicateNetBufferListsToOpen+0x141 ndis!ndisMTopReceiveNetBufferLists+0x3f0e4 ndis!ndisCallReceiveHandler+0x61 ndis!ndisInvokeNextReceiveHandler+0x1df ndis!NdisMIndicateReceiveNetBufferLists+0x104 ndiswan!IndicateRecvPacket+0x596 ndiswan!ApplyQoSAndIndicateRecvPacket+0x20b ndiswan!ProcessPPPFrame+0x16f ndiswan!ReceivePPP+0xb3 ndiswan!ProtoCoReceiveNetBufferListChain+0x442 ndis!ndisMCoIndicateReceiveNetBufferListsToNetBufferLists+0xf6 ndis!NdisMCoIndicateReceiveNetBufferLists+0x11 raspptp!CallIndicateReceived+0x210 raspptp!CallProcessRxNBLs+0x199 ndis!ndisDispatchIoWorkItem+0x12 nt!IopProcessWorkItem+0x135 nt!ExpWorkerThread+0x105 nt!PspSystemThreadStartup+0x55 nt!KiStartSystemThread+0x28 SYMBOL_NAME: wanarp!WanNdisReceivePackets+4bb FOLLOWUP_NAME: MachineOwner MODULE_NAME: wanarp IMAGE_NAME: wanarp.sys
And this following is the output of !dpcwatchdog, but as I can't find what is causing this BSOD by looking at its output as I don't have much experience with a DPC_WATCHDOG_VIOLATION bugcheck. Altough I think this could be related to some spin locking done by wanarp? Could this be a bug with wanarp? Note that we don't use any spinlocking in our driver, and us raising the IRQL should not cause any issue, as it is actually very common for indication in Ndis to be done at IRQL DISPATCH.
So How can I find the root cause of this bugcheck? There are no other third party LWF in the ndis stack.
3: kd> !dpcwatchdog All durations are in seconds (1 System tick = 15.625000 milliseconds) Circular Kernel Context Logger history: !logdump 0x2 DPC and ISR stats: !intstats /d -------------------------------------------------- CPU#0 -------------------------------------------------- Current DPC: No Active DPC Pending DPCs: ---------------------------------------- CPU Type KDPC Function dpcs: no pending DPCs found -------------------------------------------------- CPU#1 -------------------------------------------------- Current DPC: No Active DPC Pending DPCs: ---------------------------------------- CPU Type KDPC Function 1: Normal : 0xfffff80542220e00 0xfffff805418dbf10 nt!PpmCheckPeriodicStart 1: Normal : 0xfffff80542231d40 0xfffff8054192c730 nt!KiBalanceSetManagerDeferredRoutine 1: Normal : 0xffffbd0146590868 0xfffff80541953200 nt!KiEntropyDpcRoutine DPC Watchdog Captures Analysis for CPU #1. DPC Watchdog capture size: 641 stacks. Number of unique stacks: 1. No common functions detected! The captured stacks seem to indicate that only a single DPC or generic function is the culprit. Try to analyse what other processors were doing at the time of the following reference capture: CPU #1 DPC Watchdog Reference Stack (#0 of 641) - Time: 16 Min 17 Sec 984.38 mSec # RetAddr Call Site 00 fffff805418d8991 nt!KiUpdateRunTime+0x5D 01 fffff805418d2803 nt!KiUpdateTime+0x4A1 02 fffff805418db1c2 nt!KeClockInterruptNotify+0x2E3 03 fffff80541808a45 nt!HalpTimerClockInterrupt+0xE2 04 fffff805419fab9a nt!KiCallInterruptServiceRoutine+0xA5 05 fffff805419fb107 nt!KiInterruptSubDispatchNoLockNoEtw+0xFA 06 fffff805418a9a9c nt!KiInterruptDispatchNoLockNoEtw+0x37 07 fffff805418da3cc nt!KxWaitForSpinLockAndAcquire+0x2C 08 fffff8054fa614cb nt!KeAcquireSpinLockAtDpcLevel+0x5C 09 fffff80546ba1eb1 wanarp!WanNdisReceivePackets+0x4BB 0a fffff80546be0b84 ndis!ndisMIndicateNetBufferListsToOpen+0x141 0b fffff80546ba7ef1 ndis!ndisMTopReceiveNetBufferLists+0x3F0E4 0c fffff80546bddfef ndis!ndisCallReceiveHandler+0x61 0d fffff80546ba4a94 ndis!ndisInvokeNextReceiveHandler+0x1DF 0e fffff8057c32d17e ndis!NdisMIndicateReceiveNetBufferLists+0x104 0f fffff8057c30d6c7 ndiswan!IndicateRecvPacket+0x596 10 fffff8057c32d56b ndiswan!ApplyQoSAndIndicateRecvPacket+0x20B 11 fffff8057c32d823 ndiswan!ProcessPPPFrame+0x16F 12 fffff8057c308e62 ndiswan!ReceivePPP+0xB3 13 fffff80546c5c006 ndiswan!ProtoCoReceiveNetBufferListChain+0x442 14 fffff80546c5c2d1 ndis!ndisMCoIndicateReceiveNetBufferListsToNetBufferLists+0xF6 15 fffff8057c2b0064 ndis!NdisMCoIndicateReceiveNetBufferLists+0x11 16 fffff8057c2b06a9 raspptp!CallIndicateReceived+0x210 17 fffff80546bd9dc2 raspptp!CallProcessRxNBLs+0x199 18 fffff80541899645 ndis!ndisDispatchIoWorkItem+0x12 19 fffff80541852b65 nt!IopProcessWorkItem+0x135 1a fffff80541871d25 nt!ExpWorkerThread+0x105 1b fffff80541a00778 nt!PspSystemThreadStartup+0x55 1c ---------------- nt!KiStartSystemThread+0x28 -------------------------------------------------- CPU#2 -------------------------------------------------- Current DPC: No Active DPC Pending DPCs: ---------------------------------------- CPU Type KDPC Function 2: Normal : 0xffffbd01467f0868 0xfffff80541953200 nt!KiEntropyDpcRoutine DPC Watchdog Captures Analysis for CPU #2. DPC Watchdog capture size: 641 stacks. Number of unique stacks: 1. No common functions detected! The captured stacks seem to indicate that only a single DPC or generic function is the culprit. Try to analyse what other processors were doing at the time of the following reference capture: CPU #2 DPC Watchdog Reference Stack (#0 of 641) - Time: 16 Min 17 Sec 984.38 mSec # RetAddr Call Site 00 fffff805418d245a nt!KeClockInterruptNotify+0x453 01 fffff80541808a45 nt!HalpTimerClockIpiRoutine+0x1A 02 fffff805419fab9a nt!KiCallInterruptServiceRoutine+0xA5 03 fffff805419fb107 nt!KiInterruptSubDispatchNoLockNoEtw+0xFA 04 fffff805418a9a9c nt!KiInterruptDispatchNoLockNoEtw+0x37 05 fffff805418a9a68 nt!KxWaitForSpinLockAndAcquire+0x2C 06 fffff8054fa611cb nt!KeAcquireSpinLockRaiseToDpc+0x88 07 fffff80546ba1eb1 wanarp!WanNdisReceivePackets+0x1BB 08 fffff80546be0b84 ndis!ndisMIndicateNetBufferListsToOpen+0x141 09 fffff80546ba7ef1 ndis!ndisMTopReceiveNetBufferLists+0x3F0E4 0a fffff80546bddfef ndis!ndisCallReceiveHandler+0x61 nra 0b fffff80546be3a81 ndis!ndisInvokeNextReceiveHandler+0x1DF 0c fffff80546ba804e ndis!ndisFilterIndicateReceiveNetBufferLists+0x3C611 0d fffff8054e384d77 ndis!NdisFIndicateReceiveNetBufferLists+0x6E 0e fffff8054e3811a9 ourdriver+0x4D70 0f fffff80546ba7d40 ourdriver+0x11A0 10 fffff8054182a6b5 ndis!ndisDummyIrpHandler+0x100 11 fffff80541c164c8 nt!IofCallDriver+0x55 12 fffff80541c162c7 nt!IopSynchronousServiceTail+0x1A8 13 fffff80541c15646 nt!IopXxxControlFile+0xC67 14 fffff80541a0aab5 nt!NtDeviceIoControlFile+0x56 15 ---------------- nt!KiSystemServiceCopyEnd+0x25 -------------------------------------------------- CPU#3 -------------------------------------------------- Current DPC: No Active DPC Pending DPCs: ---------------------------------------- CPU Type KDPC Function dpcs: no pending DPCs found
Upcoming OSR Seminars | ||
---|---|---|
OSR has suspended in-person seminars due to the Covid-19 outbreak. But, don't miss your training! Attend via the internet instead! | ||
Kernel Debugging | 30 January 2023 | Live, Online |
Developing Minifilters | 20 March 2023 | Live, Online |
Internals & Software Drivers | 17 April 2023 | Live, Online |
Writing WDF Drivers | 22 May 2023 | Live, Online |
Comments
768us is a very long time. What are you doing at dispatch for that long?
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.
Where was the 768us? I couldn't find any 768 in the output?
But overall, the only thing we are doing at dispatch is calling NdisFIndicateReceiveNetBufferLists with a list of NET_BUFFER_LIST, which is one of the most common APIs used by any NDIS LWF, and very very common for NdisFIndicateReceiveNetBufferLists to be called at dispatch.
We are raising the IRQL before NdisFIndicateReceiveNetBufferLists, because if we didn't, there was a bug in wanarp driver of Microsoft (which was recently patched in the latest updates, but the old ones are still out there) and under rare situations it would BSOD if it was called at PASSIVE. So Basically we raise IRQL to dispatch, call NdisFIndicateReceiveNetBufferLists, and then lower it to whatever it was. And we also pass the NDIS_RECEIVE_FLAGS_DISPATCH_LEVEL to the API by checking if the current IRQL is dispatch.
So how can I find what is causing this BSOD and what is taking so long? And how is "Time: 16 Min 17 Sec 984.38 mSec" calculated?
This is also the output of !locks and !qlocks and !ready, I don't see any deadlock, but the contention count on one of the resources is 49135, is this normal or too high? Could this be related to our issue? The threads that are waiting on it or own it are for normal processes such as chrome, csrss, etc.
From here:
0x1e00 100ns units is 768us.
Tim Roberts, [email protected]
Providenza & Boekelheide, Inc.
So how can I find which function is consuming the most time in DISPATCH level and find the root of this bugcheck? And considering that there are two potentially faulty thread stacks in the output of !dpcwatchdog, which of them is consuming the most time?
And where does the "16 Min 17 Sec 984.38 mSec" come from?