CLOCK_WATCHDOG_TIMEOUT processing

Hi

I encounter an BSOD with the error message show “CLOCK_WATCHDOG_TIMEOUT 0x00000101”, the document in MSDN said that " an expected clock interrupt on a secondary processor, in a multi-processor system, was not received within the allocated interval", while the system is installed in a virtual machine , so that I would disable the watchdog by adding some parameter during the reboot.
my question?1) Is there any ways to modify the “Clock interrupt time-out interval” mentioned in the MSDN document ? https://msdn.microsoft.com/en-us/library/ff557211(v=vs.85).aspx

  1. where could I disable the watchdog inside the Windows system. I suspected that there might some registry could tuning the CPU processor

At a minimum you need to share some bugcheck analysis (at least !analyze -v with the proper symbols in place) and a brief explanation of the environment (Eg. additional hardware / software) that causes the issue.

0: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

CLOCK_WATCHDOG_TIMEOUT (101)
An expected clock interrupt was not received on a secondary processor in an
MP system within the allocated interval. This indicates that the specified
processor is hung and not processing interrupts.
Arguments:
Arg1: 0000000000000009, Clock interrupt time out interval in nominal clock ticks.
Arg2: 0000000000000000, 0.
Arg3: fffff880009b9180, The PRCB address of the hung processor.
Arg4: 0000000000000001, 0.

Debugging Details:

BUGCHECK_STR: CLOCK_WATCHDOG_TIMEOUT_16_PROC

CUSTOMER_CRASH_COUNT: 1

DEFAULT_BUCKET_ID: DRIVER_FAULT_SERVER_MINIDUMP

PROCESS_NAME: System

CURRENT_IRQL: d

STACK_TEXT:
fffff880009a9628 fffff8000172c7fa : 0000000000000101 0000000000000009 0000000000000000 fffff880009b9180 : nt!KeBugCheckEx
fffff880009a9630 fffff800016df077 : 0000000000000000 fffff80000000001 000000000001312d 000000000000000c : nt! ?? ::FNODOBFM::string'+0x4e1e fffff880009a96c0 fffff8000162b1c0 : 0000000000000000 fffff880009a9870 fffff800016473c0 fffff80000000000 : nt!KeUpdateSystemTime+0x377 fffff880009a97c0 fffff800016d0e13 : 000000000655e3a8 fffff800016473c0 0000000000000000 fffff800016f4474 : hal!HalpRtcClockInterrupt+0x130 fffff880009a97f0 fffff8000170ae63 : fffff80001853e80 0000000000000001 0000000000000000 0000000000000000 : nt!KiInterruptDispatchNoLock+0x163 fffff880009a9980 fffff800016da39c : 0000000000000000 fffff880009a9ab8 0000000000000000 0000000000000000 : nt!KxFlushEntireTb+0x93 fffff880009a99c0 fffff800016975d9 : 000000000000003f 000000000000003f fffffa8022eac7d0 0000000000000040 : nt!KeFlushMultipleRangeTb+0x28c fffff880009a9a90 fffff80001697e27 : 0000000000ba3900 000000000000003f 0000000000000000 0000000000000000 : nt!MiZeroPageChain+0x14e fffff880009a9ad0 fffff8000196e456 : fffffa803b2dfb50 0000000000000080 fffffa803b2df040 cf8b4838558d48f8 : nt!MmZeroPageThread+0x83a fffff880009a9c00 fffff800016c62c6 : fffff80001853e80 fffffa803b2dfb50 fffff80001861cc0 9090909090909090 : nt!PspSystemThreadStartup+0x5a fffff880009a9c40 0000000000000000 : 0000000000000000 0000000000000000 0000000000000000 00000000`00000000 : nt!KiStartSystemThread+0x16

STACK_COMMAND: kb

SYMBOL_NAME: ANALYSIS_INCONCLUSIVE

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: Unknown_Module

IMAGE_NAME: Unknown_Image

DEBUG_FLR_IMAGE_TIMESTAMP: 0

FAILURE_BUCKET_ID: X64_CLOCK_WATCHDOG_TIMEOUT_16_PROC_ANALYSIS_INCONCLUSIVE

BUCKET_ID: X64_CLOCK_WATCHDOG_TIMEOUT_16_PROC_ANALYSIS_INCONCLUSIVE

Followup: MachineOwner

0: kd> .lastevent
Last event: Break instruction exception - code 80000003 (first/second chance not available)
debugger time: Wed Feb 22 19:13:39.017 2017 (UTC + 8:00)
0: kd> !error
Error code: (NTSTATUS) 0 (0) - STATUS_WAIT_0

xxxxx@hotmail.com wrote:

0: kd> !analyze -v

CLOCK_WATCHDOG_TIMEOUT (101)
An expected clock interrupt was not received on a secondary processor in an
MP system within the allocated interval. This indicates that the specified
processor is hung and not processing interrupts.
Arguments:
Arg1: 0000000000000009, Clock interrupt time out interval in nominal clock ticks.
Arg2: 0000000000000000, 0.
Arg3: fffff880009b9180, The PRCB address of the hung processor.
Arg4: 0000000000000001, 0.

Do you have a driver and custom hardware in this system? This error
often means that your ISR is using too much time. Time in an ISR is
strictly limited, in order to keep system performance acceptable. Any
non-trivial processing must be deferred to your DPC.

  1. Is there any ways to modify the “Clock interrupt time-out interval” mentioned in the MSDN document ?
  2. where could I disable the watchdog inside the Windows system. I suspected that there might some registry could tuning the CPU processor

You are thinking about this in the wrong way. This is not an operating
system annoyance that you need to suppress. This is a BUG in your
driver that you need to fix.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

@Tim Roberts
Thanks Tim,

Actually, I don’t have any custom drivers being installed in this system. This is a virtual machine and a clean ISO from MSFT is installed in the VM and then install some applications (no drivers)? after several hours , the BSOD is happened. However, in the same environment (same host machine), other VMs didn’t crashed. That is why I try to suppress the watchdog for debugging

Is there any passthrough device connected to this particular VM ?
What is the OS version on the guest and what is the host running ?
I vaguely remember having had similar issue few years ago with ESXi VM running Windows Server 2008 R2 SP1 and there was a MSFT hotfix but I am not able to locate that exact kb article.

Check if these help :
https://blogs.msdn.microsoft.com/virtual_pc_guy/2009/10/16/hyper-v-hotfix-for-0x00000101-clock_watchdog_timeout-on-nehalem-systems/

https://communities.vmware.com/thread/527478?start=0&tstart=0

Arg3 is the PRCB of the processor that is stuck. Try doing a !running -ti
and see if you get a call stack for that processor.

-scott
OSR
@OSRDrivers

wrote in message news:xxxxx@ntdev…

0: kd> !analyze -v
*******************************************************************************
*
*
* Bugcheck Analysis
*
*
*
*******************************************************************************

CLOCK_WATCHDOG_TIMEOUT (101)
An expected clock interrupt was not received on a secondary processor in an
MP system within the allocated interval. This indicates that the specified
processor is hung and not processing interrupts.
Arguments:
Arg1: 0000000000000009, Clock interrupt time out interval in nominal clock
ticks.
Arg2: 0000000000000000, 0.
Arg3: fffff880009b9180, The PRCB address of the hung processor.
Arg4: 0000000000000001, 0.

Debugging Details:

BUGCHECK_STR: CLOCK_WATCHDOG_TIMEOUT_16_PROC

CUSTOMER_CRASH_COUNT: 1

DEFAULT_BUCKET_ID: DRIVER_FAULT_SERVER_MINIDUMP

PROCESS_NAME: System

CURRENT_IRQL: d

STACK_TEXT:
fffff880009a9628 fffff8000172c7fa : 0000000000000101 0000000000000009
0000000000000000 fffff880009b9180 : nt!KeBugCheckEx
fffff880009a9630 fffff800016df077 : 0000000000000000 fffff80000000001
000000000001312d 000000000000000c : nt! ?? ::FNODOBFM::string'+0x4e1e fffff880009a96c0 fffff8000162b1c0 : 0000000000000000 fffff880009a9870 fffff800016473c0 fffff80000000000 : nt!KeUpdateSystemTime+0x377 fffff880009a97c0 fffff800016d0e13 : 000000000655e3a8 fffff800016473c0 0000000000000000 fffff800016f4474 : hal!HalpRtcClockInterrupt+0x130 fffff880009a97f0 fffff8000170ae63 : fffff80001853e80 0000000000000001 0000000000000000 0000000000000000 : nt!KiInterruptDispatchNoLock+0x163 fffff880009a9980 fffff800016da39c : 0000000000000000 fffff880009a9ab8 0000000000000000 0000000000000000 : nt!KxFlushEntireTb+0x93 fffff880009a99c0 fffff800016975d9 : 000000000000003f 000000000000003f fffffa8022eac7d0 0000000000000040 : nt!KeFlushMultipleRangeTb+0x28c fffff880009a9a90 fffff80001697e27 : 0000000000ba3900 000000000000003f 0000000000000000 0000000000000000 : nt!MiZeroPageChain+0x14e fffff880009a9ad0 fffff8000196e456 : fffffa803b2dfb50 0000000000000080 fffffa803b2df040 cf8b4838558d48f8 : nt!MmZeroPageThread+0x83a fffff880009a9c00 fffff800016c62c6 : fffff80001853e80 fffffa803b2dfb50 fffff80001861cc0 9090909090909090 : nt!PspSystemThreadStartup+0x5a fffff880009a9c40 0000000000000000 : 0000000000000000 0000000000000000 0000000000000000 00000000`00000000 : nt!KiStartSystemThread+0x16

STACK_COMMAND: kb

SYMBOL_NAME: ANALYSIS_INCONCLUSIVE

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: Unknown_Module

IMAGE_NAME: Unknown_Image

DEBUG_FLR_IMAGE_TIMESTAMP: 0

FAILURE_BUCKET_ID: X64_CLOCK_WATCHDOG_TIMEOUT_16_PROC_ANALYSIS_INCONCLUSIVE

BUCKET_ID: X64_CLOCK_WATCHDOG_TIMEOUT_16_PROC_ANALYSIS_INCONCLUSIVE

Followup: MachineOwner

0: kd> .lastevent
Last event: Break instruction exception - code 80000003 (first/second chance
not available)
debugger time: Wed Feb 22 19:13:39.017 2017 (UTC + 8:00)
0: kd> !error
Error code: (NTSTATUS) 0 (0) - STATUS_WAIT_0

Hi,

I am also seeing a similar crash with the following stack trace. The crash ends up freezing the host most of the time and crash dump is rarely generated.

FILE_IN_CAB: MEMORY.DMP

BUGCHECK_CODE: 101

BUGCHECK_P1: 18

BUGCHECK_P2: 0

BUGCHECK_P3: ffff8901edd4c180

BUGCHECK_P4: 4

FAULTING_PROCESSOR: 4

PROCESS_NAME: System

FAULTING_THREAD: ffffa18ab6cec780

LOCK_ADDRESS: fffff803739b68a0 – (!locks fffff803739b68a0)

Resource @ nt!PiEngineLock (0xfffff803739b68a0) Exclusively owned
Contention Count = 11
Threads: ffffa18ab6cec780-01<*>
1 total locks

PNP_TRIAGE_DATA:
Lock address : 0xfffff803739b68a0
Thread Count : 1
Thread address: 0xffffa18ab6cec780
Thread wait : 0xf9a1

STACK_TEXT:
ffff8901f088e7c8 fffff8037362187f : 0000000000000040 fffff8037388a050 ffffe4929f5331ee 0000000000000016 : hal!HalpPciReadMmConfigUlong+0x7
ffff8901f088e7d0 fffff8037362173a : 0000000000000000 ffff8901f088e940 0000000000000040 0000000000000000 : hal!HalpPCIPerformConfigAccess+0x5b
ffff8901f088e800 fffff80373621649 : 0000000000000000 0000000000000000 4010054600000000 0000000000000000 : hal!HalpPciAccessMmConfigSpace+0x82
ffff8901f088e840 fffff8037362bc9e : 0000000000000040 0000000000000000 ffffa18a00000000 0000000000000017 : hal!HalpPCIConfig+0xc1
ffff8901f088e8c0 fffff8037362bb9d : 0000000000000000 000000000000000f ffffa18a00000000 0000000000000040 : hal!HalpReadPCIConfig+0x56
ffff8901f088e910 fffff80d4422d0b0 : ffffa18a00000200 0000000000000048 ffffa18abaa87270 fffff80d00000000 : hal!HalpGetPCIData+0x6d
ffff8901f088e9e0 fffff80d4421da53 : 0000200000000200 0000000000000060 ffffa18ab4fb31c0 0000000000000000 : ACPI!PciConfigSpaceHandlerWorker+0x1a0
ffff8901f088eae0 fffff80d4421e00c : ffffa18ab4fb31c0 ffffa18ab4368a80 0000000000000000 ffffa18abb189920 : ACPI!PciConfigInternal+0x8f
ffff8901f088eb10 fffff80d4421dcdd : ffff8901f088ec28 ffff8901f088ebf0 ffffa18ab445efc8 ffff890100000000 : ACPI!IsPciBusAsyncWorker+0x30c
ffff8901f088eb90 fffff80d44296fe8 : ffffa18ab445efc8 fffff80d4423c300 0000000000000000 0000000000000000 : ACPI!IsPciBusAsync+0xb5
ffff8901f088ebc0 fffff80d44296f51 : ffffa18ab445efc8 ffffa18ab445ee00 fffff80d4423c300 0000000000000000 : ACPI!IsNsobjPciBus+0x78
ffff8901f088ec20 fffff80d4421d37f : ffffa18ab4292ab0 ffffa18ab4292a01 fffff80d4423c340 ffffa18abb189920 : ACPI!EnableDisableRegions+0xe5
ffff8901f088eca0 fffff80d442908dd : ffffa18ab3a92180 ffffa18ab5459410 0000000000000000 000000000000000f : ACPI!ACPIDetectFilterDevices+0x25f
ffff8901f088ed40 fffff80d44214363 : ffffa18ab42927c0 ffffa18ab5459410 ffffa18abaf1d628 0000000000000000 : ACPI!ACPIFilterIrpQueryDeviceRelations+0x20d
ffff8901f088edf0 fffff80d446bfd7d : 0000000000000007 ffffa18abaf1d510 0000000000000000 ffffa18ab5459418 : ACPI!ACPIDispatchIrp+0x223
ffff8901f088ee70 fffff80d446c7555 : 0000000000000000 0000000000000000 ffffa18ab5459418 ffffa18a00000000 : pci!PciCallDownIrpStack+0x7d
ffff8901f088eed0 fffff80d4469633d : ffffa18abaf1d510 ffff890100000000 0000000000000000 fffff80373765373 : pci!PciBus_QueryDeviceRelations+0x235
ffff8901f088ef50 fffff80373b2b6ed : ffffa18abaf1d510 ffff8901f088f090 ffffa18ab4572b10 0000000000000000 : pci!PciDispatchPnpPower+0xcd
ffff8901f088efb0 fffff803737618a6 : ffffa18ab4550060 0000000000000001 ffffa18ab92a0e50 ffff8901edeaa000 : nt!PnpAsynchronousCall+0xe5
ffff8901f088eff0 fffff80373b2b5ed : ffffa18ab454dd30 0000000000000000 ffffa18ab4550060 0000000000000000 : nt!PnpSendIrp+0x92
ffff8901f088f060 fffff80373b2b1e4 : ffffa18ab92a0e50 ffffa18ab454dd58 ffffa18ab454dd30 0000000000000000 : nt!PnpQueryDeviceRelations+0x51
ffff8901f088f0f0 fffff80373b4bde7 : ffffa18ab454dd30 ffff8901f088f220 0000000000000002 fffff80300000000 : nt!PipEnumerateDevice+0xc8
ffff8901f088f120 fffff80373b28d1e : ffffa18ab97bf1a0 ffff8901ef7a1078 0000000000000000 0000000000000000 : nt!PipProcessDevNodeTree+0x19f
ffff8901f088f3a0 fffff8037379a93a : ffffa10100000003 0000000000000000 0000000000000000 0000000000000006 : nt!PiProcessReenumeration+0xa6
ffff8901f088f3f0 fffff803736d8909 : ffffa18ab6cec780 fffff803739b5340 fffff80373a572c0 fffff803739c06f0 : nt!PnpDeviceActionWorker+0x166
ffff8901f088f4c0 fffff803737a47fd : ffffa18ab6cec780 0000000000000080 ffffa18ab3aaa040 ffffa18ab6cec780 : nt!ExpWorkerThread+0xe9
ffff8901f088f550 fffff803737fee96 : ffff8901edf00180 ffffa18ab6cec780 fffff803737a47bc 0000000000000246 : nt!PspSystemThreadStartup+0x41
ffff8901f088f5a0 0000000000000000 : ffff8901f0890000 ffff8901f0889000 0000000000000000 0000000000000000 : nt!KiStartSystemThread+0x16

SYMBOL_NAME: pci!PciCallDownIrpStack+7d

MODULE_NAME: pci

IMAGE_NAME: pci.sys

IMAGE_VERSION: 10.0.14393.4530

STACK_COMMAND: .process /r /p 0xffffa18ab3aaa040; .thread 0xffffa18ab6cec780 ; kb

BUCKET_ID_FUNC_OFFSET: 7d

FAILURE_BUCKET_ID: CLOCK_WATCHDOG_TIMEOUT_pci!PciCallDownIrpStack

Is it possible that your debugger is preventing normal CPU execution?

Mr. Bond’s question is a good one, especially if you are using a VM. If that’s not the case, then it looks to me like you have a defective PCI device. Do you have a custom PCI device in this system?

In my case, it’s a custom PCI device with Xilinx FPGA. It works normally on Linux though and most Windows based hosts. The crash happens only on Dell Workstations.

In your case, I assume there is no VM involved. If that’s the case, then follow Tim’s advice and check the code in any ISRs that you have

The host crashes while installing the driver. At this point the device isn’t generating any interrupts to trigger the ISR. After hard rebooting the host, the driver is not saved by windows. As per the logs the low level pci.sys driver is causing the crash during enumeration. From the stack trace, it’s hard to know which function of the driver is causing the crash.

I debugged this from a second host with kdnet and windbg. The driver loading errors out because of ancient firmware on the PCIe card and a hardware refresh causes the host to crash.
Updating the firmware fixed the issue. Thanks for all the help.

1 Like