CLOCK_WATCHDOG_TIMEOUT processing

Podun_Yu · February 21, 2017, 6:07am

Hi

I encounter an BSOD with the error message show “CLOCK_WATCHDOG_TIMEOUT 0x00000101”, the document in MSDN said that " an expected clock interrupt on a secondary processor, in a multi-processor system, was not received within the allocated interval", while the system is installed in a virtual machine , so that I would disable the watchdog by adding some parameter during the reboot.
my question?1) Is there any ways to modify the “Clock interrupt time-out interval” mentioned in the MSDN document ? https://msdn.microsoft.com/en-us/library/ff557211(v=vs.85).aspx

where could I disable the watchdog inside the Windows system. I suspected that there might some registry could tuning the CPU processor

Anand_A · February 21, 2017, 8:04am

At a minimum you need to share some bugcheck analysis (at least !analyze -v with the proper symbols in place) and a brief explanation of the environment (Eg. additional hardware / software) that causes the issue.

Podun_Yu · February 22, 2017, 6:17am

0: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

CLOCK_WATCHDOG_TIMEOUT (101)
An expected clock interrupt was not received on a secondary processor in an
MP system within the allocated interval. This indicates that the specified
processor is hung and not processing interrupts.
Arguments:
Arg1: 0000000000000009, Clock interrupt time out interval in nominal clock ticks.
Arg2: 0000000000000000, 0.
Arg3: fffff880009b9180, The PRCB address of the hung processor.
Arg4: 0000000000000001, 0.

Debugging Details:

BUGCHECK_STR: CLOCK_WATCHDOG_TIMEOUT_16_PROC

CUSTOMER_CRASH_COUNT: 1

DEFAULT_BUCKET_ID: DRIVER_FAULT_SERVER_MINIDUMP

PROCESS_NAME: System

CURRENT_IRQL: d

STACK_TEXT:
fffff880009a9628 fffff8000172c7fa : 0000000000000101 0000000000000009 0000000000000000 fffff880009b9180 : nt!KeBugCheckEx
fffff880009a9630 fffff800016df077 : 0000000000000000 fffff80000000001 000000000001312d 000000000000000c : nt! ?? ::FNODOBFM::string'+0x4e1e fffff880009a96c0 fffff8000162b1c0 : 0000000000000000 fffff880009a9870 fffff800016473c0 fffff80000000000 : nt!KeUpdateSystemTime+0x377 fffff880009a97c0 fffff800016d0e13 : 000000000655e3a8 fffff800016473c0 0000000000000000 fffff800016f4474 : hal!HalpRtcClockInterrupt+0x130 fffff880009a97f0 fffff8000170ae63 : fffff80001853e80 0000000000000001 0000000000000000 0000000000000000 : nt!KiInterruptDispatchNoLock+0x163 fffff880009a9980 fffff800016da39c : 0000000000000000 fffff880009a9ab8 0000000000000000 0000000000000000 : nt!KxFlushEntireTb+0x93 fffff880009a99c0 fffff800016975d9 : 000000000000003f 000000000000003f fffffa8022eac7d0 0000000000000040 : nt!KeFlushMultipleRangeTb+0x28c fffff880009a9a90 fffff80001697e27 : 0000000000ba3900 000000000000003f 0000000000000000 0000000000000000 : nt!MiZeroPageChain+0x14e fffff880009a9ad0 fffff8000196e456 : fffffa803b2dfb50 0000000000000080 fffffa803b2df040 cf8b4838558d48f8 : nt!MmZeroPageThread+0x83a fffff880009a9c00 fffff800016c62c6 : fffff80001853e80 fffffa803b2dfb50 fffff80001861cc0 9090909090909090 : nt!PspSystemThreadStartup+0x5a fffff880009a9c40 0000000000000000 : 0000000000000000 0000000000000000 0000000000000000 00000000`00000000 : nt!KiStartSystemThread+0x16

STACK_COMMAND: kb

SYMBOL_NAME: ANALYSIS_INCONCLUSIVE

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: Unknown_Module

IMAGE_NAME: Unknown_Image

DEBUG_FLR_IMAGE_TIMESTAMP: 0

FAILURE_BUCKET_ID: X64_CLOCK_WATCHDOG_TIMEOUT_16_PROC_ANALYSIS_INCONCLUSIVE

BUCKET_ID: X64_CLOCK_WATCHDOG_TIMEOUT_16_PROC_ANALYSIS_INCONCLUSIVE

Followup: MachineOwner

0: kd> .lastevent
Last event: Break instruction exception - code 80000003 (first/second chance not available)
debugger time: Wed Feb 22 19:13:39.017 2017 (UTC + 8:00)
0: kd> !error
Error code: (NTSTATUS) 0 (0) - STATUS_WAIT_0

Tim_Roberts · February 22, 2017, 12:50pm

xxxxx@hotmail.com wrote:

0: kd> !analyze -v

CLOCK_WATCHDOG_TIMEOUT (101)
An expected clock interrupt was not received on a secondary processor in an
MP system within the allocated interval. This indicates that the specified
processor is hung and not processing interrupts.
Arguments:
Arg1: 0000000000000009, Clock interrupt time out interval in nominal clock ticks.
Arg2: 0000000000000000, 0.
Arg3: fffff880009b9180, The PRCB address of the hung processor.
Arg4: 0000000000000001, 0.

Do you have a driver and custom hardware in this system? This error
often means that your ISR is using too much time. Time in an ISR is
strictly limited, in order to keep system performance acceptable. Any
non-trivial processing must be deferred to your DPC.

Is there any ways to modify the “Clock interrupt time-out interval” mentioned in the MSDN document ?

where could I disable the watchdog inside the Windows system. I suspected that there might some registry could tuning the CPU processor

You are thinking about this in the wrong way. This is not an operating
system annoyance that you need to suppress. This is a BUG in your
driver that you need to fix.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Podun_Yu · February 23, 2017, 1:29am

@Tim Roberts
Thanks Tim,

Actually, I don’t have any custom drivers being installed in this system. This is a virtual machine and a clean ISO from MSFT is installed in the VM and then install some applications (no drivers)? after several hours , the BSOD is happened. However, in the same environment (same host machine), other VMs didn’t crashed. That is why I try to suppress the watchdog for debugging

Anand_A · February 23, 2017, 1:51am

Is there any passthrough device connected to this particular VM ?
What is the OS version on the guest and what is the host running ?
I vaguely remember having had similar issue few years ago with ESXi VM running Windows Server 2008 R2 SP1 and there was a MSFT hotfix but I am not able to locate that exact kb article.

Check if these help :
https://blogs.msdn.microsoft.com/virtual_pc_guy/2009/10/16/hyper-v-hotfix-for-0x00000101-clock_watchdog_timeout-on-nehalem-systems/

https://communities.vmware.com/thread/527478?start=0&tstart=0

Scott_Noone_OSR · February 23, 2017, 8:31am

Arg3 is the PRCB of the processor that is stuck. Try doing a !running -ti
and see if you get a call stack for that processor.

-scott
OSR
@OSRDrivers

wrote in message news:xxxxx@ntdev…

0: kd> !analyze -v
*******************************************************************************
*
*
* Bugcheck Analysis
*
*
*
*******************************************************************************

CLOCK_WATCHDOG_TIMEOUT (101)
An expected clock interrupt was not received on a secondary processor in an
MP system within the allocated interval. This indicates that the specified
processor is hung and not processing interrupts.
Arguments:
Arg1: 0000000000000009, Clock interrupt time out interval in nominal clock
ticks.
Arg2: 0000000000000000, 0.
Arg3: fffff880009b9180, The PRCB address of the hung processor.
Arg4: 0000000000000001, 0.

Debugging Details:

BUGCHECK_STR: CLOCK_WATCHDOG_TIMEOUT_16_PROC

CUSTOMER_CRASH_COUNT: 1

DEFAULT_BUCKET_ID: DRIVER_FAULT_SERVER_MINIDUMP

PROCESS_NAME: System

CURRENT_IRQL: d

STACK_TEXT:
fffff880009a9628 fffff8000172c7fa : 0000000000000101 0000000000000009
0000000000000000 fffff880009b9180 : nt!KeBugCheckEx
fffff880009a9630 fffff800016df077 : 0000000000000000 fffff80000000001
000000000001312d 000000000000000c : nt! ?? ::FNODOBFM::string'+0x4e1e fffff880009a96c0 fffff8000162b1c0 : 0000000000000000 fffff880009a9870 fffff800016473c0 fffff80000000000 : nt!KeUpdateSystemTime+0x377 fffff880009a97c0 fffff800016d0e13 : 000000000655e3a8 fffff800016473c0 0000000000000000 fffff800016f4474 : hal!HalpRtcClockInterrupt+0x130 fffff880009a97f0 fffff8000170ae63 : fffff80001853e80 0000000000000001 0000000000000000 0000000000000000 : nt!KiInterruptDispatchNoLock+0x163 fffff880009a9980 fffff800016da39c : 0000000000000000 fffff880009a9ab8 0000000000000000 0000000000000000 : nt!KxFlushEntireTb+0x93 fffff880009a99c0 fffff800016975d9 : 000000000000003f 000000000000003f fffffa8022eac7d0 0000000000000040 : nt!KeFlushMultipleRangeTb+0x28c fffff880009a9a90 fffff80001697e27 : 0000000000ba3900 000000000000003f 0000000000000000 0000000000000000 : nt!MiZeroPageChain+0x14e fffff880009a9ad0 fffff8000196e456 : fffffa803b2dfb50 0000000000000080 fffffa803b2df040 cf8b4838558d48f8 : nt!MmZeroPageThread+0x83a fffff880009a9c00 fffff800016c62c6 : fffff80001853e80 fffffa803b2dfb50 fffff80001861cc0 9090909090909090 : nt!PspSystemThreadStartup+0x5a fffff880009a9c40 0000000000000000 : 0000000000000000 0000000000000000 0000000000000000 00000000`00000000 : nt!KiStartSystemThread+0x16

STACK_COMMAND: kb

SYMBOL_NAME: ANALYSIS_INCONCLUSIVE

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: Unknown_Module

IMAGE_NAME: Unknown_Image

DEBUG_FLR_IMAGE_TIMESTAMP: 0

FAILURE_BUCKET_ID: X64_CLOCK_WATCHDOG_TIMEOUT_16_PROC_ANALYSIS_INCONCLUSIVE

BUCKET_ID: X64_CLOCK_WATCHDOG_TIMEOUT_16_PROC_ANALYSIS_INCONCLUSIVE

Followup: MachineOwner

0: kd> .lastevent
Last event: Break instruction exception - code 80000003 (first/second chance
not available)
debugger time: Wed Feb 22 19:13:39.017 2017 (UTC + 8:00)
0: kd> !error
Error code: (NTSTATUS) 0 (0) - STATUS_WAIT_0

wizdroid · April 24, 2023, 1:29pm

Hi,

I am also seeing a similar crash with the following stack trace. The crash ends up freezing the host most of the time and crash dump is rarely generated.

FILE_IN_CAB: MEMORY.DMP

BUGCHECK_CODE: 101

BUGCHECK_P1: 18

BUGCHECK_P2: 0

BUGCHECK_P3: ffff8901edd4c180

BUGCHECK_P4: 4

FAULTING_PROCESSOR: 4

PROCESS_NAME: System

FAULTING_THREAD: ffffa18ab6cec780

LOCK_ADDRESS: fffff803739b68a0 – (!locks fffff803739b68a0)

Resource @ nt!PiEngineLock (0xfffff803739b68a0) Exclusively owned
Contention Count = 11
Threads: ffffa18ab6cec780-01<*>
1 total locks

PNP_TRIAGE_DATA:
Lock address : 0xfffff803739b68a0
Thread Count : 1
Thread address: 0xffffa18ab6cec780
Thread wait : 0xf9a1

STACK_TEXT:
ffff8901f088e7c8 fffff8037362187f ffff8901f088e7d0 fffff8037362173a ffff8901f088e800 fffff80373621649 ffff8901f088e840 fffff8037362bc9e ffff8901f088e8c0 fffff8037362bb9d ffff8901f088e910 fffff80d4422d0b0 ffff8901f088e9e0 fffff80d4421da53 ffff8901f088eae0 fffff80d4421e00c ffff8901f088eb10 fffff80d4421dcdd ffff8901f088eb90 fffff80d44296fe8 ffff8901f088ebc0 fffff80d44296f51 ffff8901f088ec20 fffff80d4421d37f ffff8901f088eca0 fffff80d442908dd ffff8901f088ed40 fffff80d44214363 ffff8901f088edf0 fffff80d446bfd7d ffff8901f088ee70 fffff80d446c7555 ffff8901f088eed0 fffff80d4469633d ffff8901f088ef50 fffff80373b2b6ed ffff8901f088efb0 fffff803737618a6 ffff8901f088eff0 fffff80373b2b5ed ffff8901f088f060 fffff80373b2b1e4 ffff8901f088f0f0 fffff80373b4bde7 ffff8901f088f120 fffff80373b28d1e ffff8901f088f3a0 fffff8037379a93a ffff8901f088f3f0 fffff803736d8909 ffff8901f088f4c0 fffff803737a47fd ffff8901f088f550 fffff803737fee96 ffff8901f088f5a0 0000000000000000 : 0000000000000040 fffff8037388a050 ffffe4929f5331ee 0000000000000016 : hal!HalpPciReadMmConfigUlong+0x7
: 0000000000000000 ffff8901f088e940 0000000000000040 0000000000000000 : hal!HalpPCIPerformConfigAccess+0x5b
: 0000000000000000 0000000000000000 4010054600000000 0000000000000000 : hal!HalpPciAccessMmConfigSpace+0x82
: 0000000000000040 0000000000000000 ffffa18a00000000 0000000000000017 : hal!HalpPCIConfig+0xc1
: 0000000000000000 000000000000000f ffffa18a00000000 0000000000000040 : hal!HalpReadPCIConfig+0x56
: ffffa18a00000200 0000000000000048 ffffa18abaa87270 fffff80d00000000 : hal!HalpGetPCIData+0x6d
: 0000200000000200 0000000000000060 ffffa18ab4fb31c0 0000000000000000 : ACPI!PciConfigSpaceHandlerWorker+0x1a0
: ffffa18ab4fb31c0 ffffa18ab4368a80 0000000000000000 ffffa18abb189920 : ACPI!PciConfigInternal+0x8f
: ffff8901f088ec28 ffff8901f088ebf0 ffffa18ab445efc8 ffff890100000000 : ACPI!IsPciBusAsyncWorker+0x30c
: ffffa18ab445efc8 fffff80d4423c300 0000000000000000 0000000000000000 : ACPI!IsPciBusAsync+0xb5
: ffffa18ab445efc8 ffffa18ab445ee00 fffff80d4423c300 0000000000000000 : ACPI!IsNsobjPciBus+0x78
: ffffa18ab4292ab0 ffffa18ab4292a01 fffff80d4423c340 ffffa18abb189920 : ACPI!EnableDisableRegions+0xe5
: ffffa18ab3a92180 ffffa18ab5459410 0000000000000000 000000000000000f : ACPI!ACPIDetectFilterDevices+0x25f
: ffffa18ab42927c0 ffffa18ab5459410 ffffa18abaf1d628 0000000000000000 : ACPI!ACPIFilterIrpQueryDeviceRelations+0x20d
: 0000000000000007 ffffa18abaf1d510 0000000000000000 ffffa18ab5459418 : ACPI!ACPIDispatchIrp+0x223
: 0000000000000000 0000000000000000 ffffa18ab5459418 ffffa18a00000000 : pci!PciCallDownIrpStack+0x7d
: ffffa18abaf1d510 ffff890100000000 0000000000000000 fffff80373765373 : pci!PciBus_QueryDeviceRelations+0x235
: ffffa18abaf1d510 ffff8901f088f090 ffffa18ab4572b10 0000000000000000 : pci!PciDispatchPnpPower+0xcd
: ffffa18ab4550060 0000000000000001 ffffa18ab92a0e50 ffff8901edeaa000 : nt!PnpAsynchronousCall+0xe5
: ffffa18ab454dd30 0000000000000000 ffffa18ab4550060 0000000000000000 : nt!PnpSendIrp+0x92
: ffffa18ab92a0e50 ffffa18ab454dd58 ffffa18ab454dd30 0000000000000000 : nt!PnpQueryDeviceRelations+0x51
: ffffa18ab454dd30 ffff8901f088f220 0000000000000002 fffff80300000000 : nt!PipEnumerateDevice+0xc8
: ffffa18ab97bf1a0 ffff8901ef7a1078 0000000000000000 0000000000000000 : nt!PipProcessDevNodeTree+0x19f
: ffffa10100000003 0000000000000000 0000000000000000 0000000000000006 : nt!PiProcessReenumeration+0xa6
: ffffa18ab6cec780 fffff803739b5340 fffff80373a572c0 fffff803739c06f0 : nt!PnpDeviceActionWorker+0x166
: ffffa18ab6cec780 0000000000000080 ffffa18ab3aaa040 ffffa18ab6cec780 : nt!ExpWorkerThread+0xe9
: ffff8901edf00180 ffffa18ab6cec780 fffff803737a47bc 0000000000000246 : nt!PspSystemThreadStartup+0x41
: ffff8901f0890000 ffff8901f0889000 0000000000000000 0000000000000000 : nt!KiStartSystemThread+0x16

SYMBOL_NAME: pci!PciCallDownIrpStack+7d

MODULE_NAME: pci

IMAGE_NAME: pci.sys

IMAGE_VERSION: 10.0.14393.4530

STACK_COMMAND: .process /r /p 0xffffa18ab3aaa040; .thread 0xffffa18ab6cec780 ; kb

BUCKET_ID_FUNC_OFFSET: 7d

FAILURE_BUCKET_ID: CLOCK_WATCHDOG_TIMEOUT_pci!PciCallDownIrpStack

MBond2 · April 24, 2023, 9:48pm

Is it possible that your debugger is preventing normal CPU execution?

Tim_Roberts · April 24, 2023, 11:16pm

Mr. Bond’s question is a good one, especially if you are using a VM. If that’s not the case, then it looks to me like you have a defective PCI device. Do you have a custom PCI device in this system?

wizdroid · April 25, 2023, 12:02pm

In my case, it’s a custom PCI device with Xilinx FPGA. It works normally on Linux though and most Windows based hosts. The crash happens only on Dell Workstations.

MBond2 · April 26, 2023, 12:12am

In your case, I assume there is no VM involved. If that’s the case, then follow Tim’s advice and check the code in any ISRs that you have

wizdroid · April 26, 2023, 7:37am

The host crashes while installing the driver. At this point the device isn’t generating any interrupts to trigger the ISR. After hard rebooting the host, the driver is not saved by windows. As per the logs the low level pci.sys driver is causing the crash during enumeration. From the stack trace, it’s hard to know which function of the driver is causing the crash.

wizdroid · April 26, 2023, 2:14pm

I debugged this from a second host with kdnet and windbg. The driver loading errors out because of ancient firmware on the PCIe card and a hardware refresh causes the host to crash.
Updating the firmware fixed the issue. Thanks for all the help.