Good morning experts,
Looking for suggestions to find what process is hanging the system.
The Setup: Laptop with 11th gen Intel CPU (Tiger Lake) -> USB4 Dock -> Thunderbolt 3 SSD (Intel Titan Ridge + Phison NVMe controller). Multi-GB write to disk runs for 10+ seconds and then hangs the system.
If Host Memory Buffer support is disabled using HMBAllocationPolicy reg setting, the hang will not occur.
USB4 protocol traces between CPU and USB4 dock shows PCIe mem read completion data going downstream, CPU to SSD (the disk write data) and PCIe mem writes going upstream, SSD to CPU (Host memory buffer writes).
In the failing case, the CPU stops returning posted PCIe credits, specifically, it continues to send flow control updates for posted credits but does not increment the header or data values.
Windbg is connected and working over USB3. Verifier is enabled for all drivers with standard settings except Special Pool, which was causing a crash on boot unrelated to this issue.
I have tried the following to break into the debugger or cause a crash when the system hangs without success.
- Break from debugger
- Crash/break from keyboard using CrashOnCtrlScroll and Dump*Key registry settings
- Crash/Break from power button using PowerButtonBugcheck registry setting
The system will either hang forever (I’ve waited 20 minutes) or after a couple of minutes will reboot without a crash.
Any ideas, suggestions or incantations on how to break in when it hangs or to get more info to find the deadlock would be much appreciated.
Eric W