Help debugging HARD lockup during NVMe file transfer – PCIe credits not returned

Good morning experts,
Looking for suggestions to find what process is hanging the system.

The Setup: Laptop with 11th gen Intel CPU (Tiger Lake) -> USB4 Dock -> Thunderbolt 3 SSD (Intel Titan Ridge + Phison NVMe controller). Multi-GB write to disk runs for 10+ seconds and then hangs the system.
If Host Memory Buffer support is disabled using HMBAllocationPolicy reg setting, the hang will not occur.

USB4 protocol traces between CPU and USB4 dock shows PCIe mem read completion data going downstream, CPU to SSD (the disk write data) and PCIe mem writes going upstream, SSD to CPU (Host memory buffer writes).

In the failing case, the CPU stops returning posted PCIe credits, specifically, it continues to send flow control updates for posted credits but does not increment the header or data values.
Windbg is connected and working over USB3. Verifier is enabled for all drivers with standard settings except Special Pool, which was causing a crash on boot unrelated to this issue.

I have tried the following to break into the debugger or cause a crash when the system hangs without success.

  • Break from debugger
  • Crash/break from keyboard using CrashOnCtrlScroll and Dump*Key registry settings
  • Crash/Break from power button using PowerButtonBugcheck registry setting
    The system will either hang forever (I’ve waited 20 minutes) or after a couple of minutes will reboot without a crash.

Any ideas, suggestions or incantations on how to break in when it hangs or to get more info to find the deadlock would be much appreciated.
Eric W

If you can't break in from the debugger it means that the clock ISR has stopped being serviced, at that point there's not much you can do unless you can generate an NMI (even that might not work).

Presumably this isn't any of your software running? Usually if you're developing a device driver and you get this it's a matter of trying to diagnose with trace messages and seeing the last cries for help before the crash.

You might spelunk around StorNVMe a bit and see if there's any tracing you can turn on. It's probably all WPP if anything though so probably not going to be helpful.

Hi Scott,
Thanks for your reply. That's about what i thought.

You are right, all standard Microsoft and Intel drivers.

I'll poke around in StorNVMe and StorPort to see if i can find anything.

I'm still hoping to get crash to debug from.
Eric