Tips on diagnosing system hangs?

Michael_Rolle · March 4, 2018, 1:00am

My kernel driver is suffering from occasional locking up of the system, and I’m admittedly too much of a driver newbie to know how to track it down.
The lockup means that the mouse and keyboard and power button do not respond at all, and even the L-ALT+ScrLock+ScrLock+ScrLock that I have configured on my computer doesn’t give me a bugcheck like it normally would. So I have to pull the plug on my computer and reboot.

I have been careful to have my driver do all its work on the same core all the time, using KeSetTargetProcessorDpcEx and doing the work via DPCs.

At the moment, the driver is accessing some per-core MSRs on this core. These can raise an interrupt signal, but there is an APIC register that masks this interrupt. So I don’t think the computer is being interrupted at all.

I don’t have two computers, so I can only use windbg in local mode. Otherwise I could examine my frozen system while it is frozen (assuming the interface to the remote debugger didn’t also get frozen out).

Questions…

If there is an interrupt signal that won’t go away, can it take down all the processors? Can I get the kernel to log interrupts received somewhere that I can examine after I reboot?
What are some typical deadlocks that could occur in my driver? I’m not explicitly using any spinlocks. Can I get the kernel to detect deadlocks and bugcheck or log or break them or whatever?
If I’m somehow corrupting kernel memory, can this take down the system? Can I get the kernel to detect this, etc.?
Any other things that could result in all the cores being tied up?
Is LiveKD possibly helpful to run while I’m exercising my driver? If so, can you point me to some instructions about how to use it? The livekd download only contains the executable files, and the web site shows a usage display with not much explanation.
Do you think it would be worth my while to find a friend with another PC and get a debugging connector cable and debug my system remotely? If so, is there a simple tutorial on how to use windbg or kd, more helpful than the Microsoft pages?

Thanks. All suggestions will be appreciated greatly!

Michael_Rolle · March 4, 2018, 1:07am

Correction – I meant R-CTL+ScrLock+ScrLock to create a bug check.

Another question…

If I put DbgPrint calls in my driver, can you show me an easy tutorial on how to use the output. In particular, I’d want to save the data QUICKLY to disk in case of a crash or hangup.

mm1 · March 4, 2018, 11:45am

Greetings.

This is going to be challenging on one computer.

What type of driver is it? I assume it’s not something that you could
develop in a VM? If it is, there’s always VirtualBox (with VirtualKD,
actually a nice pair).

Driver Verifier does indeed have deadlock detection, but using this on the
same machine will be painful.

Yes, I would try and find/borrow another computer to help debugging.

No, there’s really not a great tutorial that I know of.

No, LiveKD isn’t going to be of much help here, I’m afraid.

Good luck,

mm

On Sat, Mar 3, 2018 at 10:06 PM, xxxxx@rolle.name wrote:

> Correction – I meant R-CTL+ScrLock+ScrLock to create a bug check.
>
> Another question…
>
> 7. If I put DbgPrint calls in my driver, can you show me an easy tutorial
> on how to use the output. In particular, I’d want to save the data QUICKLY
> to disk in case of a crash or hangup.
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: http:> showlists.cfm?list=ntdev>
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer>
></http:></http:>

Michael_Rolle · March 4, 2018, 3:33pm

Regarding suitability for running in a VM…

My driver does nothing more than read and write (1) MSRs on a single core and (2) local APIC register 500 on a single core. I have a second driver that does single reads and writes to any specified MSR or APIC. I have an AMD Ryzen 3. I’ve installed VirtualBox and I’ll try it after I’ve looked at the user manual.

If any of the above looks to you like my drivers won’t work on a VM, please let me know right away before I go down a rabbit hole trying it out for myself.

It looks like VirtualBox creates a “guest” VM on my “host” machine, and then lets me do things on the guest.

The test application is just a Windows console app that opens a file for the driver’s device and writes some stuff to the console. I run the app in a regular Windows command console. I suppose I have to run the console and the app in the guest VM, right? I can then see if I can load my drivers and exercise them.

The idea is that if my driver misbehaves on the guest VM and freezes everything, then the freeze will be limited to the guest only, and things on the host will still work fine, and I should be able to use VirtualKD, or kd or windbg for that matter, to examine what’s going on inside the VM.

Stay tuned for updates on my progress with this experiment.

mm1 · March 4, 2018, 4:12pm

Putting aside your concerns about VM’s for the moment, reading and writing
to MSR’s and an APIC might just cause a machine to hang - very strong
correlation between those two activities, particularly the latter

What are you trying to do here?

mm

On Sun, Mar 4, 2018 at 12:33 PM, xxxxx@rolle.name wrote:

> Regarding suitability for running in a VM…
>
> My driver does nothing more than read and write (1) MSRs on a single core
> and (2) local APIC register 500 on a single core. I have a second driver
> that does single reads and writes to any specified MSR or APIC. I have an
> AMD Ryzen 3. I’ve installed VirtualBox and I’ll try it after I’ve looked
> at the user manual.
>
> If any of the above looks to you like my drivers won’t work on a VM,
> please let me know right away before I go down a rabbit hole trying it out
> for myself.
>
> It looks like VirtualBox creates a “guest” VM on my “host” machine, and
> then lets me do things on the guest.
>
> The test application is just a Windows console app that opens a file for
> the driver’s device and writes some stuff to the console. I run the app in
> a regular Windows command console. I suppose I have to run the console and
> the app in the guest VM, right? I can then see if I can load my drivers
> and exercise them.
>
> The idea is that if my driver misbehaves on the guest VM and freezes
> everything, then the freeze will be limited to the guest only, and things
> on the host will still work fine, and I should be able to use VirtualKD, or
> kd or windbg for that matter, to examine what’s going on inside the VM.
>
> Stay tuned for updates on my progress with this experiment.
>
>
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: http:> showlists.cfm?list=ntdev>
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer>
></http:></http:>

Tim_Roberts · March 5, 2018, 1:43am

On Mar 3, 2018, at 10:06 PM, xxxxx@rolle.name wrote:
>
> Correction – I meant R-CTL+ScrLock+ScrLock to create a bug check.
>
> Another question…
>
> 7. If I put DbgPrint calls in my driver, can you show me an easy tutorial on how to use the output. In particular, I’d want to save the data QUICKLY to disk in case of a crash or hangup.

A second computer to host windbg would cost you $250. It does not have to be a powerhouse. You have already wasted FAR more than that trying to hack your way around the only sensible solution.
—
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Michael_Rolle · March 5, 2018, 2:38am

I spent the entire day today playing with VirtualBox. In the end, I got my test app and the drivers to run in the guest VM, but they didn’t behave correctly. VBox doesn’t support MSRs correctly, it seems. The driver will write a value into an MSR but doesn’t read that same value back. I was hoping that the guest would treat its MSRs as identical to the host’s MSRs for the corresponding processor.

Anyway, it was an interesting learning experience, and VBox is a neat product as far as I’ve used it.

I’ve actually got a couple of old laptops around, and if I can get one of them up and running with an ethernet cable, I’ll try using it as a Windbg host.

I’d like to ask one question about debugging over ethernet. If the target system freezes, will the debugger still be able to break in? If my computer doesn’t respond to the ctrl-scrlock command, I believe this is due to all the processors busy running at IRQLs higher than that of the keyboard device. So I hope that the NIC is running at a high enough IRQL to respond to the debugger.

Anyone have personal experience, or know of someone who has, of this actually working? Or not working? I recall seeing something online about using windbg for an interrupt storm, so I’ll go look that up again.

Of course, remote debugging can be useful in other ways, namely, being able to step through the driver code.

Thanks again, Tim, for sage advice.

By the way, DriverVerifier didn’t find any deadlock conditions, or anything else wrong

Mark_Roddy · March 5, 2018, 9:25am

It depends on how badly you’ve locked up the system. OSR used to sell an
“nmi” board, don’t know if they still do, but that is sort of the next step
for the physical system toute, short of hooking up a way too expensive
analyzer, if windbg can’t break in. Even if the debugger is locked out, you
should have a better shot at getting debug print data out of the remote
kernel debugger than from the local debugger, so there is always that path.

Some hypervisors are much better than others at providing processor
hardware emulation support, so you might also look at hyper-v and the free
version of vmware. hyper-v should let you create a memory dump of a hosed
vm. See
https://blogs.msdn.microsoft.com/scstr/2015/01/11/how-to-create-a-complete-memory-dump-of-a-running-or-hanging-virtual-machine-vm-on-windows-server-2012-r2-hyper-v/

Mark Roddy

On Mon, Mar 5, 2018 at 2:38 AM, xxxxx@rolle.name wrote:

> I spent the entire day today playing with VirtualBox. In the end, I got
> my test app and the drivers to run in the guest VM, but they didn’t behave
> correctly. VBox doesn’t support MSRs correctly, it seems. The driver will
> write a value into an MSR but doesn’t read that same value back. I was
> hoping that the guest would treat its MSRs as identical to the host’s MSRs
> for the corresponding processor.
>
> Anyway, it was an interesting learning experience, and VBox is a neat
> product as far as I’ve used it.
>
> I’ve actually got a couple of old laptops around, and if I can get one of
> them up and running with an ethernet cable, I’ll try using it as a Windbg
> host.
>
> I’d like to ask one question about debugging over ethernet. If the target
> system freezes, will the debugger still be able to break in? If my
> computer doesn’t respond to the ctrl-scrlock command, I believe this is due
> to all the processors busy running at IRQLs higher than that of the
> keyboard device. So I hope that the NIC is running at a high enough IRQL
> to respond to the debugger.
>
> Anyone have personal experience, or know of someone who has, of this
> actually working? Or not working? I recall seeing something online about
> using windbg for an interrupt storm, so I’ll go look that up again.
>
> Of course, remote debugging can be useful in other ways, namely, being
> able to step through the driver code.
>
> Thanks again, Tim, for sage advice.
>
> By the way, DriverVerifier didn’t find any deadlock conditions, or
> anything else wrong
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: http:> showlists.cfm?list=ntdev>
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer>
></http:></http:>

Tim_Roberts · March 5, 2018, 11:24am

xxxxx@rolle.name wrote:

I’ve actually got a couple of old laptops around, and if I can get one of them up and running with an ethernet cable, I’ll try using it as a Windbg host.

That’s what I did. Â I have a very old laptop that has the huge benefit
of having a 1394 connector.Â That became my handy WinDbg host.

I’d like to ask one question about debugging over ethernet. If the target system freezes, will the debugger still be able to break in? If my computer doesn’t respond to the ctrl-scrlock command, I believe this is due to all the processors busy running at IRQLs higher than that of the keyboard device. So I hope that the NIC is running at a high enough IRQL to respond to the debugger.

The kernel debugger core runs independently from the operating system
kernel, including its own set of drivers, so it is somewhat resilient to
software issues, but it is still vulnerable to hardware issues.Â If the
bus is locked up because of an interrupt storm, or if all of the CPUs
are in tight loops with interrupts disabled, then the kernel debugger
isn’t going to be able to run.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

matt_sykes · March 6, 2018, 7:23am

Get a second computer, and run verifier on the test driver. This is the minimum needed to develop drivers.

Jan_Bottorff · March 6, 2018, 1:37pm

You should probably be using ETW tracing, which in its (really) easiest form uses the TraceLogging APIs. If the system crashes, and writes a crash dump (i.e. your past the boot critical OS startup), you can extract any unwritten part of an ETW log from the memory image. ETW is usually good up to a million events/sec or so, and the OS uses per core log buffers so you’re not thrashing a shared lock. Having a debugger connected to see kernel DbgPrints is usually the easiest way to get small amounts of data right up to a crash.

If your system fully boots, but will not write a crash dump, it’s a fair chance you have a hung processor or something. Crash dumps run with all interrupts disabled, with a single processor doing polling of the disk controller completion (one reason the storage miniport model is strange is so it still works in this extremely limited execution environment).

A plus of an ETW trace is you can have a bunch of structured data, and if you only want to see the data recently before a crash, you can specify circular in-memory buffers, and new events just keep overwriting old events, with the circular buffers in the crash dump. You can run a system for days/weeks logging to the circular buffer. You can also flip a switch and log many gigabytes of events to disk, saving every event. You also will need some of the tools to decode ETW traces. Even though it’s a bit slow, Microsoft’s Message Analyzer has among the best filtering (expressions). MA is kind of memory inefficient too, million event traces are ok, but 100 million events traces aren’t. Microsoft Performance Toolkit tools also analyzes ETW traces, and sometimes it has a really useful view of the data, and other times it seems to just have almost useless views.

A HUGE plus of ETW traces is also the OS had hundreds of event providers built-in, so you can get unified traces of OS events along with your driver events. I’m a huge fan of ETW tracing, and even though the TraceLogging flavor has more overhead than the manifest flavor, I generally use the TraceLogging flavor because it’s just a lot easier to add new events. ETW also includes the correlation id feature, which is not well explained, but incredibly useful for making logical sense of the raw events (like associating a specific interrupt with its originating specific I/O operation).

Forcing a crash dump on command is also important, and you know about the keyboard sequence. Many server motherboards can also generate an NMI (non-maskable interrupt), which sometimes can be initiated from a server management interface (like on many Dell servers). Some servers also have the hardware to generate an NMI, but you have to add a little switch to the motherboard header tp trigger the NMI (many SuperMicro motherboards). Some motherboards know how to generate an NMI on a watchdog timeout too. An NMI has the advantage that it will interrupt through an interrupt storm or a looping IPI, although will not interrupt a processor hung on a bus access (like an infinite PCIe config retry). It would take a PCIe analyzer to verify an infinite config retry, and only if the PCIe device is in a slot you can put the PCIe interposer on. At one point , there were JTAG like debug interfaces for special x86 motherboards. It’s always seemed odd that a $2 ARM microcontroller has a better low level debugger interface (that uses a $15 interface tool) than a $2000 Intel server processor.

If you looking for CHEAP ways to get really low level debugging info out: 1) if you motherboard supports an old legacy parallel or serial interface, these were accessed with port i/o instructions. You can figure out the port address, which often would not change between boots, and make your drive toggle pins on the port (like dtr for a serial port). I suppose you could even do bit-banged serial, and use an oscilloscope ($350 or $15 Arduino) to read the serial stream. PCs also had debug port at 0x80, and for PCI (and maybe PCIe too) you could get cheap little boards than would read the debug port value and display it on a seven segment led. I’ve also connected logic analyzers to parallel ports. These methods all require your code somehow get a cpu, which can be an interrupt, or crashdump callback. Sometimes you can also monopolize a processor at high IRQL for a while, like make a worker thread that raises IRQL to some higher level for a millisecond, and periodically lowers it back down to PASSIVE_LEVEL, to avoid hanging any other threads scheduled on this core. If you forever monopolize a CPU at elevated IRQL, you will get a DPC watchdog crash. Monopolizing a cpu is not good in a production driver (proper power management is toast), but for debugging use it does give you control of a processor a large percentage of the time.

ETW tracing has an option to use the TSC timestamp instead of the QPF timestamp, although I don’t remember if that actually works on current OS versions and CPUs. My fuzzy memory is the TSC timestamp works, but is not so useful because writing the event takes a few hundred cycles and if it used a synchronizing RDTSCD, you end up only having a little bit better resolution, not 1024x better.

If you need really fast timing measurements, Intel has a great white paper on how to get the very best resolution from the TSC. Like it talks about the pros and cons or using RDTSDC which does not synchronize the pipeline vs RDTSCP which does synchronize the pipeline. I read that even for the synchronizing RDTSCD, the processor will start executing new instruction before the timer finishes reading so you can’t full escape the timing fuzziness of out-of-order execution.

I think one of the reasons for QPC being TSC/1024 is that on modern processers, which are out-of-order execution and speculatively executed, with potentially close to 100 instructions in some state of execution, TSC/1024 is a number that is above the jitter. RDTSC is NOT a synchronizing instruction, so the value is just someplace in the out-of-order queue. The OS will also choose an “optimal” high resolution time source on a particular system. On some processor models, the TSC tracks with power management clock speed changes. On these systems, the OS may choose a global hardware timer for QPC, like the high resolution event counter in the chipset. I’ve often found QPC is sufficient resolution for performance analysis of most C code. For tight inner loops, even the TSC is insufficient, and you need to look at things more statistically (like time artificially large test cases). For inner loop optimization, it’s still useful to look at the instruction stream and manually add up instruction throughput cycles, result latencies, and dependency stalls. You generally don’t directly write assembler, but you certainly can carefully tune your C code to persuade the compiler to generate the code you want.

For fine grain performance analysis, Intel’s VTune is usually my tool of choice, which answers question like: how long does that interlocked memory operation really take (an enormous amount of time if it’s a cache miss to a different cpu package).

Jan

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@rolle.name
Sent: Saturday, March 3, 2018 10:07 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Tips on diagnosing system hangs?

Correction – I meant R-CTL+ScrLock+ScrLock to create a bug check.

Another question…

7. If I put DbgPrint calls in my driver, can you show me an easy tutorial on how to use the output. In particular, I’d want to save the data QUICKLY to disk in case of a crash or hangup.

—
NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:>

Peter_Viscarola_OSR · March 6, 2018, 2:38pm

There still are. Even more fun, a lot of “closed case” (Intel) systems have the option of debugging via a given USB 3.0 connector. This has recently been cited as a security loophole.

In general, you can’t been JTAG debugging for low-level power, but JTAG debugging is a royal PITA. Also, believe it or not, most current (Intel) JTAG debug interfaces are only of limited use in debugging really hard hangs.

Aside: I *love* Jan Bottorff’s posts.

Peter
OSR
@OSRDrivers