Debugging drivers on ARM machines

For x64, I've been testing & debugging drivers in VMware VMs and attaching WinDbg to them via network debugging. For physical clients, I'm also using network debugging (provided they're using a supported NIC). For ARM, per Debugging Arm64 - Windows drivers | Microsoft Learn:

In general, developers debugging user mode apps should use the version of the debugger that matches the architecture of the target app. Use the Arm64 version of WinDbg to debug user mode Arm64 applications and use the Arm version of WinDbg to debug user mode ARM32 applications. Use the x86 version of WinDbg to debug user mode x86 applications running on Arm64 processors.

In rare cases where you need to debug system code – such as WOW64 or CHPE – you can use the Arm64 version of WinDbg. If you're debugging the Arm64 kernel from another machine, use the version of WinDbg that matches the architecture of that other machine.

So it seems like I can connect a x64 debugger to an ARM debuggee, is my understanding correct?

For VM debugging, it seems that VMware does not provide any ARM builds of their software yet. Running an ARM VM on a x64 host is not possible since we're using a hypervisor, not emulator (and an emulator's performance would probably be too poor for testing). Does that mean I can only use Hyper-V for VM debugging for now? Can anyone verify that it works?

For physical clients, can anyone verify that USB or network debugging works? If so, which model(s) did you use for the debugger/debuggee? I'm looking into buying some machines to start testing. Quite a few new ARM notebooks are also being released with the new Snapdragon X chips; not sure if there's any way to tell upfront whether they support USB/network debugging.

Correct.

Presumably, but haven't tried it yet. Only done physical debugging thus far (see next)

I can confirm that that Windows Dev Kit 2023 supports kernel debugging:

Buy Windows Dev Kit 2023 Desktop PC for Arm App Developers - Microsoft Store

The kernel debugger works using USB Ethernet Emulated Mode (USB-EEM) and KDNET. You connect the host and target via a standard USB cable and an Ethernet adapter pops out on each end. The setup is documented and works quite well:

Setting Up Kernel-Mode Debugging Over USB EEM on an Arm Device Using KDNET - Windows drivers | Microsoft Learn

We're on the waiting list to get some of the newer development kits based on Snapdragon X. Hopefully we'll get some soon and can update on how/if that works.

1 Like

So it seems like I can connect a x64 debugger to an ARM debuggee, is my understanding correct?

Correct.

For VM debugging, it seems that VMware does not provide any ARM builds of their software yet.

ESX support for arm64 windows is not yet there. Do note that even when it works, you will need ARM64 servers.

Qualcomm based Surface is one option which works.

Technically, Fusion on Mac should also be able to create a ARM64 Windows VM, but the issue is to find the Windows ARM64 ISO. We can not simply download it using MSN subscription.

See Windows ARM64 Download | MAS

I've spent the last 6+ years doing Windows kernel debugging on ARM64 server systems, the Marvell ThunderX2/ThunderX3 cpus for almost 3 years, and since 2021 the Ampere Computing ARM64 cpus. I haven't done any significant work on the Snapdragon chips. As was already mentioned the Snapdragon has a magic USB virtual Ethernet way a to do kernel debugging. Let me do a brain dump about Windows ARM64 debugging on UEFI systems.

The rumors are we may see ARM processors from a number of companies in the next year. Ampere has been selling ARM processors for years now, and NVidia Grace may be shipping too. You can run Windows ARM in a VM on an ARM Mac too. You can buy ARM server motherboards and systems now, although I haven't heard if Microsoft is productizing Windows ARM server. For some workstation users, running desktop Windows on a many core ARM cpu would be very attractive. A big limitation currently is very limited driver support on ARM Windows, like no GPU drivers.

For the non-Snapdragon ARM64 chips there are at least 4 kernel debugger options. These have been UEFI systems with PCIe busses. I've extensively used the ethernet windbg transport, on certain Mellanox PCIe cards (ConnextX-5). It's also possible to use some flavors of Intel NICs, even though the OS has no Intel NIC driver support, the windbg debug stub does. Last I knew (3 years ago) the PCIe Realtek NICs partially work for windbg, but when I tried them didn't correctly reset the NIC hardware unless you did a hard power cycle. The standby power pins keep the Realtek NIC set to a non-power on state, which windbg chokes on after a reboot unless the standby power rail goes off). A bonus of enabling an Intel NIC for windbg is it will tunnel non-windbg traffic over the Intel NIC, so you get things like remote desktop even if your system has no NDIS drivers, although network performance is not appropriate for production workloads. Windows ARM also supports USB NICs, but those don't work as a kernel debug target (except the Snapdragon special case).

The systems I've worked on also had serial ports, which work for windbg like you would expect. The serial ports I've seen have all been 3.3V logic levels, not RS-232 levs, so you need appropriate serial to USB interfaces. If the serial port is connected to a network serial server, there is also a magic windbg command line to get windbg to connect to the serial server TCP port but using the serial protocol instead of the network windbg protocol. This is also useful for connecting to QEMU serial ports and things like serial ports in assorted hardware emulators. I submitted a documentation bug report a couple years ago but Microsoft declined to fix the docs as it's not an officially supported feature. For things like Verilog emulators, windbg over serial is problematic as time runs at like 1:3000 speed in the emulated machine, but runs at normal speed in the windbg machine, so the protocol timeouts get really screwed up. MSFT should add a windbg option to allow adjusting the windbg serial and dbgnet protocol timeouts. Serial windbg also works on things like the Rapberry PI (4/5).

You can also boot ARM Windows under QEMU and configure serial ports or Intel NICs, and do kernel debugging. This works both when emulating ARM on a x64 system and when you are running QUMU on a an ARM processor (like running ARM Linux). If you are doing pure software emulation in QEMU on a x64 box, it helps to change the remote desktop connection timeout, as making the connection while emulated often exceeds the default timeout so RDT does not seem to work. The x64 emulated version has the interesting property you can go hack the QEMU emulation mechanism if needed, so you can alter the behavior of the "machine". Like for example you can write a virtual device driver for your device so you can work on driver development long before you have actual silicon or even FPGA prototypes. You often don't have to precisely emulate your device, you may just need it to work well enough to get partial functionality allowing time overlap between the driver and hardware development. When doing CPU development, being able to add debugging behaviors and instrumentation has been pretty useful to debug hardware errata and such. It's way more fun to test your error handing code when you can just flip a software switch and make some rare error condition reproduce on demand. QEMU can also generate a bunch of useful traces, and you can run gdb on the VM for things that windbg can't debug, like say stepping through the OS interrupt handling.

You can also debug ARM Windows over JTAG. You enable the EXDI to windbg protocol interface (comes with windbg now) which connects to a gdb server over TCP. This works with the Lauterbach JTAG debugger, but unfortunately last I knew does not work correctly using openocd. Lauterback is pretty expensive (thousands), openocd is free. The EXDI library has specific support for Lauterback, and openocd gdb server does not have some commands needed by windbg over EXDI (like writing breakpoint instructions to read-only pages of code). You can connect the windbg EXDI interface to the gdb server built into QEMU, which works very well. As I remember this windbg->EXDI->QEMU(gdb server) also works fine on x64 QEMU.

These windbg transports also work for hypervisor debugging if you set the right options, but since there are no symbols, hypervisor debugging is painful.

The Lauterbach JTAG debugger also knows how to load Windows symbol files (not very automatically). Note there used to be an intermittent bug in Lauterback when loading Windows symbols, which crashes the Lauterback debugger with a memory fault. For really deep debugging, you can use Lauterback together with EXDI+Windbg, you can do things like step into firmware calls and debug the firmware with the Lauterback debugger, which continues to run with the EXDI connection from windbg. You can also run Lauterback JTAG+EXDI/windbg+windbg (over an Intel NIC). This allows the most flexibility as you can symbolically debug user-mode and EL1 kernel code with windbg(NIC) and use the windbg (over EXDI) for tricky kernel debugging, and the Lauterback debugger for EL3 firmware or really deep hardware examination (like cache state). Running three debuggers at once is painful, but can be useful on occasion.

Note that ARM64 processors have control over permissions for many kinds of debugging, so the systems normally available may not allow JTAG EL3 debugging for example. These debug permissions are often controlled by the cpu vendor. A plus of debugging under QEMU is there are no permission limits.

  • Jan
2 Likes

Jan... Haven't seen you here in ages!

THANK YOU for such a very detailed, helpful, and informative post. I really, really, appreciate it.

Where can an ordinary mortal BUY an Ampere Computing ARM64 system that you can install Windows on? I mean... I know they gave one to Linus, but I haven't been able to find anyplace that actually SELLS these machines.

Again... many, many, thanks.

ETA: OK... so I found some Altra motherboards for sale for example here). But I'd still like to know if these are "known to work" with Windows ARM64.

And, do you know of ANY graphics drivers... or will we be stuck with VGA for the time being?

ETA, AGAIN: OK... there's at least SOME info on this available on the web. If you start Googling "ampere computing ARM64 Windows" there's some pretty interesting info, including some blog posts and even Ampere's instructions on GitHub for installing Windows.

Hi Peter,

Some info can be found at Windows Arm Ampere Altra guide - General Discussion - Ampere AArch64/ARM64 Developer Community

You should send a message to Joe Speed who is the developer relations guy at Ampere.

I don't think any of the current systems are WHQL certified although many/all are ARM Server Ready certified (there are multiple server ready levels). These are all UEFI/ACPI firmware based, unlike some ARM SoC systems that use a device specific device tree in Linux (and am not sure what you do for Windows).

The OEM AdLink seems to be pushing hard to get companies to ship Windows ARM drivers, see Ampere Altra Dev Kit | COM-HPC Server Carrier and Starter Kit | ADLINK

I see that Asrock microATX motherboard has 8 dimm slots and wonder if they are each wired to a memory controller or if they only use 4 memory controllers. Optimal memory bandwidth comes from using all the memory channels. Half a TB of ram only costs like $1500 now (64GB dimms).

A downside of the AdLink cpu module is it only has 6 memory slots, so can never achieve the highest memory bandwidth. For doing SW dev this may not matter.

SuperMicro also has Ampere Altra systems https://www.supermicro.com/datasheet/datasheet_Arm-Ampere.pdf

HPE is shipping ARM servers for Linux too, see https://www.hpe.com/us/en/servers/proliant-rl-300.html

I don't know if Microsoft has announced plans to ship and support Windows ARM Server as a SKU. Windows ARM desktop is a real product although you can't buy retail copies. MSDN downloads does have an official ARM desktop .ISO under Windows 11 IoT (the download dropdown shows both x64 and arm64).

All these systems are using the Ampere Altra series of processors, which are based on the ARM Neoverse N1 core. These come in 80 core (or fused for less) and 128 core flavors, in a couple speed ratings up to 3Ghz. Ampere also has a newer generation of processors called the AmpereOne series, these use a core Ampere developed inhouse and goes up to 192 cores. As far as I know the AmpereOne processor has only been picked up by cloud companies and there are no retail motherboards. The CPU sockets are different as the AmpereOne uses PCIe Gen 5. They have also announced the following generation which starts at 256 cores and 12 memory controller channels and is targeting 2025.

Keep in mind the per core performance of these Ampere Altra systems can't match a high end Intel/AMD processor, but there a lot of cores and their power consumption is pretty low so compute cycles per watt of power is what's attractive. For some workloads this is a good tradeoff (like cloud microservices) and for some workloads this will give disappointing performance (like many single threaded desktop applications or server applications that have per core license fees). These Ampere Altra systems also have a lot of PCIe lanes, like 128 Gen 4, and for OS folks, there are 6 PCIe segments so you have to specify SBDF in windbg commands. From what I have read, the latest Qualcomm Elite X cpus have quite good single core performance. There are performance/power/thermal tradoffs in the cpu configuration, so it seems questionable we will ever see 256 core cpus that have top end single core performance.

For Windows kernel devs, a critical little gotcha is you can only set a BDF address for a windbg NIC, not a SBDF, so net windbg only works if you plug the NIC into a slot on segment 0. Last I knew only the Mellanox NIC is officially supported for net windbg, and those are x16 cards. I know you can get some adapters to plug the Mellanox NIC into a 10Gb SFP switch port, and not sure about adapters to 1GB switch ports. Many/most Intel/x64 systems only have a single PCIe segment so this lack of support for windbg SBDF has not been a problem in the past. On the other hand since we now may use a bunch of NVMe devices, sometimes behind a PCIe bridge, for big systems a single PCIe segment becomes a problem.

  • Jan