I've been single-stepping with WinDbg (new debugger from the store) a kernel memory allocation function (that I believe was called from multiple places, and not just from my code.)
I started single-stepping it (F10), and also (F11 for step-into) and noticed that current core numbers were changing. Say, I started from:
6: kd> p
then it switched to:
8: kd> p
then came back to core 6.
I understand that it was probably catching other cores executing at the same place. But that was super confusing. I wonder if it was done by design, or if it's a sort of a bug in the kernel debugger?
Why do you think it's a bug? That CPU doesn't stay frozen in suspended animation while you paused in the debugger. The thread is frozen, but the CPU is still working. It's quite possible that your process changed CPUs while you were waiting.
I'm guessing you set a breakpoint and then step from that breakpoint?
The debugger behavior you describe is exactly what happens when the first core hits a breakpoint, and then you step, and a different core then hits the breakpoint. The fix is after the first core hits the breakpoint, disable that breakpoint, so you only have a single core than hit the breakpoint and is being stepped. You unfortunately can't directly set a breakpoint for a specific core, even though the cpu usually is capable of doing that with a hardware breakpoint.
If I want to breakpoint on a specific core I often insert a little chunk of debug code that checks the core number and if it matches do some line of code that I can set a breakpoint on that does nothing useful (like increments a global integer variable when the core number matches). You can then set the breakpoint on the code that does that increment, or you can also set a memory write breakpoint on that global as only the desired core will write to it. These are all strategies to avoid multiple cores stopping at a breakpoint. When you are stepping, you can also just say go if the core number was not the same is first hit the breakpoint, which works if it's uncommon to hit the breakpoint. You can also set breakpoints with a condition, with the the condition being the core number matches the one you want. This works but it degrades performance if lots of cores are hitting the breakpoint and then having to run debugger script code to determine if you continue or halt.
Windbg is not able to stop one core, and let all the other ones keep running. When you break into the debugger, all cores will be halted or resumed. JTAG debuggers sometimes can halt/continue individual cores, so you can step through your driver code while the drivers keep executing and responding to interrupts. Some devices get unhappy if their driver is not responding to heartbeat interrupts (like happens if you stop in windbg). They may do things like decide their internal firmware has crashed so will restart their firmware. For cases like this it may may be better to log events, like with TraceLogging, instead of trying to halt/step in the debugger.
Or sometimes it's better to write code that detects when a failure has happened, and trigger a bugcheck, and you can then do post-mortem debugging live or from a crashdump.
When I write a bunch of new kernel code I often like to single step through it when I first bring it up, checking variables have plausible values as I step. Setting the system to a single core with bcdedit sometimes makes this initial live walkthrough easier.