how to locate a crashed processor?

We have a 16 processor server where one CPU is causing a crash. We can tell
which one it is by using ~#. But how do we map this to a physical processor?

If we are in the right context ie ~2, is there a way we can findout the APIC
Destination Register for this processor? This is the only way I can think
of to locate the physical processor failing.

Ed in Calif

Well, you could try using a spinloop and a thermometer. i.e. write a
program that just spins in a loop, use processor affinity to control which
processor it runs on, and use a thermometer to measure heat output from each
processor. Sure, it’s O(N), but it might work.

– arlie

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Ed In Calif
Sent: Friday, October 28, 2005 10:14 PM
To: Kernel Debugging Interest List
Subject: [windbg] how to locate a crashed processor?

We have a 16 processor server where one CPU is causing a crash. We can tell
which one it is by using ~#. But how do we map this to a physical processor?

If we are in the right context ie ~2, is there a way we can findout the APIC
Destination Register for this processor? This is the only way I can think of
to locate the physical processor failing.

Ed in Calif


You are currently subscribed to windbg as: xxxxx@stonestreetone.com To
unsubscribe send a blank email to xxxxx@lists.osr.com

WinDbg has the “!APIC” extension command that prints the local APIC ID of the
current (crashed) CPU. You also can try to use the “!dd” extension to read
physical memory addresses that contain local APIC registers.

Dmitriy

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Ed In Calif
Sent: Friday, October 28, 2005 7:14 PM
To: Kernel Debugging Interest List
Subject: [windbg] how to locate a crashed processor?

We have a 16 processor server where one CPU is causing a crash. We can tell
which one it is by using ~#. But how do we map this to a physical processor?

If we are in the right context ie ~2, is there a way we can findout the APIC
Destination Register for this processor? This is the only way I can think
of to locate the physical processor failing.

Ed in Calif

Isn’t the processor number pretty much determined by the slot, no ? The
hardware documentation should have this covered.

Good one :wink:

----- Original Message -----
From: “Arlie Davis”
To: “Kernel Debugging Interest List”
Sent: Monday, October 31, 2005 7:44 AM
Subject: RE: [windbg] how to locate a crashed processor?

> Well, you could try using a spinloop and a thermometer. i.e. write a
> program that just spins in a loop, use processor affinity to control which
> processor it runs on, and use a thermometer to measure heat output from
> each
> processor. Sure, it’s O(N), but it might work.
>
> – arlie
>
>
> -----Original Message-----
> From: xxxxx@lists.osr.com
> [mailto:xxxxx@lists.osr.com] On Behalf Of Ed In Calif
> Sent: Friday, October 28, 2005 10:14 PM
> To: Kernel Debugging Interest List
> Subject: [windbg] how to locate a crashed processor?
>
>
> We have a 16 processor server where one CPU is causing a crash. We can
> tell
> which one it is by using ~#. But how do we map this to a physical
> processor?
>
> If we are in the right context ie ~2, is there a way we can findout the
> APIC
> Destination Register for this processor? This is the only way I can think
> of
> to locate the physical processor failing.
>
> Ed in Calif
>
>
> —
> You are currently subscribed to windbg as: xxxxx@stonestreetone.com To
> unsubscribe send a blank email to xxxxx@lists.osr.com
>
>
>
>
> —
> You are currently subscribed to windbg as: xxxxx@comcast.net
> To unsubscribe send a blank email to xxxxx@lists.osr.com

I don’t know of ‘slot numbers’ of processors.

----- Original Message -----
From: “Satya Das”
To: “Kernel Debugging Interest List”
Sent: Monday, October 31, 2005 3:55 PM
Subject: RE: [windbg] how to locate a crashed processor?

Isn’t the processor number pretty much determined by the slot, no ? The
hardware documentation should have this covered.


You are currently subscribed to windbg as: unknown lmsubst tag argument: ‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com

I don’t see !APCI in windbg help but I’ll try it, if it works
this is the best answer.

Ed

----- Original Message -----
From: “Dmitriy Budko”
To: “Kernel Debugging Interest List”
Sent: Monday, October 31, 2005 12:38 PM
Subject: RE: [windbg] how to locate a crashed processor?

WinDbg has the “!APIC” extension command that prints the local APIC ID of
the
current (crashed) CPU. You also can try to use the “!dd” extension to read
physical memory addresses that contain local APIC registers.

Dmitriy

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Ed In Calif
Sent: Friday, October 28, 2005 7:14 PM
To: Kernel Debugging Interest List
Subject: [windbg] how to locate a crashed processor?

We have a 16 processor server where one CPU is causing a crash. We can tell
which one it is by using ~#. But how do we map this to a physical processor?

If we are in the right context ie ~2, is there a way we can findout the APIC
Destination Register for this processor? This is the only way I can think
of to locate the physical processor failing.

Ed in Calif


You are currently subscribed to windbg as: unknown lmsubst tag argument: ‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com

Oh, come on! He could do it with a binary search and do it in O(log N).

I think I’ve been doing too much performance tuning lately (since this
observation DID pop into my head as soon as I read the note, but I
refrained from suggesting it until someone observed on Arlie’s
brilliance - we don’t want it to go to his head!)

:wink:

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Ed In Calif
Sent: Monday, October 31, 2005 11:06 PM
To: Kernel Debugging Interest List
Subject: Re: [windbg] how to locate a crashed processor?

Good one :wink:

----- Original Message -----
From: “Arlie Davis”
To: “Kernel Debugging Interest List”
Sent: Monday, October 31, 2005 7:44 AM
Subject: RE: [windbg] how to locate a crashed processor?

> Well, you could try using a spinloop and a thermometer. i.e. write a
> program that just spins in a loop, use processor affinity to control
which
> processor it runs on, and use a thermometer to measure heat output
from
> each
> processor. Sure, it’s O(N), but it might work.
>
> – arlie
>
>
> -----Original Message-----
> From: xxxxx@lists.osr.com
> [mailto:xxxxx@lists.osr.com] On Behalf Of Ed In Calif
> Sent: Friday, October 28, 2005 10:14 PM
> To: Kernel Debugging Interest List
> Subject: [windbg] how to locate a crashed processor?
>
>
> We have a 16 processor server where one CPU is causing a crash. We can

> tell
> which one it is by using ~#. But how do we map this to a physical
> processor?
>
> If we are in the right context ie ~2, is there a way we can findout
the
> APIC
> Destination Register for this processor? This is the only way I can
think
> of
> to locate the physical processor failing.
>
> Ed in Calif
>
>
> —
> You are currently subscribed to windbg as: xxxxx@stonestreetone.com
To
> unsubscribe send a blank email to xxxxx@lists.osr.com
>
>
>
>
> —
> You are currently subscribed to windbg as: xxxxx@comcast.net
> To unsubscribe send a blank email to xxxxx@lists.osr.com


You are currently subscribed to windbg as: xxxxx@osr.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Unfortunately, the number of the processor is entirely controlled by
hardware. I once had a discussion about a related issue (“which
processor is the boot processor”) on a NUMA architecture x86 box that
supported up to 32 processors. The choice was, in fact, somewhat random
and you weren’t guaranteed that the association would remain the same.
That would make even the APIC technique a bit unstable.

The advantage the temperature probe idea has is that you could actually
figure it out. However, your hardware manual (motherboard) should
include information about the appearance of CPUs. If not, this is a
reasonable question for their tech support department. What you
really want to do is execute a CPUID query on the processor (if the
assignment is fixed, that works pretty well) because you can then figure
out the processor’s serial number - not that reading the serial number
from the PHYSICAL processor is easy (generally, it involves removing the
heat sink and cleaning the heat transfer compound). Indeed, many CPUs
now have internal temperature monitoring hardware - my system certainly
does, and I monitor that temperature (the machine is configured to shut
down should the temp rise too much, in fact.)

Unfortunately, because this is a characteristic of the actual hardware,
there’s not much the debugger can really *do* to make this easy.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

Looking forward to seeing you at the next OSR File Systems class in Los
Angeles, CA October 24-27, 2005.

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Ed In Calif
Sent: Monday, October 31, 2005 11:05 PM
To: Kernel Debugging Interest List
Subject: Re: [windbg] how to locate a crashed processor?

I don’t know of ‘slot numbers’ of processors.

----- Original Message -----
From: “Satya Das”
To: “Kernel Debugging Interest List”
Sent: Monday, October 31, 2005 3:55 PM
Subject: RE: [windbg] how to locate a crashed processor?

Isn’t the processor number pretty much determined by the slot, no ? The
hardware documentation should have this covered.


You are currently subscribed to windbg as: unknown lmsubst tag argument:
‘’
To unsubscribe send a blank email to xxxxx@lists.osr.com


You are currently subscribed to windbg as: xxxxx@osr.com
To unsubscribe send a blank email to xxxxx@lists.osr.com