mailing list playing up? RE: hooking an interrupt by any means necessary

I seem to be getting some message repeats… is the mailing list playing
up, or is it just me?

I suspect the former as someone else has just chastised someone for
asking the same question again when it was already answered before :slight_smile:

James

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-354715-
xxxxx@lists.osr.com] On Behalf Of James Harper
Sent: Wednesday, 11 February 2009 20:47
To: Windows System Software Devs Interest List
Subject: RE: [ntdev] hooking an interrupt by any means necessary

> Nothing is using vector 0x83 (as determined from windbg), so if it
is
> indeed being fired over and over again then I can be more sure about
the
> problem. I want to hook interrupt vector 0x83 (by any means
necessary,
> as the subject says) to be certain that it is indeed that vector
that
is
> being fired repeatedly.
>
> Subsequent testing of xentrace shows that on a different occasion,
> vector 0x93 is being fired repeatedly instead of 0x83, so it may be
that
> a random pin is getting stuck, in which case I will want to hook all
> unused vectors.
>

I decided that instead of hooking the interrupt to see which one(s)
were
stuck, I could just analyze the IRR registers in the APIC, and sure
enough vector 0x93 is being triggered repeatedly. So I have the
information I need to pursue the problem in the Xen side of things…

Thanks!

James


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

I use NNTP for these newsgroups and I see the multiple posts with different
sent times. If there is a bug in the software OSR is running, I am sure
they will get it fixed.

“James Harper” wrote in message
news:xxxxx@ntdev…
I seem to be getting some message repeats… is the mailing list playing
up, or is it just me?

I suspect the former as someone else has just chastised someone for
asking the same question again when it was already answered before :slight_smile:

James

> -----Original Message-----
> From: xxxxx@lists.osr.com [mailto:bounce-354715-
> xxxxx@lists.osr.com] On Behalf Of James Harper
> Sent: Wednesday, 11 February 2009 20:47
> To: Windows System Software Devs Interest List
> Subject: RE: [ntdev] hooking an interrupt by any means necessary
>
> > Nothing is using vector 0x83 (as determined from windbg), so if it
is
> > indeed being fired over and over again then I can be more sure about
> the
> > problem. I want to hook interrupt vector 0x83 (by any means
necessary,
> > as the subject says) to be certain that it is indeed that vector
that
> is
> > being fired repeatedly.
> >
> > Subsequent testing of xentrace shows that on a different occasion,
> > vector 0x93 is being fired repeatedly instead of 0x83, so it may be
> that
> > a random pin is getting stuck, in which case I will want to hook all
> > unused vectors.
> >
>
> I decided that instead of hooking the interrupt to see which one(s)
were
> stuck, I could just analyze the IRR registers in the APIC, and sure
> enough vector 0x93 is being triggered repeatedly. So I have the
> information I need to pursue the problem in the Xen side of things…
>
> Thanks!
>
> James
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer

> James:

I haven’t really been reading ntdev much over the past two days, so
with
that caveat in mind, here are my late night thoughts, submitted for
your
consideration:

  • if you’re going to proceed down the route of tracking int x83, get
    an
    Arium

You’ve probably already paid for it over the past couple of weeks, and
if
you want visibility under windbg - it’s the easiest, most reliable way
(not than an ECM-XDP would ever be mistaken for rock solid in the way
of
reliability). Yes, you could go about it the idt way, but there
situation
is already so complicated and more troublingly somewhat opaque, and
you’re
also saying that you’re not sure that you can trust Xen either, so I
don’t
see how you could trust an (real) idt entry either, as Xen could
ursurp
that as well, and tracking all of that down will take time without
some
hardware assistance. It would also allow you to more easily screw
around
with the various platform interrupts and other facilities provided by
your
chipset, should you get really desperate.

Xen presents a virtual LAPIC to windows… if the Arium is a hardware
device (I’ve never heard of it before) then it would not be of any use
here, as the physical interrupts are fine, it’s only the virtual
interrupts.

That being said, if you want to hook all unused vectors for a test,
just
overwrite KiUnhandledException (or whatever it’s called); as long as
you
have a kd connection, patchguard shouldn’t be an issue, but again, if
you’re looking to stomp out Hesienbergs, this is not a very good idea,
in
my opinion.

Well the problem is actually fixed now. It turned out that qemu-dm (the
part of xen that emulates a physical machine for HVM domains) wasn’t
re-activating USB on startup which meant that the interrupt was getting
stuck if it was asserted before suspend.

To prove to myself that the problem was an interrupt vector getting
called repeatedly, I stayed at DIRQL and parsed the IRR in the LAPIC and
could see the interrupt getting turned on and staying on.

Q:

When you say ‘suspend’ and ‘resume,’ do you mean like ACPI or are you
rolling your own? If the former, you might want to check out the
acpikd
kd extension (!acpikd & !amli), and also in that case 0x83 & 0x93 are
curious values.

FINALLY:

It’s seems to have been suggested a few times already, but here it is
one
more time - we need to see your code. We just can’t reasonably be
expected to treat a hypercall involving suspend/resume as a ‘black
box,’
in my opinion.

To satisfy your curiosity :slight_smile: and remind myself of the code paths
involved (it’s been a while since I looked at them) the hypercall
sequence is:

"
cancelled = hvm_shutdown(xpdd, SHUTDOWN_suspend);
"

hvm_shutdown is:

"
static __inline int
hvm_shutdown(PXENPCI_DEVICE_DATA xpdd, unsigned int reason)
{
struct sched_shutdown ss;
int retval;

ss.reason = reason;
retval = HYPERVISOR_sched_op(xpdd, SCHEDOP_shutdown, &ss);
return retval;
}
"

And HYPERVISOR_sched_op (under x86 where I’m testing - obviously it’s
different for amd64) is:

"
static __inline int
HYPERVISOR_sched_op(PXENPCI_DEVICE_DATA xpdd, int cmd, void *arg)
{
char *hypercall_stubs = xpdd->hypercall_stubs;
long __res;
__asm {
mov ebx, cmd
mov ecx, arg
mov eax, hypercall_stubs
add eax, (__HYPERVISOR_sched_op * 32)
call eax
mov [__res], eax
}
return __res;
}
"

xpdd is my device context, and hypercall_stubs is a memory area returned
by a cpuid instruction as part of the setup. The asm stuff is necessary
to thunk between calling conventions. Xen receives the hypercall
successfully and does the shutdown, but things were going wrong on
restore because of the aforementioned stuck usb interrupt.

Now I did mention earlier that there was nothing attached to 0x83, but
it turns out that I was incorrect. I can’t explain why windbg didn’t
show me though.

Thanks for the response!

James

My bad; I thought you were talking about hooking physical interrupts,
but I guess what I get for not reading the whole thread carefully.

As far as qemu, that’s what I thought might be involved, from taking a
look at ‘dtdt.asl’ (under ‘hvmloader’).

Good luck,

mm

James Harper wrote:

> James:
>
> I haven’t really been reading ntdev much over the past two days, so
with
> that caveat in mind, here are my late night thoughts, submitted for
your
> consideration:
>
> - if you’re going to proceed down the route of tracking int x83, get
an
> Arium
>
> You’ve probably already paid for it over the past couple of weeks, and
if
> you want visibility under windbg - it’s the easiest, most reliable way
> (not than an ECM-XDP would ever be mistaken for rock solid in the way
of
> reliability). Yes, you could go about it the idt way, but there
situation
> is already so complicated and more troublingly somewhat opaque, and
you’re
> also saying that you’re not sure that you can trust Xen either, so I
don’t
> see how you could trust an (real) idt entry either, as Xen could
ursurp
> that as well, and tracking all of that down will take time without
some
> hardware assistance. It would also allow you to more easily screw
around
> with the various platform interrupts and other facilities provided by
your
> chipset, should you get really desperate.

Xen presents a virtual LAPIC to windows… if the Arium is a hardware
device (I’ve never heard of it before) then it would not be of any use
here, as the physical interrupts are fine, it’s only the virtual
interrupts.

> That being said, if you want to hook all unused vectors for a test,
just
> overwrite KiUnhandledException (or whatever it’s called); as long as
you
> have a kd connection, patchguard shouldn’t be an issue, but again, if
> you’re looking to stomp out Hesienbergs, this is not a very good idea,
in
> my opinion.

Well the problem is actually fixed now. It turned out that qemu-dm (the
part of xen that emulates a physical machine for HVM domains) wasn’t
re-activating USB on startup which meant that the interrupt was getting
stuck if it was asserted before suspend.

To prove to myself that the problem was an interrupt vector getting
called repeatedly, I stayed at DIRQL and parsed the IRR in the LAPIC and
could see the interrupt getting turned on and staying on.

> Q:
>
> When you say ‘suspend’ and ‘resume,’ do you mean like ACPI or are you
> rolling your own? If the former, you might want to check out the
acpikd
> kd extension (!acpikd & !amli), and also in that case 0x83 & 0x93 are
> curious values.
>
> FINALLY:
>
> It’s seems to have been suggested a few times already, but here it is
one
> more time - we need to see your code. We just can’t reasonably be
> expected to treat a hypercall involving suspend/resume as a ‘black
box,’
> in my opinion.
>

To satisfy your curiosity :slight_smile: and remind myself of the code paths
involved (it’s been a while since I looked at them) the hypercall
sequence is:

"
cancelled = hvm_shutdown(xpdd, SHUTDOWN_suspend);
"

hvm_shutdown is:

"
static __inline int
hvm_shutdown(PXENPCI_DEVICE_DATA xpdd, unsigned int reason)
{
struct sched_shutdown ss;
int retval;

ss.reason = reason;
retval = HYPERVISOR_sched_op(xpdd, SCHEDOP_shutdown, &ss);
return retval;
}
"

And HYPERVISOR_sched_op (under x86 where I’m testing - obviously it’s
different for amd64) is:

"
static __inline int
HYPERVISOR_sched_op(PXENPCI_DEVICE_DATA xpdd, int cmd, void *arg)
{
char *hypercall_stubs = xpdd->hypercall_stubs;
long __res;
__asm {
mov ebx, cmd
mov ecx, arg
mov eax, hypercall_stubs
add eax, (__HYPERVISOR_sched_op * 32)
call eax
mov [__res], eax
}
return __res;
}
"

xpdd is my device context, and hypercall_stubs is a memory area returned
by a cpuid instruction as part of the setup. The asm stuff is necessary
to thunk between calling conventions. Xen receives the hypercall
successfully and does the shutdown, but things were going wrong on
restore because of the aforementioned stuck usb interrupt.

Now I did mention earlier that there was nothing attached to 0x83, but
it turns out that I was incorrect. I can’t explain why windbg didn’t
show me though.

Thanks for the response!

James