hooking an interrupt by any means necessary

James_Harper · February 10, 2009, 4:43am

My Xen drivers are hanging when ‘coming back’ from a save/restore
operation, and I have identified that the hang is happening when I go
from DIRQL (or HIGH_LEVEL) back to DISPATCH_LEVEL. This hang never
occurs under the debugger so I’m getting a bit frustrated trying to find
it.

Using a tool called xentrace, I am seeing what I think is Xen sending
interrupts to vector 0x83 repeatedly, so what I want to do is hook
vector 0x83 (and maybe all ‘empty’ vectors?) and log a debug message via
ioport writes. This hooking will never ever make it into production code
so I pretty much don’t care what method I use to do it - I just want to
find the cause of this problem.

If I remember correctly, according to someone on this list who would
definitely know what he was talking about (doron?), IoConnectInterrupt
doesn’t really use the information passed to it when connecting an
interrupt, it uses ‘behind the scenes’ info from the pnp driver, and it
fails when I try to hook vector 0x83 (0x183) that way anyway.

So am I going to have to get the address of the IDT and hook via that,
or is there another way to do it?

Thanks

James

OSR_Community_User · February 10, 2009, 10:04am

James, could you break in with Windbg and use !idt to find out what normally connects to vector 0x83?

James_Harper · February 10, 2009, 6:50pm

>

James, could you break in with Windbg and use !idt to find out what
normally connects to vector 0x83?

Normally, nothing. It is the stub code that Windows puts there. The
problem I’m trying to solve is that I think that xen is triggering
interrupt vector 0x83 for some incorrect reason, and the best way I can
think of to prove it is to put something in vector 0x83 that dumps some
debug messages (which I can capture without the use of the debugger).

Remember, I cannot reproduce this problem when running windbg.

James

Don_Burn_1 · February 10, 2009, 8:04pm

James,

I have not tried this on a recent system but you might look at Dekker
and Newcomers Device Simulator
http://www.eclectic-eng.com/downloadableFiles/SimulatorAndDriver.zip They
had a way of hooking into a specific unused interrupt so they could register
an ISR, it might give you an approach.

–
Don Burn (MVP, Windows DDK)
Windows Filesystem and Driver Consulting
Website: http://www.windrvr.com
Blog: http://msmvps.com/blogs/WinDrvr

“James Harper” wrote in message
news:xxxxx@ntdev…
>
> James, could you break in with Windbg and use !idt to find out what
> normally connects to vector 0x83?
>

Normally, nothing. It is the stub code that Windows puts there. The
problem I’m trying to solve is that I think that xen is triggering
interrupt vector 0x83 for some incorrect reason, and the best way I can
think of to prove it is to put something in vector 0x83 that dumps some
debug messages (which I can capture without the use of the debugger).

Remember, I cannot reproduce this problem when running windbg.

James

anton_bassov · February 10, 2009, 8:17pm

James,

The problem I’m trying to solve is that I think that xen is triggering interrupt vector
0x83 for some incorrect reason

At the risk of sounding boring I will repeat myself - even if XEN starts raising interrupts for this or that reason,
it does not do it until you patch the kernel. Therefore, the problem is, apparently, rooted in your code.

If you don’t mind, could you please post all your code - let’s see if we are able to detect any bugs in it…

Anton Bassov

James_Harper · February 10, 2009, 9:06pm

> James,

> The problem I’m trying to solve is that I think that xen is
triggering
interrupt vector
> 0x83 for some incorrect reason

At the risk of sounding boring I will repeat myself - even if XEN
starts
raising interrupts for this or that reason,
it does not do it until you patch the kernel. Therefore, the problem
is,
apparently, rooted in your code.

You must be glossing over what I’m saying, in the cases I am referring
to, I am NOT patching the kernel. I am testing under Windows 2003
SP2 which never ever touches the TPR register, and does not need
patching.

I mentioned the patching because it was a good example of the need to
set all CPU’s at HIGH_LEVEL. The thing I am actually trying to do is put
windows into a state where xen can suspend it, make the suspend
hypercall, and then put things right again. After I make the suspend
hypercall though, the moment I enabled the interrupts again everything
hangs.

The code in question is:
. Raise to DIRQL (used to be HIGH_LEVEL, but I’m testing with DIRQL and
it still crashes).
. Make hypercall
. Lower back to DISPATCH_LEVEL

My theory now is that something goes wrong inside xen and it gets mixed
up about which interrupts go where and is calling the wrong vector. I
want to prove that to myself by having some of my code execute when that
vector is hit.

Please put patching the kernel out of your mind, in the case I am having
trouble with it is not done.

James

anton_bassov · February 10, 2009, 10:44pm

> I mentioned the patching because it was a good example of the need to set all CPU’s at HIGH_LEVEL.

my theory now is that something goes wrong inside xen and it gets mixed up about which
interrupts go where and is calling the wrong vector.

Apparently, I just got it wrong then. Never mind. Either way, the story is still the same - no matter what your code does, XEN is not confused about interrupts until you execute it. Therefore, the problem is still rooted in your code that makes hypercall, and this is the only thing I want you to pay attention at. It has nothing to do with IRQL and TPR - I am almost sure that if you use CLI-STI sequence instead of raising and lowering IRQL and leave the rest of your code intact you will still face exactly the same problem…

Anton Bassov

James_Harper · February 11, 2009, 12:38am

> > I mentioned the patching because it was a good example of the need
to

set all CPU’s at HIGH_LEVEL.

>my theory now is that something goes wrong inside xen and it gets
mixed
up about which
> interrupts go where and is calling the wrong vector.

Apparently, I just got it wrong then. Never mind. Either way, the
story is
still the same - no matter what your code does, XEN is not confused
about
interrupts until you execute it. Therefore, the problem is still
rooted in
your code that makes hypercall, and this is the only thing I want you
to
pay attention at. It has nothing to do with IRQL and TPR - I am almost
sure that if you use CLI-STI sequence instead of raising and lowering
IRQL
and leave the rest of your code intact you will still face exactly the
same problem…

Yes, and in fact I have tested with CLI-STI, and yes, it crashes after
STI. This is the reason why I started a new thread with a different
subject. I know the problem has nothing to do with IRQL, TPR, CLI, or
STI. I suspect that the problem is that when Xen starts my domain up
again after the suspend operation, it leaves an interrupt pin ‘stuck’. I
run a tool called ‘xentrace’ and (I think) it shows that interrupt
vector 0x83 is being triggered over and over again, but I want to be
sure before I investigate the xen restore code any further.

Nothing is using vector 0x83 (as determined from windbg), so if it is
indeed being fired over and over again then I can be more sure about the
problem. I want to hook interrupt vector 0x83 (by any means necessary,
as the subject says) to be certain that it is indeed that vector that is
being fired repeatedly.

Subsequent testing of xentrace shows that on a different occasion,
vector 0x93 is being fired repeatedly instead of 0x83, so it may be that
a random pin is getting stuck, in which case I will want to hook all
unused vectors.

James

Scott_Noone_OSR · February 11, 2009, 11:53am

>I want to hook interrupt vector 0x83 (by any means necessary,

as the subject says) to be certain that it is indeed that vector that is
being fired repeatedly.

Hooking the IDT for this purpose is more of a platform architecture question
than a Windows driver one. If all you need to do is hook the entry so you
can output “I’m here!” to an I/O port, chapter three of the Intel reference
manuals has pretty much everything you need. (Of course, once you have
conducted this experiment you should promptly toss the code away)

-scott

–
Scott Noone
Software Engineer
OSR Open Systems Resources, Inc.
http://www.osronline.com

“James Harper” wrote in message
news:xxxxx@ntdev…
> > I mentioned the patching because it was a good example of the need
to
> set all CPU’s at HIGH_LEVEL.
>
> >my theory now is that something goes wrong inside xen and it gets
mixed
> up about which
> > interrupts go where and is calling the wrong vector.
>
>
> Apparently, I just got it wrong then. Never mind. Either way, the
story is
> still the same - no matter what your code does, XEN is not confused
about
> interrupts until you execute it. Therefore, the problem is still
rooted in
> your code that makes hypercall, and this is the only thing I want you
to
> pay attention at. It has nothing to do with IRQL and TPR - I am almost
> sure that if you use CLI-STI sequence instead of raising and lowering
IRQL
> and leave the rest of your code intact you will still face exactly the
> same problem…

Yes, and in fact I have tested with CLI-STI, and yes, it crashes after
STI. This is the reason why I started a new thread with a different
subject. I know the problem has nothing to do with IRQL, TPR, CLI, or
STI. I suspect that the problem is that when Xen starts my domain up
again after the suspend operation, it leaves an interrupt pin ‘stuck’. I
run a tool called ‘xentrace’ and (I think) it shows that interrupt
vector 0x83 is being triggered over and over again, but I want to be
sure before I investigate the xen restore code any further.

Nothing is using vector 0x83 (as determined from windbg), so if it is
indeed being fired over and over again then I can be more sure about the
problem. I want to hook interrupt vector 0x83 (by any means necessary,
as the subject says) to be certain that it is indeed that vector that is
being fired repeatedly.

Subsequent testing of xentrace shows that on a different occasion,
vector 0x93 is being fired repeatedly instead of 0x83, so it may be that
a random pin is getting stuck, in which case I will want to hook all
unused vectors.

James

James_Harper · February 11, 2009, 5:34pm

> >I want to hook interrupt vector 0x83 (by any means necessary,

>as the subject says) to be certain that it is indeed that vector that
is
>being fired repeatedly.

Hooking the IDT for this purpose is more of a platform architecture
question
than a Windows driver one. If all you need to do is hook the entry so
you
can output “I’m here!” to an I/O port, chapter three of the Intel
reference
manuals has pretty much everything you need. (Of course, once you have
conducted this experiment you should promptly toss the code away)

Thanks. That’s the sort of answer I was looking for. I didn’t know if
there was an API to do what I wanted, or if I needed to go down to the
IDT level.

Thanks again.

James

anton_bassov · February 11, 2009, 9:03pm

Once this interrupt is not used by the OS anyway, what I would advise you to do here is not to hook it but to actually provide your own handler that does not transfer control to the OS-provided stub but does IRETD without acknowledging interrupt Don’t issue EOI from the handler, but, instead, put this part into a code that follows KeLowerIrql(), and set breakpoints on your stub and on line that issues EOI . If your theory about interrupt storm is correct you will see the sequence bp1-bp2-bp1-bp2-etc in a debugger…

Anton Bassov

OSR_Community_User · February 14, 2009, 10:23pm

James:

I haven’t really been reading ntdev much over the past two days, so with that caveat in mind, here are my late night thoughts, submitted for your consideration:

if you’re going to proceed down the route of tracking int x83, get an Arium

You’ve probably already paid for it over the past couple of weeks, and if you want visibility under windbg - it’s the easiest, most reliable way (not than an ECM-XDP would ever be mistaken for rock solid in the way of reliability). Yes, you could go about it the idt way, but there situation is already so complicated and more troublingly somewhat opaque, and you’re also saying that you’re not sure that you can trust Xen either, so I don’t see how you could trust an (real) idt entry either, as Xen could ursurp that as well, and tracking all of that down will take time without some hardware assistance. It would also allow you to more easily screw around with the various platform interrupts and other facilities provided by your chipset, should you get really desperate.

That being said, if you want to hook all unused vectors for a test, just overwrite KiUnhandledException (or whatever it’s called); as long as you have a kd connection, patchguard shouldn’t be an issue, but again, if you’re looking to stomp out Hesienbergs, this is not a very good idea, in my opinion.

Q:

When you say ‘suspend’ and ‘resume,’ do you mean like ACPI or are you rolling your own? If the former, you might want to check out the acpikd kd extension (!acpikd & !amli), and also in that case 0x83 & 0x93 are curious values.

FINALLY:

It’s seems to have been suggested a few times already, but here it is one more time - we need to see your code. We just can’t reasonably be expected to treat a hypercall involving suspend/resume as a ‘black box,’ in my opinion.

Good luck,

mm

James_Harper · February 14, 2009, 10:33pm

> Nothing is using vector 0x83 (as determined from windbg), so if it is

indeed being fired over and over again then I can be more sure about
the
problem. I want to hook interrupt vector 0x83 (by any means necessary,
as the subject says) to be certain that it is indeed that vector that
is
being fired repeatedly.

Subsequent testing of xentrace shows that on a different occasion,
vector 0x93 is being fired repeatedly instead of 0x83, so it may be
that
a random pin is getting stuck, in which case I will want to hook all
unused vectors.

I decided that instead of hooking the interrupt to see which one(s) were
stuck, I could just analyze the IRR registers in the APIC, and sure
enough vector 0x93 is being triggered repeatedly. So I have the
information I need to pursue the problem in the Xen side of things…

Thanks!

James