Debugging a System Hang

OSR_Community_User · June 24, 2002, 3:35pm

Hi all,

I have a problem with a system hanging, and I am trying to diagnose the
cause. I would appreciate any suggestions you could offer in my debug
analysis procedure. I am debugging an NT4.0 SP5 server. The system is hung

Since the system is hung, I’ve connected a WinDbg 4.0.0018.0

The first step I did was !teb. It returned among other things: ClientId:
125.142

I also did a “!process -1” which showed me the current process and all the
threads. The base information it displayed was:
PROCESS 80e74300 Cid: 0125 Peb: 7ffdf000 ParentCid: 002d
DirBase: 15ef0000 ObjectTable: 81f24f08 TableSize: 606.
Image: AppNumberOne.ex
VadRoot 80e83408 Clone 0 Private 16753. Modified 1088783. Locked 0.
80E744BC MutantState Signalled OwningThread 0
Token e13fa030
ElapsedTime 22:16:34.0218
UserTime 0:05:18.0453
KernelTime 0:18:34.0203
QuotaPoolUsage[PagedPool] 67982
QuotaPoolUsage[NonPagedPool] 5399032
Working Set Sizes (now,min,max) (17389, 50, 345) (69556KB, 200KB,
1380KB)
PeakWorkingSetSize 18063
VirtualSize 157 Mb
PeakVirtualSize 164 Mb
PageFaultCount 26984367
MemoryPriority BACKGROUND
BasePriority 8
CommitCharge 17422

There was only one thread listed in a “running” state. 125.142 whose thread
id was 81e06020

Next I did a .thread 81e06020

Then I typed a KB command which yielded some interesting results.
ChildEBP RetAddr
f52e16a0 80119594 hal!KeAcquireSpinLockRaiseToSynch+0x34
f52e16b0 80112b35 nt!KeInsertQueueApc+0x12
f52e16dc eb0e2130 nt!IofCompleteRequest+0x201
WARNING: Stack unwind information not available. Following frames may be
wrong.
80b12034 0e1fb000 MyDriver+0x2130
0690b000 00000000 0xe1fb000

Note that “MyDriver” is NOT called by this application, but is called by
another application running in the system.

I also typed “!pcr” which yielded “Irql: 00000000”

So now that I see this thread spinning in “KeAcquireSpinLockRaiseToSynch”,
I typed “!locks” which displayed (among other things) the following:
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks…

Resource @ 0x81f2d4e8 Shared 2 owning threads
Threads: 81e06020-02 80f937c0-01

Now I see that thread 81e06020 (hey that was the same as above) is using a
resource that thread 80f937c0 is using. So I did a “.thread 80f937c0” and
then a “kb” and this is what I got:

ChildEBP RetAddr
eb41f7ec 801185c2 nt!KiSwapThread+0x1b1
eb41f810 8027e5ac nt!KeWaitForSingleObject+0x1b8
eb41f82c 8027d9e7 Ntfs!NtfsWaitSync+0x18
eb41fa10 8027fcae Ntfs!NtfsNonCachedIo+0x1a5
eb41fc4c 8027eb8d Ntfs!NtfsCommonWrite_6695+0x36
eb41fcc0 801128af Ntfs!NtfsFsdWrite+0xcc
eb41fcd4 80113816 nt!IofCallDriver+0x37
eb41fcec 80125439 nt!IoSynchronousPageWrite+0xb2
eb41fdc8 8012507c nt!MiFlushSectionInternal+0x36f
eb41fe04 80104222 nt!MmFlushSection+0x128
eb41feb4 80103bca nt!CcFlushCache+0x3b6
eb41fef8 80108f87 nt!CcWriteBehind+0xf0
eb41ff34 8010bcdd nt!CcWorkerThread+0xc7
eb41ff4c 80139bde nt!ExpWorkerThread+0x73
eb41ff7c 8014563e nt!PspSystemThreadStartup+0x54
00000000 00000000 nt!KiThreadStartup+0x16

So… now what? Did I misinterpret or miss something along the way? What
did it mean when it said “MutantState Signalled OwningThread 0” in the
“!process -1” command’s output? Also, the “Working set sizes” seemed very
odd. What should I be asking myself that I am missing?

I know this seems like a lot of information (to me), but any suggestions
would be greatly appreciated.

Thanks,
Joe D.

Peter_Viscarola_OSR · June 25, 2002, 12:53pm

“Joe D” wrote in message news:xxxxx@ntdev…
>
>
> Then I typed a KB command which yielded some interesting results.
> ChildEBP RetAddr
> f52e16a0 80119594 hal!KeAcquireSpinLockRaiseToSynch+0x34
> f52e16b0 80112b35 nt!KeInsertQueueApc+0x12
> f52e16dc eb0e2130 nt!IofCompleteRequest+0x201
>

I don’t know the ultimate cause of your hang – But I can tell you from
looking at the stack of the spinning thread, that this shows a driver (your
driver?) completing an I/O request in an arbitrary thread context (You’ve
called IoCompleteRequest(…), and that function is in the process of
queuing the “special kernel mode APC for I/O completion”), and this thread
is spinning on the dispatcher database lock. You’re also (obviously)
running on an MP system.

If the problem is easy to repro, your first order of business is to repro
the problem with the checked kernel and HAL (in the case that you’re not
doing that already).

Also, in case you’re not aware of it, !locks only shows ERESOURCES, not spin
locks. So that won’t be too much help. Are you using ERESOURCES? If so,
are you disabling normal kernel APC delivery during the time you hold the
resource?

What’s are the threads running on the other CPUs doing?

Peter
OSR

OSR_Community_User · June 25, 2002, 1:30pm

Hi Peter,

Also, in case you’re not aware of it, !locks only
shows
ERESOURCES, not spin
locks. So that won’t be too much help.

So, what if you were interested in spinlocks? Are you
hopelessly
stuck?

Rgds,
Rob

-----Original Message-----
From: Peter Viscarola [mailto:xxxxx@osr.com]
Sent: Tuesday, June 25, 2002 12:51 PM
To: NT Developers Interest List
Subject: [ntdev] Re: Debugging a System Hang

“Joe D” wrote in message
news:xxxxx@ntdev…
> >
> >
> > Then I typed a KB command which yielded some
interesting results.
> > ChildEBP RetAddr
> > f52e16a0 80119594
hal!KeAcquireSpinLockRaiseToSynch+0x34
> > f52e16b0 80112b35 nt!KeInsertQueueApc+0x12
> > f52e16dc eb0e2130 nt!IofCompleteRequest+0x201
> >
>
> I don’t know the ultimate cause of your hang – But
I can
> tell you from
> looking at the stack of the spinning thread, that
this shows
> a driver (your
> driver?) completing an I/O request in an arbitrary
thread
> context (You’ve
> called IoCompleteRequest(…), and that function is
in the process of
> queuing the “special kernel mode APC for I/O
completion”),
> and this thread
> is spinning on the dispatcher database lock. You’re
also (obviously)
> running on an MP system.
>
> If the problem is easy to repro, your first order of
business
> is to repro
> the problem with the checked kernel and HAL (in the
case that
> you’re not
> doing that already).
>
> Also, in case you’re not aware of it, !locks only
shows
> ERESOURCES, not spin
> locks. So that won’t be too much help. Are you
using
> ERESOURCES? If so,
> are you disabling normal kernel APC delivery during
the time
> you hold the
> resource?
>
> What’s are the threads running on the other CPUs
doing?
>
> Peter
> OSR
>

__________________________________________________
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

OSR_Community_User · June 25, 2002, 1:42pm

you just look at all the processors. If a thread is holding a spinlock
it will be running on that processor until the lock is released.

-p

-----Original Message-----
From: Rob Montalvo [mailto:xxxxx@yahoo.com]
Sent: Tuesday, June 25, 2002 10:27 AM
To: NT Developers Interest List
Subject: [ntdev] Re: Debugging a System Hang

Hi Peter,

Also, in case you’re not aware of it, !locks only
shows
ERESOURCES, not spin
locks. So that won’t be too much help.

So, what if you were interested in spinlocks? Are you hopelessly stuck?

Rgds,
Rob

-----Original Message-----
From: Peter Viscarola [mailto:xxxxx@osr.com]
Sent: Tuesday, June 25, 2002 12:51 PM
To: NT Developers Interest List
Subject: [ntdev] Re: Debugging a System Hang

“Joe D” wrote in message
news:xxxxx@ntdev…
> >
> >
> > Then I typed a KB command which yielded some
interesting results.
> > ChildEBP RetAddr
> > f52e16a0 80119594
hal!KeAcquireSpinLockRaiseToSynch+0x34
> > f52e16b0 80112b35 nt!KeInsertQueueApc+0x12
> > f52e16dc eb0e2130 nt!IofCompleteRequest+0x201
> >
>
> I don’t know the ultimate cause of your hang – But
I can
> tell you from
> looking at the stack of the spinning thread, that
this shows
> a driver (your
> driver?) completing an I/O request in an arbitrary
thread
> context (You’ve
> called IoCompleteRequest(…), and that function is
in the process of
> queuing the “special kernel mode APC for I/O
completion”),
> and this thread
> is spinning on the dispatcher database lock. You’re
also (obviously)
> running on an MP system.
>
> If the problem is easy to repro, your first order of
business
> is to repro
> the problem with the checked kernel and HAL (in the
case that
> you’re not
> doing that already).
>
> Also, in case you’re not aware of it, !locks only
shows
> ERESOURCES, not spin
> locks. So that won’t be too much help. Are you
using
> ERESOURCES? If so,
> are you disabling normal kernel APC delivery during
the time
> you hold the
> resource?
>
> What’s are the threads running on the other CPUs
doing?
>
> Peter
> OSR
>

__________________________________________________
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

—
You are currently subscribed to ntdev as: xxxxx@microsoft.com To
unsubscribe send a blank email to %%email.unsub%%

OSR_Community_User · June 25, 2002, 1:58pm

Peter,

May you suggest a strategy? I.e., assuming you were
using WinDbg, which commands would you type to get
this information?

Thank you in advance,
Rob

— Peter Wieland
wrote:
> you just look at all the processors. If a thread is
> holding a spinlock
> it will be running on that processor until the lock
> is released.
>
> -p
>
> -----Original Message-----
> From: Rob Montalvo [mailto:xxxxx@yahoo.com]
> Sent: Tuesday, June 25, 2002 10:27 AM
> To: NT Developers Interest List
> Subject: [ntdev] Re: Debugging a System Hang
>
>
> Hi Peter,
>
> > Also, in case you’re not aware of it, !locks only
> shows
> > ERESOURCES, not spin
> > locks. So that won’t be too much help.
>
> So, what if you were interested in spinlocks? Are
> you hopelessly stuck?
>
> Rgds,
> Rob
>
> > -----Original Message-----
> > From: Peter Viscarola [mailto:xxxxx@osr.com]
> > Sent: Tuesday, June 25, 2002 12:51 PM
> > To: NT Developers Interest List
> > Subject: [ntdev] Re: Debugging a System Hang
> >
> >
> > “Joe D” wrote in message
> news:xxxxx@ntdev…
> > >
> > >
> > > Then I typed a KB command which yielded some
> interesting results.
> > > ChildEBP RetAddr
> > > f52e16a0 80119594
> hal!KeAcquireSpinLockRaiseToSynch+0x34
> > > f52e16b0 80112b35 nt!KeInsertQueueApc+0x12
> > > f52e16dc eb0e2130
> nt!IofCompleteRequest+0x201
> > >
> >
> > I don’t know the ultimate cause of your hang –
> But
> I can
> > tell you from
> > looking at the stack of the spinning thread, that
> this shows
> > a driver (your
> > driver?) completing an I/O request in an arbitrary
> thread
> > context (You’ve
> > called IoCompleteRequest(…), and that function
> is
> in the process of
> > queuing the “special kernel mode APC for I/O
> completion”),
> > and this thread
> > is spinning on the dispatcher database lock.
> You’re
> also (obviously)
> > running on an MP system.
> >
> > If the problem is easy to repro, your first order
> of
> business
> > is to repro
> > the problem with the checked kernel and HAL (in
> the
> case that
> > you’re not
> > doing that already).
> >
> > Also, in case you’re not aware of it, !locks only
> shows
> > ERESOURCES, not spin
> > locks. So that won’t be too much help. Are you
> using
> > ERESOURCES? If so,
> > are you disabling normal kernel APC delivery
> during
> the time
> > you hold the
> > resource?
> >
> > What’s are the threads running on the other CPUs
> doing?
> >
> > Peter
> > OSR
> >
>
>
>
> Do You Yahoo!?
> Yahoo! - Official partner of 2002 FIFA World Cup
> http://fifaworldcup.yahoo.com
>
> —
> You are currently subscribed to ntdev as:
> xxxxx@microsoft.com To
> unsubscribe send a blank email to %%email.unsub%%
>
> —
> You are currently subscribed to ntdev as:
> xxxxx@yahoo.com
> To unsubscribe send a blank email to
%%email.unsub%%

Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

OSR_Community_User · June 25, 2002, 2:08pm

Peter,

Thanks for the info. We are not using ERESOURCES in our driver. We are
running the MP version of NT, but we only have a single processor installed.

The problem occurs sporadically (it took a couple of days to reproduce
this one), but we can hopefully do it again. What would be the benefit of
running the checked kernel and HAL? I vaguely remember seeing a windbg
command that applied to checked version only, but I forget which.

How do you know that the IoCompleteRequest is running in an arbitrary
thread? Is it because the Stack Unwind Information was not available, or is
it just due to the general nature of drivers (completing IRPs in DPCs).

Also, given the stack trace of the following:
ChildEBP RetAddr Args to Child
f52e16a0 80119594 81df7128 820707c0 f52e16dc
hal!KeAcquireSpinLockRaiseToSynch+0x34
f52e16b0 80112b35 81df7128 ff443688 00000000
nt!KeInsertQueueApc+0x12
f52e16dc eb0e2130 80b1200c 00000002 80b12000
nt!IofCompleteRequest+0x201

I tried to do a “!irp 80b1200c” command and windbg returned something about
the tag not being correct, and was probably not an IRP. Does this indicate
any significant or did I just not interpret that stack infomormation
correctly.

Thanks,
Joe

OSR_Community_User · June 25, 2002, 2:54pm

look at “multiprocessor syntax” in the windbg help. The same commands
will work in kd too.

k where is the processor number will get you a stack trace from
that processor.

~s will switch to the specified processor.

-p

-----Original Message-----
From: Rob Montalvo [mailto:xxxxx@yahoo.com]
Sent: Tuesday, June 25, 2002 10:55 AM
To: NT Developers Interest List
Subject: [ntdev] Re: Debugging a System Hang

Peter,

May you suggest a strategy? I.e., assuming you were
using WinDbg, which commands would you type to get
this information?

Thank you in advance,
Rob

— Peter Wieland
wrote:
> you just look at all the processors. If a thread is
> holding a spinlock
> it will be running on that processor until the lock
> is released.
>
> -p
>
> -----Original Message-----
> From: Rob Montalvo [mailto:xxxxx@yahoo.com]
> Sent: Tuesday, June 25, 2002 10:27 AM
> To: NT Developers Interest List
> Subject: [ntdev] Re: Debugging a System Hang
>
>
> Hi Peter,
>
> > Also, in case you’re not aware of it, !locks only
> shows
> > ERESOURCES, not spin
> > locks. So that won’t be too much help.
>
> So, what if you were interested in spinlocks? Are
> you hopelessly stuck?
>
> Rgds,
> Rob
>
> > -----Original Message-----
> > From: Peter Viscarola [mailto:xxxxx@osr.com]
> > Sent: Tuesday, June 25, 2002 12:51 PM
> > To: NT Developers Interest List
> > Subject: [ntdev] Re: Debugging a System Hang
> >
> >
> > “Joe D” wrote in message
> news:xxxxx@ntdev…
> > >
> > >
> > > Then I typed a KB command which yielded some
> interesting results.
> > > ChildEBP RetAddr
> > > f52e16a0 80119594
> hal!KeAcquireSpinLockRaiseToSynch+0x34
> > > f52e16b0 80112b35 nt!KeInsertQueueApc+0x12
> > > f52e16dc eb0e2130
> nt!IofCompleteRequest+0x201
> > >
> >
> > I don’t know the ultimate cause of your hang –
> But
> I can
> > tell you from
> > looking at the stack of the spinning thread, that
> this shows
> > a driver (your
> > driver?) completing an I/O request in an arbitrary
> thread
> > context (You’ve
> > called IoCompleteRequest(…), and that function
> is
> in the process of
> > queuing the “special kernel mode APC for I/O
> completion”),
> > and this thread
> > is spinning on the dispatcher database lock.
> You’re
> also (obviously)
> > running on an MP system.
> >
> > If the problem is easy to repro, your first order
> of
> business
> > is to repro
> > the problem with the checked kernel and HAL (in
> the
> case that
> > you’re not
> > doing that already).
> >
> > Also, in case you’re not aware of it, !locks only
> shows
> > ERESOURCES, not spin
> > locks. So that won’t be too much help. Are you
> using
> > ERESOURCES? If so,
> > are you disabling normal kernel APC delivery
> during
> the time
> > you hold the
> > resource?
> >
> > What’s are the threads running on the other CPUs
> doing?
> >
> > Peter
> > OSR
> >
>
>
>
> Do You Yahoo!?
> Yahoo! - Official partner of 2002 FIFA World Cup
> http://fifaworldcup.yahoo.com
>
> —
> You are currently subscribed to ntdev as: xxxxx@microsoft.com To
> unsubscribe send a blank email to %%email.unsub%%
>
> —
> You are currently subscribed to ntdev as: xxxxx@yahoo.com
> To unsubscribe send a blank email to
%%email.unsub%%

Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

—
You are currently subscribed to ntdev as: xxxxx@microsoft.com To
unsubscribe send a blank email to %%email.unsub%%

Peter_Viscarola_OSR · June 25, 2002, 3:47pm

“Joe D” wrote in message news:xxxxx@ntdev…
>
> Peter,
>
> Thanks for the info. We are not using ERESOURCES in our driver. We
are
> running the MP version of NT, but we only have a single processor
installed.
>

Hmmm… That makes the probelm even stranger, then. It’s not like there’s
another processor that could be holding the dispatcher database lock, right?
Weird…

> What would be the benefit of
> running the checked kernel and HAL?
>

Oh, one of my FAVORITE questions:

If you’re not testing your driver with the checked kernel and HAL, you’re
not testing your driver properly.

The checked kernel and HAL have lots of “cross-checking” built into them.
This ranges from parameter validation for various DDK functions, to
verification of internal state and structures. This differs from the free
build of the system, which foregoes much of this checking, given that the
O/S architecture is basically that kernel mode components implicitly “trust”
each other. Testing on the checked build is extremely valuable to driver
writers because of the checks it performs.

This is all described in the (XP and later) DDK, in the DDK docs, see the
section “Driver Development Tools”… “The Checked Build Of Windows” (just
type The Checked Build of Windows (no quotes) into the box at the index
tab).

You don’t have to install the full checked build to get these benefits. You
can install JUST the checked kernel and HAL. See the DDK or
http://www.osr.com/ntinsider/2001/checking/checked.htm for instructions.

> How do you know that the IoCompleteRequest is running in an arbitrary
> thread? Is it because the Stack Unwind Information was not available, or
is
> it just due to the general nature of drivers (completing IRPs in DPCs).
>

Nah. It’s just a gift I was given. Ooops, sorry. No, actually, I can tell
that somebody’s completing an I/O request asynchronously (calling
IoCompleteRequest), because of the call to KeInsertQueueApc – This wouldn’t
be done if the request was being completed sychronously (in thread context).
See
http://www.osr.com/ntinsider/1997/iocomp/iocomp.htm (not an article for
beginners or the faint of heart, and sort of aimed at FS and FS Filter
Driver writers).

Can you look at the stack frame that’s in your driver (in WinDbg’s stack
window, select that stack location) and see what your driver is doing? The
IRP will definitely still be around at this point (it’s not returned until
after the APC has run)…

Peter
OSR

Mark_Roddy · June 25, 2002, 9:09pm

Not hopelessly - the lock holders and waiters all ought to be threads
visible through !process 0 7, it just takes a lot of grovelling through
the entrails to figure out who is having an incestuous relationship with
whom.

In fact if it is a spinlock deadlock generally the lock holders/waiters
are each occupying one of the cpus, or one cpu has a thread that is
attempting to acquire a lock it already owns. You may not be able to
determine exactly which lock is the problem, but it should be pretty
obvious that it is a spinlock deadlock. Note also that for these
deadlocks generally each cpu will be at either DISPATCH_LEVEL or some
DIRQL.

(I’m hedging a bit here as I believe that there are deadlocks possible
where combinations of different lock types (i.e. spinlocks, events,
resources semaphores etc,) are acquired in differing orders, so it may
be possible to have a spinlock related deadlock where not all cpus are
directly waiting on one or more spinlocks.)

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Rob Montalvo
Sent: Tuesday, June 25, 2002 1:27 PM
To: NT Developers Interest List
Subject: [ntdev] Re: Debugging a System Hang

Hi Peter,

> Also, in case you’re not aware of it, !locks only
shows
> ERESOURCES, not spin
> locks. So that won’t be too much help.

So, what if you were interested in spinlocks? Are you
hopelessly stuck?

Rgds,
Rob

> -----Original Message-----
> From: Peter Viscarola [mailto:xxxxx@osr.com]
> Sent: Tuesday, June 25, 2002 12:51 PM
> To: NT Developers Interest List
> Subject: [ntdev] Re: Debugging a System Hang
>
>
> “Joe D” wrote in message
> news:xxxxx@ntdev…
> > >
> > >
> > > Then I typed a KB command which yielded some
> interesting results.
> > > ChildEBP RetAddr
> > > f52e16a0 80119594
> hal!KeAcquireSpinLockRaiseToSynch+0x34
> > > f52e16b0 80112b35 nt!KeInsertQueueApc+0x12
> > > f52e16dc eb0e2130 nt!IofCompleteRequest+0x201
> > >
> >
> > I don’t know the ultimate cause of your hang – But
> I can
> > tell you from
> > looking at the stack of the spinning thread, that
> this shows
> > a driver (your
> > driver?) completing an I/O request in an arbitrary
> thread
> > context (You’ve
> > called IoCompleteRequest(…), and that function is
> in the process of
> > queuing the “special kernel mode APC for I/O
> completion”),
> > and this thread
> > is spinning on the dispatcher database lock. You’re
> also (obviously)
> > running on an MP system.
> >
> > If the problem is easy to repro, your first order of
> business
> > is to repro
> > the problem with the checked kernel and HAL (in the
> case that
> > you’re not
> > doing that already).
> >
> > Also, in case you’re not aware of it, !locks only
> shows
> > ERESOURCES, not spin
> > locks. So that won’t be too much help. Are you
> using
> > ERESOURCES? If so,
> > are you disabling normal kernel APC delivery during
> the time
> > you hold the
> > resource?
> >
> > What’s are the threads running on the other CPUs
> doing?
> >
> > Peter
> > OSR
> >
>
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

—
You are currently subscribed to ntdev as: xxxxx@hollistech.com To
unsubscribe send a blank email to %%email.unsub%%

OSR_Community_User · June 26, 2002, 1:40pm

Peter,
Thanks for the info and links. I don’t mean to be asking you
specifically to answer my questions, so any input from anyone else is
certainly welcome.

I am somewhat able to reproduce the problem. It apears to be more
likely with a heavier load on the OS with regard to File IO, but it is still
somewhat random. I will try and install to checked build of the OS to see
what happens.

I clicked on the frame which appears to point to my driver, and all I
got were "???"s. Does that mean that the code is swapped out? That indeed
would be a problem.

What exactly is the dispatch manager database? It sound like something
used by the IO manager to process IRPs and dispatch them to device drivers.
It sounds like it would also be used in calling StartIo routines? When does
this spinlock get acquired and released?

I also did a !PCR on this machine, and it says that the IRQL is zero.
Since I am spinning attempting to acquire a spin lock, would that
neccesarily indicate that this thread/process already has the spin lock
acquired, or could I be in a deadly embrace? Is there any way to determine
what other process has the dispatch manager database spinlock acquired?

Also, given the stack trace
ChildEBP RetAddr Args to Child
f52e16a0 80119594 81df7128 820707c0 f52e16dc
hal!KeAcquireSpinLockRaiseToSynch+0x34
f52e16b0 80112b35 81df7128 ff443688 00000000
nt!KeInsertQueueApc+0x12
f52e16dc eb0e2130 80b1200c 00000002 80b12000
nt!IofCompleteRequest+0x201
WARNING: Stack unwind information not available. Following frames
may be wrong.
80b12034 0e1fb000 fdba7000 00000000 00000000 MyDriver+0x2130
0690b000 00000000 00000000 00000000 00000000 +0xe1fb000

can I tell if the address of the IRP that is beeing completed is at
80b1200c, or do I have to do more digging? When I do a !irp 80b1200c, I get
the message that says the IRP signature does not match.

I am still in the process of reading the IOCompletion article. My device
driver has no completion routines involved. It is a relatively simple
driver that talks to a piece of custom hardware. Applications talk to the
device via custom IOCTLs. It basically simply sends messages back and
forth, or waits for a specific message to come back from the device.

As always… Thanks,
Joe D

Peter Viscarola wrote in message news:xxxxx@ntdev…
>
>
> “Joe D” wrote in message news:xxxxx@ntdev…
> >
> > Peter,
> >
> > Thanks for the info. We are not using ERESOURCES in our driver. We
> are
> > running the MP version of NT, but we only have a single processor
> installed.
> >
>
> Hmmm… That makes the probelm even stranger, then. It’s not like there’s
> another processor that could be holding the dispatcher database lock,
right?
> Weird…
>
> > What would be the benefit of
> > running the checked kernel and HAL?
> >
>
> Oh, one of my FAVORITE questions:
>
> If you’re not testing your driver with the checked kernel and HAL, you’re
> not testing your driver properly.
>
> The checked kernel and HAL have lots of “cross-checking” built into them.
> This ranges from parameter validation for various DDK functions, to
> verification of internal state and structures. This differs from the free
> build of the system, which foregoes much of this checking, given that the
> O/S architecture is basically that kernel mode components implicitly
“trust”
> each other. Testing on the checked build is extremely valuable to driver
> writers because of the checks it performs.
>
> This is all described in the (XP and later) DDK, in the DDK docs, see the
> section “Driver Development Tools”… “The Checked Build Of Windows” (just
> type The Checked Build of Windows (no quotes) into the box at the index
> tab).
>
> You don’t have to install the full checked build to get these benefits.
You
> can install JUST the checked kernel and HAL. See the DDK or
> http://www.osr.com/ntinsider/2001/checking/checked.htm for instructions.
>
> > How do you know that the IoCompleteRequest is running in an
arbitrary
> > thread? Is it because the Stack Unwind Information was not available,
or
> is
> > it just due to the general nature of drivers (completing IRPs in DPCs).
> >
>
> Nah. It’s just a gift I was given. Ooops, sorry. No, actually, I can
tell
> that somebody’s completing an I/O request asynchronously (calling
> IoCompleteRequest), because of the call to KeInsertQueueApc – This
wouldn’t
> be done if the request was being completed sychronously (in thread
context).
> See
> http://www.osr.com/ntinsider/1997/iocomp/iocomp.htm (not an article for
> beginners or the faint of heart, and sort of aimed at FS and FS Filter
> Driver writers).
>
> Can you look at the stack frame that’s in your driver (in WinDbg’s stack
> window, select that stack location) and see what your driver is doing?
The
> IRP will definitely still be around at this point (it’s not returned until
> after the APC has run)…
>
> Peter
> OSR
>
>
>
>
>

OSR_Community_User · June 26, 2002, 2:24pm

> > Thanks for the info. We are not using ERESOURCES in our driver.

> We are running the MP version of NT, but we only have a single
> processor installed.
>

Hmmm… That makes the probelm even stranger, then. It’s not
like there’s another processor that could be holding the
dispatcher database lock, right?
Weird…

As I understand it, it’s not weird, it’s wrong. Peter, I seem to
recall that in your seminar you said that spinlocks are implemented
as “no-ops” in the single processor kernel because they don’t
make sense on a single processor. If a single processor grabs a
real live spinlock, doesn’t it lock out ALL other work, which might
explain the hang…?

So, using an MP kernel on a single processor is not just a bad idea,
it breaks in exactly they way Joe is saying it’s breaking.

Of course, that seminar was many years ago, and a lot of code has
been written since then. I could be wrong.

Stu Bell

OSR_Community_User · June 26, 2002, 3:29pm

you’re not calling IoCompleteRequest inside your ISR are you?

-p

-----Original Message-----
From: Joe D [mailto:xxxxx@voicenet.com]
Sent: Wednesday, June 26, 2002 10:38 AM
To: NT Developers Interest List
Subject: [ntdev] Re: Debugging a System Hang

Peter,
Thanks for the info and links. I don’t mean to be asking you
specifically to answer my questions, so any input from anyone else is
certainly welcome.

I am somewhat able to reproduce the problem. It apears to be more
likely with a heavier load on the OS with regard to File IO, but it is
still somewhat random. I will try and install to checked build of the
OS to see what happens.

I clicked on the frame which appears to point to my driver, and all
I got were "???"s. Does that mean that the code is swapped out? That
indeed would be a problem.

What exactly is the dispatch manager database? It sound like
something used by the IO manager to process IRPs and dispatch them to
device drivers. It sounds like it would also be used in calling StartIo
routines? When does this spinlock get acquired and released?

I also did a !PCR on this machine, and it says that the IRQL is
zero. Since I am spinning attempting to acquire a spin lock, would that
neccesarily indicate that this thread/process already has the spin lock
acquired, or could I be in a deadly embrace? Is there any way to
determine what other process has the dispatch manager database spinlock
acquired?

Also, given the stack trace
ChildEBP RetAddr Args to Child
f52e16a0 80119594 81df7128 820707c0 f52e16dc
hal!KeAcquireSpinLockRaiseToSynch+0x34
f52e16b0 80112b35 81df7128 ff443688 00000000
nt!KeInsertQueueApc+0x12
f52e16dc eb0e2130 80b1200c 00000002 80b12000
nt!IofCompleteRequest+0x201
WARNING: Stack unwind information not available. Following
frames may be wrong.
80b12034 0e1fb000 fdba7000 00000000 00000000 MyDriver+0x2130
0690b000 00000000 00000000 00000000 00000000 +0xe1fb000

can I tell if the address of the IRP that is beeing completed is at
80b1200c, or do I have to do more digging? When I do a !irp 80b1200c, I
get the message that says the IRP signature does not match.

I am still in the process of reading the IOCompletion article. My
device driver has no completion routines involved. It is a relatively
simple driver that talks to a piece of custom hardware. Applications
talk to the device via custom IOCTLs. It basically simply sends
messages back and forth, or waits for a specific message to come back
from the device.

As always… Thanks,
Joe D

Peter Viscarola wrote in message news:xxxxx@ntdev…
>
>
> “Joe D” wrote in message news:xxxxx@ntdev…
> >
> > Peter,
> >
> > Thanks for the info. We are not using ERESOURCES in our driver.

> > We
> are
> > running the MP version of NT, but we only have a single processor
> installed.
> >
>
> Hmmm… That makes the probelm even stranger, then. It’s not like
> there’s another processor that could be holding the dispatcher
> database lock,
right?
> Weird…
>
> > What would be the benefit of
> > running the checked kernel and HAL?
> >
>
> Oh, one of my FAVORITE questions:
>
> If you’re not testing your driver with the checked kernel and HAL,
> you’re not testing your driver properly.
>
> The checked kernel and HAL have lots of “cross-checking” built into
> them. This ranges from parameter validation for various DDK functions,

> to verification of internal state and structures. This differs from
> the free build of the system, which foregoes much of this checking,
> given that the O/S architecture is basically that kernel mode
> components implicitly
“trust”
> each other. Testing on the checked build is extremely valuable to
> driver writers because of the checks it performs.
>
> This is all described in the (XP and later) DDK, in the DDK docs, see
> the section “Driver Development Tools”… “The Checked Build Of
> Windows” (just type The Checked Build of Windows (no quotes) into the
> box at the index tab).
>
> You don’t have to install the full checked build to get these
> benefits.
You
> can install JUST the checked kernel and HAL. See the DDK or
> http://www.osr.com/ntinsider/2001/checking/checked.htm for
> instructions.
>
> > How do you know that the IoCompleteRequest is running in an
arbitrary
> > thread? Is it because the Stack Unwind Information was not
> > available,
or
> is
> > it just due to the general nature of drivers (completing IRPs in
> > DPCs).
> >
>
> Nah. It’s just a gift I was given. Ooops, sorry. No, actually, I
> can
tell
> that somebody’s completing an I/O request asynchronously (calling
> IoCompleteRequest), because of the call to KeInsertQueueApc – This
wouldn’t
> be done if the request was being completed sychronously (in thread
context).
> See
> http://www.osr.com/ntinsider/1997/iocomp/iocomp.htm (not an article
> for beginners or the faint of heart, and sort of aimed at FS and FS
> Filter Driver writers).
>
> Can you look at the stack frame that’s in your driver (in WinDbg’s
> stack window, select that stack location) and see what your driver is
> doing?
The
> IRP will definitely still be around at this point (it’s not returned
> until after the APC has run)…
>
> Peter
> OSR
>
>
>
>
>

—
You are currently subscribed to ntdev as: xxxxx@microsoft.com To
unsubscribe send a blank email to %%email.unsub%%

OSR_Community_User · June 26, 2002, 3:32pm

> As I understand it, it’s not weird, it’s wrong. Peter, I seem to

recall that in your seminar you said that spinlocks are implemented
as “no-ops” in the single processor kernel because they don’t
make sense on a single processor. If a single processor grabs a
real live spinlock, doesn’t it lock out ALL other work, which might
explain the hang…?

So, using an MP kernel on a single processor is not just a bad idea,
it breaks in exactly they way Joe is saying it’s breaking.

Of course, that seminar was many years ago, and a lot of code has
been written since then. I could be wrong.

In a UP kernel, the actual grabbing of the spin lock itself is, in fact,
a no-op, but the system still raises the IRQL to DISPATCH_LEVEL (or
DIRQL if it’s an interrupt spin lock) while the spin lock is held,
regardless of whether it’s the UP or MP kernel.

If I remember correctly, the checked build is always an MP kernel, even
when installed on UP systems (right Peter?). So your thought of the MP
kernel being a bad idea on UP systems is incorrect. However, because
certain unnecessary things, such as grabbing the actual spin locks, can
be eliminated on a UP system, the UP kernel removes them as an
optimization. But there isn’t anything inherently bad about grabbing
the spin lock on a UP system.

Jay

Jay Talbott
Principal Consulting Engineer
SysPro Consulting, LLC
3519 E. South Fork Drive
Suite 201
Phoenix, AZ 85044
(480) 704-8045
xxxxx@sysproconsulting.com
http://www.sysproconsulting.com

Peter_Viscarola_OSR · June 26, 2002, 3:50pm

“Stu Bell” wrote in message news:xxxxx@ntdev…
>
> As I understand it, it’s not weird, it’s wrong. Peter, I seem to
> recall that in your seminar you said that spinlocks are implemented
> as “no-ops” in the single processor kernel because they don’t
> make sense on a single processor. If a single processor grabs a
> real live spinlock, doesn’t it lock out ALL other work, which might
> explain the hang…?
>
> So, using an MP kernel on a single processor is not just a bad idea,
> it breaks in exactly they way Joe is saying it’s breaking.
>

Close, but…

In the UP version of the kernel, KeAcquireSpinLock(…) is reduced to
KeRaiseIrql(…); There’s sort of no point in acquring the spin lock, cuz
nobody else could ever HAVE it (if they had it, we’d be at elevated IRQL,
and hence the new attempt to acquire it wouldn’t be running). So, it’s
either gotta be available, or if it’s not available it’s been improperly
released.

The MP kernel runs on systems with only 1 processor all the time. The
checked version of the kernel is always MP. You might have 1 CPU installed
(or supported) on a system that is otherwise an MP system. Etc. This should
work without any problem.

What’s weird is the Dispatcher Database Lock is an internal lock used by the
O/S for (lots of things relating to) scheduling. It’s not very likely that
NT V4 has a problem with acquiring/releasing this lock, right? People would
have found such a problem in, oh, 10 minutes or so, but certainly within the
last five years.

MY BEST GUESS would be that there’s an invalid IRQL change going on
somewhere… JoeD: Are you calling KeLowerIrql anywhere in your driver?
Any similar IRQL gymnastics/cleverness? This really does look like your
driver screwing something up to me.

Sorry I can’t be more help… time to load the checked kernel and repro…

P

OSR_Community_User · June 26, 2002, 3:57pm

Peter W,

Why yes I am. I havn’t quite figured out why that is bad yet. But I am
sure you are about to tell me? I will see if I can figure it out on my own,
but I certainly won’t refuse your advice.

Thanks,
Joe

Peter Wieland wrote in message
news:xxxxx@ntdev…

you’re not calling IoCompleteRequest inside your ISR are you?

-p

-----Original Message-----
From: Joe D [mailto:xxxxx@voicenet.com]
Sent: Wednesday, June 26, 2002 10:38 AM
To: NT Developers Interest List
Subject: [ntdev] Re: Debugging a System Hang

Peter,
Thanks for the info and links. I don’t mean to be asking you
specifically to answer my questions, so any input from anyone else is
certainly welcome.

I am somewhat able to reproduce the problem. It apears to be more
likely with a heavier load on the OS with regard to File IO, but it is
still somewhat random. I will try and install to checked build of the
OS to see what happens.

I clicked on the frame which appears to point to my driver, and all
I got were "???"s. Does that mean that the code is swapped out? That
indeed would be a problem.

What exactly is the dispatch manager database? It sound like
something used by the IO manager to process IRPs and dispatch them to
device drivers. It sounds like it would also be used in calling StartIo
routines? When does this spinlock get acquired and released?

I also did a !PCR on this machine, and it says that the IRQL is
zero. Since I am spinning attempting to acquire a spin lock, would that
neccesarily indicate that this thread/process already has the spin lock
acquired, or could I be in a deadly embrace? Is there any way to
determine what other process has the dispatch manager database spinlock
acquired?

Also, given the stack trace
ChildEBP RetAddr Args to Child
f52e16a0 80119594 81df7128 820707c0 f52e16dc
hal!KeAcquireSpinLockRaiseToSynch+0x34
f52e16b0 80112b35 81df7128 ff443688 00000000
nt!KeInsertQueueApc+0x12
f52e16dc eb0e2130 80b1200c 00000002 80b12000
nt!IofCompleteRequest+0x201
WARNING: Stack unwind information not available. Following
frames may be wrong.
80b12034 0e1fb000 fdba7000 00000000 00000000 MyDriver+0x2130
0690b000 00000000 00000000 00000000 00000000 +0xe1fb000

can I tell if the address of the IRP that is beeing completed is at
80b1200c, or do I have to do more digging? When I do a !irp 80b1200c, I
get the message that says the IRP signature does not match.

I am still in the process of reading the IOCompletion article. My
device driver has no completion routines involved. It is a relatively
simple driver that talks to a piece of custom hardware. Applications
talk to the device via custom IOCTLs. It basically simply sends
messages back and forth, or waits for a specific message to come back
from the device.

As always… Thanks,
Joe D

Peter Viscarola wrote in message news:xxxxx@ntdev…
>
>
> “Joe D” wrote in message news:xxxxx@ntdev…
> >
> > Peter,
> >
> > Thanks for the info. We are not using ERESOURCES in our driver.

> > We
> are
> > running the MP version of NT, but we only have a single processor
> installed.
> >
>
> Hmmm… That makes the probelm even stranger, then. It’s not like
> there’s another processor that could be holding the dispatcher
> database lock,
right?
> Weird…
>
> > What would be the benefit of
> > running the checked kernel and HAL?
> >
>
> Oh, one of my FAVORITE questions:
>
> If you’re not testing your driver with the checked kernel and HAL,
> you’re not testing your driver properly.
>
> The checked kernel and HAL have lots of “cross-checking” built into
> them. This ranges from parameter validation for various DDK functions,

> to verification of internal state and structures. This differs from
> the free build of the system, which foregoes much of this checking,
> given that the O/S architecture is basically that kernel mode
> components implicitly
“trust”
> each other. Testing on the checked build is extremely valuable to
> driver writers because of the checks it performs.
>
> This is all described in the (XP and later) DDK, in the DDK docs, see
> the section “Driver Development Tools”… “The Checked Build Of
> Windows” (just type The Checked Build of Windows (no quotes) into the
> box at the index tab).
>
> You don’t have to install the full checked build to get these
> benefits.
You
> can install JUST the checked kernel and HAL. See the DDK or
> http://www.osr.com/ntinsider/2001/checking/checked.htm for
> instructions.
>
> > How do you know that the IoCompleteRequest is running in an
arbitrary
> > thread? Is it because the Stack Unwind Information was not
> > available,
or
> is
> > it just due to the general nature of drivers (completing IRPs in
> > DPCs).
> >
>
> Nah. It’s just a gift I was given. Ooops, sorry. No, actually, I
> can
tell
> that somebody’s completing an I/O request asynchronously (calling
> IoCompleteRequest), because of the call to KeInsertQueueApc – This
wouldn’t
> be done if the request was being completed sychronously (in thread
context).
> See
> http://www.osr.com/ntinsider/1997/iocomp/iocomp.htm (not an article
> for beginners or the faint of heart, and sort of aimed at FS and FS
> Filter Driver writers).
>
> Can you look at the stack frame that’s in your driver (in WinDbg’s
> stack window, select that stack location) and see what your driver is
> doing?
The
> IRP will definitely still be around at this point (it’s not returned
> until after the APC has run)…
>
> Peter
> OSR
>
>
>
>
>

—
You are currently subscribed to ntdev as: xxxxx@microsoft.com To
unsubscribe send a blank email to %%email.unsub%%

OSR_Community_User · June 26, 2002, 4:06pm

Peter Viscarola wrote in message news:xxxxx@ntdev…
>
>
> MY BEST GUESS would be that there’s an invalid IRQL change going on
> somewhere… JoeD: Are you calling KeLowerIrql anywhere in your driver?
> Any similar IRQL gymnastics/cleverness? This really does look like your
> driver screwing something up to me.

I was beginning to suspect the same thing. My recollection is that I am not
calling KeRaise/LowerIrql anywhere. The closest thing to that might be a
call to KeSynchronizeExecution, so I will investigate that.

>
> Sorry I can’t be more help… time to load the checked kernel and repro…
>
Thanks for all your help. I let everyone know what I find if/when I find
it.

I used to have a list of the TOP Things an NT Device Driver Writer Should
Never Do. I had gotten it from the net somewhere. Does anybody recall
seeing anything like this, and where it might be?

Thanks,
Joe D

Peter_Viscarola_OSR · June 26, 2002, 4:08pm

> Peter Wieland wrote in message
> news:xxxxx@ntdev…
>
> you’re not calling IoCompleteRequest inside your ISR are you?
>

“Joe D” wrote in message news:xxxxx@ntdev…
>
> Why yes I am. I havn’t quite figured out why that is bad yet. But I
am
> sure you are about to tell me? I will see if I can figure it out on my
own,
> but I certainly won’t refuse your advice.
>

Oh, PeterWie, that’s VERY good. Bravo! Nice catch. I NEVER would have
thought of that.

JoeD: You can’t call IoCompleteRequest from your ISR, cuz it’s not allowed.
There are very few things you can do at such a high IRQL (interrupt level).
Note the DDK docs explicitly state:

Callers of IoCompleteRequest must be running at IRQL <= DISPATCH_LEVEL.
Every function in Windows has an IRQL restriction associated with it. Note
that if you had been running the CHECKED BUILD OF THE KERNEL AND HAL it
would have caught this problem for you…

Peter
OSR

OSR_Community_User · June 26, 2002, 4:22pm

> Peter W,

Why yes I am. I havn’t quite figured out why that is bad
yet. But I am
sure you are about to tell me? I will see if I can figure it
out on my own,
but I certainly won’t refuse your advice.

Thanks,
Joe

From the DDK documentation:

“Callers of IoCompleteRequest must be running at IRQL <=
DISPATCH_LEVEL.”

While in your ISR, you are at DIRQL which is > DISPATCH_LEVEL.

IoCompleteRequest() need to acquire a spin lock behind the scenes
somewhere that is currently being held by the thread that was
interrupted by your ISR. Your ISR spins indefinately waiting to get the
spin lock, and your system hangs.

Jay

Jay Talbott
Principal Consulting Engineer
SysPro Consulting, LLC
3519 E. South Fork Drive
Suite 201
Phoenix, AZ 85044
(480) 704-8045
xxxxx@sysproconsulting.com
http://www.sysproconsulting.com

OSR_Community_User · June 26, 2002, 4:35pm

Ok well as I read the docs on IoCompleteRequest, it says that it must be
called at IRQL <= DispatchLevel. This is code that I didn’t (recently) add,
but I didn’t catch it either.

I do see how this could cause significant problems.

Was my guess on what the Dispatch Manager Database does correct?

I will correct it immediately.

Thanks everybody,
Joe

Joe D wrote in message news:xxxxx@ntdev…
>
> Peter W,
>
> Why yes I am. I havn’t quite figured out why that is bad yet. But I
am
> sure you are about to tell me? I will see if I can figure it out on my
own,
> but I certainly won’t refuse your advice.
>
> Thanks,
> Joe
>
>
> Peter Wieland wrote in message
> news:xxxxx@ntdev…
>
> you’re not calling IoCompleteRequest inside your ISR are you?
>
> -p
>
> -----Original Message-----
> From: Joe D [mailto:xxxxx@voicenet.com]
> Sent: Wednesday, June 26, 2002 10:38 AM
> To: NT Developers Interest List
> Subject: [ntdev] Re: Debugging a System Hang
>
>
> Peter,
> Thanks for the info and links. I don’t mean to be asking you
> specifically to answer my questions, so any input from anyone else is
> certainly welcome.
>
> I am somewhat able to reproduce the problem. It apears to be more
> likely with a heavier load on the OS with regard to File IO, but it is
> still somewhat random. I will try and install to checked build of the
> OS to see what happens.
>
> I clicked on the frame which appears to point to my driver, and all
> I got were "???"s. Does that mean that the code is swapped out? That
> indeed would be a problem.
>
> What exactly is the dispatch manager database? It sound like
> something used by the IO manager to process IRPs and dispatch them to
> device drivers. It sounds like it would also be used in calling StartIo
> routines? When does this spinlock get acquired and released?
>
> I also did a !PCR on this machine, and it says that the IRQL is
> zero. Since I am spinning attempting to acquire a spin lock, would that
> neccesarily indicate that this thread/process already has the spin lock
> acquired, or could I be in a deadly embrace? Is there any way to
> determine what other process has the dispatch manager database spinlock
> acquired?
>
>
> Also, given the stack trace
> ChildEBP RetAddr Args to Child
> f52e16a0 80119594 81df7128 820707c0 f52e16dc
> hal!KeAcquireSpinLockRaiseToSynch+0x34
> f52e16b0 80112b35 81df7128 ff443688 00000000
> nt!KeInsertQueueApc+0x12
> f52e16dc eb0e2130 80b1200c 00000002 80b12000
> nt!IofCompleteRequest+0x201
> WARNING: Stack unwind information not available. Following
> frames may be wrong.
> 80b12034 0e1fb000 fdba7000 00000000 00000000 MyDriver+0x2130
> 0690b000 00000000 00000000 00000000 00000000 +0xe1fb000
>
> can I tell if the address of the IRP that is beeing completed is at
> 80b1200c, or do I have to do more digging? When I do a !irp 80b1200c, I
> get the message that says the IRP signature does not match.
>
> I am still in the process of reading the IOCompletion article. My
> device driver has no completion routines involved. It is a relatively
> simple driver that talks to a piece of custom hardware. Applications
> talk to the device via custom IOCTLs. It basically simply sends
> messages back and forth, or waits for a specific message to come back
> from the device.
>
> As always… Thanks,
> Joe D
>
>
> Peter Viscarola wrote in message news:xxxxx@ntdev…
> >
> >
> > “Joe D” wrote in message news:xxxxx@ntdev…
> > >
> > > Peter,
> > >
> > > Thanks for the info. We are not using ERESOURCES in our driver.
>
> > > We
> > are
> > > running the MP version of NT, but we only have a single processor
> > installed.
> > >
> >
> > Hmmm… That makes the probelm even stranger, then. It’s not like
> > there’s another processor that could be holding the dispatcher
> > database lock,
> right?
> > Weird…
> >
> > > What would be the benefit of
> > > running the checked kernel and HAL?
> > >
> >
> > Oh, one of my FAVORITE questions:
> >
> > If you’re not testing your driver with the checked kernel and HAL,
> > you’re not testing your driver properly.
> >
> > The checked kernel and HAL have lots of “cross-checking” built into
> > them. This ranges from parameter validation for various DDK functions,
>
> > to verification of internal state and structures. This differs from
> > the free build of the system, which foregoes much of this checking,
> > given that the O/S architecture is basically that kernel mode
> > components implicitly
> “trust”
> > each other. Testing on the checked build is extremely valuable to
> > driver writers because of the checks it performs.
> >
> > This is all described in the (XP and later) DDK, in the DDK docs, see
> > the section “Driver Development Tools”… “The Checked Build Of
> > Windows” (just type The Checked Build of Windows (no quotes) into the
> > box at the index tab).
> >
> > You don’t have to install the full checked build to get these
> > benefits.
> You
> > can install JUST the checked kernel and HAL. See the DDK or
> > http://www.osr.com/ntinsider/2001/checking/checked.htm for
> > instructions.
> >
> > > How do you know that the IoCompleteRequest is running in an
> arbitrary
> > > thread? Is it because the Stack Unwind Information was not
> > > available,
> or
> > is
> > > it just due to the general nature of drivers (completing IRPs in
> > > DPCs).
> > >
> >
> > Nah. It’s just a gift I was given. Ooops, sorry. No, actually, I
> > can
> tell
> > that somebody’s completing an I/O request asynchronously (calling
> > IoCompleteRequest), because of the call to KeInsertQueueApc – This
> wouldn’t
> > be done if the request was being completed sychronously (in thread
> context).
> > See
> > http://www.osr.com/ntinsider/1997/iocomp/iocomp.htm (not an article
> > for beginners or the faint of heart, and sort of aimed at FS and FS
> > Filter Driver writers).
> >
> > Can you look at the stack frame that’s in your driver (in WinDbg’s
> > stack window, select that stack location) and see what your driver is
> > doing?
> The
> > IRP will definitely still be around at this point (it’s not returned
> > until after the APC has run)…
> >
> > Peter
> > OSR
> >
> >
> >
> >
> >
>
>
>
>
>
>
>
>
>
> —
> You are currently subscribed to ntdev as: xxxxx@microsoft.com To
> unsubscribe send a blank email to %%email.unsub%%
>
>
>
>
>
>

David_J_Craig · June 26, 2002, 4:38pm

IoCompleteRequest() must be run at DISPATCH_LEVEL or less. It won’t work at
DIRQL. You must schedule a DPC to handle the rest of hardware interrupt
service logic. Do the minimum you have to do in the ISR.

----- Original Message -----
From: “Joe D”
Newsgroups: ntdev
To: “NT Developers Interest List”
Sent: Wednesday, June 26, 2002 3:54 PM
Subject: [ntdev] Re: Debugging a System Hang

> Peter W,
>
> Why yes I am. I havn’t quite figured out why that is bad yet. But I
am
> sure you are about to tell me? I will see if I can figure it out on my
own,
> but I certainly won’t refuse your advice.
>
> Thanks,
> Joe
>
>
> Peter Wieland wrote in message
> news:xxxxx@ntdev…
>
> you’re not calling IoCompleteRequest inside your ISR are you?
>
> -p
>
> -----Original Message-----
> From: Joe D [mailto:xxxxx@voicenet.com]
> Sent: Wednesday, June 26, 2002 10:38 AM
> To: NT Developers Interest List
> Subject: [ntdev] Re: Debugging a System Hang
>
>
> Peter,
> Thanks for the info and links. I don’t mean to be asking you
> specifically to answer my questions, so any input from anyone else is
> certainly welcome.
>
> I am somewhat able to reproduce the problem. It apears to be more
> likely with a heavier load on the OS with regard to File IO, but it is
> still somewhat random. I will try and install to checked build of the
> OS to see what happens.
>
> I clicked on the frame which appears to point to my driver, and all
> I got were "???"s. Does that mean that the code is swapped out? That
> indeed would be a problem.
>
> What exactly is the dispatch manager database? It sound like
> something used by the IO manager to process IRPs and dispatch them to
> device drivers. It sounds like it would also be used in calling StartIo
> routines? When does this spinlock get acquired and released?
>
> I also did a !PCR on this machine, and it says that the IRQL is
> zero. Since I am spinning attempting to acquire a spin lock, would that
> neccesarily indicate that this thread/process already has the spin lock
> acquired, or could I be in a deadly embrace? Is there any way to
> determine what other process has the dispatch manager database spinlock
> acquired?
>
>
> Also, given the stack trace
> ChildEBP RetAddr Args to Child
> f52e16a0 80119594 81df7128 820707c0 f52e16dc
> hal!KeAcquireSpinLockRaiseToSynch+0x34
> f52e16b0 80112b35 81df7128 ff443688 00000000
> nt!KeInsertQueueApc+0x12
> f52e16dc eb0e2130 80b1200c 00000002 80b12000
> nt!IofCompleteRequest+0x201
> WARNING: Stack unwind information not available. Following
> frames may be wrong.
> 80b12034 0e1fb000 fdba7000 00000000 00000000 MyDriver+0x2130
> 0690b000 00000000 00000000 00000000 00000000 +0xe1fb000
>
> can I tell if the address of the IRP that is beeing completed is at
> 80b1200c, or do I have to do more digging? When I do a !irp 80b1200c, I
> get the message that says the IRP signature does not match.
>
> I am still in the process of reading the IOCompletion article. My
> device driver has no completion routines involved. It is a relatively
> simple driver that talks to a piece of custom hardware. Applications
> talk to the device via custom IOCTLs. It basically simply sends
> messages back and forth, or waits for a specific message to come back
> from the device.
>
> As always… Thanks,
> Joe D
>
>
> Peter Viscarola wrote in message news:xxxxx@ntdev…
> >
> >
> > “Joe D” wrote in message news:xxxxx@ntdev…
> > >
> > > Peter,
> > >
> > > Thanks for the info. We are not using ERESOURCES in our driver.
>
> > > We
> > are
> > > running the MP version of NT, but we only have a single processor
> > installed.
> > >
> >
> > Hmmm… That makes the probelm even stranger, then. It’s not like
> > there’s another processor that could be holding the dispatcher
> > database lock,
> right?
> > Weird…
> >
> > > What would be the benefit of
> > > running the checked kernel and HAL?
> > >
> >
> > Oh, one of my FAVORITE questions:
> >
> > If you’re not testing your driver with the checked kernel and HAL,
> > you’re not testing your driver properly.
> >
> > The checked kernel and HAL have lots of “cross-checking” built into
> > them. This ranges from parameter validation for various DDK functions,
>
> > to verification of internal state and structures. This differs from
> > the free build of the system, which foregoes much of this checking,
> > given that the O/S architecture is basically that kernel mode
> > components implicitly
> “trust”
> > each other. Testing on the checked build is extremely valuable to
> > driver writers because of the checks it performs.
> >
> > This is all described in the (XP and later) DDK, in the DDK docs, see
> > the section “Driver Development Tools”… “The Checked Build Of
> > Windows” (just type The Checked Build of Windows (no quotes) into the
> > box at the index tab).
> >
> > You don’t have to install the full checked build to get these
> > benefits.
> You
> > can install JUST the checked kernel and HAL. See the DDK or
> > http://www.osr.com/ntinsider/2001/checking/checked.htm for
> > instructions.
> >
> > > How do you know that the IoCompleteRequest is running in an
> arbitrary
> > > thread? Is it because the Stack Unwind Information was not
> > > available,
> or
> > is
> > > it just due to the general nature of drivers (completing IRPs in
> > > DPCs).
> > >
> >
> > Nah. It’s just a gift I was given. Ooops, sorry. No, actually, I
> > can
> tell
> > that somebody’s completing an I/O request asynchronously (calling
> > IoCompleteRequest), because of the call to KeInsertQueueApc – This
> wouldn’t
> > be done if the request was being completed sychronously (in thread
> context).
> > See
> > http://www.osr.com/ntinsider/1997/iocomp/iocomp.htm (not an article
> > for beginners or the faint of heart, and sort of aimed at FS and FS
> > Filter Driver writers).
> >
> > Can you look at the stack frame that’s in your driver (in WinDbg’s
> > stack window, select that stack location) and see what your driver is
> > doing?
> The
> > IRP will definitely still be around at this point (it’s not returned
> > until after the APC has run)…
> >
> > Peter
> > OSR
> >
> >
> >
> >
> >
>
>
>
>
>
>
>
>
>
> —
> You are currently subscribed to ntdev as: xxxxx@microsoft.com To
> unsubscribe send a blank email to %%email.unsub%%
>
>
>
>
>
> —
> You are currently subscribed to ntdev as: xxxxx@yoshimuni.com
> To unsubscribe send a blank email to %%email.unsub%%
>