KERNEL_APC_PENDING_DURING_EXIT (20) Bugcheck

Hi All

I got a bug check from time to time, no clue how it happened, I checked all my mutex, I can’t find a suspicious one,
Anyone can shed light on that ?

Thanks

Ben Tsang

KERNEL_APC_PENDING_DURING_EXIT (20)
The key data item is the thread’s APC disable count.
If this is non-zero, then this is the source of the problem.
The APC disable count is decremented each time a driver calls
KeEnterCriticalRegion, FsRtlEnterFileSystem, or acquires a mutex. The APC
disable count is incremented each time a driver calls KeLeaveCriticalRegion,
FsRtlExitFileSystem, or KeReleaseMutex. Since these calls should always be in
pairs, this value should be zero when a thread exits. A negative value
indicates that a driver has disabled APC calls without re-enabling them. A
positive value indicates that the reverse is true.
If you ever see this error, be very suspicious of all drivers installed on the
machine – especially unusual or non-standard drivers. Third party file
system redirectors are especially suspicious since they do not generally
receive the heavy duty testing that NTFS, FAT, RDR, etc receive.
This current IRQL should also be 0. If it is not, that a driver’s
cancelation routine can cause this bugcheck by returning at an elevated
IRQL. Always attempt to note what you were doing/closing at the
time of the crash, and note all of the installed drivers at the time of
the crash. This symptom is usually a severe bug in a third party
driver.
Arguments:
Arg1: 0000000000000000, The address of the APC found pending during exit.
Arg2: 000000000000fff2, The thread’s APC disable count
Arg3: 0000000000000000, The current IRQL
Arg4: 0000000000000001

Debugging Details:

DBGHELP: C:\Program Files\Debugging Tools for Windows (x64)\sym\ntkrnlmp.exe\4A5BC6005dd000\ntkrnlmp.exe - OK
DBGHELP: C:\Program Files\Debugging Tools for Windows (x64)\sym\srv.sys\4A5BC25798000\srv.sys - OK

BUGCHECK_STR: 0x20_NULLAPC_KAPC_NEGATIVE

DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT

PROCESS_NAME: System

CURRENT_IRQL: 2

LAST_CONTROL_TRANSFER: from fffff800017c16d2 to fffff800016c3f60

STACK_TEXT:
fffff880056403f8 fffff800017c16d2 : 0000000000000000 fffffa80064d9980 0000000000000065 fffff8000170a314 : nt!DbgBreakPointWithStatus
fffff88005640400 fffff800017c24be : 0000000000000003 0000000000000000 fffff80001706ee0 0000000000000020 : nt!KiBugCheckDebugBreak+0x12
fffff88005640460 fffff800016cc004 : 0000000000000000 0000000000000000 0000000000000010 fffff80001988d14 : nt!KeBugCheck2+0x71e
fffff88005640b30 fffff80001988d29 : 0000000000000020 0000000000000000 000000000000fff2 0000000000000000 : nt!KeBugCheckEx+0x104
fffff88005640b70 fffff800019b22a1 : fffffa8000000000 fffff880c146feff fffff78000000320 fffffa8006452020 : nt! ?? ::NNGAKEGL::string'+0x29559 fffff88005640c40 fffff8000195c884 : 0000000000000000 0000000000000003 0000000000000008 fffff78000000320 : nt!PspTerminateThreadByPointer+0x4d fffff88005640c90 fffff880050eeeee : 0000000000000008 0000000000000000 fffff88005640d08 0000000000000008 : nt!PsTerminateSystemThread+0x24 fffff88005640cc0 fffff8800513173f : fffffa8000000010 0000000000000001 fffffa800723c010 0000000000000102 : srv!SrvTerminateWorkerThread+0x3e fffff88005640cf0 fffff8000196f166 : fffffa8007242010 fffffa80064d9980 0000000000000080 fffffa80045f4b30 : srv!WorkerThread+0x189 fffff88005640d40 fffff800016aa486 : fffff80001844e80 fffffa80064d9980 fffffa8006cbd360 fffff8800123ea90 : nt!PspSystemThreadStartup+0x5a fffff88005640d80 0000000000000000 : fffff88005641000 fffff8800563b000 fffff880056409f0 00000000`00000000 : nt!KxStartSystemThread+0x16

STACK_COMMAND: kb

FOLLOWUP_IP:
srv!SrvTerminateWorkerThread+3e
fffff880`050eeeee 4883c420 add rsp,20h

SYMBOL_STACK_INDEX: 7

SYMBOL_NAME: srv!SrvTerminateWorkerThread+3e

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: srv

IMAGE_NAME: srv.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 4a5bc257

FAILURE_BUCKET_ID: X64_0x20_NULLAPC_KAPC_NEGATIVE_VRF_srv!SrvTerminateWorkerThread+3e

BUCKET_ID: X64_0x20_NULLAPC_KAPC_NEGATIVE_VRF_srv!SrvTerminateWorkerThread+3e

Followup: MachineOwner

The analyze output has the clues for you:

The APC disable count is decremented each time a driver calls
KeEnterCriticalRegion, FsRtlEnterFileSystem, or acquires a mutex. The APC
disable count is incremented each time a driver calls
KeLeaveCriticalRegion,
FsRtlExitFileSystem, or KeReleaseMutex. Since these calls should always
be in
pairs, this value should be zero when a thread exits. A negative value
indicates that a driver has disabled APC calls without re-enabling them.

And note that the APC disable count is negative:

Arg2: 000000000000fff2, The thread’s APC disable count

Do you call KeEnterCriticalRegion or FsRtlEnterFileSystem at all in your
driver? Make sure that they all have matching KeLeaveCriticalRegion or
FsRtlExitFileSystem calls (including down all error paths and don’t forget
about handling exceptions!). If you wait on mutexes then you should also
make sure that all paths out call KeReleaseMutex.

What makes these so challenging is that the damage is already done by the
time of the crash. We have lots of instrumentation in our drivers to catch
these in the hopes of getting them earlier. Driver Verifier also has support
for catching this issue and that might benefit you, though typically this is
a bug that can only really be found via code inspection.

Good luck!

-scott


Scott Noone
Consulting Associate
OSR Open Systems Resources, Inc.
http://www.osronline.com

wrote in message news:xxxxx@ntfsd…
> Hi All
>
> I got a bug check from time to time, no clue how it happened, I checked
> all my mutex, I can’t find a suspicious one,
> Anyone can shed light on that ?
>
> Thanks
>
> Ben Tsang
>
> KERNEL_APC_PENDING_DURING_EXIT (20)
> The key data item is the thread’s APC disable count.
> If this is non-zero, then this is the source of the problem.
> The APC disable count is decremented each time a driver calls
> KeEnterCriticalRegion, FsRtlEnterFileSystem, or acquires a mutex. The APC
> disable count is incremented each time a driver calls
> KeLeaveCriticalRegion,
> FsRtlExitFileSystem, or KeReleaseMutex. Since these calls should always
> be in
> pairs, this value should be zero when a thread exits. A negative value
> indicates that a driver has disabled APC calls without re-enabling them.
> A
> positive value indicates that the reverse is true.
> If you ever see this error, be very suspicious of all drivers installed on
> the
> machine – especially unusual or non-standard drivers. Third party file
> system redirectors are especially suspicious since they do not generally
> receive the heavy duty testing that NTFS, FAT, RDR, etc receive.
> This current IRQL should also be 0. If it is not, that a driver’s
> cancelation routine can cause this bugcheck by returning at an elevated
> IRQL. Always attempt to note what you were doing/closing at the
> time of the crash, and note all of the installed drivers at the time of
> the crash. This symptom is usually a severe bug in a third party
> driver.
> Arguments:
> Arg1: 0000000000000000, The address of the APC found pending during exit.
> Arg2: 000000000000fff2, The thread’s APC disable count
> Arg3: 0000000000000000, The current IRQL
> Arg4: 0000000000000001
>
> Debugging Details:
> ------------------
>
> DBGHELP: C:\Program Files\Debugging Tools for Windows
> (x64)\sym\ntkrnlmp.exe\4A5BC6005dd000\ntkrnlmp.exe - OK
> DBGHELP: C:\Program Files\Debugging Tools for Windows
> (x64)\sym\srv.sys\4A5BC25798000\srv.sys - OK
>
> BUGCHECK_STR: 0x20_NULLAPC_KAPC_NEGATIVE
>
> DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT
>
> PROCESS_NAME: System
>
> CURRENT_IRQL: 2
>
> LAST_CONTROL_TRANSFER: from fffff800017c16d2 to fffff800016c3f60
>
> STACK_TEXT:
> fffff880056403f8 fffff800017c16d2 : 0000000000000000 fffffa80064d9980
> 0000000000000065 fffff8000170a314 : nt!DbgBreakPointWithStatus
> fffff88005640400 fffff800017c24be : 0000000000000003 0000000000000000
> fffff80001706ee0 0000000000000020 : nt!KiBugCheckDebugBreak+0x12
> fffff88005640460 fffff800016cc004 : 0000000000000000 0000000000000000
> 0000000000000010 fffff80001988d14 : nt!KeBugCheck2+0x71e
> fffff88005640b30 fffff80001988d29 : 0000000000000020 0000000000000000
> 000000000000fff2 0000000000000000 : nt!KeBugCheckEx+0x104
> fffff88005640b70 fffff800019b22a1 : fffffa8000000000 fffff880c146feff
> fffff78000000320 fffffa8006452020 : nt! ?? ::NNGAKEGL::string'+0x29559<br>&gt; fffff88005640c40 fffff8000195c884 : 0000000000000000 0000000000000003 <br>&gt; 0000000000000008 fffff78000000320 : nt!PspTerminateThreadByPointer+0x4d<br>&gt; fffff88005640c90 fffff880050eeeee : 0000000000000008 0000000000000000 <br>&gt; fffff88005640d08 0000000000000008 : nt!PsTerminateSystemThread+0x24<br>&gt; fffff88005640cc0 fffff8800513173f : fffffa8000000010 0000000000000001 <br>&gt; fffffa800723c010 0000000000000102 : srv!SrvTerminateWorkerThread+0x3e<br>&gt; fffff88005640cf0 fffff8000196f166 : fffffa8007242010 fffffa80064d9980 <br>&gt; 0000000000000080 fffffa80045f4b30 : srv!WorkerThread+0x189<br>&gt; fffff88005640d40 fffff800016aa486 : fffff80001844e80 fffffa80064d9980 <br>&gt; fffffa8006cbd360 fffff8800123ea90 : nt!PspSystemThreadStartup+0x5a<br>&gt; fffff88005640d80 0000000000000000 : fffff88005641000 fffff8800563b000 <br>&gt; fffff880056409f0 0000000000000000 : nt!KxStartSystemThread+0x16<br>&gt;<br>&gt;<br>&gt; STACK_COMMAND: kb<br>&gt;<br>&gt; FOLLOWUP_IP:<br>&gt; srv!SrvTerminateWorkerThread+3e<br>&gt; fffff880050eeeee 4883c420 add rsp,20h
>
> SYMBOL_STACK_INDEX: 7
>
> SYMBOL_NAME: srv!SrvTerminateWorkerThread+3e
>
> FOLLOWUP_NAME: MachineOwner
>
> MODULE_NAME: srv
>
> IMAGE_NAME: srv.sys
>
> DEBUG_FLR_IMAGE_TIMESTAMP: 4a5bc257
>
> FAILURE_BUCKET_ID:
> X64_0x20_NULLAPC_KAPC_NEGATIVE_VRF_srv!SrvTerminateWorkerThread+3e
>
> BUCKET_ID:
> X64_0x20_NULLAPC_KAPC_NEGATIVE_VRF_srv!SrvTerminateWorkerThread+3e
>
> Followup: MachineOwner
> ---------
>
>
>

If you use ERESOURCES run the !locks command. Also if you use registry
callbacks and the OS is Vista/W2K8 an exception in your callback routine can
be caught and ignored by the OS. I had this problem and the exception left
an ERESOURCE locked.

Bill Wandel

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@hotmail.com
Sent: Wednesday, January 06, 2010 2:10 PM
To: Windows File Systems Devs Interest List
Subject: [ntfsd] KERNEL_APC_PENDING_DURING_EXIT (20) Bugcheck

Hi All

I got a bug check from time to time, no clue how it happened, I checked all
my mutex, I can’t find a suspicious one, Anyone can shed light on that ?

Thanks

Ben Tsang

KERNEL_APC_PENDING_DURING_EXIT (20)
The key data item is the thread’s APC disable count.
If this is non-zero, then this is the source of the problem.
The APC disable count is decremented each time a driver calls
KeEnterCriticalRegion, FsRtlEnterFileSystem, or acquires a mutex. The APC
disable count is incremented each time a driver calls KeLeaveCriticalRegion,
FsRtlExitFileSystem, or KeReleaseMutex. Since these calls should always be
in pairs, this value should be zero when a thread exits. A negative value
indicates that a driver has disabled APC calls without re-enabling them. A
positive value indicates that the reverse is true.
If you ever see this error, be very suspicious of all drivers installed on
the machine – especially unusual or non-standard drivers. Third party file
system redirectors are especially suspicious since they do not generally
receive the heavy duty testing that NTFS, FAT, RDR, etc receive.
This current IRQL should also be 0. If it is not, that a driver’s
cancelation routine can cause this bugcheck by returning at an elevated
IRQL. Always attempt to note what you were doing/closing at the time of the
crash, and note all of the installed drivers at the time of the crash. This
symptom is usually a severe bug in a third party driver.
Arguments:
Arg1: 0000000000000000, The address of the APC found pending during exit.
Arg2: 000000000000fff2, The thread’s APC disable count
Arg3: 0000000000000000, The current IRQL
Arg4: 0000000000000001

Debugging Details:

DBGHELP: C:\Program Files\Debugging Tools for Windows
(x64)\sym\ntkrnlmp.exe\4A5BC6005dd000\ntkrnlmp.exe - OK
DBGHELP: C:\Program Files\Debugging Tools for Windows
(x64)\sym\srv.sys\4A5BC25798000\srv.sys - OK

BUGCHECK_STR: 0x20_NULLAPC_KAPC_NEGATIVE

DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT

PROCESS_NAME: System

CURRENT_IRQL: 2

LAST_CONTROL_TRANSFER: from fffff800017c16d2 to fffff800016c3f60

STACK_TEXT:
fffff880056403f8 fffff800017c16d2 : 0000000000000000 fffffa80064d9980
0000000000000065 fffff8000170a314 : nt!DbgBreakPointWithStatus
fffff88005640400 fffff800017c24be : 0000000000000003 0000000000000000
fffff80001706ee0 0000000000000020 : nt!KiBugCheckDebugBreak+0x12
fffff88005640460 fffff800016cc004 : 0000000000000000 0000000000000000
0000000000000010 fffff80001988d14 : nt!KeBugCheck2+0x71e fffff88005640b30 fffff80001988d29 : 0000000000000020 0000000000000000 000000000000fff2 0000000000000000 : nt!KeBugCheckEx+0x104 fffff88005640b70 fffff800019b22a1 : fffffa8000000000 fffff880c146feff fffff78000000320 fffffa8006452020 : nt! ?? ::NNGAKEGL::string'+0x29559 fffff88005640c40
fffff8000195c884 : 0000000000000000 0000000000000003 0000000000000008
fffff78000000320 : nt!PspTerminateThreadByPointer+0x4d fffff88005640c90 fffff880050eeeee : 0000000000000008 0000000000000000 fffff88005640d08 0000000000000008 : nt!PsTerminateSystemThread+0x24 fffff88005640cc0 fffff8800513173f : fffffa8000000010 0000000000000001 fffffa800723c010 0000000000000102 : srv!SrvTerminateWorkerThread+0x3e fffff88005640cf0 fffff8000196f166 : fffffa8007242010 fffffa80064d9980 0000000000000080 fffffa80045f4b30 : srv!WorkerThread+0x189 fffff88005640d40 fffff800016aa486 : fffff80001844e80 fffffa80064d9980 fffffa8006cbd360 fffff8800123ea90 : nt!PspSystemThreadStartup+0x5a fffff88005640d80 0000000000000000 : fffff88005641000 fffff8800563b000 fffff880056409f0 00000000`00000000 : nt!KxStartSystemThread+0x16

STACK_COMMAND: kb

FOLLOWUP_IP:
srv!SrvTerminateWorkerThread+3e
fffff880`050eeeee 4883c420 add rsp,20h

SYMBOL_STACK_INDEX: 7

SYMBOL_NAME: srv!SrvTerminateWorkerThread+3e

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: srv

IMAGE_NAME: srv.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 4a5bc257

FAILURE_BUCKET_ID:
X64_0x20_NULLAPC_KAPC_NEGATIVE_VRF_srv!SrvTerminateWorkerThread+3e

BUCKET_ID:
X64_0x20_NULLAPC_KAPC_NEGATIVE_VRF_srv!SrvTerminateWorkerThread+3e

Followup: MachineOwner


NTFSD is sponsored by OSR

For our schedule of debugging and file system seminars (including our new fs
mini-filter seminar) visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

> If you use ERESOURCES run the !locks command. Also if you use registry

callbacks and the OS is Vista/W2K8 an exception in your callback routine can
be caught and ignored by the OS. I had this problem and the exception left
an ERESOURCE locked.

This is exactly why exceptions are evil.

They are only good in managed languages, or in “strong C++” stuff where all lock acquisitions and all “enter the special state” calls are wrapped by the objects which will release these states in destructors.

Also, any resource allocation/non-C+±object creation should be wrapped this way - to automatically release the resource in a destructor.

Not many C++ coders really do these things, and, without them, exceptions are evil since they introduce the illusion of proper error handling, while in reality the error is not properly handled.

Dirty use of exception frames in C++ occurs in things like ATL-based OA server code, which usually runs in apartment threading and thus has no issues with locks. But even in such a code, if the exception occurs after CreateFile, it will leak the file handle.

Wrapping HANDLE to the object which will do CloseHandle in a destructor is the solution, so is catching all exceptions and converting them to good old “do the cleanup and return an error” code.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Thanks all your reply.

Since this happened around once a week. I am trying to log all my mutex and eresource when i acquire it now, when this happened again, maybe i can get something.

I already enable verify in my previous test, but it didn’t catch anything.

“Bill Wandel” wrote in message news:xxxxx@ntfsd…
> If you use ERESOURCES run the !locks command. Also if you use registry
> callbacks and the OS is Vista/W2K8 an exception in your callback routine
> can
> be caught and ignored by the OS. I had this problem and the exception left
> an ERESOURCE locked.
>

Could you tell more about this? Do you mean some OS bug in exception
handling or stack unwind?
– pa

The problem was that my code caused an exception accessing a user mode
address within by registry callback. I did not have a try/except block
around this access. The OS calls the callbacks within a try/except block and
caught this exception and ignored it. I consider this an OS bug. There was a
post to this list by someone from Microsoft describing this problem and
other CM issues. The note said that it is fixed in W7/R2. That note was
posted after I found the problem so it didn’t do me any good.
If you need it, I probably have the Microsoft post saved somewhere. A quick
look by me didn’t find it.

Bill Wandel

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Pavel A.
Sent: Thursday, January 07, 2010 3:46 PM
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] KERNEL_APC_PENDING_DURING_EXIT (20) Bugcheck

“Bill Wandel” wrote in message news:xxxxx@ntfsd…
> If you use ERESOURCES run the !locks command. Also if you use registry
> callbacks and the OS is Vista/W2K8 an exception in your callback
> routine can be caught and ignored by the OS. I had this problem and
> the exception left an ERESOURCE locked.
>

Could you tell more about this? Do you mean some OS bug in exception
handling or stack unwind?
– pa


NTFSD is sponsored by OSR

For our schedule of debugging and file system seminars (including our new fs
mini-filter seminar) visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Search ntfsd for “Registry Filtering Issues Identified at Filter Plugfest 21
and Possible”

Bill Wandel

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Pavel A.
Sent: Thursday, January 07, 2010 3:46 PM
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] KERNEL_APC_PENDING_DURING_EXIT (20) Bugcheck

“Bill Wandel” wrote in message news:xxxxx@ntfsd…
> If you use ERESOURCES run the !locks command. Also if you use registry
> callbacks and the OS is Vista/W2K8 an exception in your callback
> routine can be caught and ignored by the OS. I had this problem and
> the exception left an ERESOURCE locked.
>

Could you tell more about this? Do you mean some OS bug in exception
handling or stack unwind?
– pa


NTFSD is sponsored by OSR

For our schedule of debugging and file system seminars (including our new fs
mini-filter seminar) visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Thank you.
– pa

“Bill Wandel” wrote in message news:xxxxx@ntfsd…
> The problem was that my code caused an exception accessing a user mode
> address within by registry callback. I did not have a try/except block
> around this access. The OS calls the callbacks within a try/except block
> and
> caught this exception and ignored it. I consider this an OS bug. There was
> a
> post to this list by someone from Microsoft describing this problem and
> other CM issues. The note said that it is fixed in W7/R2. That note was
> posted after I found the problem so it didn’t do me any good.
> If you need it, I probably have the Microsoft post saved somewhere. A
> quick
> look by me didn’t find it.
>
> Bill Wandel
>
>
> -----Original Message-----
> From: xxxxx@lists.osr.com
> [mailto:xxxxx@lists.osr.com] On Behalf Of Pavel A.
> Sent: Thursday, January 07, 2010 3:46 PM
> To: Windows File Systems Devs Interest List
> Subject: Re:[ntfsd] KERNEL_APC_PENDING_DURING_EXIT (20) Bugcheck
>
> “Bill Wandel” wrote in message
> news:xxxxx@ntfsd…
>> If you use ERESOURCES run the !locks command. Also if you use registry
>> callbacks and the OS is Vista/W2K8 an exception in your callback
>> routine can be caught and ignored by the OS. I had this problem and
>> the exception left an ERESOURCE locked.
>>
>
> Could you tell more about this? Do you mean some OS bug in exception
> handling or stack unwind?
> – pa
>
>
>
> —
> NTFSD is sponsored by OSR
>
> For our schedule of debugging and file system seminars (including our new
> fs
> mini-filter seminar) visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>
>

Hi,Folks

After I logged all Mutex and ERESOURCE acquiring and releasing, I did get this bug check again, I checked my log, all Mutex and ERESOURCE are matched each other.

I am running the test in windows 2008 R2, and with driver verifier enable,there are Infiniband driver from Mellanox installed and plus my filter driver in this computer.

Any suggestion for my next step test?

Thanks
Tsang

This bugcheck has to be the first of the two that may occur in
PspExitThread:

if (Thread->Tcb.CombinedApcDisable != 0)
{
KeBugCheckEx(KERNEL_APC_PENDING_DURING_EXIT,
0,
Thread->Tcb.CombinedApcDisable,
0,
1);
}

Unless someone posts a better advice, try this:

Get offset of Tcb.CombinedApcDisable for your OS from your symbols,
then write a macro that will call PsGetCurrentThread, reads
WORD from that offset. If it is below zero, throw an assert.

Put that macro your driver’s functions.

Not sure that it helps. As I wrote: Unless anyone suggests something
better …

L.