APC_INDEX_MISMATCH

I have a client who has asked me to chase down some APC_INDEX_MISMATCH crashes. The crash says some driver has disabled kernel APCs without re-enabling them. The suspect driver is a registry callback filter, but it does nothing that would affect the APC state. The BSOD happens in their Windows service, but not inside their driver. Their driver IS running in another CPU, but it has just started a callback and is still fetching handles and contexts.

So, what seemingly innocent operations would disable kernel APCs? Are these mismatches detected immediately, or can this be leftover from other operations? I’m suspicious because the thread being blamed is not the one with the suspect driver, although it’s doing an inverted call completion for every filtered operation, so the service interacts with the driver a lot.

Just some background:

Threads start off with an APC disable count of zero. Entering a Critical or Guarded region will subtract from the the thread’s count and leaving the region adds to it. If a thread tries to exit or return to user mode with a non-zero APC disable count you get the bugcheck. You can see the Critical Region count here:

1: kd> dt nt!_kthread @$thread KernelApcDisable
   +0x1e4 KernelApcDisable : 0n0

The Guarded Region count here:

1: kd> dt nt!_kthread @$thread SpecialApcDisable
   +0x1e6 SpecialApcDisable : 0n0

And the combined value:

1: kd> dt nt!_kthread @$thread CombinedApcDisable
   +0x1e4 CombinedApcDisable : 0

Things to look for:

  1. Explicit calls to KeEnterCriticalRegion or KeEnterGuardedRegion
  2. All PASSIVE_LEVEL locks will enter a Critical or Guarded Region to avoid thread suspension. Usually when I see these crashes it’s actually because of a lock still being held
  3. Verifier logs calls to KeEnter/KeLeave and you can dump that log with !verifier 200. This is really only going to help if the driver is directly calling these APIs though