IRQL_GT_ZERO_AT_SYSTEM_SERVICE

Any thoughts on what might cause this?

27: kd> !analyze -v


  •                                                                         *
    
  •                    Bugcheck Analysis                                    *
    
  •                                                                         *
    

IRQL_GT_ZERO_AT_SYSTEM_SERVICE (4a)
Returning to usermode from a system call at an IRQL > PASSIVE_LEVEL.
Arguments:
Arg1: 00007ffe73c2fad4, Address of system function (system call routine)
Arg2: 0000000000000002, Current IRQL
Arg3: 0000000000000000, 0
Arg4: ffff828fe1bc6b80, 0

Debugging Details:

KEY_VALUES_STRING: 1

PROCESSES_ANALYSIS: 1

SERVICE_ANALYSIS: 1

STACKHASH_ANALYSIS: 1

TIMELINE_ANALYSIS: 1

DUMP_CLASS: 1

DUMP_QUALIFIER: 401

BUILD_VERSION_STRING: 17763.1.amd64fre.rs5_release.180914-1434

SYSTEM_MANUFACTURER: Supermicro

SYSTEM_PRODUCT_NAME: X10DRi

SYSTEM_SKU: 072815D9

SYSTEM_VERSION: 123456789

BIOS_VENDOR: American Megatrends Inc.

BIOS_VERSION: 3.2

BIOS_DATE: 11/22/2019

BASEBOARD_MANUFACTURER: Supermicro

BASEBOARD_PRODUCT: X10DRi

BASEBOARD_VERSION: 1.10

DUMP_TYPE: 1

BUGCHECK_P1: 7ffe73c2fad4

BUGCHECK_P2: 2

BUGCHECK_P3: 0

BUGCHECK_P4: ffff828fe1bc6b80

PROCESS_NAME: ViTLService.exe

BUGCHECK_STR: RAISED_IRQL_FAULT

FAULTING_IP:
+0
00007ffe`73c2fad4 ?? ???

CPU_COUNT: 20

CPU_MHZ: 960

CPU_VENDOR: GenuineIntel

CPU_FAMILY: 6

CPU_MODEL: 3f

CPU_STEPPING: 2

CPU_MICROCODE: 6,3f,2,0 (F,M,S,R) SIG: 44’00000000 (cache) 44’00000000 (init)

BLACKBOXBSD: 1 (!blackboxbsd)

DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT

CURRENT_IRQL: 2

ANALYSIS_SESSION_HOST: DEVELOPMENT10

ANALYSIS_SESSION_TIME: 07-09-2021 14:47:49.0594

ANALYSIS_VERSION: 10.0.18362.1 amd64fre

LAST_CONTROL_TRANSFER: from fffff8064047d1e9 to fffff8064046b970

STACK_TEXT:
ffff828fe1bc6948 fffff8064047d1e9 : 000000000000004a 00007ffe73c2fad4 0000000000000002 0000000000000000 : nt!KeBugCheckEx
ffff828fe1bc6950 fffff8064047d083 : 001d81b3e6a76060 000000001bd16d04 ffffa687f62ab600 ffffb80055a80180 : nt!KiBugCheckDispatch+0x69
ffff828fe1bc6a90 00007ffe73c2fad4 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : nt!KiSystemServiceExitPico+0x1fe
000000001e0ff828 0000000000000000 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : 0x00007ffe`73c2fad4

THREAD_SHA1_HASH_MOD_FUNC: 1b1fd012b2a510c586295e696f84a9476c8f91e5

THREAD_SHA1_HASH_MOD_FUNC_OFFSET: 6a054393ae1713fef08345a54701ed3a92fa10c6

THREAD_SHA1_HASH_MOD: 2a7ca9d3ab5386d53fea7498e1d81b9c4a4c036b

FOLLOWUP_IP:
nt!KiSystemServiceExitPico+1fe
fffff806`4047d083 4883ec50 sub rsp,50h

FAULT_INSTR_CODE: 50ec8348

SYMBOL_STACK_INDEX: 2

SYMBOL_NAME: nt!KiSystemServiceExitPico+1fe

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: nt

IMAGE_NAME: ntkrnlmp.exe

DEBUG_FLR_IMAGE_TIMESTAMP: 0

IMAGE_VERSION: 10.0.17763.2029

STACK_COMMAND: .thread ; .cxr ; kb

BUCKET_ID_FUNC_OFFSET: 1fe

FAILURE_BUCKET_ID: RAISED_IRQL_FAULT_ViTLService.exe_nt!KiSystemServiceExitPico

BUCKET_ID: RAISED_IRQL_FAULT_ViTLService.exe_nt!KiSystemServiceExitPico

PRIMARY_PROBLEM_CLASS: RAISED_IRQL_FAULT_ViTLService.exe_nt!KiSystemServiceExitPico

TARGET_TIME: 2021-07-09T19:01:22.000Z

OSBUILD: 17763

OSSERVICEPACK: 2029

SERVICEPACK_NUMBER: 0

OS_REVISION: 0

SUITE_MASK: 272

PRODUCT_TYPE: 3

OSPLATFORM_TYPE: x64

OSNAME: Windows 10

OSEDITION: Windows 10 Server TerminalServer SingleUserTS

OS_LOCALE:

USER_LCID: 0

OSBUILD_TIMESTAMP: unknown_date

BUILDDATESTAMP_STR: 180914-1434

BUILDLAB_STR: rs5_release

BUILDOSVER_STR: 10.0.17763.1.amd64fre.rs5_release.180914-1434

ANALYSIS_SESSION_ELAPSED_TIME: c40

ANALYSIS_SOURCE: KM

FAILURE_ID_HASH_STRING: km:raised_irql_fault_vitlservice.exe_nt!kisystemserviceexitpico

FAILURE_ID_HASH: {cdd5bbd8-cd2e-d628-3ec7-7946549935de}

Followup: MachineOwner

Do you have a driver in this game? Is VITLService one of your services? The bugcheck seems pretty clear; a driver returned without lowering the IRQL. Either a dangling spin lock, or some foolish manual IRQL manipultion.

Yes, there are two drivers involved and one is ours. Ours is a legacy WDM software only driver that interacts with another vendor’s driver via function pointers exchanged during initialization (driver entry). The vendor’s driver interacts with their hardware and ours uses their API.

So far I haven’t been able to reproduce it in our environment, but I’m able to reproduce it consistently in a customer’s environment.
(Don’t you love those cases?)

At this point I’m not sure if our driver is causing the crash or theirs. I was hoping the crash dump would shed some light on that. In the past when I found a bug in our code it was pretty obvious by the stack text in the crash dump.

I’ve checked to make sure KeReleaseSpinLock() is called before IoCompleteRequest() in any cases where the spinlock is acquired.

Would this happen if the vendor’s driver is calling one of our functions at something higher than DISPATCH_LEVEL and then we call IoCompleteRequest() or would that be a different bug check error code? I can’t imagine they would do that because that would mean they would be calling our function from their ISR.

Thanks for your input on this.
Erik

Do you have enough logging to tell which request triggers this? That might help. IoCompleteRequest can be called at dispatch level just fine. This is not about completion, this is happening when a driver returns back to the user.

So far still no luck reproducing this in our environment.
I’m going to see if the customer will be gracious enough to let me try and debug it in their environment.

I’ve been looking at the cancellation routine code, thinking maybe the user mode app is crashing with an IRP in flight.

I see something that looks a little odd in the cancel routine:

OldIRQL = KeGetCurrentIrql();

if (OldIRQL < DISPATCH_LEVEL)
KeAcquireSpinLock(&DeviceExtension->TSPISpinLock, &OldIRQL);
else
KeAcquireSpinLockAtDpcLevel(&DeviceExtension->TSPISpinLock);

… (do some stuff with spinlock held)

if (OldIRQL < DISPATCH_LEVEL)
KeReleaseSpinLock(&DeviceExtension->TSPISpinLock, OldIRQL);
else
KeReleaseSpinLockFromDpcLevel(&DeviceExtension->TSPISpinLock);

I don’t see the advantage of using KeAcquireSpinLockAtDpcLevel/KeReleaseSpinLockFromDpcLevel in a cancellation routine.

So I was finally able to reproduce this in our environment and I found that downgrading the vendor’s driver to an older version stopped it from crashing. I’m still not sure if their driver is crashing or if it’s behaving in a way that causes ours to crash. Will be debugging it further now that I can reproduce it.

Just thought I would follow up with the resolution to this issue.

There was a certain condition in our code that would cause KeReleaseSpinLock to get called twice on the same lock.
I was surprised that SDV didn’t flag it.

The way I finally found it was by writing wrapper routines around KeAcquireSpinLock and KeReleaseSpinLock that incremented and decremented a counter. If the counter ever went above 1 or below 0, I manually bugchecked the system.

This was tricky to debug because there wasn’t an immediate crash and in our case the timing had to be just right for it to travel down the code path where the 2nd release of the lock occurred.

Thanks for following-up. That’s always helpful for the archives.

KeAcquireSpinLock and KeReleaseSpinLock are really very simple functions, and – as you’ve discovered – do no error checking. It IS disappointing that SDV didn’t catch this. Hmmmm…

Again, thanks for following up. Great bug!

Peter