Hi,
I’m experiencing a live-lock situation withe using CSQ on a multi-processor
machine. The issue doesn’t happen in a single processor machine. First off
my driver is a top level driver that creates a single worker thread that
handles queued IOCTLS for user requests in system context. I use ERESOURCE
in my CSQ callback routines as below:
//
VOID MyDrvAcquireLock(
IN PIO_CSQ Csq,
OUT PKIRQL Irql
)
{
PMyDrv_DEVICE_EXTENSION devExtension;
devExtension = CONTAINING_RECORD(Csq,
MyDrv_DEVICE_EXTENSION, CancelSafeQueue);
KeEnterCriticalRegion();
ExAcquireResourceExclusiveLite( &devExtension->IrpQueueLock, TRUE );
}
//
VOID MyDrvReleaseLock(
IN PIO_CSQ Csq,
IN KIRQL Irql
)
{
PMyDrv_DEVICE_EXTENSION devExtension;
devExtension = CONTAINING_RECORD(Csq,
MyDrv_DEVICE_EXTENSION, CancelSafeQueue);
ExReleaseResourceLite( &devExtension->IrpQueueLock );
KeLeaveCriticalRegion();
}
Here is how my cleanup dispatch routine looks like:
NTSTATUS
MyDrvCleanup(
IN PDEVICE_OBJECT DeviceObject,
IN PIRP Irp
)
{
PMyDrv_DEVICE_EXTENSION devExtension;
LIST_ENTRY tempQueue;
PLIST_ENTRY thisEntry;
PIRP pendingIrp;
PIO_STACK_LOCATION pendingIrpStack, irpStack;
devExtension = DeviceObject->DeviceExtension;
irpStack = IoGetCurrentIrpStackLocation(Irp);
while(pendingIrp = IoCsqRemoveNextIrp(&devExtension->CancelSafeQueue,
irpStack->FileObject))
{
// Cancel the IRP
pendingIrp->IoStatus.Information = 0;
pendingIrp->IoStatus.Status = STATUS_CANCELLED;
MyDrv_KDPRINT((“Cleanup cancelled irp %p\n”, pendingIrp));
IoCompleteRequest(pendingIrp, IO_NO_INCREMENT);
}
// Finally complete the cleanup IRP
Irp->IoStatus.Information = 0;
Irp->IoStatus.Status = STATUS_SUCCESS;
IoCompleteRequest(Irp, IO_NO_INCREMENT);
return STATUS_SUCCESS;
}
After the worker thread processes a large volume of requests, the machine
hangs. Looking at the stack trace of my process, which is sending IOCTLS to
the driver, I see the following trace every time:
THREAD 81557da8 Cid 0708.01b0 Teb: 7ffde000 Win32Thread: e15d9798
RUNNING on processor 1
IRP List:
816da360: (0006,0094) Flags: 00000404 Mdl: 00000000
Not impersonating
DeviceMap e15531d0
Owning Process 8166f1f8 Image:
MyUserProg.exe
Wait Start TickCount 35456 Ticks: 16146 (0:00:04:
12.281)
Context Switch Count 89534 LargeStack
UserTime 00:00:00.0546
KernelTime 00:04:17.0562
Win32 Start Address MyUserProg. (0x0043d73a)
Start Address kernel32!BaseProcessStartThunk (0x77e813f2)
Stack Init f7b23000 Current f7b22bec Base f7b23000 Limit f7b1e000
Call 0
Priority 8 BasePriority 8 PriorityDecrement 0 DecrementCount 16
ChildEBP RetAddr
f7b22bf4 80527571 hal!KeReleaseQueuedSpinLock+0x51 (FPO: [0,1,0])
f7b22c14 f7ab5ab4 nt!ExAcquireResourceExclusiveLite+0x65 (FPO:
[Non-Fpo])
f7b22c28 f7ac2934 MyDrvdrv!MyDrvAcquireLock+0x24 (FPO: [Non-Fpo])
(CONV: stdcall)
f7b22c40 f7ab5ded MyDrvdrv!WdmlibIoCsqRemoveNextIrp+0x16 (FPO:
[Non-Fpo])
f7b22c6c 804eb605 MyDrvdrv!MyDrvCleanup+0x2d (FPO: [Non-Fpo]) (CONV:
stdcall)
f7b22c7c 8056a12c nt!IopfCallDriver+0x31 (FPO: [0,0,1])
f7b22ca8 805a0a63 nt!IopCloseFile+0x23a (FPO: [Non-Fpo])
f7b22cd8 805a041b nt!ObpDecrementHandleCount+0x119 (FPO: [Non-Fpo])
f7b22d00 805a04b1 nt!ObpCloseHandleTableEntry+0x14b (FPO: [Non-Fpo])
f7b22d48 805a05d7 nt!ObpCloseHandle+0x85 (FPO: [Non-Fpo])
f7b22d58 80531814 nt!NtClose+0x19 (FPO: [1,0,0])
f7b22d58 7ffe0304 nt!KiSystemService+0xc9 (FPO: [0,0] TrapFrame @
f7b22d64)
0012e894 77f5b5d4 SharedUserData!SystemCallStub+0x4 (FPO: [0,0,0])
0012e898 77e7a683 ntdll!ZwClose+0xc (FPO: [1,0,0])
0012e8a0 10091f0b kernel32!CloseHandle+0x4d (FPO: [1,0,0])
As can be seen, upon issue of CloseHandle from user app I run into this
situation. Here is snip from !stacks command that shows this thread is
running on PROCESSOR 1. So it’s a live lock:
[8166f1f8 MyUserProg.exe]
708.0001b0 81557da8 0003f12 RUNNING hal!KeReleaseQueuedSpinLock+0x51
Like I said earlier, this *only* happens when I run this code on pentium
dual-core machine and doesn’t happen on a single processor machine.
My questions are:
- Are there any issues for using CSQ on a multi-processor m/c.
- Or, this is due to using a ERESOURCE in a multi-processor environment. If
so, what locking primitive is suggested so that this works in both uni and
multi-processor environments.
Thanks in advance,
Chandra