Conceptual issue with linked list

Hi everybody,

I am using doubly linked lists in my KMDF driver and randomly get LIST_HEAD corruption which I don’t understand why.
Just wondering if there’s something conceptually wrong with my use of doubly linked lists.

I have three lists using LIST_ENTRY; each list has their own KSPIN_LOCK to protect accesses via InsertTailList() and RemoveHeadList() using KeAcquireInStackQueuedSpinLock() and KeReleaseInStackQueuedSpinLock().

Entries are moved between the lists in the secondary interrupt handler (DPC = DISPATCH_LEVEL), but also via IOCTLs at PASSIVE_LEVEL.
My understanding is that using KeAcquireInStackQueuedSpinLock will raise to DISPATCH_LEVEL in any instance and also protect on SMP systems; and therefore the LIST_HEAD should never get corrupted.
This system works well, sometimes for hours, but randomly I get a BSOD with the list corruption (KERNEL_SECURITY_CHECK_FAILURE 139).

Is there anything conceptually wrong with this approach?
Should I use a single lock for all three lists?

Thanks,
Timber

  1. The KLOCK_QUEUE_HANDLE you pass to the APIs must be located on the stack, do not share a KLOCK_QUEUE_HANDLE in your device extension or other context

  2. Acquiring the lock does raise the IRQL to DISPATCH_LEVEL for the processor that you are executing on at that moment. It does not block other threads running on other cores from executing code at DISPATCH_LEVEL (unless it is a single core machine). Are you acquiring the spin lock in your DPC or assuming that IRQL alone is protecting the list? Note that in your DPC you should call KeAcquireInStackQueuedSpinLockAtDpcLevel as the IRQL is known.

Thanks Doron.
Yes, the KLOCK_QUEUE_HANDLE handle is on the stack.

I have a wrapper function for accessing the lists and am using KeAcquireInStackQueuedSpinLock because it can be used at any level.
I will change that to KeAcquireInStackQueuedSpinLockAtDpcLevel in the appropriate code paths when I am optimising the driver. I think it’s mostly for optimisation that I should be using KeAcquireInStackQueuedSpinLockAtDpcLevel in the DPC, right?

So I guess on an SMP machine my driver could be executing the DPC and accessing the lists while the user calls in via the IOCTL but their app runs on the other processor and hence corrupting the lists?
How can I protect against that, if that’s the case?

Remember that you also have to protect all the READS of your list, not just Insert and Remove. The routines that update the lists can temporarily leave the pointers in an inconsistent state.

There are no other accesses to the lists other than InsertTailList() and RemoveHeadList()
RemoveHeadList either returns a valid item that can be processed or it doesn’t and if there is an item that has been processed it just gets put into the list.

Running this system on an SMP machine, how can I protect accesses if KeAcquireInStackQueuedSpinLock is not enough?
Surely I can somehow safely access driver managed lists via an IOCTL even if the driver’s DPC and the userland app run on different processors.

You were not clear in our initial post if all accesses were guarded by the spinlock. Based on that ambiguity, I had to assume you were potentially not using the spinlock to guard access in the DPC. If you are using the spin lock to guard all access to the list, I would recommend looking at if

  1. you are checking for the list to be empty before removing an item

  2. after dropping the lock, are you manipulating/dereferencing LIST_ENTRY::Flink/Blink and potentially reaching back into list entries still in the list? I have guarded against this by either initializing the list entry after removal (so flink/blink point back to itself) or set both fields to null to catch accidental access.

You didn’t mention the details of the bugcheck. How is the list corrupted? how much debugging of the state of the corrupted list have you done?

Surely I can somehow safely access driver managed lists via an IOCTL even if the driver’s DPC and the userland app run on different processors.

Sure, as along as you serialize the accesses with a spin lock.

Doron, sorry that it wasn’t clear; yes all list accesses are protected by the lock and once I get a list item I don’t touch the item’s LIST_ENTRY.
When removing an item I acquire the lock, call IsListEmpty() and then RemoveHeadList() if the list was not empty.

When I get an item off a list should I initialize the item’s LIST_ENTRY? But that wouldn’t be the cause for the issue, i.e. list head corruption, I think.

The bugcheck KERNEL_SECURITY_CHECK_FAILURE states: A LIST_ENTRY has been corrupted (i.e. double remove)
I have added some code to record each list’s add/remove calls so that I can look at the history when the bugcheck occurs.
All add/removes seem ok, but the list head is corrupt and it points to an item that is not on the list, but is on another list.

Is using an SMP machine a red herring in terms of KeAcquireInStackQueuedSpinLock not fully protecting the lists when driver and userland app can run on different processors?

@Tim_Roberts said:
Sure, as along as you serialize the accesses with a spin lock.

Ok thanks, so KeAcquireInStackQueuedSpinLock() is enough even on SMP machines.
I got confused as it was mentioned that it only raises IRQL only on the one processor, but the spin lock still protects across processors.

Is the second list also protected with a spinlock? When the item is inserted into the second list, do you use a second list entry or reuse the list entry for when the object is in the first list ?

Each of the three lists has their own spin lock.
The item only has one LIST_ENTRY and that is re-used for all three lists.
An item can only ever be in one list and is only ever processed when it’s not in any list and while being processed it’s not shared between threads.

Given your brief description of the bug check it sounds like you are inserting the object into two lists simultaneously. To debug you can initialize the object list entry immediately after removing it from a list and then check IsListEmpty(object->entry) every time you insert it onto a list. If false, bug check manually and figure out why the state doesn’t match your expectation.

I will try that, thanks; it should give me another clue why that’s happening.
Thanks so much for all your pointers, Doron and Tim! Much appreciated.