Question on InStackQueued Spinlocks (and Threaded DPC)...

Hi

A) InStackQueued Spinlocks

WDK says

“The driver allocates a KLOCK_QUEUE_HANDLE structure that it passes by pointer to KeAcquireInStackQueuedSpinLock. The driver passes the same structure by pointer to KeReleaseInStackQueuedSpinLock when it releases the spin lock.
****Drivers should normally allocate the structure on the stack each time they acquire the lock.****”

  1. Why do KLOCK_QUEUE_HANDLE need to be on stack? Can I have just have it on NPAGED heap?
  2. Is it just not advised to have that elsewhere other than stack or just illegal . I guess it will be the former, but just want to confirm?
  3. Also can I acquire the InStackQueued lock in one thread and release it on a different thread?

B) Threaded DPC

WDK Says

" ***On server systems, where overall system performance is more important than system latency, threaded DPCs work in the identical manner as ordinary DPCs do***.
It is only on client systems, where a high system latency causes the system to appear unresponsive, that threaded DPCs can be preempted by real-time threads."

  1. But on 2K8 Server my threaded DPC gets called both at passive and dispatch levels. What did I miss here, Does OS not enforce that? If so why include that text in the WDK verbatim?

–thx

  1. each thread which acquires an in stack queued spinlock must have its own unique klock_queue_handle pointer value. Why? Because if the lock is acquired, each one is placed into a linked list of waiters. If two threads shared the same lock handle value, it would corrupt this list since the handle cannot represent both threads. This also means you cannot put the klock queue handle in your device extension.

  2. you could allocate it each time you attempt to acquire the lock, but that is just silly b/c the allocation can fail. Just put it onto the stack.

  3. no, you would leak the IRQL on the acquiring thread. It must be released on the same thread that acquired the lock. Only some dispatcher objects can be acquired in one thread and released in another

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@yahoo.com
Sent: Thursday, December 04, 2008 11:50 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Question on InStackQueued Spinlocks (and Threaded DPC)…

Hi

A) InStackQueued Spinlocks

WDK says

“The driver allocates a KLOCK_QUEUE_HANDLE structure that it passes by pointer to KeAcquireInStackQueuedSpinLock. The driver passes the same structure by pointer to KeReleaseInStackQueuedSpinLock when it releases the spin lock.
****Drivers should normally allocate the structure on the stack each time they acquire the lock.****”

  1. Why do KLOCK_QUEUE_HANDLE need to be on stack? Can I have just have it on NPAGED heap?
  2. Is it just not advised to have that elsewhere other than stack or just illegal . I guess it will be the former, but just want to confirm?
  3. Also can I acquire the InStackQueued lock in one thread and release it on a different thread?

B) Threaded DPC

WDK Says

" ***On server systems, where overall system performance is more important than system latency, threaded DPCs work in the identical manner as ordinary DPCs do***.
It is only on client systems, where a high system latency causes the system to appear unresponsive, that threaded DPCs can be preempted by real-time threads."

  1. But on 2K8 Server my threaded DPC gets called both at passive and dispatch levels. What did I miss here, Does OS not enforce that? If so why include that text in the WDK verbatim?

–thx


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

> Also can I acquire the InStackQueued lock in one thread and release it on a different thread?

What do you think yourself??? Spinlocks work on per-CPU, rather than per-thread, basis. It is understandable that if spinlock has been acquired by CPU X, it can get released only by CPU X. Spinlocks are held at elevated IRQL, which means context switch cannot occur on CPU X while spinlock is being held. The only situation when the above scenario is possible is the one when you lower IRQL while spinlock is being held, but, unless you are just desperate to deadlock, you will never do anything like that…

Anton Bassov.

See below…

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@yahoo.com
Sent: Friday, December 05, 2008 2:50 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Question on InStackQueued Spinlocks (and Threaded DPC)…

Hi

A) InStackQueued Spinlocks

WDK says

“The driver allocates a KLOCK_QUEUE_HANDLE structure that it passes by
pointer to KeAcquireInStackQueuedSpinLock. The driver passes the same
structure by pointer to KeReleaseInStackQueuedSpinLock when it releases the
spin lock.
****Drivers should normally allocate the structure on the stack each time
they acquire the lock.****”

  1. Why do KLOCK_QUEUE_HANDLE need to be on stack? Can I have just have it on
    NPAGED heap?
    ************************************************************************
    It doesn’t *need* to be on the stack, but the idea is that the stack will be
    allocated in the local memory of the CPU that is running the current thread
    that is setting the lock. This avoids interprocessor bus contention and
    local memory contention in NUMA architectures. If you use a heap address,
    there is no control over the memory contention issue, so this would be an
    Extremely Bad Strategy.

Also, if it is allocated anywhere else, it has to be on a per-call basis,
and this means you must know each location in your driver that can acquire
the lock, and have a KQH which is exclusive to that location. This creates
weird dependencies, fragile-under-maintenance code, creates module
interdependcies to replace clean component interfaces, and generally
violates most standards of good programming.
************************************************************************

  1. Is it just not advised to have that elsewhere other than stack or just
    illegal . I guess it will be the former, but just want to confirm?
    ************************************************************************
    As indicated, it would be an exceptionally poor design decision to put it in
    a heap-allocated page. The whole point of a QSL is that it is designed to
    reduce memory conflicts in large multiprocessor environments, so why would
    you want to defeat this? (The FIFO queueing is a cool side benefit)
    ************************************************************************

  2. Also can I acquire the InStackQueued lock in one thread and release it on
    a different thread?
    ************************************************************************
    No. Because, once you have acquired it, no other thread can run; a spin
    lock blocks scheduling because you are raised to DISPATCH_LEVEL. There’s no
    way any other thread can run. The question doesn’t even make sense. How is
    it that any other thread could *possibly* know when a spin lock could be
    released? Nothing else can run on that CPU until the thread that set the
    lock releases the lock, so on a uniprocessor, you have instant deadlock if
    you think you can do this. In a multiprocessor system, there’s no way for
    the thread that set it to release control to another CPU. This issue
    applies to any locking structure (for example, ONLY the thread that sets a
    mutex [user or kernel space] is permitted to release the mutex, and this is
    enforced). So other than the fact that it makes no logical sense, creates
    deadlock, and could not occur under any imaginable conditions of correct
    code, there’s nothing wrong with the idea.

Note that it wouldn’t make sense for either a queued spin lock or a Spin
Lock Classic. It wouldn’t make sense for *any* mutual-exclusion mechanism
to allow other than the owner of the lock to release the lock.
************************************************************************

B) Threaded DPC

WDK Says

" ***On server systems, where overall system performance is more important
than system latency, threaded DPCs work in the identical manner as ordinary
DPCs do***.
It is only on client systems, where a high system latency causes the system
to appear unresponsive, that threaded DPCs can be preempted by real-time
threads."

  1. But on 2K8 Server my threaded DPC gets called both at passive and
    dispatch levels. What did I miss here, Does OS not enforce that? If so why
    include that text in the WDK verbatim?

–thx


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Thanks
Was converting legacy locks to Q’ed spinlocks in existing codebase (As I am hitting that too much tiem at DPC issue I think), so was looking for route to do that.
Anyways will get that handle from stack which will giev me all the benefits my stack needs anyways.
Yes spinlock having no thread affinity is plain bad…

Several issues:

You cannot set a spin lock if you need more than 10usec to do your action.
You have to redesign your code so you meet the timing requirements. Legacy
to queued spin locks is not going to change that problem; they both have the
same limitations that are supposed to be obeyed.

Spin locks also do not have recursive acquisition semantics.
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@yahoo.com
Sent: Friday, December 05, 2008 2:54 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Question on InStackQueued Spinlocks (and Threaded
DPC)…

Thanks
Was converting legacy locks to Q’ed spinlocks in existing codebase (As I am
hitting that too much tiem at DPC issue I think), so was looking for route
to do that.
Anyways will get that handle from stack which will giev me all the benefits
my stack needs anyways.
Yes spinlock having no thread affinity is plain bad…


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Another possibility is that time-out is not caused by long hold times, but by some bug in your code. Since you were asking about releasing spin-lock from a different thread, if your existing code does that you are in trouble. When you hit the time-out you need to debug it to figure out the root cause.
Thanks,
Alex

Date: Fri, 5 Dec 2008 23:57:15 -0500> From: xxxxx@flounder.com> Subject: RE: [ntdev] Question on InStackQueued Spinlocks (and Threaded DPC)…> To: xxxxx@lists.osr.com> > Several issues:> > You cannot set a spin lock if you need more than 10usec to do your action.> You have to redesign your code so you meet the timing requirements. Legacy> to queued spin locks is not going to change that problem; they both have the> same limitations that are supposed to be obeyed.> > Spin locks also do not have recursive acquisition semantics. > joe> > > -----Original Message-----> From: xxxxx@lists.osr.com> [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@yahoo.com> Sent: Friday, December 05, 2008 2:54 PM> To: Windows System Software Devs Interest List> Subject: RE:[ntdev] Question on InStackQueued Spinlocks (and Threaded> DPC)…> > Thanks> Was converting legacy locks to Q’ed spinlocks in existing codebase (As I am> hitting that too much tiem at DPC issue I think), so was looking for route> to do that. > Anyways will get that handle from stack which will giev me all the benefits> my stack needs anyways. > Yes spinlock having no thread affinity is plain bad… > > —> NTDEV is sponsored by OSR> > For our schedule of WDF, WDM, debugging and other seminars visit: > http://www.osr.com/seminars\> > To unsubscribe, visit the List Server section of OSR Online at> http://www.osronline.com/page.cfm?name=ListServer\> > – > This message has been scanned for viruses and> dangerous content by MailScanner, and is> believed to be clean.> > > —> NTDEV is sponsored by OSR> > For our schedule of WDF, WDM, debugging and other seminars visit: > http://www.osr.com/seminars\> > To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer


Suspicious message? There?s an alert for that.
http://windowslive.com/Explore/hotmail?ocid=TXT_TAGLM_WL_hotmail_acq_broad2_122008

Thanks.

The issue I am seeing is at http://www.osronline.com/showThread.cfm?link=143802

[? 0] I think is ‘too much time at DPC’ issue ??

A)

>>>>>>
You cannot set a spin lock if you need more than 10usec to do your action.
You have to redesign your code so you meet the timing requirements.
Legacy to queued spin locks is not going to change that problem;
they both have the same limitations that are supposed to be obeyed.
<<<<<<<

My understanding is QSL (q slocks) have 2 improvements over legacy SLs’
a) Serviced in FIFO rather than LIFO
b) By having the KLOCK_QUEUE_HANDLE on stack, you reduce memory conflicts in large multiprocessor environments

I assumed a) is primary benefit.
But earlier it was pointed to me (b) is the primary benefit

[?1] W.R.T (a)
-My first thread (T1 on proc 1 ) is blocked on SLock
-Suppose say threads (T2, T3) from *other procs* also get blocked
-Lock becomes available (from T4 from Proc4 etc)
-But T2, T3 get serviced first,
-So can’t (in theory etc) other proc Threads here constantly get ahead in the wait-list and make T1 starve for enough ticks that Proc #1 spits out that “too much time at DPC error” ??

[?1.1] I know still the issue is in my stack, but legacy locks added to the delay above here ? sort of ?

[B]

>>>>>>>
…Spin locks also do not have recursive acquisition semantics.

…Another possibility is that time-out is not caused by long hold times, but by some bug in your code.
<<<<<<<<
[?2] Does DVerifier call it deadlock on recursive attempt?
I did not get DV BSOD during this, so assumed it is ‘too much time at DPC error’

[C]

>>>>
…Since you were asking about releasing spin-lock from a
different thread, if your existing code does that you are in trouble.

***…In a multiprocessor system, there’s no way for
the thread that set it to release control to another CPU…

<<<

Again http://www.osronline.com/showThread.cfm?link=143802.
I do DPC redirection.

*** [?3] Is this possible ??
-T1 on proc1 acquired lock
-Q’ed a DPC item on to other Proc
-T2 on other proc now tried to release the lock acquired in step 1

Blatantly bad, but threw that question on consequence in case a different thread/proc released the lock ???

Looked at code again, obviously that’s not the case :slight_smile:

[?3.1] But in case that happens, does DV has any sort of bugchecks?

–thx

Hi,
I looked at your other thread, and you only show the stack for the thread hitting the time out. Usually to debug something like this you would want to at least figure out who is holding the lock and why the lock is being held for a long time. What are the call stacks for other processors? Enabling DPC ETW tracing might also help understand what is happening.
Alex

Date: Sat, 6 Dec 2008 01:46:47 -0500> From: xxxxx@yahoo.com> To: xxxxx@lists.osr.com> Subject: RE:[ntdev] Question on InStackQueued Spinlocks (and Threaded DPC)…> > Thanks.> > The issue I am seeing is at http://www.osronline.com/showThread.cfm?link=143802 > > [? 0] I think is ‘too much time at DPC’ issue ??


Send e-mail anywhere. No map, no compass.
http://windowslive.com/Explore/hotmail?ocid=TXT_TAGLM_WL_hotmail_acq_anywhere_122008

>>>>
Usually to debug something like this you would want to at
least figure out who is holding the lock and why the lock is being held for a
long time. What are the call stacks for other processors?
<<<<<<<<<

It was 2-proc. The otehr proc did not have anything on it AFAIK (just a few stack frames involving that intelpoidle+xxx or something like that…).
Will dump that also next time.

Will try DPC ETW tracing…

Thx…