Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Before Posting...
Please check out the Community Guidelines in the Announcements and Administration Category.

Deadlock in worker threads

Alex_GrigAlex_Grig Member Posts: 3,238
I see a deadlock on a 24 thread (two six-core) system. All "delayed" worker threads are stuck in nt!KeSetSystemGroupAffinityThread. Examples:

0: kd> !thread fffffa801eeed8a0
THREAD fffffa801eeed8a0 Cid 0004.0fb4 Teb: 0000000000000000 Win32Thread: 0000000000000000 ????
Not impersonating
DeviceMap fffff8a000008500
Owning Process fffffa80136f2040 Image: System
Attached Process N/A Image: N/A
Wait Start TickCount 3518454 Ticks: 45279 (0:00:11:46.356)
Context Switch Count 691
UserTime 00:00:00.000
KernelTime 00:00:00.046
Win32 Start Address nt!ExpWorkerThread (0xfffff800054f2910)
Stack Init fffff88008d9cdb0 Current fffff88008d9bf90
Base fffff88008d9d000 Limit fffff88008d97000 Call 0
Priority 15 BasePriority 12 UnusualBoost 3 ForegroundBoost 0 IoPriority 2 PagePriority 5
Child-SP RetAddr : Args to Child : Call Site
fffff880`08d9bfd0 fffff800`054bd372 : fffff880`026a4180 fffffa80`1eeed8a0 fffffa80`1eeedaa0 00000000`00000000 : nt!KiSwapContext+0x7a
fffff880`08d9c110 fffff800`054bd9ac : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KeSetSystemGroupAffinityThread+0x18a
fffff880`08d9c180 fffff800`057a6bbc : fffffa80`3a5f7460 fffff880`08d9c240 fffff880`096fffd0 fffff800`0566cbc0 : nt!PopExecuteOnTargetProcessors+0xdc
fffff880`08d9c220 fffff800`057f2419 : 00000000`00000000 fffff880`08d9c700 00000000`00000018 fffff8a0`09dcc1c0 : nt!PpmCapturePerformanceDistribution+0x9c
fffff880`08d9c290 fffff800`057f2949 : fffff8a0`09dcc1c0 00000000`00000000 00000000`00000006 00000000`00000000 : nt!ExpQuerySystemInformation+0x14d9
fffff880`08d9c640 fffff800`054e78d3 : fffff8a0`09de0000 fffff800`054e687d 00000000`00000041 fffff8a0`09dcc000 : nt!NtQuerySystemInformation+0x4d
fffff880`08d9c680 fffff800`054e3e70 : fffff880`0198def4 00000000`00020000 fffff880`08d9c844 fffff8a0`09dcc198 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`08d9c680)
fffff880`08d9c818 fffff880`0198def4 : 00000000`00020000 fffff880`08d9c844 fffff8a0`09dcc198 fffffa80`00000000 : nt!KiServiceLinkage
fffff880`08d9c820 fffff880`0198e3ed : fffffa80`18b5ce70 fffff880`01c583b7 00000000`20206f49 fffffa80`1eeed8a0 : cng!GatherRandomKey+0x294
fffff880`08d9cbe0 fffff800`057def4d : 00000000`00000001 00000000`00000001 fffffa80`1ef58900 fffffa80`1eeed8a0 : cng!scavengingWorkItemRoutine+0x3d
fffff880`08d9cc80 fffff800`054f2a21 : fffff800`05685658 fffff800`057def01 fffffa80`1eeed800 00370036`00310020 : nt!IopProcessWorkItem+0x3d
fffff880`08d9ccb0 fffff800`05785cce : 0075006c`005c0033 fffffa80`1eeed8a0 00000000`00000080 fffffa80`136f2040 : nt!ExpWorkerThread+0x111
fffff880`08d9cd40 fffff800`054d9fe6 : fffff880`02715180 fffffa80`1eeed8a0 fffff880`027204c0 00330020`00300020 : nt!PspSystemThreadStartup+0x5a
fffff880`08d9cd80 00000000`00000000 : fffff880`08d9d000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxStartSystemThread+0x16

0: kd> !thread fffffa801efe5a10
THREAD fffffa801efe5a10 Cid 0004.08ec Teb: 0000000000000000 Win32Thread: 0000000000000000 ????
Not impersonating
DeviceMap fffff8a000008500
Owning Process fffffa80136f2040 Image: System
Attached Process N/A Image: N/A
Wait Start TickCount 3518454 Ticks: 45279 (0:00:11:46.356)
Context Switch Count 687
UserTime 00:00:00.000
KernelTime 00:00:00.046
Win32 Start Address nt!ExpWorkerThread (0xfffff800054f2910)
Stack Init fffff88009212db0 Current fffff88009211f90
Base fffff88009213000 Limit fffff8800920d000 Call 0
Priority 15 BasePriority 12 UnusualBoost 3 ForegroundBoost 0 IoPriority 2 PagePriority 5
Child-SP RetAddr : Args to Child : Call Site
fffff880`09211fd0 fffff800`054bd372 : fffff880`026a4180 fffffa80`1efe5a10 fffffa80`1efe5c10 00000000`00000000 : nt!KiSwapContext+0x7a
fffff880`09212110 fffff800`054bd9ac : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KeSetSystemGroupAffinityThread+0x18a
fffff880`09212180 fffff800`057a6bbc : fffffa80`3a5f7460 fffff880`09212240 fffff880`0971ffd0 fffff800`0566cbc0 : nt!PopExecuteOnTargetProcessors+0xdc
fffff880`09212220 fffff800`057f2419 : 00000000`00000000 fffff880`09212700 00000000`00000018 fffff8a0`0b2fc1c0 : nt!PpmCapturePerformanceDistribution+0x9c
fffff880`09212290 fffff800`057f2949 : fffff8a0`0b2fc1c0 00000000`00000000 00000000`00000006 00000000`00000000 : nt!ExpQuerySystemInformation+0x14d9
fffff880`09212640 fffff800`054e78d3 : fffff8a0`0b310000 fffff800`054e687d 00000000`00000041 fffff8a0`0b2fc000 : nt!NtQuerySystemInformation+0x4d
fffff880`09212680 fffff800`054e3e70 : fffff880`0198def4 00000000`00020000 fffff880`09212844 fffff8a0`0b2fc198 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`09212680)
fffff880`09212818 fffff880`0198def4 : 00000000`00020000 fffff880`09212844 fffff8a0`0b2fc198 fffffa80`00000000 : nt!KiServiceLinkage
fffff880`09212820 fffff880`0198e3ed : fffffa80`18b5ce70 fffff880`01c583b7 00000000`20206f49 fffffa80`1efe5a10 : cng!GatherRandomKey+0x294
fffff880`09212be0 fffff800`057def4d : 00000000`00000001 00000000`00000001 fffffa80`3a048290 fffffa80`1efe5a10 : cng!scavengingWorkItemRoutine+0x3d
fffff880`09212c80 fffff800`054f2a21 : fffff800`05685658 fffff800`057def01 fffffa80`1efe5a00 00000000`00000000 : nt!IopProcessWorkItem+0x3d
fffff880`09212cb0 fffff800`05785cce : 00000000`00000000 fffffa80`1efe5a10 00000000`00000080 fffffa80`136f2040 : nt!ExpWorkerThread+0x111
fffff880`09212d40 fffff800`054d9fe6 : fffff880`028e2180 fffffa80`1efe5a10 fffff880`028ed4c0 00000000`00000246 : nt!PspSystemThreadStartup+0x5a
fffff880`09212d80 00000000`00000000 : fffff880`09213000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxStartSystemThread+0x16

0: kd> !thread fffffa80136d5b60
THREAD fffffa80136d5b60 Cid 0004.0d34 Teb: 0000000000000000 Win32Thread: 0000000000000000 ????
Not impersonating
DeviceMap fffff8a000008500
Owning Process fffffa80136f2040 Image: System
Attached Process N/A Image: N/A
Wait Start TickCount 3518454 Ticks: 45279 (0:00:11:46.356)
Context Switch Count 701
UserTime 00:00:00.000
KernelTime 00:00:00.062
Win32 Start Address nt!ExpWorkerThread (0xfffff800054f2910)
Stack Init fffff880093bbdb0 Current fffff880093baf90
Base fffff880093bc000 Limit fffff880093b6000 Call 0
Priority 15 BasePriority 12 UnusualBoost 3 ForegroundBoost 0 IoPriority 2 PagePriority 5
Child-SP RetAddr : Args to Child : Call Site
fffff880`093bafd0 fffff800`054bd372 : fffff880`026a4180 fffffa80`136d5b60 fffffa80`136d5d60 00000000`00000000 : nt!KiSwapContext+0x7a
fffff880`093bb110 fffff800`054bd9ac : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KeSetSystemGroupAffinityThread+0x18a
fffff880`093bb180 fffff800`057a6bbc : fffffa80`3a5f7460 fffff880`093bb240 fffff880`096bffd0 fffff800`0566cbc0 : nt!PopExecuteOnTargetProcessors+0xdc
fffff880`093bb220 fffff800`057f2419 : 00000000`00000000 fffff880`093bb700 00000000`00000018 fffff8a0`09d8c1c0 : nt!PpmCapturePerformanceDistribution+0x9c
fffff880`093bb290 fffff800`057f2949 : fffff8a0`09d8c1c0 00000000`00000000 00000000`00000006 00000000`00000000 : nt!ExpQuerySystemInformation+0x14d9
fffff880`093bb640 fffff800`054e78d3 : fffff8a0`09da0000 fffff800`054e687d 00000000`00000041 fffff8a0`09d8c000 : nt!NtQuerySystemInformation+0x4d
fffff880`093bb680 fffff800`054e3e70 : fffff880`0198def4 00000000`00020000 fffff880`093bb844 fffff8a0`09d8c198 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`093bb680)
fffff880`093bb818 fffff880`0198def4 : 00000000`00020000 fffff880`093bb844 fffff8a0`09d8c198 fffffa80`00000000 : nt!KiServiceLinkage
fffff880`093bb820 fffff880`0198e3ed : fffffa80`18b5ce70 00000000`00000000 00000000`20206f49 fffffa80`136d5b60 : cng!GatherRandomKey+0x294
fffff880`093bbbe0 fffff800`057def4d : 00000000`00000001 00000000`00000001 fffffa80`2aacb2d0 fffffa80`136d5b60 : cng!scavengingWorkItemRoutine+0x3d
fffff880`093bbc80 fffff800`054f2a21 : fffff800`05685658 fffff800`057def01 fffffa80`136d5b00 00000000`00000000 : nt!IopProcessWorkItem+0x3d
fffff880`093bbcb0 fffff800`05785cce : 00000000`00000000 fffffa80`136d5b60 00000000`00000080 fffffa80`136f2040 : nt!ExpWorkerThread+0x111
fffff880`093bbd40 fffff800`054d9fe6 : fffff880`02715180 fffffa80`136d5b60 fffff880`027204c0 00000000`00000246 : nt!PspSystemThreadStartup+0x5a
fffff880`093bbd80 00000000`00000000 : fffff880`093bc000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxStartSystemThread+0x16

0: kd> !thread fffffa801370b680
THREAD fffffa801370b680 Cid 0004.0044 Teb: 0000000000000000 Win32Thread: 0000000000000000 ????
Not impersonating
DeviceMap fffff8a000008500
Owning Process fffffa80136f2040 Image: System
Attached Process N/A Image: N/A
Wait Start TickCount 3518454 Ticks: 45279 (0:00:11:46.356)
Context Switch Count 25026
UserTime 00:00:00.000
KernelTime 00:00:04.399
Win32 Start Address nt!ExpWorkerThread (0xfffff800054f2910)
Stack Init fffff88002fd4db0 Current fffff88002fd3f90
Base fffff88002fd5000 Limit fffff88002fcf000 Call 0
Priority 15 BasePriority 12 UnusualBoost 3 ForegroundBoost 0 IoPriority 2 PagePriority 5
Child-SP RetAddr : Args to Child : Call Site
fffff880`02fd3fd0 fffff800`054bd372 : fffff880`026a4180 fffffa80`1370b680 fffffa80`1370b880 00000000`00000000 : nt!KiSwapContext+0x7a
fffff880`02fd4110 fffff800`054bd9ac : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KeSetSystemGroupAffinityThread+0x18a
fffff880`02fd4180 fffff800`057a6bbc : fffffa80`3a5f7460 fffff880`02fd4240 fffff880`00b44fd0 fffff800`0566cbc0 : nt!PopExecuteOnTargetProcessors+0xdc
fffff880`02fd4220 fffff800`057f2419 : 00000000`00000000 fffff880`02fd4700 00000000`00000018 fffff8a0`0258f1c0 : nt!PpmCapturePerformanceDistribution+0x9c
fffff880`02fd4290 fffff800`057f2949 : fffff8a0`0258f1c0 00000000`00000000 00000000`00000006 00000000`00000000 : nt!ExpQuerySystemInformation+0x14d9
fffff880`02fd4640 fffff800`054e78d3 : fffff8a0`025a0000 fffff800`054e687d 00000000`00000041 fffff8a0`0258f000 : nt!NtQuerySystemInformation+0x4d
fffff880`02fd4680 fffff800`054e3e70 : fffff880`0198def4 00000000`00020000 fffff880`02fd4844 fffff8a0`0258f198 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`02fd4680)
fffff880`02fd4818 fffff880`0198def4 : 00000000`00020000 fffff880`02fd4844 fffff8a0`0258f198 fffffa80`00000000 : nt!KiServiceLinkage
fffff880`02fd4820 fffff880`0198e3ed : fffffa80`18b5ce70 00000000`000007ff 00000000`20206f49 fffffa80`1370b680 : cng!GatherRandomKey+0x294
fffff880`02fd4be0 fffff800`057def4d : 00000000`00000001 00000000`00000001 fffffa80`1fa00160 fffffa80`1370b680 : cng!scavengingWorkItemRoutine+0x3d
fffff880`02fd4c80 fffff800`054f2a21 : fffff800`05685658 fffff800`057def01 fffffa80`1370b600 fffff800`05685658 : nt!IopProcessWorkItem+0x3d
fffff880`02fd4cb0 fffff800`05785cce : 00000000`00000000 fffffa80`1370b680 00000000`00000080 fffffa80`136f2040 : nt!ExpWorkerThread+0x111
fffff880`02fd4d40 fffff800`054d9fe6 : fffff880`02871180 fffffa80`1370b680 fffff880`0287c4c0 00000000`00000000 : nt!PspSystemThreadStartup+0x5a
fffff880`02fd4d80 00000000`00000000 : fffff880`02fd5000 fffff880`02fcf000 fffff880`02fd49e0 00000000`00000000 : nt!KxStartSystemThread+0x16


Anybody knows what's going on?

Also, there is no hard deadlock on any processor. There is one runaway thread (not holding any lock), but why would it block all other threads on a 24-thread box?

Comments

  • Mark_RoddyMark_Roddy Member - All Emails Posts: 4,307
    On Tue, Apr 12, 2011 at 2:22 PM, wrote:

    > PopExecuteOnTargetProcessors


    So you have one runaway thread and a routine that is attempting to execute
    "on all (target) processors" - perhaps this is the cause of the problem?
  • Alex_GrigAlex_Grig Member Posts: 3,238
    A runaway thread is a private one, not one of Worker threads. Also, it runs on PASSIVE_LEVEL and default priority:

    Priority 16 BasePriority 8 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5

    , so it can be preempted.

    BUT, worker threads seem to be on:

    Priority 15 BasePriority 12 UnusualBoost 3 ForegroundBoost 0 IoPriority 2 PagePriority 5

    Could that be a problem?

    Also, it doesn't seem a "routine, that wants to execute an all processors. If it were so, that would make more sense to use DPC-type facilities.

    As present, such behavior is a clear design flaw. A thread could be simply a default priority thread (or several such threads) running a multi-hours calculation in user mode, and that could cause all system worker threads to get stuck? And the system get unresponsive then? Because the worker threads are also used by HID drivers. WTF?

    A big Windows deficiency is lack of runaway (or CPU-intensive) thread detection and their priority demotion. It's been very big bitch of mine in single-CPU era, and I thought in this age it would not be so. I was mistaken.
  • Scott_Noone_(OSR)Scott_Noone_(OSR) Administrator Posts: 3,142
    >Also, it runs on PASSIVE_LEVEL and default priority:

    Priority 16 BasePriority 8 UnusualBoost 0 ForegroundBoost 0 IoPriority 2
    PagePriority 5

    16 isn't the default priority of threads in the O/S. Note there that the
    BasePriority is 8, which is much more likely to be the default priority of
    the thread (usually 7-9, though it's O/S specific).

    >Could that be a problem?

    Sure. Priority 16 is higher than priority 15.

    >Also, it doesn't seem a "routine, that wants to execute an all processors.
    >If it were so, that would make more sense to use DPC-type >facilities.

    Why? The code here is trying to capture some per-processor info and the way
    it does that is to change the affinity of the thread in a loop capturing
    along the way. Seems reasonable to me.

    >thread could be simply a default priority thread (or several such threads)
    >running a multi-hours calculation in user mode, and that could >cause all
    >system worker threads to get stuck? And the system get unresponsive then?
    >Because the worker threads are also used by >HID drivers. WTF?

    16 is a real time thread priority and, as such, it can be (ab)used to bring
    the entire system to its knees. This is why it requires that your token have
    the SeIncreaseBasePriorityPrivilege enabled, which by default is only
    available to administrators.

    -scott

    --
    Scott Noone
    Consulting Associate and Chief System Problem Analyst
    OSR Open Systems Resources, Inc.
    http://www.osronline.com

    -scott
    OSR

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Upcoming OSR Seminars
Developing Minifilters 29 July 2019 OSR Seminar Space
Writing WDF Drivers 23 Sept 2019 OSR Seminar Space
Kernel Debugging 21 Oct 2019 OSR Seminar Space
Internals & Software Drivers 18 Nov 2019 Dulles, VA