Hi
The product that i work on has a minifilter driver and couple of plugin drivers. I have come across a case where the customer when trying to run few of its batch processes, the system becomes very sluggish over a period of time.
The forced crash dump was shared with Microsoft and they have said that it is because of a large number of zombie process amounting to hundreds and it is caused by one of our drivers. I tried to list the process with their respective thread by !process 0 2 (details removed for brevity)
…
PROCESS 856ba0a0 SessionId: 0 Cid: 070c Peb: 7ffdf000 ParentCid: 17e0
DirBase: 7de5d020 ObjectTable: 00000000 HandleCount: 0.
Image: ABC.Batch.1100.exe
No active threads
PROCESS 84f0ad90 SessionId: 0 Cid: 11d0 Peb: 7ffdf000 ParentCid: 11a4
DirBase: 7f5d3ee0 ObjectTable: 00000000 HandleCount: 0.
Image: ABC.Batch.1200.exe
No active threads
…
I cannot find the terminated thread of the ABC.Batch.1200.exe process. but Microsoft reports that they the terminated thread is
THREAD 856b6740 Cid 070c.0ed4 Teb: 00000000 Win32Thread: 00000000 TERMINATED
To ascertain the details of the thread i ran the following command
0: kd> !thread 856b6740
THREAD 856b6740 Cid 070c.0ed4 Teb: 00000000 Win32Thread: 00000000 TERMINATED
Not impersonating
DeviceMap 9afc5538
Owning Process 856ba0a0 Image: ABC.Batch.1200.exe
Attached Process N/A Image: N/A
Wait Start TickCount 11215 Ticks: 11496104 (2:01:53:46.625)
Context Switch Count 683
UserTime 00:00:00.343
KernelTime 00:00:00.218
Win32 Start Address 0x009750de
Stack Init 0 Current a21a4c08 Base a21a5000 Limit a21a2000 Call 0
Priority 8 BasePriority 6 PriorityDecrement 0 IoPriority 1 PagePriority 3
While trying to find more details
0: kd> !object 0x856b6740
Object: 856b6740 Type: (838aa040) Thread
ObjectHeader: 856b6728 (old version)
HandleCount: 1 PointerCount: 1
and with obtrace
kd> !obtrace 0x856b6740
Object: 856b6740
Image: cmd.exe
Sequence (+/-) Tag Stack
…
5facf +1 nt!ObReferenceObjectSafe+40 <----MS states this as cause of the leak
nt!PsLookupProcessThreadByCid+17
nt!PsOpenThread+17f
nt!NtOpenThread+2c
nt!KiSystemServicePostCall+0
nt!ZwOpenThread+11
nt!IopProcessWorkItem+23
nt!ExpWorkerThread+fd
nt!PspSystemThreadStartup+9d
…
No where i see the mention of our product’s driver.
How i can get the information of a list of terminated threads? Kindly apprise me if anyone has info regarding this.
Thanks
Debbrat
Processes awaiting final thread termination should show up in the global process list (“!process 0”).
The stack backtrace they are claiming is the issue would be for a worker routine. Thus, you should review your work routines: look for anywhere you call IoQueueWorkItem and see if you’re calling ZwOpenThread from any of the work routines used by those queued work items.
While your code isn’t on the stack, there are plausible reasons that this can happen - but IopProcessWorkItem doesn’t call ZwOpenThread (“uf nt!IopProcessWorkItem” and analyze the assembly code - there’s no calls to ZwOpenThread). Thus, the only way this sequence can occur is from a work routine.
Tony
OSR
Hi Tony,
Thanks for responding.
I tried to run the “!process 0”, it did not give me terminated thread info. Not a single terminated thread was displayed.
We are calling IoQueueWorkItem in a worker thread and finally the driver calls ZwOpenThread. But this is only happening with few batch process of the customer, this zombie process issue does not occur with other process.
I even tried to run “!stacks”, it displays the process name but with no stack. I reckon when the thread is terminated, its stack is destroyed, not sure about that.
Thanks
Debbrat
The terminated threads are no longer executing, so they wouldn’t have any
call stacks associated with them. Even if you could see a call stack it
wouldn’t matter much as the problem isn’t the fact that the threads have
terminated, it’s that someone is not releasing references to them. This is
typically what we mean by a thread of process becoming a “zombie”, all
execution is complete but the OS can’t delete the thread/process structures
because someone has a reference.
In the shown thread the handle count is elevated, so someone still has an
active handle to this thread. If your driver calls ZwOpenThread I would say
that’s a pretty big indication that you have a handle leak somewhere. Have
you checked ALL code paths to make sure that you eventually ZwClose every
handle returned by ZwOpenThread? Including error paths?
To answer your original question: there is no list of terminated threads.
When the execution of the thread is complete, the thread performs teardown
processing and removes itself from the parent process thread list. The
thread then transitions to the Terminated state and remains in memory until
the reference count drops to zero.
If you want to find them yourself you could always search non-paged pool for
the tag “Thre”, which is used for thread objects:
!poolfind “Thre”
You would then need to go through each result to determine if it is a zombie
or not.
-scott
OSR
@OSRDrivers
wrote in message news:xxxxx@ntfsd…
Hi Tony,
Thanks for responding.
I tried to run the “!process 0”, it did not give me terminated thread info.
Not a single terminated thread was displayed.
We are calling IoQueueWorkItem in a worker thread and finally the driver
calls ZwOpenThread. But this is only happening with few batch process of the
customer, this zombie process issue does not occur with other process.
I even tried to run “!stacks”, it displays the process name but with no
stack. I reckon when the thread is terminated, its stack is destroyed, not
sure about that.
Thanks
Debbrat
Hi Scott,
Thanks for your reply.
We do call ZwClose, but i will check again.
I will also check about the threads.
Thanks
Debbrat
It sounds like there is some unusual path out of the code that doesn’t clean up the handle, which is why you’re only seeing it in a specific usage scenario (your customer site). So my advice would be to code review the error paths…
Tony
OSR
Hi Tony and Scott,
Thanks for your inputs. I will check all the error paths pertaining to ZwOpenThread.
Debbrat