Help with analyzing this dump (Sluggish System)

Hi,

I am trying to analyze an issue with the driver for our PCI card. The hardware
is a loopback device and is a bus master DMA. An application reads and writes
data to measure its throughput.

On a Free Build of Windows XP SP3 it all works fine, but on a Checked Build of
same OS the system becomes very sluggish. It looks as if it has frozen but is
not, as mouse pointer does move after few seconds and the debug prints in our
driver (in ISR and DPC) are continuously being printed.

I set out to debug this with “!locks” but got some output which is confusing,
and hence need some help analyzing it.

Here I go.
First I check the locks :


0: kd> !locks
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held
locks…

Resource @ 0x8a480768 Exclusively owned
Contention Count = 4865661
NumberOfSharedWaiters = 1
NumberOfExclusiveWaiters = 13
Threads: 8a3e9450-01<*> 83cd0950-01
Threads Waiting On Exclusive Access:
8a3ecd58 83d327e8 83d32da8 8ac57020
83c9f2e8 83c44a18 8a446da8 83d45b30
8a319450 83c7cda8 83c9cda8 8a4dbda8
870e5020

KD: Scanning for held
locks…

Resource @ 0x83dd7798 Shared 1 owning threads
Threads: 8af95023-01<*> *** Actual Thread 8af95020
KD: Scanning for held locks.

Resource @ 0x83c27b30 Shared 1 owning threads
Threads: 8af95023-01<*> *** Actual Thread 8af95020
4723 total locks, 3 locks currently held

It tells that the thread 8a3e9450 has currently acquired the lock exclusively,
and one thread is waiting for shared access while 13 other threads are waiting
for an exclusive access.

The stack of the thread which has acquired it is :

0: kd> !thread 8a3e9450
THREAD 8a3e9450 Cid 0418.055c Teb: 7ffa5000 Win32Thread: e2849778 STANDBY
Not impersonating
DeviceMap e1009218
Owning Process 0 Image:
Attached Process 8a339a30 Image: svchost.exe
Wait Start TickCount 499712 Ticks: 10226 (0:00:02:39.781)
Context Switch Count 769 LargeStack
UserTime 00:00:00.000
KernelTime 00:00:00.000
Win32 Start Address 0x7730b153
Start Address 0x7c810669
Stack Init ba0c8000 Current ba0c7bd0 Base ba0c8000 Limit ba0c5000 Call 0
Priority 14 BasePriority 8 PriorityDecrement 4 DecrementCount 16
ChildEBP RetAddr Args to Child
ba0c7be8 80502d50 8a3e94c0 8a3e9450 804faf40 nt!KiSwapContext+0x2f (FPO: [Uses
EBP] [0,0,4])
ba0c7bf4 804faf40 00000000 8a480768 00000000 nt!KiSwapThread+0x8a (FPO: [0,0,0])
ba0c7c1c 80534f23 00000000 00000000 00000000 nt!KeWaitForSingleObject+0x1c2
(FPO: [5,5,4])
ba0c7c54 80535420 000025ff e2849778 00000000 nt!ExpWaitForResource+0x2f (FPO:
[0,5,0])
ba0c7c74 bf800b5e 8a480768 00000001 bf802ff0
nt!ExAcquireResourceExclusiveLite+0x8e (FPO: [2,3,0])
ba0c7c80 bf802ff0 00000000 e2849778 00000000 win32k!EnterCrit+0x14 (FPO:
[0,0,0])
ba0c7ca8 bf801bc8 000025ff 00000000 00000001 win32k!xxxSleepThread+0x19a (FPO:
[3,5,4])
ba0c7cec bf81a0a7 ba0c7d18 000025ff 00000000
win32k!xxxRealInternalGetMessage+0x418 (FPO: [6,9,4])
ba0c7d4c 8054089c 0178ff98 00000000 00000000 win32k!NtUserGetMessage+0x27 (FPO:
[Non-Fpo])
ba0c7d4c 7c90e514 0178ff98 00000000 00000000 nt!KiFastCallEntry+0xfc (FPO: [0,0]
TrapFrame @ ba0c7d64)
WARNING: Frame IP not in any known module. Following frames may be wrong.
0178ff4c 00000000 00000000 00000000 00000000 0x7c90e514
-------

My first doubt here is that if this thread has granted access for this resource
then why does it still need to wait with KeWaitForSingleObject?
If it is waiting for some other resource, how can I find out what is that
resource?

Also from the stack it looks like the first parameter for KeWaitForSingleObject
(which should be the object on which it waits) is NULL!

Here are the details of the first resource:
-----
0: kd> dt _ERESOURCE -b 8a480768
nt!_ERESOURCE
+0x000 SystemResourcesList : _LIST_ENTRY [0x8ac4b2e8 - 0xba65a11c]
+0x000 Flink : 0x8ac4b2e8
+0x004 Blink : 0xba65a11c
+0x008 OwnerTable : 0x8a36f470
+0x00c ActiveCount : 1
+0x00e Flag : 0x80
+0x010 SharedWaiters : 0x8a3d11f8
+0x014 ExclusiveWaiters : 0x8ab9e738
+0x018 OwnerThreads :
[00] _OWNER_ENTRY
+0x000 OwnerThread : 0x8a3e9450
+0x004 OwnerCount : 1
+0x004 TableSize : 1
[01]
+0x000 OwnerThread : 0x83cd0950
+0x004 OwnerCount : 1
+0x004 TableSize : 1
+0x028 ContentionCount : 0x4a3e7d
+0x02c NumberOfSharedWaiters : 1
+0x02e NumberOfExclusiveWaiters : 0xd
+0x030 Address : (null)
+0x030 CreatorBackTraceIndex : 0
+0x034 SpinLock : 0
-----

Any help is greatly appreciated.
Thanks,

Suresh

When the debug prints are transmitted, the kernel is pretty much frozen. This is why the system is sluggish.

Thanks Alex, but the debug prints in our driver were added AFTER we noticed this sluggish system issue. In other words the issue is there also without those our debug prints.
As such we can see that without adding those debug prints if we put breakpoints in our code then those breakpoints get hit. This also rules out the deadlock.

The debug version of the OS is not a good platform for performance
analysis. I’m having trouble with the concept that there is a problem
when “on a Free Build of Windows XP SP3 it all works fine, but on a
Checked Build of same OS the system becomes very sluggish”. Are you
really going to waste resources on resolving this? Why?

Mark Roddy

On Thu, Jul 14, 2011 at 8:58 AM, wrote:
> Hi,
>
> I am trying to analyze an issue with the driver for our PCI card. The hardware
> is a loopback device and is a bus master DMA. An application reads and writes
> data to measure its throughput.
>
> On a Free Build of Windows XP SP3 it all works fine, but on a Checked Build of
> same OS the system becomes very sluggish. It looks as if it has frozen but is
> not, as mouse pointer does move after few seconds and the debug prints in our
> driver (in ISR and DPC) are continuously being printed.
>
> I set out to debug this with “!locks” but got some output which is confusing,
> and hence need some help analyzing it.
>
> Here I go.
> First I check the locks :
>
> -------
> 0: kd> !locks
> DUMP OF ALL RESOURCE OBJECTS
> KD: Scanning for held
> locks…
>
> Resource @ 0x8a480768 ? ?Exclusively owned
> ? ?Contention Count = 4865661
> ? ?NumberOfSharedWaiters = 1
> ? ?NumberOfExclusiveWaiters = 13
> ? ? Threads: 8a3e9450-01<> 83cd0950-01
> ? ? Threads Waiting On Exclusive Access:
> ? ? ? ? ? ? ?8a3ecd58 ? ? ? 83d327e8 ? ? ? 83d32da8 ? ? ? 8ac57020
> ? ? ? ? ? ? ?83c9f2e8 ? ? ? 83c44a18 ? ? ? 8a446da8 ? ? ? 83d45b30
> ? ? ? ? ? ? ?8a319450 ? ? ? 83c7cda8 ? ? ? 83c9cda8 ? ? ? 8a4dbda8
> ? ? ? ? ? ? ?870e5020
>
> KD: Scanning for held
> locks…
> …
>
> Resource @ 0x83dd7798 ? ?Shared 1 owning threads
> ? ? Threads: 8af95023-01<
> Actual Thread 8af95020
> KD: Scanning for held locks.
>
> Resource @ 0x83c27b30 ? ?Shared 1 owning threads
> ? ? Threads: 8af95023-01<
>
* Actual Thread 8af95020
> 4723 total locks, 3 locks currently held
> -------
>
> It tells that the thread 8a3e9450 has currently acquired the lock exclusively,
> and one thread is waiting for shared access while 13 other threads are waiting
> for an exclusive access.
>
> The stack of the thread which has acquired it is :
> -------
> 0: kd> !thread 8a3e9450
> THREAD 8a3e9450 ?Cid 0418.055c ?Teb: 7ffa5000 Win32Thread: e2849778 STANDBY
> Not impersonating
> DeviceMap ? ? ? ? ? ? ? ? e1009218
> Owning Process ? ? ? ? ? ?0 ? ? ? Image: ? ? ? ?
> Attached Process ? ? ? ? ?8a339a30 ? ? ? Image: ? ? ? ? svchost.exe
> Wait Start TickCount ? ? ?499712 ? ? ? ? Ticks: 10226 (0:00:02:39.781)
> Context Switch Count ? ? ?769 ? ? ? ? ? ? ? ? LargeStack
> UserTime ? ? ? ? ? ? ? ? ?00:00:00.000
> KernelTime ? ? ? ? ? ? ? ?00:00:00.000
> Win32 Start Address 0x7730b153
> Start Address 0x7c810669
> Stack Init ba0c8000 Current ba0c7bd0 Base ba0c8000 Limit ba0c5000 Call 0
> Priority 14 BasePriority 8 PriorityDecrement 4 DecrementCount 16
> ChildEBP RetAddr ?Args to Child
> ba0c7be8 80502d50 8a3e94c0 8a3e9450 804faf40 nt!KiSwapContext+0x2f (FPO: [Uses
> EBP] [0,0,4])
> ba0c7bf4 804faf40 00000000 8a480768 00000000 nt!KiSwapThread+0x8a (FPO: [0,0,0])
> ba0c7c1c 80534f23 00000000 00000000 00000000 nt!KeWaitForSingleObject+0x1c2
> (FPO: [5,5,4])
> ba0c7c54 80535420 000025ff e2849778 00000000 nt!ExpWaitForResource+0x2f (FPO:
> [0,5,0])
> ba0c7c74 bf800b5e 8a480768 00000001 bf802ff0
> nt!ExAcquireResourceExclusiveLite+0x8e (FPO: [2,3,0])
> ba0c7c80 bf802ff0 00000000 e2849778 00000000 win32k!EnterCrit+0x14 (FPO:
> [0,0,0])
> ba0c7ca8 bf801bc8 000025ff 00000000 00000001 win32k!xxxSleepThread+0x19a (FPO:
> [3,5,4])
> ba0c7cec bf81a0a7 ba0c7d18 000025ff 00000000
> win32k!xxxRealInternalGetMessage+0x418 (FPO: [6,9,4])
> ba0c7d4c 8054089c 0178ff98 00000000 00000000 win32k!NtUserGetMessage+0x27 (FPO:
> [Non-Fpo])
> ba0c7d4c 7c90e514 0178ff98 00000000 00000000 nt!KiFastCallEntry+0xfc (FPO: [0,0]
> TrapFrame @ ba0c7d64)
> WARNING: Frame IP not in any known module. Following frames may be wrong.
> 0178ff4c 00000000 00000000 00000000 00000000 0x7c90e514
> -------
>
> My first doubt here is that if this thread has granted access for this resource
> then why does it still need to wait with KeWaitForSingleObject?
> If it is waiting for some other resource, how can I find out what is that
> resource?
>
> Also from the stack it looks like the first parameter for KeWaitForSingleObject
> (which should be the object on which it waits) is NULL!
>
> Here are the details of the first resource:
> -----
> 0: kd> dt _ERESOURCE -b 8a480768
> nt!_ERESOURCE
> ? +0x000 SystemResourcesList : _LIST_ENTRY [0x8ac4b2e8 - 0xba65a11c]
> ? ? ?+0x000 Flink ? ? ? ? ? ?: 0x8ac4b2e8
> ? ? ?+0x004 Blink ? ? ? ? ? ?: 0xba65a11c
> ? +0x008 OwnerTable ? ? ? : 0x8a36f470
> ? +0x00c ActiveCount ? ? ?: 1
> ? +0x00e Flag ? ? ? ? ? ? : 0x80
> ? +0x010 SharedWaiters ? ?: 0x8a3d11f8
> ? +0x014 ExclusiveWaiters : 0x8ab9e738
> ? +0x018 OwnerThreads ? ? :
> ? ?[00] _OWNER_ENTRY
> ? ? ?+0x000 OwnerThread ? ? ?: 0x8a3e9450
> ? ? ?+0x004 OwnerCount ? ? ? : 1
> ? ? ?+0x004 TableSize ? ? ? ?: 1
> ? ?[01]
> ? ? ?+0x000 OwnerThread ? ? ?: 0x83cd0950
> ? ? ?+0x004 OwnerCount ? ? ? : 1
> ? ? ?+0x004 TableSize ? ? ? ?: 1
> ? +0x028 ContentionCount ?: 0x4a3e7d
> ? +0x02c NumberOfSharedWaiters : 1
> ? +0x02e NumberOfExclusiveWaiters : 0xd
> ? +0x030 Address ? ? ? ? ?: (null)
> ? +0x030 CreatorBackTraceIndex : 0
> ? +0x034 SpinLock ? ? ? ? : 0
> -----
>
> Any help is greatly appreciated.
> Thanks,
>
> Suresh
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>

Well, this is a third-party project. So we are going to hand over the complete solution (hardware and driver with source code) to our client.
The client will definately check it out on a checked build at some point of time and will run into the issue.

Apart from this, it is quite an interesting issue. The interrupt rate and driver code is exactly the same as in free build. Also there should be a proper way to debug such scenario, enabling us to pinpoint whatever be the reason.

Btw, I forgot to add that the issue is somehow related to interrupts. When we switch from interrupt mode to polling mode the issue goes away.