Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Home NTDEV

More Info on Driver Writing and Debugging


The free OSR Learning Library has more than 50 articles on a wide variety of topics about writing and debugging device drivers and Minifilters. From introductory level to advanced. All the articles have been recently reviewed and updated, and are written using the clear and definitive style you've come to expect from OSR over the years.


Check out The OSR Learning Library at: https://www.osr.com/osr-learning-library/


Before Posting...

Please check out the Community Guidelines in the Announcements and Administration Category.

Measuring the DPC and SINGLE_DPC_TIMEOUT_EXCEEDED

Neo-2Neo-2 Member Posts: 32

We have encountered an ASSERTION,
.
Assertion: *** DPC execution time exceeds system limit
This is NOT a break in update time
This is a BUG in a DPC routine
Perform a stack trace to find the culprit
The time out limit will be doubled on continuation
Use gh to continue!!
.
DPC_TIMEOUT_TYPE: SINGLE_DPC_TIMEOUT_EXCEEDED
DPC_RUNTIME: 501
DPC_TIME_LIMIT: 500
FAULTING_IP:
nt!KeAccumulateTicks+18756a
fffff805`7a8b862a cd2c int 2Ch
ERROR_CODE: (NTSTATUS) 0xc0000420 - An assertion failure has occurred.
EXCEPTION_CODE: (NTSTATUS) 0xc0000420 - An assertion failure has occurred.
EXCEPTION_CODE_STR: c0000420

ffffc601e2aa1bb0 fffff80112715a4f : 0001311fcccea0dd ffffc601e2a85180 0000000000000286 000000000055d1bd : nt!KeAccumulateTicks+0x18756a
ffffc601e2aa1c10 fffff8011260847c : 0000000000000000 ffff9a0313ab8b00 ffff838137e5a680 ffff9a0313ab8bb0 : nt!KeClockInterruptNotify+0xcf
ffffc601e2aa1f30 fffff80112780a15 : ffff9a0313ab8b00 0000000000000000 0000000000000001 ffffc601e2a85180 : hal!HalpTimerClockIpiRoutine+0x1c
ffffc601e2aa1f60 fffff8011284f92a : ffff838137e5a680 ffff9a0313ab8b00 ffff9a070ba97000 ffff9a0313ab8b00 : nt!KiCallInterruptServiceRoutine+0xa5
ffffc601e2aa1fb0 fffff8011284fe77 : 0000000004d11000 0000000000000000 0000000000000002 0000000000000000 : nt!KiInterruptSubDispatchNoLockNoEtw+0xfa
ffff838137e5a600 fffff8011271edd1 : 0000000000000000 0000000000010008 0000000000000460 ffff9a070737a000 : nt!KiInterruptDispatchNoLockNoEtw+0x37
ffff838137e5a790 fffff801129b942d : ffff9a0313aa0500 fffff801129b7eef 0000000000000460 ffff9a070ba96ba0 : nt!KiAcquireKobjectLockSafe+0x31
ffff838137e5a7c0 fffff801128a9d57 : 0000000000000460 0000000000000000 0000000000000460 0000000000000000 : nt!ExpCheckForWorker+0xf9
ffff838137e5a820 fffff801129baf5c : ffff9a031340b000 0000000000000000 000000005843424e fffff8011272b023 : nt!ExpFreePoolChecks+0x170c1b
ffff838137e5a850 fffff801129eb9c2 : ffff9a031340b000 ffff9a03c76f0b70 0000000000000000 ffff800000000000 : nt!ExpFreeHeapSpecialPool+0x54
ffff838137e5a8a0 fffff80f123a8bf1 : ffff9a070ba96ba0 ffff9a070ba96ba0 000044da05c7ef00 ffff9a03192f3dc8 : nt!ExFreePoolWithTag+0x3992
ffff838137e5aa50 fffff80112f447d7 : ffff9a070ba96ba0 ffff9a03c76f0b70 ffff9a070ba96ba0 fffff80f127b7665 : VerifierExt!ExFreePoolWithTag_wrapper+0x11
ffff838137e5aa80 fffff80112f4474c : ffff9a031fdc8580 0000000000000000 0000000000000000 ffff9a03732e3930 : nt!VerifierExFreePoolWithTag+0x57
ffff838137e5aab0 fffff8011042bbfa : 0000000000000000 ffff9a070ba96ba0 ffff9a031fdc8580 ffff9a031bba3000 : nt!VerifierExFreePool+0x1c
.
Q1. It is not a Bug Check 0x133 DPC_WATCHDOG_VIOLATION but an ASSERTION. Not sure why bug check did not happen ?
.
Started debugging and want to measure the time taken in each DPC (Average, Min and the Max values)
So, added a KeQueryPerformanceCounter() at the beginning and at the end of the DPCHandler took the difference and divided by PerformanceFrequency to obtain the time taken.
The min value was around 20us but the max value obtained was 2Secs, and there are few completions finishing in millisecs.
Later, realized that this may not be the correct way to measure DPC and it might include the time the DPC gets interrupted by the ISR.
.
Found the following ways to measure the DPC,
a) https://docs.microsoft.com/en-us/windows-hardware/drivers/devtest/example-15--measuring-dpc-isr-time
b) xperf from Windows SDK(Windows Performance Toolkit)
.
From the ETL log the following is observed, 649.477935200(Fragment Enter), 650.533282500(Fragment Exit), 1,057.932500(Exclusive Duration)
It looks like the xperf/tracelog both using a similar logic of have a KeQueryPerformanceCounter() at the entry and exit and measuring the difference.
Also, to note that in this run there is no ASSERTION or Bug Check, which means the DPC/ISR did not actually took 1Sec or exceed the limit.
It appears to be that the DPC got interrupted and got the control back after 1 sec.
In other runs, the max time went up to 2 and 3secs.
.
Q2. Is the understanding of taking more time is correct i.e., is it becausAe the DPC is interrupted and then resumed back after the elapsed time and hence showing more time in the tools ?
.
The original issue is longer time taken by the DPC, which is under investigation and in process to understand where the driver is spending time.
Q3. If the DPC is waiting for the spinlock (KeAcquireInStackQueuedSpinLockAtDpcLevel), Will this waiting time considered for the Watchdog ticks ?
.

Comments

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 8,485

    Q1. It is not a Bug Check 0x133 DPC_WATCHDOG_VIOLATION but an ASSERTION. Not sure why bug check did not happen ?

    Because the debugger is attached, would be my guess.

    Q2. Is the understanding of taking more time is correct i.e., is it becausAe the DPC is interrupted and then resumed back after the elapsed time and hence showing more time in the tools ?

    I'm not sure I fully understand your scenario. But, yes... if your DPC is interrupted then (as far as I understand it) the time taken in that ISR will be counted against your DPC for the purposes of the Watchdog timer.

    The min value was around 20us but the max value obtained was 2Secs,

    Crazy, huh? Yeah... I bet if you computed it, you'd find the AVERAGE latency to be pretty short as well. But 2 Secs is unreasonably long... something bad is definitely happening.

    What kind of device is this, by the way? A GPIO device?? Anything unusual about your device, your ISR, or your DPC that you can share???

    Q3. If the DPC is waiting for the spinlock (KeAcquireInStackQueuedSpinLockAtDpcLevel), Will this waiting time considered for the Watchdog ticks ?

    Yes. Definitely.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • Neo-2Neo-2 Member Posts: 32

    @Peter_Viscarola_(OSR) said:

    Q1. It is not a Bug Check 0x133 DPC_WATCHDOG_VIOLATION but an ASSERTION. Not sure why bug check did not happen ?

    Because the debugger is attached, would be my guess.

    Q2. Is the understanding of taking more time is correct i.e., is it becausAe the DPC is interrupted and then resumed back after the elapsed time and hence showing more time in the tools ?

    I'm not sure I fully understand your scenario. But, yes... if your DPC is interrupted then (as far as I understand it) the time taken in that ISR will be counted against your DPC for the purposes of the Watchdog timer.

    This is pretty old regular NIC driver and HW. The HW interrupts and in the ISR the corresponding DPC is queued and will handle accordingly. The DPC runs a loop and consume messages. The max messages the DPC can process is set to 256 before yielding and from the debug logs it is observed that for most of the time when the 256 message limit is hit the time taken is around 2msecs(max) and for this unusual long time in DPC the number of messages processed ranges from 4-24.
    I have just started using xperf and with the limited knowledge we could see at the time of issue, only the DPC is running and there is no call to the ISR. Also, checked in the CPU timeline there is no other driver being executed which appears to be like all the 2secs is being used by the DPC itself, strange no ASSERT/Bug Check. But, the observations might we very well wrong(will need to learn how to study xperf etl in detail and efficiently)

    The min value was around 20us but the max value obtained was 2Secs,

    Crazy, huh? Yeah... I bet if you computed it, you'd find the AVERAGE latency to be pretty short as well. But 2 Secs is unreasonably long... something bad is definitely happening.

    Very true from the xperf etl log, the driver's average is 245us and the rogue DPC handler is 86us and among other DPC handlers the max average is 1msec.

    What kind of device is this, by the way? A GPIO device?? Anything unusual about your device, your ISR, or your DPC that you can share???

    This is a regular PCI-NIC device nothing unusual. Only the test case is newer i.e., the customer is using his application with Bi-Directional traffic. The assert if hits will be after 8-10hrs of running the traffic and it is very inconsistent. The test ran multiple 48+hrs without any ASSERT.

    Q3. If the DPC is waiting for the spinlock (KeAcquireInStackQueuedSpinLockAtDpcLevel), Will this waiting time considered for the Watchdog ticks ?

    Yes. Definitely.

    Hmm, then probably the DPC is probably waiting long for the spinlock to be acquired.

    Peter

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 8,485

    The assert if hits will be after 8-10hrs of running the traffic and it is very inconsistent. The test ran multiple 48+hrs without any ASSERT.

    What you're seeing, in my experience, is CLASSIC Windows behavior. Average ISR to DPC latency, and average DPC runtime, can be low... but every once in a while... things can get out of hand. We've had clients coming to us complaining of strange problems like this that happen "once a day, overnight, sometimes" -- It's one of the least attractive things of kernel-mode programming on Windows.

    All I can advise is that (1) if this is a closed system that you're building, identify the offending component(s) and get them off the system, or (2) if your hardware and driver can be installed in a wide-variety of systems, you're just going to have to "suck it up" and provide mitigations... like queuing a work item. I know, that's not super helpful. Sorry.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • Neo-2Neo-2 Member Posts: 32

    @Peter_Viscarola_(OSR) said:

    The assert if hits will be after 8-10hrs of running the traffic and it is very inconsistent. The test ran multiple 48+hrs without any ASSERT.

    What you're seeing, in my experience, is CLASSIC Windows behavior. Average ISR to DPC latency, and average DPC runtime, can be low... but every once in a while... things can get out of hand. We've had clients coming to us complaining of strange problems like this that happen "once a day, overnight, sometimes" -- It's one of the least attractive things of kernel-mode programming on Windows.

    All I can advise is that (1) if this is a closed system that you're building, identify the offending component(s) and get them off the system, or (2) if your hardware and driver can be installed in a wide-variety of systems, you're just going to have to "suck it up" and provide mitigations... like queuing a work item. I know, that's not super helpful. Sorry.

    No, it is not a closed system and it open to work on wide-variety of systems. Sure, We would continue to work on this issue.
    Thank you very much for your inputs on the questions asked in earlier thread, that was helpful in understanding .

    Peter

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Upcoming OSR Seminars
OSR has suspended in-person seminars due to the Covid-19 outbreak. But, don't miss your training! Attend via the internet instead!
Developing Minifilters 24 May 2021 Live, Online
Writing WDF Drivers 14 June 2021 Live, Online
Internals & Software Drivers 27 September 2021 Live, Online
Kernel Debugging 15 November 2021 Live, Online