Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Before Posting...
Please check out the Community Guidelines in the Announcements and Administration Category.

High CPU Utilization in WaitForSingleObject

ravigargravigarg Member Posts: 3
edited July 29 in NTDEV

I have a driver and a userland service which exchange packets. I am getting unexpected (high) CPU utilization (~7% to 10% single-core) in WaitForSingleObject. Profiler shows lots of spin time for WaitForSingleObject. Can somebody please help me on ways to performance optimize the following

    // **UserLand Code**
    void OpenDevice()
    {
        m_DeviceHandle = CreateFile( DEVICE_NAME, MAXIMUM_ALLOWED, 0, NULL, OPEN_EXISTING, FILE_FLAG_OVERLAPPED | FILE_ATTRIBUTE_NORMAL, INVALID_HANDLE_VALUE );
    }
    void workerThread()
    { 
        HANDLE hIoEvent = CreateEvent(NULL, FALSE, TRUE, ""); 
        OVERLAPPED ol;
        memset(&ol, 0, sizeof(OVERLAPPED));
        ol.hEvent = hIoEvent;
        do { 
            DeviceIoControl(m_DeviceHandle, CONTROL_CODE( MY_ASYNC_EVENT, METHOD_OUT_DIRECT), NULL, 0, outBuf, sizeof(outBuf), &returnedCount, &ol);            
            ret = WaitForSingleObjects(hIoEvent, INFINITE); **//High CPU Utilization (Spin) here**
            switch(ret)
            {
            case WAIT_OBJECT_0:
                // data avialble
                GetOverlappedResult(m_DeviceHandle, &ol, &bytesRead, FALSE); 
                break;
            }
        } while(true);
        CloseHandle(hIoEvent);
    } 

    // **Driver code** 
    static
    NTSTATUS IoctlDeviceIoControl(PIRP  pIrp,   PIO_STACK_LOCATION   pIrpStack, PULONG  pOutBufLen)
    {
        switch (ioFunctionCode) {
            case MY_ASYNC_EVENT: 
                IoMarkIrpPending(pIrp);        
                IoSetCancelRoutine(pIrp, EventCancelIRP);                
                // Get the packet from Queue and update pIrp                
                IoSetCancelRoutine(pIrp, NULL);
                IoCompleteRequest(pIrp, IO_NO_INCREMENT);
            break;
        }
    }

I have also tried to use IoCompletionPort, the time spent in WaitForSingleObject now goes in GetQueuedCompletionStatus

Comments

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 7,515

    You’re sure you’re seeing this time in the WaitForSingleObject code and not something you’re doing as a result of waking? Seems... unlikely.

    I’m not sure it matter but the driver code snip you provided doesn’t make any sense. At all. What does “get the packet from the queue” mean? Why are you synchronously completing an IRP that you marked pending? Why are you writing a WDM driver?

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • Tim_RobertsTim_Roberts Member - All Emails Posts: 13,166

    I don't doubt that you're spending ELAPSED time in WaitForSingleObject and GetQueuedCompletionStatus, but not CPU time. What profiler are you using?

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • ravigargravigarg Member Posts: 3


    I am using Intel VTune Amplifier.
    @Peter_Viscarola_(OSR) I skipped a lot of driver code in the above snippet to have my query concise (assuming that is not causing the CPU issue)
    "get the packet from the queue" we have packet-list which has packet captured at this driver level and then sent to userland. IRP is completed asynchronously (I can put more driver details if that helps to solve the issue)

  • Peter_Viscarola_(OSR)Peter_Viscarola_(OSR) Administrator Posts: 7,515

    That snip from VTune shows the time being spent in CreateFile, not wait.

    I’m very confused.

    Peter

    Peter Viscarola
    OSR
    @OSRDrivers

  • ravigargravigarg Member Posts: 3
    edited July 30

    @Peter_Viscarola_(OSR) Spin time of WaitForSingleObject is 3.264s (which is high considering the test sample).

    From VTune Documentation: Spin Time
    Spin time is Wait Time during which the CPU is busy. This often occurs when a synchronization API causes the CPU to poll while the software thread is waiting. Some Spin Time may be preferable to the alternative of increased thread context switches. Too much Spin Time, however, can reflect lost opportunity for productive work.

  • anton_bassovanton_bassov Member Posts: 5,065

    Spin time is Wait Time during which the CPU is busy. This often occurs when a synchronization API causes the CPU
    to poll while the software thread is waiting. Some Spin Time may be preferable to the alternative of increased thread context switches.
    Too much Spin Time, however, can reflect lost opportunity for productive work.

    Well, I don't really know what they mean by " synchronization API causes the CPU to poll", but the above seems to be of zero relevance to KeWaitForSingleObject(). At the kernel level, it may apply to Solaris/Illumos adaptive mutex acquisition that involves waiter spinning as long as the mutex owner is in a running state. In the Windows kernel world it may, probably, (I am not so sure about it) apply to FAST_MUTEX acquisition, which is conceptually similar to the userland critical section locking that involves a limited number of spins before calling
    WaitFor SingleObject(). However, I don't think KeWaitForSingleObject() may work this way....

    Anton Bassov

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Upcoming OSR Seminars
Writing WDF Drivers 21 Oct 2019 OSR Seminar Space & ONLINE
Internals & Software Drivers 18 Nov 2019 Dulles, VA
Kernel Debugging 30 Mar 2020 OSR Seminar Space
Developing Minifilters 27 Apr 2020 OSR Seminar Space & ONLINE