Thread boost and dynamic priority

I am working on an audio driver for a USB device. The kernel driver is only talking to a user space app that I am also developing. The user space apps plays back the audio that it receives from the driver. The driver is meant to replace an already existing driver that I don’t have the source code for, but I can run it and compare the result to what I am working on.

I have some shared memory between the driver and the user mode app that I write to in the driver, and then I use KeSetEvent to set an event that was passed from the user mode app. I wait for this event in the user mode app, and when it is signaled I read from the buffer.

The thread that I using to wait for the event in the user mode app does SetThreadPriority(GetCurrentThread(), THREAD_BASE_PRIORITY_LOWRT).
I can see that this is set with GetThreadPriority(GetCurrentThread())

I am also changing the priority of the process with SetPriorityClass(GetCurrentProcess(), HIGH_PRIORITY_CLASS).

I have used GetProcessPriorityBoost and GetThreadPriorityBoost, and the both tell me that priority boosting is not disabled for neither the thread nor the process.

If I put some logging in the callback that I get from the library that comes with the driver that I am trying to replace, then I can see that they have the same values for the above settings as I have in my thread / process.

I complete the event in the driver with KeSetEvent(kernelEvent_, IO_SOUND_INCREMENT, false).

If I use Process Explorer to inspect the thread I can see that “Base Pri” and “Dyn Pri” is set to 15 for my thread.

If I do the same with the library that comes with the driver that I am trying to replace, then I can see that “Base Pri” is set to 15, but “Dyn Pri” is set to 31.

How can I make the dynamic priority of my thread go to 31 also?

Why do you think this is necessary? What problem are you trying to solve? Manually adjusting the priority rarely gives a satisfactory result.

One issue with the driver I am writing is that when I stream audio from the device and I start Edge, then I sometimes hear cracks and pops. This is not the case with the driver that I am trying to replace.
I have captured USB traffic with a Beagle 480, and I am not losing any packets, and the timing of the packets are the same when I compare the two drivers. If I write the audio samples to a wavfile in the user mode app, then the stream is intact. Since I can see that the dynamic priority is different, and since that is related to when the thread can deliver the samples I wanted to have them the same.

changing thread priority may mask you problems, but this is a terrible design

ok, thanks for the feedback!

Altough you don’t think that it is a good idea, I am still interested in knowing how to achive priority 15 and dynamic priority 31. Maybe it does not solve anything, but since I’ve been trying to get to those numbers without luck it would be nice to know how to do it.
I am of course also interested in other ways to improve the situation.

Let me expand slightly - using events to communicate between UM and KM is a poor choice. Events may sound like a good idea, but in practice the overhead is high and as you observe the latency is not deterministic. High thread priority will reduce the chances of a delay, but on a loaded system, even high priority threads won’t be scheduled for awhile - potentially an unbounded time, but in practice a few thread quanta should be the maximum.

Either IRPs or shared memory (allocated with a long lived IRP) have better performance characteristics and are easier to work with

To answer your specific question, use SetPriorityClass with REALTIME_PRIORITY_CLASS and then SetThreadPriority with THREAD_PRIORITY_TIME_CRITICAL. Priority boosting is not relevant for you - that’s something that processes with mouse and keyboard focus get so the user can interact with them more swiftly. This is largely obsolete now that even low end systems have many cores

As was mentioned your process will need REALTIME_PRIORITY_CLASS to reach priority 31 (or anything above 15).

With a priority of 31, you can also receive “non-preemptive scheduling” from the dispatcher if you also create a job object for the process and set JobObjectBasicLimitInformation with a SchedulingClass of 9. Note that scheduling class is not the same as priority.

Your process may also want to avoid hard pagefaults by setting a minimum working set for the process and calling VirtualLock on any virtual memory that could be important for processing the audio.

Also, remember that even at the highest possible priority, you are not guaranteed not to be preempted. If another program does the same thing, you will compete equally with them to be scheduled, and hardware interrupts and high IRQL always take precedence. If the consequence is only poor audio on a maxed out system, then that trade off is probably okay

@MBond2
I am using a long lived IRP to map memory between the user mode app and the kernel driver. To notify the user mode app when that memory has been written to I have tried a call to DeviceIoControl passing an OVERLAPPED and then completing the WDFREQUEST when the memory has been written to. After that I need to make a a call to DeviceIoControl again so that there is another WDFREQUEST to complete the next time the memory has been written to. I guess this is also some overhead…
As previously described I also tried using event, and my experience is that events works better in my scenario.

I am settings the priority for the process and the thread like this:

    LOG_INFO("GetPriorityClass before: {}", GetPriorityClass(GetCurrentProcess()));
    if (!SetPriorityClass(GetCurrentProcess(), REALTIME_PRIORITY_CLASS)) {
      LOG_INFO("Settings priority class failed");
    }
    LOG_INFO("GetPriorityClass after: {}", GetPriorityClass(GetCurrentProcess()));

    LOG_INFO("Thread prio before: {}", GetThreadPriority(GetCurrentThread()));
    if (!SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_TIME_CRITICAL)) {
      LOG_INFO("Setting thread priority failed");
    }
    LOG_INFO("Thread prio after: {}", GetThreadPriority(GetCurrentThread()));

    if (BOOL isThreadPriorityBoostDisabled = false;
        GetThreadPriorityBoost(GetCurrentThread(), &isThreadPriorityBoostDisabled)) {
      LOG_INFO("thread priority boost disabled: {}", isThreadPriorityBoostDisabled);
    }

    if (BOOL isProcessPriorityBoostDisabled = false;
        GetProcessPriorityBoost(GetCurrentProcess(), &isProcessPriorityBoostDisabled)) {
      LOG_INFO("is process priority boost disabled: {}", isProcessPriorityBoostDisabled);
    }

LOG_INFO is just a macro that is using libfmt.

The above prints:

GetPriorityClass before: 32
GetPriorityClass after: 128
Thread prio before: 0
Thread prio after: 15
thread priority boost disabled: 0
is process priority boost disabled: 0

I was expecting GetPriorityClass after: 256. Any ideas why that is not happening? If I use Process Explorer I can still see that the thread has Base Pri 15 and Dyn Pri 15.

If I only use:
const auto taskHandle = AvSetMmThreadCharacteristics(L"Pro Audio", &taskIndex);
AvSetMmThreadPriority(taskHandle, AVRT_PRIORITY_CRITICAL);

then I get Base Pri 26 and Dyn Pri 26.

@Daniel_Terhell , thanks, I will experiement with that to see if it makes a difference!

Have the application send mutiple notification requests to the driver. Sending one at a time widens the gaps where the driver has a notification without a request to complete.

Yes, I had multiple memory buffers mapped, each with its own WDFREQUEST queued up, and they are written to and signaled in a round robin fashion.

It is not typical to use both a long lived IRP to share memory between UM and KM and also use DeviceIoControl to signal changes within that memory region. Normally, interlocked operations and memory barriers are used. That’s kind of the whole point of this kind of design - avoid many IRPs and UM / KM transitions when you know that there will be new data always so polling the memory makes sense.

In the olden days, when KM / UM transitions were maybe more expensive than normal context switches (exception processing overhead), using shared memory along with events made a lot of sense, but it has never made sense to use shared memory and IRPs - that has the overhead of both with the advantages of neither design

Audio (and video) have the unique properly that jitter is less desirable than data loss - a few pops or clicks are better than distortion of the entire playback. That makes a shared memory desirable

but most kinds of IO are the other way around and they should use many pending IRPs with OVERLAPPED IO (think 100 or more pending) to transfer data efficiently. Beware of completion sequence discrepancies caused by thread preemption etc.

ok, so if you would have shared memory and poll that to check for when the memory has been written to you would spin and sleep a bit until some variable in shared memory is updated?

but it has never made sense to use shared memory and IRPs - that has the overhead of both with the advantages of neither design

I can see the overhead involved in having to queue up a new IRP that should be signaled each time the shared memory (allocated with METHOD_OUT_DIRECT) has been written to, but I don’t understand what the overhead of the shared memory is.

If one would use OVERLAPPED IO with METHOD_BUFFERED, then to me it seems that the overhead is the same as having to queue up IRPs that is used to signal that the shared memory is written to, except that there is no copying of the data involved.

re shared memory, beyond the coherency protocol (hardware dependent), you have to implement some scheme of dividing a large chunk of memory into smaller ones, and indicating which of these has data that ready to be consumed by the other party. Then a way of checking for those chunks and indicating that the consumed buffers are now available for more data. if your problem cannot accept lost or corrupted data, the work required to implement this scheme is significant and your performance won’t be better than just using standard IRPs. The advantage of shared memory comes when you can relax at least one of those conditions - for example if you do not need to be certain that a block of data has been consumed before overwriting it; of if you can assume that a block of data now contains valid data based on elapsed time rather than a specific signal

IRPs assure the guaranteed delivery of consistent data between contexts. the use of them as signals than other memory may be read is both overkill and wrong when the order of that other data matters