Setting LOW_REALTIME_PRIORITY for driver created thread

parsa · July 1, 2021, 2:11pm

Hi All,

I am working on a NDIS miniport driver. In my driver, I am creating some threads to handle NDIS packet processing in the send/receive path.
When I use LOW_REALTIME_PRIORITY for one of the thread that process the packet in the receive path, I am observing good download speed.
Any risk using LOW_REALTIME_PRIORITY for thread created using PsCreateSystemThread.

Thanks,
Parsa

Peter_Viscarola_OSR · July 2, 2021, 1:19am

Nope. No risk at all. Except for the fact that if the work thread runs in the same CPU as the thread queuing the work, it’s likely to preempt the queuing thread. After considering that for awhile, if you’re good with that, party on!

Peter

craig_howard · July 2, 2021, 2:16am

Yep, that’s call priority inversion, and it’s a real problem to watch for … generally I will set the thread priority to IDLE, then WaitForXObject until it’s got something to do, then bump it up to whatever, then drop it down and repeat … that may seem counterintuitive but especially for a driver that is getting data from somewhere else and passing it along you really want to have your threads IDLE until they are needed …

anton_bassov · July 2, 2021, 11:37pm

Except for the fact that if the work thread runs in the same CPU as the thread queuing the work, it’s likely to preempt the queuing thread.
After considering that for awhile, if you’re good with that, party on!

The point that is, IMHO, worth considering here is that the OP speaks about the ingress network packets that are getting processed by a NDIS miniport driver. These packets are more than likely to get enqueued in context of a DPC, rather than the one of a thread,right. Therefore, the scenario that you have described above simply never occurs under these circumstances, so that it is simply a non-issue. This is the only reason why the OP seems to be so happy about the performance - he just has not yet encountered the potential drawbacks of this approach in so far.

However, if he tries the same approach with the egress ones, the things are not necessarily going to work THAT well, because these packets may get enqueued not only in context of a DPC but in the one of a thread as well. Once we are speaking about the thread priority that falls into the realtime range, the scenario that you have described may, indeed, arise just naturally, and I am not sure the OP will be too happy about the performance.

Therefore, some extra considerations may be needed indeed. The very first thing that gets into my head is simply making the worker thread wait (probably,with a timeout) on some event that gets signaled (which may be done in context of either a thread or a DPC) only if the number of packets in the queue exceeds some minimal threshold, which will solve the issue of the superfluous context switches.

This is , apparently, the most simplistic (and,probably reasonable) approach that can be taken. If you want something more complex, you may, probably, want to experiment with playing around with the thread priorities, i.e. to take the approach that has been mentioned by Mr.Howard. However, it seems (at least to me) to be simply an overkill here…

Anton Bassov

MBond2 · July 4, 2021, 10:54pm

leaving aside what sort of processing a dedicated thread could be required to do, I infer TCP data primarily inbound to the Windows host with some short commands and lots of ACKs going back - hence ‘download speed’

This pattern is common, but should not dictate your design. At a layer lower than you, a well written NIC driver will implement a hybrid polling method - sometimes relying on tight loops to poll hardware for more data and sometimes waiting for interrupts. The reason being that network traffic rates can rise and fall dramatically. When traffic levels are high, it will be everything that a tight loop can do to get them from the device and pass them on to higher levels that presumably can use other cores to continue processing (the TCP window size regulates this behaviour in the case that the machine doesn’t have the capacity to handle the throughput - and other protocols have other methods). But many times the volume is not high and it is very much a waste to poll for something that isn’t there. Some years ago, I was asked to look into a system created by another company. When the system had no work to do, the CPU usage flatlined at 100%, but when they submitted work, the CPU usage dropped down. And the more work they gave it to do, the lower the CPU usage became. Figuring that something that they had done had somehow broken the measurements that task manager provides, they asked me to check. Task manager was working just fine, but the system in question had a threading model so poor that I won’t describe it

so the first question is why do you need a thread at all? And presuming that you do, can you check or assure that you will have sufficient other cores to do the other kinds of processing that any single stream of TCP (I presume) data will require - on top of whatever else the system might be doing?