IOCTL or shared event for fast(frequently) communication

rumble06 · September 13, 2020, 8:14pm

My driver needs to communicate with an app very frequently(it sends a notification to the app every time a keypress happens). I have tried with pending IOCTLs and using a dedicated thread that opens a IOCTL every time after it gets a reply from the driver. However, when doing that too fast, the driver doesn’t have a pending queue anymore on which it can send a reply. Currently, I’m using multiple threads inside the app that all creates it’s own pending queue so the driver should always have one present, even while the app is processing the response from a previous queue. I’m planning to use a safe-thread queue inside the app (something like https://github.com/cameron314/concurrentqueue ) to then send the responses from the worked threads that opened the queues to the main thread of the app. Is an event created with CreateEvent method (that I can send the handle to the driver so the driver can call it) be queued if an event is already being processed by the app (let’s say the app got a response from the event, while it’s processing that response and doing something based on it, if the driver calls the event again, would the app process that event after it finishes the current execution, or it would just be lost like it happens when there is no pending queue available with inverted IOCTL calls?)? Can a driver send data to the app using an event like it can do using the OutputBuffer in a pending IOCTL? I prefer to recode the inverted calls part to replace it with events, than have to deal with thread safety inside the app.

Don_Burn · September 13, 2020, 9:36pm

This has to be your app taking too long. I’ve developed KMDF drivers that were able to process 100,000 IOCTL’s per second, without any special processing, and as many as 600,000 with some special techniques on a dual processor system.

Event are not a great idea here, remember these have two states, set or not. If you want to signal with the driver that there is a keystroke how do you handle when the app hasn’t processed it and N more come in. All the driver can do is set or clear the event, there is no data with it.

You need to change your app to either process data faster, or make the IOCTL be able to report N events.

rumble06 · September 13, 2020, 9:57pm

@Don_Burn said:
This has to be your app taking too long. I’ve developed KMDF drivers that were able to process 100,000 IOCTL’s per second, without any special processing, and as many as 600,000 with some special techniques on a dual processor system.

Event are not a great idea here, remember these have two states, set or not. If you want to signal with the driver that there is a keystroke how do you handle when the app hasn’t processed it and N more come in. All the driver can do is set or clear the event, there is no data with it.

You need to change your app to either process data faster, or make the IOCTL be able to report N events.

I know it’s not the driver which can not process the IOCTL, it’s the app that can’t process that many IOCTLs in a single thread. I can’t make the IOCTL report N events instead of one, as the app needs to get the event and process it as fast as possible (latency is an issue here). As I said, by creating more threads (even 2 threads instead of one fixes it most of the time) in the app that all of them creates IOCTL requests that are gonna be used by the driver to notify, then it works just fine, but now I have to implement a way to send the data from all the threads that got a response from the driver, to the main thread and still keep latency as low as possible. In no way my driver or app would ever need to process 100k IOCTLs per second, it’ll probably never pass over 1k IOCTLs per second. As you said that an event can’t also send data, I have no other option than go with the method I’m currently using and implement a semaphore/concurrent queue system to pass the data to the main thread.

Tim_Roberts · September 14, 2020, 4:11am

However, when doing that too fast, the driver doesn’t have a pending queue anymore on which it can send a reply.

This sentence makes me think you’re not using overlapped I/O, so you’re sending one request at a time per file handle. Is that possibly correct? Because there’s no way to implement what you’re talking about without using overlapped I/O. Your app can submit 15 ioctls, if it wants, thereby ensuring the driver queue never runs dry.

Also remember that completing an ioctl does sets an event. The event communication is already done for you, PLUS you get to return data.

rumble06 · September 14, 2020, 8:28am

@Tim_Roberts said:

However, when doing that too fast, the driver doesn’t have a pending queue anymore on which it can send a reply.

This sentence makes me think you’re not using overlapped I/O, so you’re sending one request at a time per file handle. Is that possibly correct? Because there’s no way to implement what you’re talking about without using overlapped I/O. Your app can submit 15 ioctls, if it wants, thereby ensuring the driver queue never runs dry.

Also remember that completing an ioctl does sets an event. The event communication is already done for you, PLUS you get to return data.

You are correct, I’m opening a file handle per each thread without using overlapped I/O. Wouldn’t overlapped I/O have a similar behavior tho? Except that it would use just one file handle, if there are too many calls to be processed in one thread the system would create more threads for the app so it can process all the data? If that’s the case, it means I still need a semaphore/concurrent queue/message loop to send the data to the main thread when that happens, or maybe I could rethink the app a little bit more so there wouldn’t be a need to process the data in the main thread and each threads can process the data without having to send it to the main thread.

Edit: By main thread, I do not mean the main thread of the app, but ONE thread that the data is gonna be processed (doesn’t needs to be the main thread of the app, but it doesn’t needs to be processed in many threads)

Peter_Viscarola_OSR · September 14, 2020, 11:47am

Sigh. We’ve seen a several posts like this over the past two weeks. Odd.

Have the app open one handle for overlapped I/O. Have the app queue many (10, 20, 175) IOCTLs in advance. They will sit on the drivers Queue… and there will always be plenty of them. Have the app handle the completions via a completion port. As soon as the app gets one completion, have it send another IOCTL to the driver to replace the one that was just completed.

Problem solved.

“One IOCTL per keypress” isn’t fast, or even a lot of I/O. Do not use an event for this. Use an IOCTL like you’re doing. Your problem seems to ME to be entirely with your user mode handling of the operations.

Peter

rumble06 · September 14, 2020, 12:55pm

@“Peter_Viscarola_(OSR)” said:
Sigh. We’ve seen a several posts like this over the past two weeks. Odd.

Have the app open one handle for overlapped I/O. Have the app queue many (10, 20, 175) IOCTLs in advance. They will sit on the drivers Queue… and there will always be plenty of them. Have the app handle the completions via a completion port. As soon as the app gets one completion, have it send another IOCTL to the driver to replace the one that was just completed.

Problem solved.

“One IOCTL per keypress” isn’t fast, or even a lot of I/O. Do not use an event for this. Use an IOCTL like you’re doing. Your problem seems to ME to be entirely with your user mode handling of the operations.

Peter

Switched to overlapped I/O, currently putting 15 IOCTLs in queue (using a single file handle) and it works fine now (the speed is great and it seems that it doesn’t miss any “event” (the driver always has a pending IOCTL on which it can reply)). What would be the safest/recommended way to open a IOCTL again after receiving a response from a previous one? The completion port callback seems to be on another thread (well, it makes sense, otherwise it would block the main thread), but the file handle is created in the main thread. Isn’t a handle thread-specific and not safe to call from another thread(if that’s not the case, it means I can call DeviceIoControl to open another IOCTL directly inside the completion port callback and I do not have to find other solutions or use a mutex)? In this case, my idea is to keep a global integer with how many IOCTLs are currently opened, whenever a IOCTL gets completed, decrease this number, whenever it is below a threshold (a while loop in the main thread(the thread that opened the file handle) to check it’s value maybe?) open another IOCTL and I would use a mutex to safely access and edit that count variable, would that be a good implementation?

Sergey_Pisarev · September 14, 2020, 1:03pm

Isn’t a handle thread-specific and not safe to call from another thread

There are good books such as ‘windows system programming’ by Johnson Hart and ‘windows internals’ by Mark Russinovich

Peter_Viscarola_OSR · September 14, 2020, 1:58pm

Isn’t a handle thread-specific

No. Handles are PROCESS specific (they live in the process handle table) not thread specific.

it means I can call DeviceIoControl to open another IOCTL directly inside the completion port callback

That’s exactly what most people do.

Peter

anton_bassov · September 17, 2020, 12:21am

My driver needs to communicate with an app very frequently(it sends a notification to the app every time a keypress happens).

ROFLMAO…

Just to give you an idea, it takes you approx.150ms to blink an eye. Although it may seem just instantaneous to you, this time is,
in actuality, sufficient for 10 threads to use up their quantum. I hope this example is sufficient for convincing you that if you look at the whole thing from the OS’s perspective, any human-generated events happen to be, in actuality, EXTREMELY infrequent ones.
Just look at the numbers that Don provided, and recall that a typical human would hardly generate even a dozen of keystrokes per second. Therefore, you most certainly don’t need to use events, let alone to share a buffer between an app and a driver, here.

However, when doing that too fast, the driver doesn’t have a pending queue anymore on which it can send a reply.

Assuming that we are speaking about a properly-written app, this scenario may occur only if your target app is starved of CPU time.
Although it may happen under some special circumstances (for example, when the network traffic is really high, or if some high-priority
threads take up all available CPU time), this is very unlikely to occur on more or less regular basis. Therefore, the only conclusion here is that your app is just poorly designed. Fix this part, and I can assure you that the whole problem will be gone…

Anton Bassov

MBond2 · September 19, 2020, 12:22am

Anton beat me too it, but keyboard input might have been high speed in 1980, but it is certainly not in 2020. But it is also clear that the threading model needs work too. With or without IOCP, the stock thread pool, or anything custom, you should be able to handle this level of load with a single thread and >1% CPU usage. You should also evaluate your expectations if you think that 15 pending IRPs is a lot. Think 1,500 before you start to worry on commodity hardware and many thousands on specialized systems

MBond2 · September 19, 2020, 2:51am

now before the OP runs away, understand that we contribute here with the hope of helping and not to hindering your progress. The big question to how this should work was posed around 1990 and was solved in favor of this pattern. Although many others think that the solutions that other systems have might be better, nothing has been proven or demonstrated as better. Possibly pmem will prove to be, and NUMA remains a challenge to correctly support, but from a certain point of view that’s like saying it is better when we teach that 2 +2 make 5 because it will be harder to explain later the exceptional cases where that might result versus the normal ones where it can’t

Prokash_Sinha · September 19, 2020, 2:59am

Interrupt priorities are usually set from slow speed to higher speed. That
is less frequent to more frequent.

So Anton got a point

Pro

Peter_Viscarola_OSR · September 19, 2020, 3:14am

Interrupt priorities are usually set from slow speed to higher speed

On what operating system?

Not on Windows, certainly.

Peter

Prokash_Sinha · September 19, 2020, 4:05am

*nix design is based on that. Otherwise how would keyboard get attention in
the presence of high frequency interrupts ?

Pro

Peter_Viscarola_OSR · September 19, 2020, 12:55pm

Otherwise how would keyboard get attention in the presence of high frequency interrupts ?

Seriously?

On modern processors, and with ISRs of reasonable length, everything gets serviced just fine.

The whole idea of “make the most urgent device the most important IRQ” ceased being important around the time that we stopped using wire cutters to set the IRQ of plug-in boards.

Windows establishes IRQL by round robin assignment.

P

anton_bassov · September 19, 2020, 3:19pm

*nix design is based on that. Otherwise how would keyboard get attention in the presence of high frequency interrupts ?

You mean 40±year old UNIX versions written for PDP-11,right? These systems,indeed, made a heavy use of spl(), because the hardware interrupts had to be prioritized to one another on PDP-11. In act, the very concept of spl() (as well as its IRQL cousin) is based solely upon PDP-11 specifics.

Therefore, it lost any practical meaning when UNIX got ported to other architectures. In practical terms, it hung on for quite a while, because,
as long as we are speaking about the UP systems, it may work as a synchronization method just fine, so that no one really bothered to clean it up. However, it does not work this way with the MP ones, so that it had to get replaced with mutexes when MP systems became common.

Therefore, no major modern UNIX derivative (at least open-source one) really uses it any more. For example, FreeBSD had abandoned it and replaced it with mutexes ages ago, and Solaris/Illumos uses it only for disabling interrupts. It is still used by both NetBSD and OpenBSD, but these UNIX derivatives would hardly qualify for being called the major ones,right?

Anton Bassov

Prokash_Sinha · September 19, 2020, 3:22pm

Thanks Peter for enlightening me on this

pro

Prokash_Sinha · September 19, 2020, 3:29pm

Yep , thanks !

Right after my assertions I realised I was talking about old approach. Not
being in windows kernel for two years or more, it struck my mind about iRQL
and either round robins along with coalesced…

Pro

Peter_Viscarola_OSR · September 20, 2020, 2:40am

These systems,indeed, made a heavy use of spl(), because the hardware interrupts had to be prioritized to one another on PDP-11. In act, the very concept of spl() (as well as its IRQL cousin) is based solely upon PDP-11 specifics.

Have you so quickly forgotten the lessons I taught you?

You continue to misunderstand and repeat incorrect info about the PDP-11 and SPL. Do not talk about things you know nothing about, at least when there are folks who know better are within earshot.

Peter