IOCTL or shared event for fast(frequently) communication

However, when doing that too fast, the driver doesn’t have a pending queue anymore on which it can send a reply.

This sentence makes me think you’re not using overlapped I/O, so you’re sending one request at a time per file handle. Is that possibly correct? Because there’s no way to implement what you’re talking about without using overlapped I/O. Your app can submit 15 ioctls, if it wants, thereby ensuring the driver queue never runs dry.

Also remember that completing an ioctl does sets an event. The event communication is already done for you, PLUS you get to return data.

1 Like

@Tim_Roberts said:

However, when doing that too fast, the driver doesn’t have a pending queue anymore on which it can send a reply.

This sentence makes me think you’re not using overlapped I/O, so you’re sending one request at a time per file handle. Is that possibly correct? Because there’s no way to implement what you’re talking about without using overlapped I/O. Your app can submit 15 ioctls, if it wants, thereby ensuring the driver queue never runs dry.

Also remember that completing an ioctl does sets an event. The event communication is already done for you, PLUS you get to return data.

You are correct, I’m opening a file handle per each thread without using overlapped I/O. Wouldn’t overlapped I/O have a similar behavior tho? Except that it would use just one file handle, if there are too many calls to be processed in one thread the system would create more threads for the app so it can process all the data? If that’s the case, it means I still need a semaphore/concurrent queue/message loop to send the data to the main thread when that happens, or maybe I could rethink the app a little bit more so there wouldn’t be a need to process the data in the main thread and each threads can process the data without having to send it to the main thread.

Edit: By main thread, I do not mean the main thread of the app, but ONE thread that the data is gonna be processed (doesn’t needs to be the main thread of the app, but it doesn’t needs to be processed in many threads)

Sigh. We’ve seen a several posts like this over the past two weeks. Odd.

Have the app open one handle for overlapped I/O. Have the app queue many (10, 20, 175) IOCTLs in advance. They will sit on the drivers Queue… and there will always be plenty of them. Have the app handle the completions via a completion port. As soon as the app gets one completion, have it send another IOCTL to the driver to replace the one that was just completed.

Problem solved.

“One IOCTL per keypress” isn’t fast, or even a lot of I/O. Do not use an event for this. Use an IOCTL like you’re doing. Your problem seems to ME to be entirely with your user mode handling of the operations.

Peter

1 Like

@“Peter_Viscarola_(OSR)” said:
Sigh. We’ve seen a several posts like this over the past two weeks. Odd.

Have the app open one handle for overlapped I/O. Have the app queue many (10, 20, 175) IOCTLs in advance. They will sit on the drivers Queue… and there will always be plenty of them. Have the app handle the completions via a completion port. As soon as the app gets one completion, have it send another IOCTL to the driver to replace the one that was just completed.

Problem solved.

“One IOCTL per keypress” isn’t fast, or even a lot of I/O. Do not use an event for this. Use an IOCTL like you’re doing. Your problem seems to ME to be entirely with your user mode handling of the operations.

Peter

Switched to overlapped I/O, currently putting 15 IOCTLs in queue (using a single file handle) and it works fine now (the speed is great and it seems that it doesn’t miss any “event” (the driver always has a pending IOCTL on which it can reply)). What would be the safest/recommended way to open a IOCTL again after receiving a response from a previous one? The completion port callback seems to be on another thread (well, it makes sense, otherwise it would block the main thread), but the file handle is created in the main thread. Isn’t a handle thread-specific and not safe to call from another thread(if that’s not the case, it means I can call DeviceIoControl to open another IOCTL directly inside the completion port callback and I do not have to find other solutions or use a mutex)? In this case, my idea is to keep a global integer with how many IOCTLs are currently opened, whenever a IOCTL gets completed, decrease this number, whenever it is below a threshold (a while loop in the main thread(the thread that opened the file handle) to check it’s value maybe?) open another IOCTL and I would use a mutex to safely access and edit that count variable, would that be a good implementation?

Isn’t a handle thread-specific and not safe to call from another thread

There are good books such as ‘windows system programming’ by Johnson Hart and ‘windows internals’ by Mark Russinovich

1 Like

Isn’t a handle thread-specific

No. Handles are PROCESS specific (they live in the process handle table) not thread specific.

it means I can call DeviceIoControl to open another IOCTL directly inside the completion port callback

That’s exactly what most people do.

Peter

1 Like

My driver needs to communicate with an app very frequently(it sends a notification to the app every time a keypress happens).

ROFLMAO…

Just to give you an idea, it takes you approx.150ms to blink an eye. Although it may seem just instantaneous to you, this time is,
in actuality, sufficient for 10 threads to use up their quantum. I hope this example is sufficient for convincing you that if you look at the whole thing from the OS’s perspective, any human-generated events happen to be, in actuality, EXTREMELY infrequent ones.
Just look at the numbers that Don provided, and recall that a typical human would hardly generate even a dozen of keystrokes per second. Therefore, you most certainly don’t need to use events, let alone to share a buffer between an app and a driver, here.

However, when doing that too fast, the driver doesn’t have a pending queue anymore on which it can send a reply.

Assuming that we are speaking about a properly-written app, this scenario may occur only if your target app is starved of CPU time.
Although it may happen under some special circumstances (for example, when the network traffic is really high, or if some high-priority
threads take up all available CPU time), this is very unlikely to occur on more or less regular basis. Therefore, the only conclusion here is that your app is just poorly designed. Fix this part, and I can assure you that the whole problem will be gone…

Anton Bassov

Anton beat me too it, but keyboard input might have been high speed in 1980, but it is certainly not in 2020. But it is also clear that the threading model needs work too. With or without IOCP, the stock thread pool, or anything custom, you should be able to handle this level of load with a single thread and >1% CPU usage. You should also evaluate your expectations if you think that 15 pending IRPs is a lot. Think 1,500 before you start to worry on commodity hardware and many thousands on specialized systems

1 Like

now before the OP runs away, understand that we contribute here with the hope of helping and not to hindering your progress. The big question to how this should work was posed around 1990 and was solved in favor of this pattern. Although many others think that the solutions that other systems have might be better, nothing has been proven or demonstrated as better. Possibly pmem will prove to be, and NUMA remains a challenge to correctly support, but from a certain point of view that’s like saying it is better when we teach that 2 +2 make 5 because it will be harder to explain later the exceptional cases where that might result versus the normal ones where it can’t

1 Like

Interrupt priorities are usually set from slow speed to higher speed. That
is less frequent to more frequent.

So Anton got a point

Pro

Interrupt priorities are usually set from slow speed to higher speed

On what operating system?

Not on Windows, certainly.

Peter

*nix design is based on that. Otherwise how would keyboard get attention in
the presence of high frequency interrupts ?

Pro

Otherwise how would keyboard get attention in the presence of high frequency interrupts ?

Seriously?

On modern processors, and with ISRs of reasonable length, everything gets serviced just fine.

The whole idea of “make the most urgent device the most important IRQ” ceased being important around the time that we stopped using wire cutters to set the IRQ of plug-in boards.

Windows establishes IRQL by round robin assignment.

P

*nix design is based on that. Otherwise how would keyboard get attention in the presence of high frequency interrupts ?

You mean 40±year old UNIX versions written for PDP-11,right? These systems,indeed, made a heavy use of spl(), because the hardware interrupts had to be prioritized to one another on PDP-11. In act, the very concept of spl() (as well as its IRQL cousin) is based solely upon PDP-11 specifics.

Therefore, it lost any practical meaning when UNIX got ported to other architectures. In practical terms, it hung on for quite a while, because,
as long as we are speaking about the UP systems, it may work as a synchronization method just fine, so that no one really bothered to clean it up. However, it does not work this way with the MP ones, so that it had to get replaced with mutexes when MP systems became common.

Therefore, no major modern UNIX derivative (at least open-source one) really uses it any more. For example, FreeBSD had abandoned it and replaced it with mutexes ages ago, and Solaris/Illumos uses it only for disabling interrupts. It is still used by both NetBSD and OpenBSD, but these UNIX derivatives would hardly qualify for being called the major ones,right?

Anton Bassov

Thanks Peter for enlightening me on this

pro

Yep , thanks !

Right after my assertions I realised I was talking about old approach. Not
being in windows kernel for two years or more, it struck my mind about iRQL
and either round robins along with coalesced…

Pro

These systems,indeed, made a heavy use of spl(), because the hardware interrupts had to be prioritized to one another on PDP-11. In act, the very concept of spl() (as well as its IRQL cousin) is based solely upon PDP-11 specifics.

Have you so quickly forgotten the lessons I taught you?

You continue to misunderstand and repeat incorrect info about the PDP-11 and SPL. Do not talk about things you know nothing about, at least when there are folks who know better are within earshot.

Peter

You continue to misunderstand and repeat incorrect info about the PDP-11 and SPL. Do not talk about things you
know nothing about, at least when there are folks who know better are within earshot.

Oh, come on…

I really hope that you are not going to argue against my assertion that PDP-11 prioritized interrupts to one another at the hardware level, right.

Although I haven’t had a chance to get any personal experience with PDP-11 (ironically, we are of the same age, i.e. “born in 1969”),
I am still in a position to get the publicly available documentation.

http://gordonbell.azurewebsites.net/digital/pdp%2011%20handbook%201969.pdf

This doc is pretty long, but here is a Wiki article that makes a reference to it, and this article happens to be much shorter

https://en.wikipedia.org/wiki/PDP-11

Here is the relevant excerpt from it

[begin quote]

The PDP-11 operated at a priority level from 0 through 7, declared by three bits in the Processor Status Word (PSW), and high-end models could operate in a choice of modes, Kernel (privileged), User (application), and sometimes Supervisor, according to two bits in the PSW.

To request an interrupt, a bus device would assert one of four common bus lines, BR4 through BR7, until the processor responded. Higher numbers indicated greater urgency, perhaps that data might be lost or a desired sector might rotate out of contact with the read/write heads unless the processor responded quickly. The printer’s readiness for another character was the lowest priority (BR4), as it would remain ready indefinitely. If the processor were operating at level 5, then BR6 and BR7 would be in order. If the processor were operating at 3 or lower, it would grant any interrupt; if at 7, it would grant none. Bus requests that were not granted were not lost but merely deferred. The device needing service would continue to assert its bus request.

Whenever an interrupt exceeded the processor’s priority level, the processor asserted the corresponding bus grant, BG4 through BG7. The bus-grant lines were not common lines but were a daisy chain: The input of each gate was the output of the previous gate in the chain. A gate was on each bus device, and a device physically closer to the processor was earlier in the daisy chain. If the device had made a request, then on sensing its bus-grant input, it could conclude it was in control of the bus, and did not pass the grant signal to the next device on the bus. If the device had not made a request, it propagated its bus-grant input to its bus-grant output, giving the next closest device the chance to reply. (If devices did not occupy adjacent slots to the processor board, “grant continuity cards” inserted into the empty slots propagated the bus-grant line.)

Once in control of the bus, the device dropped its bus request and placed on the bus the memory address of its two-word vector. The processor saved the program counter (PC) and PSW, entered Kernel mode, and loaded new values from the specified vector. For a device at BR6, the new PSW in its vector would typically specify 6 as the new processor priority, so the processor would honor more urgent requests (BR7) during the service routine, but defer requests of the same or lower priority. With the new PC, the processor jumped to the service routine for the interrupting device. That routine operated the device, at least removing the condition that caused the interrupt. The routine ended with the RTI (ReTurn from Interrupt) instruction, which restored PC and PSW as of just before the processor granted the interrupt.

If a bus request were made in error and no device responded to the bus grant, the processor timed out and performed a trap that would suggest bad hardware.

[end quote]

Anton Bassov

Have you so quickly forgotten the lessons I taught you?

OMG - it looks like, in actuality, I was not THAT wrong on that particular occasion.

Look at the following lines taken from the excerpt that I quoted in my previous post.

[begin quote]

Whenever an interrupt exceeded the processor’s priority level, the processor asserted the corresponding bus grant, BG4 through BG7. The bus-grant lines were not common lines but were a daisy chain: The input of each gate was the output of the previous gate in the chain. A gate was on each bus device, and a device physically closer to the processor was earlier in the daisy chain. If the device had made a request, then on sensing its bus-grant input, it could conclude it was in control of the bus, and did not pass the grant signal to the next device on the bus. If the device had not made a request, it propagated its bus-grant input to its bus-grant output, giving the next closest device the chance to reply. (If devices did not occupy adjacent slots to the processor board, “grant continuity cards” inserted into the empty slots propagated the bus-grant line.)

[end quote]

Anton Bassov

tl;dr (your posts above.)

Anton, stop it. You asserted in the previous thread (a) that the PDP-11 has a non-uniform cost for accessing hardware… which is false, and (b) that “the OS prioritising hardware interrupts to one another” was unique to the PDP-11, or done in software, or something… I’m honestly not sure WTF you’re saying. Interrupt priorities on the PDP-11 are a hardware concept, like they are on the IBM PC. They are reflected in the PSW, where the priority is set by either the SPL or MTPS instruction. Hardware interrupt priorities was not a new, or unique, concept to the PDP-11.

So, be quiet Anton. Contribute something useful to the questioners or go back to being quiet… as you have been for the past several months.

Your algorithm for being here should be:

if(AntonHasSomethingHelpfulForTheDIscussion()) {
    if(!ThisWillAnnoyPeter() {
        
        if(ThingToBePostedIsDefinitelyUseful()) {
            PostTheComment();   
        }
    }
}

Peter