ISR To DPC communication

ajitabhs · January 13, 2022, 9:36pm

Hi Guys,

The context here is very simple. The ISR kicks in and requests DPC for further processing. The DPC runs and processes the interrupt. The DPC needs to be able to handle multiple interrupts since every “enqueueDpc” might not necessarily result in queuing of DPC.

This also means that there has to be a mechanism to communicate multiple interrupts to DPC for processing. Earlier I have used ExInterlockedLists to achieve this. The problem with this is that it goes and disables interrupt in EFlags register. Which basically means that every time we remove or insert an entry in to the queue, there is a brief period (two pointer interlocked exchange instruction time) during which the interrupts on the core will be disabled. It would have been better if interrupt disable would be done for the specific device using the APIC, but setting the EFlags does not seem optimal.

Another option is to Acquire a Interrupt SpinLock. This will be more efficient than Interlocked operations. But I really do not want to do this unless there no other option available to achieve this communication.

I was hoping that KMDF framework [this is the framework I am using to write the drivers] would take care of this by providing such mechanism, but looks like that is not the case.

Any thoughts??

AJ

Peter_Viscarola_OSR · January 13, 2022, 11:11pm

The Framework has nothing specific to help in this situation. The typical thing we do is use the Interrupt Spin Lock, as you described.

Peter

Tim_Roberts · January 14, 2022, 1:26am

This also means that there has to be a mechanism to communicate multiple interrupts to DPC for processing.

I typically try to engineer things so the DPC merely assumes “interrtupts occurred”, without having to know that “exactly 3 interrupts occurred”. The DPC should be able to handle whatever the hardware is complaining about, without knowing how many times it whined.

Peter_Viscarola_OSR · January 14, 2022, 4:54pm

Mr. Roberts’ advice is excellent.

Remember, that the ratio of DPC invocations to ISR invocations is not 1:1 – The ISR can potentially run many times before the DPC is invoked a single time.

“How do I communicate what needs to be done from my ISR to my DPC” is a very common KMDF question. When building WDF, having some sort of “standard” method for communicating from ISR to DPC was on the list of things to do, but it fell below the line. It’s super-hard to solve this in a generic way that’d be useful to “most types of driver” – Each device and driver are different, and many times to solution simply comes down to “acquire the Interrupt Spin Lock in your DpcForIsr and interrogate the device registers or in-memory structures to determine what needs done.”

Peter

craig_howard · January 14, 2022, 8:43pm

Typically my ISR’s are triggered from an FPGA going into a specific state from another state … such as DMA complete, servo motor operation complete, sensor full/ empty, missile fueled and armed, etc so the ISR is very simple: it examines the MSI to get some context for the interrupt, tells the FPGA to halt that specific interrupt (not all of them, just the one that triggered) and pushes the DPC

In the DPC handler it “knows” the context of what happened so it examines the FPGA to get more info and clear things up, then re-enables the specific interrupt that triggered things in the first place

Doing it this way I avoid interrupt storms and am able to have a specific DPC handler for each interrupt rather than the “mother of all switch statements”, and am able to guarantee serialization of interrupts for a specific type

It’s important to note that each interrupt is stateless with respect to the other interrupts; the missile being fueled has nothing to do with the servo’s opening the missile launcher doors, nor with the programming of the target coordinates. That’s by design, to ensure that item A doesn’t block item B which is waiting on item C which itself is waiting on item A …

ajitabhs · January 14, 2022, 9:27pm

@Tim_Roberts : Thanks for a great idea.

However, my device is a NVME device with NO Interrupt status register and the only way it tell what happened to the device is by an MSI interrupt with a particular Message ID which identifies the source of interrupt. If I have to find out later what all has happened to the since the last interrupt, It will be a huge task in itself. That would include inspecting every completion queue that the driver created and figuring out if there was any completion that was posted but looking at the phase bit of the entry the driver is expecting a completion. Not very efficient and I will spend a very long time in DPC.

The message tells me which completion queue has finished. I need to retain this information to efficiently process the completion.
Let me check with my hardware team if I can get a status which I can read in my DPC which can give me a bit map of all the queue which have completions pending. That will really ease my life.

MBond2 · January 15, 2022, 12:45am

well, I think you need to talk with your hardware team more. Forgetting whatever might happen on the host side, MSI interrupts are not guaranteed to be delivered 1:1 by the hardware from the device to the host.

but if you really have an NVME device, then why are you writing a driver at all? That’s like writing a custom driver for an HID compliant mouse instead of relying on the in box Microsoft ones. I have seen it done for the purpose of adding side channel interfaces like low level format, or to support down level OS versions.

If this is truly NVME, then I don’t expect you to need any hardware or firmware changes. But I have no way of knowing any of the details so I might be completely wrong

ajitabhs · January 25, 2022, 10:42pm

Well,
It’s partial NVME device. The initialization and queue setup is all NVME. But the commands are custom for custom hardware. I do understand that MSI are not guaranteed to be delivered 1:1 and they are posted writes with no guaranteed delivery as well. Now there are two ways to do this. First is conventional ISR DPC communication.
The second is to go and look for interrupt sources, in my case this will be going through all the completion queues looking for the phase bit to be reversed. So in my case, there could be 32 completion queues (just a number), so would it be cheaper to go and read 32 memory locations Vs acquiring the interrupt spin lock?
I would prefer the 32 memory location locations rather than acquiring a interrupt spin lock.

Any takers on this??

-AJ

MBond2 · January 26, 2022, 1:59am

So it is not an NVME device. It is a device with a protocol similar to NVMe?

Next, consider your question more carefully. If your device is running ‘slowly’, then you will send it some command, that command will complete and indicate that by updating a completion queue and triggering an interrupt. Your ISR will run and trigger the DPC. The DPC then handles the completion of that one command - leave aside exactly how it does that for now - and then a ‘long’ time will pass before the next command is issued and the cycle will repeat

But if your device is running ‘quickly’, then you will be sending it, many commands all the time. When each one completes, each will update the appropriate completion queue. Then it will try to trigger an interrupt. Sometimes, this will generate an actual MSI, but often it won’t because an interrupt is already pending than no CPU has consumed. Eventually the chipset will manage to get the attention of some CPU and your ISR will run. The ISR triggers the DPC and sometime later, possibly after multiple ISR runs, the DPC runs. By the time that happens many commands may have completed and your DPC should plan to handle all of them. Actually, if a high load remains for a long time, switching from interrupts to polling improves performance of the system overall as well as the device throughput. understanding how to optimize the performance of a scheme like this is a hard problem

So to your actual question - is it better to acquire a spin lock or to poll on many device memory locations. As usual with performance questions, the answer is ‘it depends’. If you are running on a desktop level system with relatively few cores and all of them on the same die, acquiring a spin lock is quite straightforward. Contention notwithstanding, it is a small number of muops in the CPU to execute the lock compare exchange. When you move to a larger system - one with multiple CPU dies, then the cost of those lock operations increases. Instead of just wires within a single chunk of silicone, coherency protocols have to be used when executing those instructions and that can take substantially longer. It is almost impossible to exactly determine what the best sequence should be in a real system, but in general, in the first case acquiring a CS is certainly better, and in the second case acquiring a CS might be better or might be worse - depending on how many cores, how much contention and how many device memory locations and how slow it is to access them (is the core that runs the DPC near to the device or does the CPU interconnect need to be involved)

but the question that you should really be thinking about is how independent or co-dependent are the commands that complete? If they are independent, or independent enough, then if you have 32 hardware completion queues, then you should have 32 software completion paths - calling 32 separate ISRs and DPCs - even if they happen to all be the same function in the code