ISR-DPC long delay

I have a PCIe FPGA design with a ping pong buffer. When a buffer fills up it generates an interrupt. The interrupt clears the interrupt and then immediately queues a DPC. The DPC then runs when the IRQL drops and completes before the next interrupt.

The time between my interrupts is 64ms. I am not currently missing DPCs, as far as I know because the software engineers using the hardware are not complaining about missed DPCs.

However............ I do remember a while ago I did have problems with missed buffers occurring intermittently. I was at a loss as to what was causing them until I did a clean reinstall of windows and after reinstalling the software and drivers the problem went away.

My point is that you do not want to plan for an interrupt for every chunk of data. You want to plan for an interrupt when your hardware has new data after a long time. If there is always data, they you don't need to fire an interrupt at all. You just keep detecting new data from your device and processing it. Of course you can't do that infinitely from a DPC, and the hardware may have a hard time knowing when a new interrupt is actually needed, but as a design pattern, 1 interrupt per 1 package of data only makes sense for the slowest data rates where the data comes in bursts

Thanks MBond2!

In my original design data was streaming in an an interrupt was triggered for every packet, but this created an interrupt storm that the CPU couldn't handle, so my current design is generating an interrupt once every 120 packets (32MB - which is half the buffer) so in fact it functions as a ping-pong buffer.
Of course I can refine the design to trigger an interrupt once every 60 packets or less, but this won't change the data streaming rate and the buffer size which will still leave me at the point where data which wasn't processed within ~100[msec] is lost for good.

Think about it like this:

  1. Incoming data rate is 100[msec] per 64MB.
  2. Interrupt rate might be modified as I wish (an interrupt every: (x)MB=(y)[msec]), but then there is a trade off since my CPU will have to respond to each interrupt.
  3. If something is starving the DPC for more than 100[msec], it will surely starve the same period of time even if in that period I will now get 4, 8 or 16 interrupts.

Correct?

Let me give you a practical example. I worked with a PCIe telemetry device that produced 100MB/sec in 8k packets. We were using the Xilinx PCIe IP, which uses a set of descriptors where the hardware sets a bit when the packet has been written. I set up a 4MB circular buffer, which held about 500 packets.

Using that setup, we didn't need interrupts at all. The user-mode process reading in the telemetry handled packets until it ran out of filled packets. At that point, we just delayed a while.

We ran two such boards simultaneously, without working up a sweat.

Take a step back and think about what interrupts are for. An interrupt is a signal from a hardware peripheral that it needs attention from the CPU. If the code running on the CPU already knows that the hardware needs attention, or shortly will need attention, raising an interrupt is not only unnecessary but counter productive.

When handling input that comes at no predictable rate and where there can be long pauses and gaps (network packets, user's keystrokes etc.), interrupts are very useful because the CPU can ignore the hardware for long periods of time and do something else. Then when needed, the hardware interrupts and indicates that it needs attention and the CPU does that. If there happens to be a burst of input, the CPU should handle all of it, and then go away and do something else until next time.

When handling input that comes at a predictable rate (within a reasonable jitter), after an initial interrupt to start processing, they don't help much. They just add overhead by forcing context switches

Sounds interesting, but I still don't understand how would this work if something in the system is freezing my DPC for >100[msec].

If DPC (DISPATCH_LEVEL) is blocked, then why would a general thread running at IRQL<DISPATCH_LEVEL not be blocked?

There are almost no uniprocessor machines these days. But interrupt and DPC scheduling is not easy, and they are never equally distributed. A thread might be scheduled and running on one core at the same time as a DPC queued to another core is delayed.

Reducing the total number of interrupts will also 'just help' overall system performance

This is a much more complicated topic now a days where we have asymmetric multi processing. Performance cores and efficiency cores in the same system, NUMA and a whole bunch of other stuff. I am not aware of any solid research in this area, but Microsoft must have done something about it. The Linux guys too