Viewpoints on storport miniports doing significant work in the ISR

I’ve recently looked at a couple of storage miniports (like the Microsoft AHCI driver in the WDK samples), and was surprised at the amount of work they were doing in the ISR. For example, they called the storport srb completion function, from the ISR, which seems especially odd, as you can’t complete the original IRP from DIRQL, so storport has to take the request off the pending queue and put it on a to be completed queue, and then queue a DPC, when the DPC runs it has to take the request off the to be completed queue and does the actual completion.

Except for a few very special cases, that had excellent reasons, about every hardware driver I’ve written (in 20+ years) pretty much only queued a DPC in the ISR, and then did any real work in the DPC. Good drivers would even dynamically shift between polling in a thread and ISR/DPCs, as you need to be careful at high data rates not to totally consume a core at DPC level, to avoid DPC watchdog timeouts, and ugly thread starvations behaviors. Once a thread is scheduled to a core, if that core goes off to DPC/ISR level for a while, that thread will not get moved to a different core.

I’ve written multiple high performance NIC drivers that queued a DPC from the ISR, and they seemed to have just fine performance, so I’m skeptical doing lots of work in the ISR is a performance win. Doing significant work in the ISR vastly complicates locking too, as you can’t just take a normal spinlock, you must take an interrupt lock on the correct MSI-X interrupt, or else you just have to write your code such that your DPC level startio code can take interrupts, and you make sure the ISR doesn’t change anything your DPC level code was using. If there is no well-defined locking, it’s very fragile from a maintenance viewpoint.

It’s always been my understanding that taking any more than the minimal amount of time in an ISR was bad.

Can anybody clarify why storport miniports might have this architecture?

Thanks,
Jan