You can’t do this with the current version of NDIS. NDIS’s tricks only help when you can delegate your ISR to NDIS.
It’s certainly possible that registering our handlers with NDIS won’t magically resolve our HLK findings
Yeah, NDIS RST isn’t magic. It does a bit of work to avoid hitting a DPC timeout (bugcheck 0x133), but it certainly doesn’t make the datapath run any faster. If your “HLK findings” are 133 bugchecks, then you’ll want to either use RST or replicate what it’s doing. If it’s something other than 133 bugchecks, then RST is unlikely to help, and you’ll need to investigate those findings.
If you want to replicate RST in your driver, because you’re seeing 133 bugchecks, then here’s a thumbnail sketch of what RST is actually doing.
The current implementation of NDIS has a trivial feedback loop that tries to keep the total time spent in a single DPC below some % of the DPC time limit. If the previous indication took too long, then NDIS lowers the NBL limit. If the previous indication came in well below time budget, then NDIS increases the NBL limit.
You’re not expected to implement this feedback loop yourself; many IHVs with homemade RST do just fine by hardcoding a constant maximum number of NBLs, like 128 or 1024. (Higher-end gear tends to use a higher NBL limit, with the expectation that if someone paid a lot for the NIC, they probably also paid a lot for the CPU, so the system can handle more packets in the same time interval.)
If there’s more packets available than the limit, then queue another DPC to dispatch the next batch. This will get you past the “single DPC” watchdog. But it won’t help you with the “cumulative DPC” watchdog. For that, you need to ensure that there are small gaps in the DPCs. The easiest way to do this is to schedule a KTIMER that’s due to expire on the next tick. Unfortunately, that means that if the processor is idle, you’ll just waste cycles. So you can race a low-priority passive thread against the timer: the thread ensures you get any spare CPU cycles, while the timer ensures you run within a small deadline (approximately 1/2 clock interrupt).
If you’re very familiar with the NT scheduler, you might wonder whether using a Threaded DPC would save you some trouble. Unfortunately, NDIS disallows receive indications on those, because our current implementation of TCPIP is not compatible with Threaded DPCs.
All this is a lot of complexity to build into a NIC driver, so I’m not really asking that you do it. Really, NDIS owes you a library API to do all this work for you. And NDIS can do a better job of scheduling than any individual miniport, since NDIS has more system-wide visibility. E.g., NDIS can determine that CPU4 is not used for anything else, so NDIS can disable the DPC watchdog on CPU4 and use the CPU exclusively for receive operations.