I'm developing an NDIS Filter Driver that handles high-throughput UDP traffic from an FPGA connected via a 10 GbE NIC. The workload is as follows:
-
The FPGA sends UDP packets (1448 bytes payload) at line rate.
-
My filter driver intercepts these packets in
FilterReceiveNetBufferLists. -
For each matching packet, the driver must generate and send a small ACK response immediately.
However, profiling with Windows Performance Analyzer (WPA) shows that calling NdisFSendNetBufferLists() directly from the receive path introduces significant overhead and becomes a bottleneck under load
my code:
if (!NdisIsNblCountedQueueEmpty(&ackQ)) {
UINT32 ackQCnt = (UINT32)ackQ.NblCount;
RC_ETW_NBL_CHAIN_ACK_START(PortNumber, ackQCnt, cpuIndex);
NdisFSendNetBufferLists(pFilter->FilterHandle,
NdisGetNblChainFromNblCountedQueue(&ackQ),
PortNumber,
sendFlags // SendFlags
);
RC_ETW_NBL_CHAIN_ACK_STOP(PortNumber, ackQCnt, cpuIndex);
}
Question:
Is there a better way to send ACKs efficiently in this scenario?