Hi all,
My NDIS miniport is somewhat complete, however before I get into WHQL
testing, I’m trying to finish a few optimizations.
Recall that I am writing a virtual NIC and bus that uses a shared memory
ring buffer on top of a Linux kvm host.
At the moment I’m very concerned about my rx performance, which is a
surprise since essentially all I do is pull frames off the ring and call
*Indicate*(). Currently, tx performance is 3x better than rx
performance.
I’m using netperf for performance testing at the moment. netperf merely
sends packets to a netperf server instance for a specified period of
time and records results.
My questions…
-
Currently I set the TCP/UDP checksum flags (as appropriate) on rx
since the host’s NIC driver performs the checksums. I assumed that if I
do not, the protocol stack above me would perform the checksums,
correct? -
Based on a previous email, I chain together the incoming frames as a
list of NET_BUFFER_LISTs and forward that along to
NdisMIndicateReceiveNetBufferLists(). Supposedly this would be faster
than calling *indicate* for each NBL separately. However, one test I
ran changed the processing to forward individual NBLs to indicate and I
got slightly better rx results. Its not clear to me that these results
are statistically significant, however I want verify that chaining is
‘better’ from a Microsoft point of view. Is it? -
I currently set OID_802_3_MAXIMUM_LIST_SIZE to 0 (zero) for
simplicity’s sake. This specific test is not using multicast addressing
however would it be better to add multicast addressing code to the
driver? Since the upper layers need to deal with overflow, its unclear
to me how it would be better to include multicast address handling, but
I am unsure. -
Should I consider chaining NET_BUFFERs instead of chaining
NET_BUFFER_LISTS? -
I have the possibility of a currently executing DPC re-queuing itself
and I strongly suspect this happens. IIUC, you cannot *queue* the same
DPC more than once, however once the DPC is in execution, you can queue
another instance. Is this correct? (Yes, I am aware of smp issues wrt
same) -
During a PAUSE, I may (I have to think about this some) I may have
the situation of an ‘inflight’ tx multi-NET_BUFFER NET_BUFFER_LIST being
uncompleted. Is this an error? (I suspect so) If so, how would this
manifest itself?
Thanks,
-PWM