> I notice that unless I set the SO_SNDBUF to 0, it is still
copying the data to an internal buffer (I notice this because
unless I set the SNDBUF to 0, I get the notification that my
final write has finished, close the socket, and the final write
hasn’t actually made it onto the wire and disappears.)
Yes. This is how zero-copy is activated in Windows.
Note: TCP may need to retransmit the data, and, with zero-copy, the only source
of the data is the app’s IRP+MDL. So, TCP/AFD has no other choices then pend
the send request till all ACKs will arrive. With zero-copy, it is so.
Note: TDI_SEND is always pended till all ACKs will arrive, this is due to TCP
doing no buffering (buffering is the task of AFD). So, with SO_SNDBUF != 0, AFD
does the buffering. With SO_SNDBUF == 0, AFD does no buffering, and just
stupidly converts socket writes to TDI_SENDs.
If I set SO_SNDBUF to 0, does the Nagel algorithm still
function?
No for sure. If you will have lots of tiny sends with zero-copy - then each
send will be blocked till all ACKs will arrive, and the next send will not be
able to proceed. This gives Nagle send coalescing no chances, since TCP will
have only 1 pending send a time.
Not bad anyway. Nagle is for tiny sends, while zero-copy is for huge ones. You
can switch SO_SNDBUF on and off (yes, this is fast, faster then data sends, the
emotional prejudice about option changes being more expensive then the traffic
is wrong) on the fly.
Like - send the tiny header with Nagle, then set zero-copy on, send the huge
data, then reset zero-copy off, and so on. Very beneficial, BTW, tried this in
practice.
For example, does the TCP stack use a gather
write from multiple buffers to assemble the packet?
With zero-copy, TCP has only 1 outstanding send in a time (usually, unless
you will send lots of overlapped sends to the socket).
Or does setting SO_SNDBUF to 0 effectively disable
Nagel as well (effectively set TCP_NODELAY.)
Not directly, only effectively. I believe that, if you use lots of overlapped
sends and zero copy, then Nagle will be active.
Nagle is TCP’s behaviour, while zero-copy is AFD’s (TCP is always zero-copy).
Also, is setting SO_RCVBUF and SO_SNDBUF to 0 microsoft
recommended practice for a high performance server using
completion ports?
This means - zero-copy.
SO_RCVBUF = 0 will force TCP to close the receiver window immediately on next
incoming packet if there is no pending reads on the socket (very degenerate
case). But, if there are pending reads, then SO_RCVBUF = 0 will do zero-copy
(well, single-copy - from NDIS_PACKET to the app’s MDL, I believe this is done
in AFD’s TdiClientEventReceive callback called by TCP). Otherwise, there will
be 2 copies - from NDIS_PACKET to AFD’s buffer and from AFD’s buffer to the
app.
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com