At 12:19 AM 07/14/2000 +0100, John Sullivan wrote:
On Thursday 13 July 2000 you wrote:
In the first case, the client’s stack has wedged (at least on that one
socket). In the second, the server’s stack has wedged.
What means “wedged”?
In the third
case, either stack could be doing something truly bizarre, but more
likely is that random packet loss on the wire is preventing the
reliable transmission of data. (How many clients active at once? What
volume of data for each one? What speed network? Two machines is ok,
but get to about 50% saturation with several clients (10? 20?) and the
performance dies as collisions take over.)
If you’re using multiple sockets with blocking send(), you presumably
have multiple threads? In which case are you being *very* careful not
to access a socket from more than one thread at a time? MSDN warns
against this.
Two machines, one client and one server. The client is running four copies
of a single-threaded test program which opens a single socket and uses
blocking calls on it. Zero chance a socket is being accessed by more than
one thread.
One last possibility, are you absolutely sure that the server is
reading the data to free up buffer space? If the server stops reading
for any reason, the TCP window will be exhausted and the client will
block indefinitely. (Log bytes in and out of send/recv calls at both
ends. Comparing the last couple of totals during a wedge should tell
you whether there’s more data than you expected “stuck in the pipe”.)
The test data is as follows: The client sends a “request”, in this case
around 22K in size, by composing it in a contiguous block in memory and then
using a compliance loop around a send() call to transmit it. The server
works on it for a few hundred milliseconds, then sends a “response” of a
couple KB. The client then starts again with another, identical request. The
same socket is used throughout, opened at client start and held open the
entire time.
I’ve altered the client to detect the send() error and report various
things, including how many bytes of the request have been sent and how many
remain to be sent. When the error occurs, zero bytes of the new request
have been sent. In other words, for the new request there have been zero
successful send() calls… the very first one reports the error. Note that
the client has just finished receiving the response from the previous
iteration. More significantly, the last successful send() will have occurred
just a few hundred milliseconds ago… and we know that send() was
successful because otherwise this client wouldn’t have received the response
it just finished receiving.
The error can take hundreds, and in some cases tens of thousands, of
iterations to show up. And when one of the four copies of the test client
has experienced the error, the other three continue to run just fine. The
fourth client can be restarted and it, too, will resume running without
problems.
Thanks!