TDI problem: bug in TCP?

During testing of my TDI client code, I have found a problem and
determined its cause.
It looks very much like a bug in TCP.
Is it really a bug in TCP? I know one stupid workaround at TDI level,
but is there any smarter ones?

PROBLEM:

  • if the TDI client returns STATUS_DATA_NOT_ACCEPTED from ClientEventReceive
    in response to the last data portion (FIN)
    AND
  • the connection is “half-closed”
    THEN:
  • TDI client receives abortive disconnect and thus loses the end of the
    received data stream.
    From the point of view of the other party, the connection was closed
    gracefully.

DETAILS:

Under “half-closed connection” I mean - the machine R (receiver) have called
TDI_DISCONNECT_RELEASE, thus sending the TCP FIN to the machine S (sender).
This was done long ago (immediately after connecting) because the app’s
logic does not require any R->S data transfers, only S->R.
So, the sender TCB on S is in CLOSE_WAIT state, while the receiver TCB on R
is in FIN_WAIT_2 state during the whole data transfer.

When receiving a FIN fragment, TCP first indicates all data in it to the TDI
client and waits till all of it will be consumed by TDI_RECEIVE IRPs (or by
ClientEventReceive itself).
TCP calls ClientEventDisconnect/TDI_DISCONNECT_RELEASE only after this.
So, ClientEventReceive has absolutely no notion of whether this is a FIN
fragment.

According to the TDI documentation, if the TDI client applies backpressure
(like SO_RCVBUF in sockets), then it can return STATUS_DATA_NOT_ACCEPTED
from ClientEventReceive, saving the BytesAvailable counter. Its must then
send a TDI_RECEIVE IRP to TCP to consume this initially rejected data when
it will have enough space for it.

Note that the TDI client’s decision to return STATUS_DATA_NOT_ACCEPTED
cannot depend on whether this is a FIN fragment - the TDI client does not
know this at this point.
So, if the TDI client returns STATUS_DATA_NOT_ACCEPTED sometimes, then
sooner or later it will return it to the FIN fragment, thus hitting the bug
I’m describing.

In this case, TCP sends 2 ACKs to the machine S - first to acknowledge the
data on arrival, second is a “window-only” ACK (does not acknowledge
anything, only opens a window) send from TDI_RECEIVE logic in TCP (just
after the IRP completion) when the app consumes the initially rejected data
and the backpressure is gone.
If the consuming app is slow, then several seconds can pass from FIN arrival
and the first ACK - to the second ACK.

The sequence of events is (I can provide Network Monitor logs):

S->R FIN with data

TCP on R accepts the packet
TCP on R sends the ACK

R->S ACK which acknowledges the data

TCP on R calls ClientEventReceive
ClientEventReceive returns STATUS_DATA_NOT_ACCEPTED
several seconds pass
the app declares it is ready to consume more data
TDI client sends TDI_RECEIVE IRP
TCP provides the TDI client with more data and completes the IRP
TCP sends the ACK to open a window

R->S ACK - acknowledges no data, window-open only.

Thus the sender will receive 2 ACKs to its FIN packet.

If this would be a both-side-open connection (sender TCB was ESTABLISHED and
moved to FIN_WAIT_1 after the sender have sent the FIN fragment) - then the
first ACK will move the sender to FIN_WAIT_2. Second ACK arrival in
FIN_WAIT_2 will do no harm.

But if this is a “half-closed” connection - sender was in CLOSE_WAIT, moved
to LAST_ACK while sending FIN fragment - then, according to RFC793, the
first ACK will just destroy the sender TCB. The second ACK will be
considered stray by the sender at it sends RST in response to it (RFC793).
This RST response is really displayed by the Network Monitor.

The Network Monitor log was:

S->R FIN with data
R->S ACK to FIN (immediately)
R->S ACK to FIN (window-only ACK several seconds later, when the
test app have
consumed several screenfuls of
data)
S->R RST (immediately)

This RST response causes TCP on receiver to call
ClientEventDisconnect/TDI_DISCONNECT_ABORT, thus aborting the connection
which was, in fact, gracefully closed, and causing a data loss.
ClientEventDisconnect/TDI_DISCONNECT_RELEASE is not called in this case at
all.

This is the bug I mean. The sender behaviour seems to be 100% correct as in
RFC793.
What is strange is the receiver behaviour. Why TCP sends 2 ACKs to the FIN
packet? Sending any window-opening ACKs in response to FIN (especially from
TIME_WAIT state as it occurs) is surely nonsense.

I see no smart workarounds in TDI client code, because ClientEventReceive
does not know whether it processes the FIN packet or not.
Also I do not know of any way of sending TDI_RECEIVE IRP which will not
cause window-opening ACKs to be sent.
Also the TDI client will not receive
ClientEventDisconnect/TDI_DISCONNECT_RELEASE till it will consume all data
previously rejected by STATUS_DATA_NOT_ACCEPTED.

The only workaround I see is disable STATUS_DATA_NOT_ACCEPTED backpressure
at all on half-closed connection after the reverse-direction FIN was sent.

Have anybody saw this?
Does anybody knows how WinSock and AFD.SYS handles this problem?
Or it just has the same bug?

regards,
Max


You are currently subscribed to ntdev as: $subst(‘Recip.EmailAddr’)
To unsubscribe send a blank email to leave-ntdev-$subst(‘Recip.MemberIDChar’)@lists.osr.com