async reads and IO manager ordering guarantee

MBond2 · July 10, 2020, 2:38am

Thanks Peter for the vote of confidence.

I have been focused on this kind of stuff since around 2005, but I am always nervous that there is some larger issue that I have missed. It seems hard to transition even with all this time to the status of an expert. Even when there are obvious things that I might rail against or for

MBond2 · July 10, 2020, 2:43am

I think the suggestion of a ‘strand’ demonstrates a complete lack of understanding.

pindumb · July 10, 2020, 5:21am

A ‘strand’ does not solve the correct ordering on itself, but it avoids the big lock around WSARecv when assigning sequence numbers. With a ‘strand’ the lock is only on the queue.

See also: http://www.lenholgate.com/blog/2014/07/efficient-multi-threading.html

MBond2 · July 11, 2020, 12:38am

Can you elaborate further? I don’t think we are using a consistent terminology but am interested in anything that may help

David_R_Cattley · July 11, 2020, 2:16pm

I have been following this closely since I happen to be working on an boost ASIO based socket reader and have been looking to extend the worker part of the processing to multi-threaded to spread the (independent) loads across cores.

That said, I do not have a TCP stream to deal with but a UDP stream that already has sequence numbers in the payload and so delivery “de-ordering” by multiple readers/workers on the underlying IOCP is not actually a problem I was worried about but one that was just another reason the UDP packets might arrive “out of sequence” by the time my processing was “done” and needed to merge the results.

I may be wrong (as I often am) but my understanding in the TCP case is that the I/O Manager (or more importantly, the TCP stack) does guarantee the completion order of read requests. The issue at hand as I read it is that a set of UM threads picking up these ordered completions themselves have no “run ordering” guarantee.

I also understand that TCP and the I/O Manager will complete read requests in the order of submission. That would seem then that tagging the request with a sequence number would be sufficient to give the re-ordering hint on the completion side. If multiple threads are refilling the receive request pool then a lock is needed over the entire operation sequence of:

allocating the next sequence number (ticket)
assigning the sequence number to the request
posting the request

This in my understanding aligns with the terse comment that “only a lock around the queue is required” and yes, the boost ASIO strand abstraction applied to the posting (enqueing) side seems like it would at least get sequence numbers assigned with minimum contention.

The dispatch side of the queue however if serviced by multiple workers must already have a work-plan that understands that the work will need to be merged to regain completion order. Otherwise why dispatch across multiple workers?

So after re-reading this thread multiple times I am at either that

(1) there really is no problem here, just a design challenge

or -
(2) MBond2 is actually claiming that TCP is broken in Windows
or -
(3) MBond2 is wishing that TCP had put some more information in the completion like the TCP Sequence Number value of the first octet in the receive buffer so that tagging the OVERLAPPED would be unnecessary as a means of tracking completion order.

So helpful and interesting illumination of a challenging design problem, bug, or feature request?

-dave

MBond2 · July 16, 2020, 11:14pm

First TCP is not broken on Windows. It works very reliably for many applications including my own. your number 3 is correct. That is the whole point. I wish I could be ignorant of the order in which I submit the reads, and rely on somthing about the data that comes back to tell me in which order they were actually fulfilled. it should not be in the data buffer absolutly as that would break the TCP stream integrity, but possibly in the unused offset and offset high members of the overlapped struct

The advantage of this is not that I can do anything any more correctly that I do now, but that I don’t need to hold a large lock when submitting a new read, around a smaller lock that also needs to be acquired. And all I really care about is the order or acquisition of that smaller lock