Partial asynchronous WskSends

Eran_Borovik-2 · November 29, 2012, 3:30am

Hi experts,
I would like to further understand the behavior of the windows kernel socket with regards to asynchronous sends over TCP.
From my experience with BSD sockets, non-blocking sends can issue less bytes than what was requested, and thus I need to have special code that monitors how much was sent and repeatedly send what was left. This also means that to have meaningful results on the receiving end, one must serialize all send calls.
Now the NT kernel sockets give us asynchronous ability. However, I won’t be able to issue multiple asynchronous sends (from multiple threads or from one thread) if partial sends are possible since then the receiving end might get mixed messages.
My question is: Is it possible to have partial sends with NT kernel sockets, and if it is not possible, is it allowed to issue multiple sends concurrent sends and expect that the send messages won’t mix in the TCP stream. For example, if one message is AAAA, and the other is BBBB, I don’t want AABBAABB but AAAABBBB and BBBBAAAA are totally acceptable.

Many thanks,
Eran.

OSR_Community_User · November 29, 2012, 3:42am

The kernel sockets interface uses basically an MDL list to specify each fragment to send. It’s also an IRP based interface and I would not expect to get successful IRP completion unless all the fragments were sent. You could get an error, like a disconnect, and an error status in the IRP completion.

If you have multiple threads/processor issuing write irps to the socket concurrently, those will each be sent as an uninterrupted stream of fragments, although there will not be any guarantee which thread actually is sent first, unless you explicitly serialize issuing the sends.

Jan

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
Sent: Thursday, November 29, 2012 12:30 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Partial asynchronous WskSends

Hi experts,
I would like to further understand the behavior of the windows kernel socket with regards to asynchronous sends over TCP.
From my experience with BSD sockets, non-blocking sends can issue less bytes than what was requested, and thus I need to have special code that monitors how much was sent and repeatedly send what was left. This also means that to have meaningful results on the receiving end, one must serialize all send calls.
Now the NT kernel sockets give us asynchronous ability. However, I won’t be able to issue multiple asynchronous sends (from multiple threads or from one thread) if partial sends are possible since then the receiving end might get mixed messages.
My question is: Is it possible to have partial sends with NT kernel sockets, and if it is not possible, is it allowed to issue multiple sends concurrent sends and expect that the send messages won’t mix in the TCP stream. For example, if one message is AAAA, and the other is BBBB, I don’t want AABBAABB but AAAABBBB and BBBBAAAA are totally acceptable.

Many thanks,
Eran.

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Eran_Borovik-2 · November 29, 2012, 3:52am

Thanks Jan for your valuable input.
So it seems calling multiple WskSends should be OK.
Can I also assume that if I get an error on one send, I should get an error on all the rest of the concurrent sends. Otherwise, I can still get the mixed messages scenario. Again, assuming AAAA and BBBB, and the first failed, is it possible to have AABBBB?

Thanks,
Eran.

OSR_Community_User · November 30, 2012, 2:16am

I can’t speak as to ‘kernel sockets’ but asynchrous I/O is the only sane
way to manage user-level socket programming. I use the MFC CAsyncSocket
class. This has specified event callbacks, so the way you do a send is to
call the send method. This will return either a negative value (an error
code) or a non-negative value (the number of bytes sent). You use this
value to track the buffer. Suppose your buffer is 1000 bytes. You do a
send of 1000 bytes, it sends 100. So you keep 900 bytes around, until you
get a “send” notification. Then you send the 900-byte buffer. It sends
200 byes, so you keep 700. Repeat until bytes-to-send goes to 0. This is
not “special” code; this is plain-vanilla asynchronous socket programming,
so I have no idea why you would think it is somehow “special”.

TCP protocol guarantees that the packets will arrive in the correct order,
and I don’t see why the receiving end needs to “serialize” send calls;
certainly there is no need yo “serialize” the on the sending side, unless
you are referring to the quite ordinary
don’t-send-until-you-get-a-ready-to-send-notification model, which is not
“special” code but very, very ordinary asynch socket programming.

It would make no sense to do network transmissions from multiple
independet threads; to the best of my knowledge, there is no guarantee
that transmissions would not be interlaced in unpredictable ways even if
you are using synchronous sockets. MFC solves this problem by simply not
supporting multithreaded use of sockets; only one thread at a time can
“own” a socket. So if I am doing asynchronous socket programming, I
either do it from the main GUI thread (because it doesn’t block the
thread) or use a dedicated thread for the purpose.

Microsoft used to have an MSDN article about multithreaded socket
programming; the only thing wrong with it was that it got socket
programming wrong, multithreading wrong, and synchronization wrong. One
of my clients called me in to fix their product about two years into its
development, and (I found out later) their programmer had built the entire
system on this horrible example. I told them that there was nothing wrong
with their product that a total rewrite could not fix, and they pulled the
plug on it after having spent over a half-million dollars on its
development. I rewrote the Microsoft example code, salvaging practically
nothing, and you can find the article on www.flounder.com/mvp_tips.htm,
search for asynchronous.

You should not expect multithreaded socket sends to arrive in the correct
sequence; that may be an accident of one particular version of the OS
socket support. To implement this correctly requires writing code that
limits the use of the socket to a single thread. Fortunately, this is
trivial to accomplish.
joe

Hi experts,
I would like to further understand the behavior of the windows kernel
socket with regards to asynchronous sends over TCP.
From my experience with BSD sockets, non-blocking sends can issue less
bytes than what was requested, and thus I need to have special code that
monitors how much was sent and repeatedly send what was left. This also
means that to have meaningful results on the receiving end, one must
serialize all send calls.
Now the NT kernel sockets give us asynchronous ability. However, I won’t
be able to issue multiple asynchronous sends (from multiple threads or
from one thread) if partial sends are possible since then the receiving
end might get mixed messages.
My question is: Is it possible to have partial sends with NT kernel
sockets, and if it is not possible, is it allowed to issue multiple sends
concurrent sends and expect that the send messages won’t mix in the TCP
stream. For example, if one message is AAAA, and the other is BBBB, I
don’t want AABBAABB but AAAABBBB and BBBBAAAA are totally acceptable.

Many thanks,
Eran.

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · November 30, 2012, 2:22am

As far as I know, there is no guarantee that if one packet fails, that the
next one must fail. There is no particular problem in limiting network
sends to a single thread, which is about the only sane approach I would
consider.
joe

Thanks Jan for your valuable input.
So it seems calling multiple WskSends should be OK.
Can I also assume that if I get an error on one send, I should get an
error on all the rest of the concurrent sends. Otherwise, I can still get
the mixed messages scenario. Again, assuming AAAA and BBBB, and the first
failed, is it possible to have AABBBB?

Thanks,
Eran.

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · November 30, 2012, 3:34am

I can think of excellent reasons to do socket sends from multiple threads. Like say you have a higher layer protocol that builds messages, and the callers of that protocol come in from multiple threads. You DO want to spread any load for things like message crc calculation or message encryption across processors. Queuing the send to the socket is just one last little step. This is essentially multiple streams/requests multiplexed across one TCP connection, a pretty common way of doing things, like in CIFS or iSCSI.

Jan

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@flounder.com
Sent: Thursday, November 29, 2012 11:21 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Partial asynchronous WskSends

As far as I know, there is no guarantee that if one packet fails, that the next one must fail. There is no particular problem in limiting network sends to a single thread, which is about the only sane approach I would consider.
joe

Thanks Jan for your valuable input.
So it seems calling multiple WskSends should be OK.
Can I also assume that if I get an error on one send, I should get an
error on all the rest of the concurrent sends. Otherwise, I can still
get the mixed messages scenario. Again, assuming AAAA and BBBB, and
the first failed, is it possible to have AABBBB?

Thanks,
Eran.

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

David_R_Cattley · November 30, 2012, 8:15am

> Can I also assume that if I get an error on one send, I should get an

error on all the rest of the concurrent sends. Otherwise, I can still
get the mixed messages scenario. Again, assuming AAAA and BBBB, and
the first failed, is it possible to have AABBBB?

That would depend on the transport protocol to a large extent.

So what transport protocol are you using? TCP?

But let’s back up a second. You have multiple producers (threads I guess)
and they are asynchronous. You have a single ‘endpoint’ over which
communication is occuring. Transmission is interleaved into the flow based
on first-come-first serve.

So either you have some message blocking semantics provided by the transport
or your individual writers must be coordinating some form of message
blocking.

So what is the logical state of your channel if an error occurs? Is it
recoverable?

Is the ‘error’ you are worrying about a ‘failed send’ or a ‘partial send’?

If the channel is subject to partial send then you need your own queue and
you cannot rely on the transport to sort it out for you. This is why UM
Sockets (in non-blocking mode) lead to patterns of single writer because
then can do things like accept part of your send request for stream
protocols.

But you are in KM. So nod politely at all of that wisdom provided for a
problem you do not have and ask yourself what WSK does. It is IRP based.
Does TCP complete a WDK send ‘partially’ when the channel is still
available?

If the channel is dead then it is dead. All pending sends are going to
complete as failed.

If you are using some other transport then some other analysis is required.
I assume you are using TCP because if you were using UDP then none of this
matters. The channel is not reliable nor ordered and so how could you
possibly care about the success or order of sends since they are not assured
to be delivered or delivered in order anyway.

Good Luck,
Dave Cattley

David_R_Cattley · November 30, 2012, 8:23am

I agree with Jan on this one and indeed have written such multiplexors.
The only time one really needs to interpose additional [re] serialization is
when the problem itself dictates it [message order semantics]. Absent
that, the concurrent solution works just fine if built carefully.

Dave Cattley

Eran_Borovik-2 · December 2, 2012, 1:44am

Thank you all for your answers.
Jan is correct in comparing my protocol to CIFS/ISCSI, as indeed I am using totally independent RPC messages that ride on the same socket, as such I don’t care about the order in which the RPC arrive as long as they don’t get mixed in the stream. My protocol obviously can handle errors, and if a socket is down, I can recover after re connection. The only issue I want to avoid is that the receiving end won’t get in the same TCP stream a partial RPC followed by successful one. Receiving a partial RPC and then having the socket closed is totally acceptable.
Joseph, I totally agree that having a code that keeps track of how much buffer you’ve sent and re-send what left is not special (although it is somewhat more complicated when you deal with MDL chains…), however in my opinion, when you have asynchronous socket interface, such a mechanism in an ideal world won’t be needed as the requests are already queued in the kernel, and as such there is no reason on earth to have the kernel complete a partial request unless the user really requested for that.

Again, thank you all for your advises.
Best regards,
Eran.

OSR_Community_User · December 2, 2012, 5:31am

Do not make the common error that if you send N packets, each with K
bytes, via TCP/IP, that the receiver sees N packets, which it can receive
using N receives into buffers K bytes long. TCP/IP is a stream protocol,
meaning that the receiver gets the same bytes sent, in the same order they
were sent. There is no guarantee beond this, and there is most especially
no guarantee of the sender and receiver seeing the same packet boundaries.
This week’s implementation of the network stack of the sender, and
network stack of the receiver, pretty much guarantee this, but when you
add in fragmentation from intermediate routers, the use of a fixed set of
resources in tbe sender and receiver under the conditions of other network
traffic competing fot those resources, all bets are off. Using protocols
like RPC means the RPC mechanisms deal with sending and receiving integral
packets for you; if you are using TCP/IP directly, then it becomes your
responsibility.

If it appears to work “correctly” , that is, you issue N sends of K bytes
each and you receiver issues N receives of K bytes each, that is a
transient accident. Don’t rely on such behavior.

There are reasons that a send will only send a limited amount and leave it
up to the app to re-issue the request. First and foremost is that is how
Berkely Sockets was specified to behave. Why was that? Because kernel
resources are finite. Where is the kernel going to hold all the data? It
has to have internal buffers, which hold a copy of the data. Suupose six
applications all want to send a 1GB file at the same time. You can’t
block, and the 32-bit kernel can’t get 6GB of internal buffering. Game
over. There is no way to guarantee that one app won’t wait indefinitely.
But using smaller packet sizes guarantees that each app will get a turn,
and while there is no real “fairness” guarantee, statistically the delay
waiting for a fast-turnover small resource will not impact the apparent
non-blocking nature of the call.

So no matter what you think it should be like, we have to deal with the
reality of what it is. Which is why there is no “special” code required.
All asynch socket code looks pretty much the same.
joe

Thank you all for your answers.
Jan is correct in comparing my protocol to CIFS/ISCSI, as indeed I am
using totally independent RPC messages that ride on the same socket, as
such I don’t care about the order in which the RPC arrive as long as they
don’t get mixed in the stream. My protocol obviously can handle errors,
and if a socket is down, I can recover after re connection. The only issue
I want to avoid is that the receiving end won’t get in the same TCP stream
a partial RPC followed by successful one. Receiving a partial RPC and then
having the socket closed is totally acceptable.
Joseph, I totally agree that having a code that keeps track of how much
buffer you’ve sent and re-send what left is not special (although it is
somewhat more complicated when you deal with MDL chains…), however in my
opinion, when you have asynchronous socket interface, such a mechanism in
an ideal world won’t be needed as the requests are already queued in the
kernel, and as such there is no reason on earth to have the kernel
complete a partial request unless the user really requested for that.

Again, thank you all for your advises.
Best regards,
Eran.

NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer