TDI ClientEventReceive with TDI_RECEIVE_COPY_LOOKAHEAD

Anyone with the experience to say whether they would consider this
expected or not:

A TDI client driver I’m looking at is getting called on
ClientEventReceive with TDI_RECEIVE_COPY_LOOKAHEAD, TDI_RECEIVE_NORMAL
and TDI_RECEIVE_AT_DISPATCH_LEVEL (0xA20), and both BytesIndicated and
BytesAvailable are exactly 1460. (In a case where 2400 bytes of data
has been received.)

This was falling through logic in the driver that expected
BytesIndicated < BytesAvailable before the posting of a TDI_RECEIVE
IRP was required. And when I look at the DDK documentation, its
implied that BytesIndicated would be less than BytesAvailable in the
TDI_RECEIVE_COPY_LOOKAHEAD case unless TDI_RECEIVE_ENTIRE_MESSAGE was
present (which its not).

Is this a case of something I’m still not clearly deciphering, a case
where the DDK documentation just never was completely correct, or is
there in fact something “unusual” about this behavior.

For what it’s worth, this is Windows XP Professional SP2 with the
in-box 3Com 3C920 driver, and the receive is of TCP data sent from a
remote user-mode Winsock client. If I create a TDI_RECEIVE IRP I can
successfully obtain all 2400 bytes that were sent. So the question is
just with regard to the BytesIndicated and BytesAvailable indications
being received, given the flags.

Thanks.

Alan Adams

> implied that BytesIndicated would be less than BytesAvailable in the

TDI_RECEIVE_COPY_LOOKAHEAD case unless
TDI_RECEIVE_ENTIRE_MESSAGE was
present (which its not).

Correct, there can be BytesIndicated == BytesAvailable and no
TDI_RECEIVE_ENTIRE_MESSAGE. In this case, you nevertheless have the whole data
portion at hand with Ptr/Length pair, and need no TDI_RECEIVE IRP.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

“Maxim S. Shatskih” wrote:

> > implied that BytesIndicated would be less than BytesAvailable in the
> > TDI_RECEIVE_COPY_LOOKAHEAD case unless TDI_RECEIVE_ENTIRE_MESSAGE was
> > present (which its not).
>
> Correct, there can be BytesIndicated == BytesAvailable and no
> TDI_RECEIVE_ENTIRE_MESSAGE. In this case, you nevertheless have the whole data
> portion at hand with Ptr/Length pair, and need no TDI_RECEIVE IRP.

But even though theoretically you don’t need a TDI_RECEIVE in that
case, you have no basis on which to not send one anyway, correct?
(Unless your own message data tells you how much total data you’re
expecting to get.)

Since BytesIndicated == BytesAvailable and TDI_RECEIVE_ENTIRE_MESSAGE
being absent will /also/ occur in occasions where there is more data
you’ll need a TDI_RECEIVE to retrieve.

At least that’s what I’m saying I’m seeing here. A 2400-byte message
received via TCP will call the ClientEventReceive with BytesIndicated
and BytesAvailable both set to 1460, and TDI_RECEIVE_ENTIRE_MESSAGE
isn’t set.

I’ll get the full 2400 by asking for it via TDI_RECEIVE, but I didn’t
have any indication through TDI that 2400 bytes had been received. I
knew 2400 only because I control the client, too.

Just wanting to confirm this is supposed to be “unknown quantity”
receive processing, and the TDI_RECEIVE needs to be setup to try a
guesstimate buffer and keep expanding that allocation until TDI
returns an error when I try to TDI_RECEIVE more bytes (which would be
my first and only indication there is no more / how large the received
data actually was).

As opposed to this situation not being expected, and I am supposed to
know there are 2400 bytes to be retrieved through TDI before I send my
TDI_RECEIVE IRP.

Note that if I send a 2400-byte message via UDP, the
ClientEventReceive actually has BytesIndicated 1460 and BytesAvailable
2400, which is I guess what I “expected” in the TCP case as well.

Thanks for the consideration.

Alan Adams

Alan,

UDP *is* a message oriented protocol and so the transport knows the boundary
of the message and so the semantics are a bit different.

TCP does not have any concept of message size. It is a STREAM protocol.
The size of a ‘message’ is one octet. It (the protocol) has no way of
communicating a message boundary and thus the fact that your sender has
placed 2400 octets into the receive window and it has (or will have) arrived
at the receiver by the time the TDI_RECEIVE IRP is processed is really just
coincidence, no matter how repeatable it may seem. Out of curiosity, do
you *ever* see TDI_RECEIVE_ENTIRE_MESSAGE set?

Expecting TCP to do message blocking is incorrect, as it would be for any
stream type protocol. A sequenced packet protocol is what might provide
ordered, reliable delivery of ‘messages’.

AFAIK; if you process all of the data in a receive indication (i.e., you
‘accept’ all of the octets indicated when BytesIndicated==BytesAvailable)
you are going to continue to get receive callbacks for the ‘next’ segment
(receive) that occurs. If you only accept a portion of the data, or, the
entire amount of data available is not ‘indicated’, the transport is *not*
going to call the receive indication again until *after* you read the
remainder with a TDI_RECEIVE IRP.

The only reason to *not* return a TDI_RECEIVE IRP is that you do not have
any assurance that it will be completed immediately which is really the
whole point of the event receive mechansim (to keep from having to have a
bunch or rx IRPs queue in the transport). That said, I can only hazard a
guess as to what the transport might do; complete the IRP with an error,
queueu it, or complete it with whater data might be available (including the
possiblity that it is zero octets). Sure in your case the NIC has probably
indicated multiple packets into the transport so by the time the transport
comes back around out of the NIC receive indication it has more data to
satisfy the (now queued) IRP.

It is probably better to just stay inside the box of behavior that says to
return an IRP when BytesIndicated < BytesAvailable (and, or course, you want
the data). This is the case of the ‘data’ being in more than one MDL
(NDIS_BUFFER) and is the ultimate reason that a non-chained receive
indication needs to return an IRP to ‘finish’ the read.

Good Luck,
Dave Cattley
Consulting Engineer
Systems Software Development

David,
Totally agree with your suggestion. I have met the same problem as Alan
in the last few days. It’s actually a design problem for me since I try
to deal with the application data strictly following the message
boundary just as you pointed out(Many user level application deal with
the tcp/ip stack in this way). However, it could be a little bit late
for me to rewrite this part of logic. So I find out a workaround to
bypass the problem. It’s possible for the TDI_RECEIVE IRP been completed
when it doesn’t get the whole data as it requested. I have to resend the
TDI_RECEIVE IRP in the completion routine. I’m afraid there is a flow
even low probability to overflow the DPC stack. Do you have any idea of it?
Anyway, the right way should be only submit TDI_RECEIVE IRP only when
byts_indicated < bytes_available, and never submit a TDI_RECEIVE IRP
whose buffer size is larger than bytes_available - *bytes_taken.
BTW: We could met TDI_RECEIVE_ENTIRE_MESSAGE in the real world.


Best Regards,
hanzhu

David R. Cattley дµÀ:

Alan,

UDP *is* a message oriented protocol and so the transport knows the boundary
of the message and so the semantics are a bit different.

TCP does not have any concept of message size. It is a STREAM protocol.
The size of a ‘message’ is one octet. It (the protocol) has no way of
communicating a message boundary and thus the fact that your sender has
placed 2400 octets into the receive window and it has (or will have) arrived
at the receiver by the time the TDI_RECEIVE IRP is processed is really just
coincidence, no matter how repeatable it may seem. Out of curiosity, do
you *ever* see TDI_RECEIVE_ENTIRE_MESSAGE set?

Expecting TCP to do message blocking is incorrect, as it would be for any
stream type protocol. A sequenced packet protocol is what might provide
ordered, reliable delivery of ‘messages’.

AFAIK; if you process all of the data in a receive indication (i.e., you
‘accept’ all of the octets indicated when BytesIndicated==BytesAvailable)
you are going to continue to get receive callbacks for the ‘next’ segment
(receive) that occurs. If you only accept a portion of the data, or, the
entire amount of data available is not ‘indicated’, the transport is *not*
going to call the receive indication again until *after* you read the
remainder with a TDI_RECEIVE IRP.

The only reason to *not* return a TDI_RECEIVE IRP is that you do not have
any assurance that it will be completed immediately which is really the
whole point of the event receive mechansim (to keep from having to have a
bunch or rx IRPs queue in the transport). That said, I can only hazard a
guess as to what the transport might do; complete the IRP with an error,
queueu it, or complete it with whater data might be available (including the
possiblity that it is zero octets). Sure in your case the NIC has probably
indicated multiple packets into the transport so by the time the transport
comes back around out of the NIC receive indication it has more data to
satisfy the (now queued) IRP.

It is probably better to just stay inside the box of behavior that says to
return an IRP when BytesIndicated < BytesAvailable (and, or course, you want
the data). This is the case of the ‘data’ being in more than one MDL
(NDIS_BUFFER) and is the ultimate reason that a non-chained receive
indication needs to return an IRP to ‘finish’ the read.

Good Luck,
Dave Cattley
Consulting Engineer
Systems Software Development


Questions? First check the Kernel Driver FAQ at http://www.osronline.com/article.cfm?id=256

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

> > Correct, there can be BytesIndicated == BytesAvailable and no

> TDI_RECEIVE_ENTIRE_MESSAGE. In this case, you nevertheless have the
whole data
> portion at hand with Ptr/Length pair, and need no TDI_RECEIVE IRP.

But even though theoretically you don’t need a TDI_RECEIVE in that
case, you have no basis on which to not send one anyway, correct?

BytesIndicated == BytesAvailable is this basis.

and BytesAvailable both set to 1460, and TDI_RECEIVE_ENTIRE_MESSAGE
isn’t set.

I have doubts TDI_RECEIVE_ENTIRE_MESSAGE is ever used by TCP. TCP is not a
message-oriented protocol, its byte stream has no message boundaries. The
information of sender-side message sizes is 100% lost on the receiver side.

BTW - how do you know that the message is 2400 bytes? from what
ClientEventReceive parameters?

As opposed to this situation not being expected, and I am supposed to
know there are 2400 bytes to be retrieved through TDI before I send my
TDI_RECEIVE IRP.

You have no ways. TCP has no message boundaries, and so the receiver has no
information at all about the portion size sent by sender. It just stupidly
waits for the next portion, and then - if it will arrive very soon - will
catenate them together and indicate as one data chunk.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

> It is probably better to just stay inside the box of behavior that says to

return an IRP when BytesIndicated < BytesAvailable (and, or course, you want
the data). This is the case of the ‘data’ being in more than one MDL

The rules which were successful for me:

  • if BytesIndicated == BytesAvailable - then all data is here, available by
    pointer, consume it.
  • if BytesIndicated < BytesAvailable - consume BytesIndicated bytes (available
    by pointer), and return a TDI_RECEIVE IRP for the size of ( BytesAvailable -
    BytesIndicated ). The completion routine of this IRP is guaranteed to be called
    before any next ClientEventReceive, and will also do consumption.
  • if, being within ClientEventReceive, I’m out of memory for consumption (out
    of my own limit set by SO_RCVBUF) - then I return STATUS_DATA_NOT_ACCEPTED.
    This blocks the TCP’s receive indication path completely for a while.
  • then, when I’m no more out of memory, I send the TDI_RECEIVE IRP. This
    unblocks the previously blocked TCP’s receiver.

From what I remember, the BytesIndicated < BytesAvailable thing occurs when the
data arrives to TCP as a non-consecutive chain of several NDIS_PACKETSs or
NDIS_PACKET with several NDIS_BUFFERs. In this case, TCP indicates the first
buffer size
as BytesIndicated (this is the amount of data directly available
by pointer), and the total packet chain size as BytesAvailable.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

> in the last few days. It’s actually a design problem

for me since I try
to deal with the application data strictly following
the message
boundary just as you pointed out(Many user level

Then you need to add the “message length” words or message delimiters to your
data flow.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

“David R. Cattley” wrote:

David and Maxim,

Thanks for being willing to correct my expectations. What’s being
said regarding TCP not being “message based” of course makes sense, in
addition to how I was incorrectly following the “is really just
coincidence” point David raised.

> Out of curiosity, do you ever see TDI_RECEIVE_ENTIRE_MESSAGE set?

Yes. If I arbitrarily send myself a 400 byte message via TCP instead
of 2400 bytes, I’m seeing BytesIndicated 400, BytesAvailable 400, and
TDI_RECEIVE_NORMAL, TDI_RECEIVE_COPY_LOOKAHEAD,
TDI_RECEIVE_ENTIRE_MESSAGE, and TDI_RECEIVE_AT_DISPATCH_LEVEL flagged.

As described earlier, when I send myself the 2400 byte TCP message, I
have BytesIndicated 1460, BytesAvailable 1460, and TDI_RECEIVE_NORMAL,
TDI_RECEIVE_COPY_LOOKAHEAD, and TDI_RECEIVE_AT_DISPATCH_LEVEL flagged.

At this point I believe that everything in my environment is working
exactly in line with the descriptions you’ve both asserted back to me.
I can see how the flow Maxim laid out would react properly within this
environment.

Alan Adams

Alan,

Thanks for the feedback on this question.

> Out of curiosity, do you *ever* see TDI_RECEIVE_ENTIRE_MESSAGE set?

Yes. If I arbitrarily send myself a 400 byte message via TCP instead of
2400 bytes,
I’m seeing BytesIndicated 400, BytesAvailable 400, and TDI_RECEIVE_NORMAL,
TDI_RECEIVE_COPY_LOOKAHEAD, TDI_RECEIVE_ENTIRE_MESSAGE, and
TDI_RECEIVE_AT_DISPATCH_LEVEL flagged.

I am wondering if TCP might have set TDI_RECEIVE_ENTIRE_MESSAGE in response
to the PSH flag being set. This could explain why your 2400 octet ‘send’
resulted in the first segment *without* the TDI_RECEIVE_ENTIRE_MESSAGE since
it probably would have arrived (the first segment) without the PSH set yet
the second segment might have had the PSH flag set by the sender. In any
event, the PSH flag is not a ‘blocking’ mechanism in TCP but just a hint and
so you really are stuck with treating it only as a stream of octets.

Good Luck,
Dave Cattley
Consulting Engineer
Systems Software Development

> event, the PSH flag is not a ‘blocking’ mechanism in TCP but just a hint and

Correct, PSH is not a message boundary.

Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

Alan,
After struggled with it several days, I suppose the solution Maxim and
David outlined is the only proper way to handle it. It also strictly
following the guideline of docs. There are several pitfalls I met, even
some of them could only be hit rarely.

  1. The Tdi Client has to consume all the data indicated in the
    ClientEventReceive callback if BytesIndicated == BytesAvaiable.
  2. The Client has to return a TDI RECEIVE irp to copy all of the
    available data you have not consumed in one time.
  3. Sometimes several messages(from the upper level protocol’s
    perspective) could be indicated once. Prepare to process them in one
    callback. This may be related to the implementation of the miniport driver.
    If I violated one of the above rules, some OS version will not invoke my
    ClientEventReceive any more. I’m trying to change the logic as the folks
    depicted.

Best Regards,
hanzhu

Alan Adams дµÀ:

“David R. Cattley” wrote:
>
> David and Maxim,
>
> Thanks for being willing to correct my expectations. What’s being
> said regarding TCP not being “message based” of course makes sense, in
> addition to how I was incorrectly following the “is really just
> coincidence” point David raised.
>
>> Out of curiosity, do you ever see TDI_RECEIVE_ENTIRE_MESSAGE set?
>
> Yes. If I arbitrarily send myself a 400 byte message via TCP instead
> of 2400 bytes, I’m seeing BytesIndicated 400, BytesAvailable 400, and
> TDI_RECEIVE_NORMAL, TDI_RECEIVE_COPY_LOOKAHEAD,
> TDI_RECEIVE_ENTIRE_MESSAGE, and TDI_RECEIVE_AT_DISPATCH_LEVEL flagged.
>
> As described earlier, when I send myself the 2400 byte TCP message, I
> have BytesIndicated 1460, BytesAvailable 1460, and TDI_RECEIVE_NORMAL,
> TDI_RECEIVE_COPY_LOOKAHEAD, and TDI_RECEIVE_AT_DISPATCH_LEVEL flagged.
>
> At this point I believe that everything in my environment is working
> exactly in line with the descriptions you’ve both asserted back to me.
> I can see how the flow Maxim laid out would react properly within this
> environment.
>
> Alan Adams
>
> —
> Questions? First check the Kernel Driver FAQ at http://www.osronline.com/article.cfm?id=256
>
> To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
>