NdisReset criteria

Dear Folks,

We have a NIC miniport driver that we modified to make a duplicate
of each packet and send each copy to a different destination. Due
to the nature of the hardware interface, this causes throughput to
be dramatically reduced. Also, it adds a little overhead to packet
processing in the driver.

Initially, we had a problem with NDIS resetting our driver, so we
call NdisMSetAttributesEx with

NDIS_ATTRIBUTE_IGNORE_PACKET_TIMEOUT

and

NDIS_ATTRIBUTE_IGNORE_REQUEST_TIMEOUT

set. This prevents NDIS from calling our NdisReset handler.
However, we see throughput drop periodically at 4 second intervals.
This corresponds to the default period that NDIS uses when calling
the reset handler in the case that a driver is slow to respond to
send and OID requests.

Apparently, NDIS is still unhappy with our responsiveness, but
instead of calling our reset handler, it does something else to
cause a hiccough in our throughput.

We tried setting our data rate to a lower value, but this had no effect.

Does anyone know what criteria NDIS uses to determine if a driver is
responding sluggishly and needs to be reset?

Does anyone know if there are any buggy intermediate drivers in the
NDIS stack that might forget to set the ignore timeout flags?

Thank you,
Cullen

Is NDIS calling your MiniportReset or MiniportCheckForHang handler (do you
have one?)?

Did you set the CheckForHangTimeInSeconds in NdisMSetAttributesEx() to some
larget value?

From NdisMSetAttributesEx…


CheckForHangTimeInSeconds
Specifies the interval in seconds at which NDIS should call the
MiniportCheckForHang function, if any. Specifying zero for this parameter
indicates that NDIS should call MiniportCheckForHang at NDIS’s default
two-second interval and that NDIS should call the MiniportReset function at
the default four-second time-out interval for pending sends and requests
that NDIS holds queued to the caller subsequently.
Specifying a value greater than two extends both the check-for-hang and
time-out intervals. NDIS uses double the specified check-for-hang interval
as its time-out interval for the caller.

If the caller sets NDIS_ATTRIBUTE_DESERIALIZE in AttributeFlags, NDIS does
not queue pending sends for the miniport driver. Instead, such a
deserialized driver must manage its own queuing of subsequent send requests
internally whenever it has insufficient resources to transmit an incoming
send immediately.

Good Luck,
Dave Cattley
Consulting Engineer
Systems Software Development

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Cullen
Sent: Tuesday, October 23, 2007 5:49 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] NdisReset criteria

Dear Folks,

We have a NIC miniport driver that we modified to make a duplicate of each
packet and send each copy to a different destination. Due to the nature of
the hardware interface, this causes throughput to be dramatically reduced.
Also, it adds a little overhead to packet processing in the driver.

Initially, we had a problem with NDIS resetting our driver, so we call
NdisMSetAttributesEx with

NDIS_ATTRIBUTE_IGNORE_PACKET_TIMEOUT

and

NDIS_ATTRIBUTE_IGNORE_REQUEST_TIMEOUT

set. This prevents NDIS from calling our NdisReset handler.
However, we see throughput drop periodically at 4 second intervals.
This corresponds to the default period that NDIS uses when calling the
reset handler in the case that a driver is slow to respond to send and OID
requests.

Apparently, NDIS is still unhappy with our responsiveness, but instead of
calling our reset handler, it does something else to cause a hiccough in our
throughput.

We tried setting our data rate to a lower value, but this had no effect.

Does anyone know what criteria NDIS uses to determine if a driver is
responding sluggishly and needs to be reset?

Does anyone know if there are any buggy intermediate drivers in the NDIS
stack that might forget to set the ignore timeout flags?

Thank you,
Cullen


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Dear Mr. Cattley,

David R. Cattley wrote:

Is NDIS calling your MiniportReset or MiniportCheckForHang handler (do you
have one?)?

We have both. NDIS does call our MiniportCheckForHang handler and
we return FALSE. However, if that is the only change we make, NDIS
still calls our reset handler every four seconds.

Did you set the CheckForHangTimeInSeconds in NdisMSetAttributesEx() to some
larget value?

Previously, we set this value to a large number. NDIS waits exactly
that number of seconds and then calls our reset handler. Ideally,
we would like NDIS not to call our reset handler.

From NdisMSetAttributesEx…

No need to duplicate MSDN here. I have the docs installed. It
would be helpful if someone could answer my actual question:

Does anyone know what criteria NDIS uses to determine if a driver is
responding sluggishly and needs to be reset?

Thank you,
Cullen

> -----Original Message-----

From: xxxxx@lists.osr.com [mailto:bounce-304034-
xxxxx@lists.osr.com] On Behalf Of Cullen
Sent: Wednesday, October 24, 2007 9:19 AM
To: Windows System Software Devs Interest List
Cc: Jason Yates
Subject: Re:[ntdev] NdisReset criteria

Dear Mr. Cattley,

David R. Cattley wrote:
> Is NDIS calling your MiniportReset or MiniportCheckForHang handler
(do you
> have one?)?

We have both. NDIS does call our MiniportCheckForHang handler and
we return FALSE. However, if that is the only change we make, NDIS
still calls our reset handler every four seconds.

> Did you set the CheckForHangTimeInSeconds in NdisMSetAttributesEx()
to some
> larget value?

Previously, we set this value to a large number. NDIS waits exactly
that number of seconds and then calls our reset handler. Ideally,
we would like NDIS not to call our reset handler.

> From NdisMSetAttributesEx…

No need to duplicate MSDN here. I have the docs installed. It
would be helpful if someone could answer my actual question:

> Does anyone know what criteria NDIS uses to determine if a driver is
> responding sluggishly and needs to be reset?
[PCAUSA] That is fairly clear in the DDK documentation for MiniportReset.

If your driver is deserialized, then NDIS should never call your
MiniportReset for a stalled send.

That said, do you complete the send immediately after you have made your
copy or do you wait to complete the send until all of your slow sends have
completed? I would think that you should complete the original send before
calling your slow interface.

In some cases loopback could be a problem. For some types of packets
(broadcast, multicast in particular…) higher-level protocols depend on
seeing a loopback of send packets for their operation. It is conceivable
that a higher-level protocol is initiating a reset because it did not see a
loopback packet that it expected to see in a reasonable time.

MiniportReset can also be called if MiniportQueryInformation or
MiniportSetInformation stalls. It is unlikely, but is there some problem in
these paths?

Believe me, David knows his stuff and we all are giving the best advice we
can. I don’t think that either David or myself have seen or heard of this
particular behavior in our years working in this area. So, there is clearly
something unusual specifically related to your hardware or driver design.

Good luck,

Thomas F. Divine

Thank you,
Cullen


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Dear Mr. Divine,

Thomas F. Divine wrote:

> -----Original Message-----
> From: xxxxx@lists.osr.com [mailto:bounce-304034-
> xxxxx@lists.osr.com] On Behalf Of Cullen
> Sent: Wednesday, October 24, 2007 9:19 AM
> To: Windows System Software Devs Interest List
> Cc: Jason Yates
> Subject: Re:[ntdev] NdisReset criteria
>
> > Does anyone know what criteria NDIS uses to determine if a driver is
> > responding sluggishly and needs to be reset?
[PCAUSA] That is fairly clear in the DDK documentation for MiniportReset.

If your driver is deserialized, then NDIS should never call your
MiniportReset for a stalled send.

The driver is built as an NDIS 4.0 driver so it is not deserialized.
It is a very old driver. It has been a long time since we tried
it, but I believe we had problems building the driver for higher
versions of NDIS.

That said, do you complete the send immediately after you have made your
copy or do you wait to complete the send until all of your slow sends have
completed? I would think that you should complete the original send before
calling your slow interface.

We send the original packet first. If we are able to send the
original packet immediately, we return NDIS_STATUS_SUCCESS in our
send handler. Otherwise, we return NDIS_STATUS_PENDING, and send
the original packet later. We do not indicate status on the
duplicate packet.

Our interface is slow for all packets. It uses port I/O.

In some cases loopback could be a problem. For some types of packets
(broadcast, multicast in particular…) higher-level protocols depend on
seeing a loopback of send packets for their operation. It is conceivable
that a higher-level protocol is initiating a reset because it did not see a
loopback packet that it expected to see in a reasonable time.

MiniportReset can also be called if MiniportQueryInformation or
MiniportSetInformation stalls. It is unlikely, but is there some problem in
these paths?

There are no problems in those paths. Our changes to the code have
not included changes to those paths. The only thing I can think of
that would be a problem is a situation where the CPU cannot keep up
with the load. However, monitoring CPU using Task Manager shows
that CPU usage never gets above 60%.

Believe me, David knows his stuff and we all are giving the best advice we
can. I don’t think that either David or myself have seen or heard of this
particular behavior in our years working in this area. So, there is clearly
something unusual specifically related to your hardware or driver design.

I believe that the people who follow this group know their stuff or
I would not have come here looking for help. However, it would be
nice if a Microsoft representative could answer the key question here:

Does anyone know what criteria NDIS uses to determine if a driver is
responding sluggishly and needs to be reset?

Thank you,
Cullen

I am sorry to say that the solution to your problem would almost certainly
be to implement your driver using current NDIS 5 models.

Thomas F. Divine

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-304048-
xxxxx@lists.osr.com] On Behalf Of Cullen
Sent: Wednesday, October 24, 2007 10:48 AM
To: Windows System Software Devs Interest List
Cc: Jason Yates
Subject: Re:[ntdev] NdisReset criteria

Dear Mr. Divine,

Thomas F. Divine wrote:
>> -----Original Message-----
>> From: xxxxx@lists.osr.com [mailto:bounce-304034-
>> xxxxx@lists.osr.com] On Behalf Of Cullen
>> Sent: Wednesday, October 24, 2007 9:19 AM
>> To: Windows System Software Devs Interest List
>> Cc: Jason Yates
>> Subject: Re:[ntdev] NdisReset criteria
>>
>> > Does anyone know what criteria NDIS uses to determine if a driver
is
>> > responding sluggishly and needs to be reset?
> [PCAUSA] That is fairly clear in the DDK documentation for
MiniportReset.
>
> If your driver is deserialized, then NDIS should never call your
> MiniportReset for a stalled send.

The driver is built as an NDIS 4.0 driver so it is not deserialized.
It is a very old driver. It has been a long time since we tried
it, but I believe we had problems building the driver for higher
versions of NDIS.

> That said, do you complete the send immediately after you have made
your
> copy or do you wait to complete the send until all of your slow sends
have
> completed? I would think that you should complete the original send
before
> calling your slow interface.

We send the original packet first. If we are able to send the
original packet immediately, we return NDIS_STATUS_SUCCESS in our
send handler. Otherwise, we return NDIS_STATUS_PENDING, and send
the original packet later. We do not indicate status on the
duplicate packet.

Our interface is slow for all packets. It uses port I/O.

> In some cases loopback could be a problem. For some types of packets
> (broadcast, multicast in particular…) higher-level protocols depend
on
> seeing a loopback of send packets for their operation. It is
conceivable
> that a higher-level protocol is initiating a reset because it did not
see a
> loopback packet that it expected to see in a reasonable time.
>
> MiniportReset can also be called if MiniportQueryInformation or
> MiniportSetInformation stalls. It is unlikely, but is there some
problem in
> these paths?

There are no problems in those paths. Our changes to the code have
not included changes to those paths. The only thing I can think of
that would be a problem is a situation where the CPU cannot keep up
with the load. However, monitoring CPU using Task Manager shows
that CPU usage never gets above 60%.

> Believe me, David knows his stuff and we all are giving the best
advice we
> can. I don’t think that either David or myself have seen or heard of
this
> particular behavior in our years working in this area. So, there is
clearly
> something unusual specifically related to your hardware or driver
design.

I believe that the people who follow this group know their stuff or
I would not have come here looking for help. However, it would be
nice if a Microsoft representative could answer the key question here:

Does anyone know what criteria NDIS uses to determine if a driver is
responding sluggishly and needs to be reset?

Thank you,
Cullen


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Hi

Are you pending packets too long?

My recent experiance was due to to me pending packets via a queue, found
this was due to one packet being held off, on the send path.

I think this is normally the major reason why reset is called. What happens
if you fail packets instead as an experiment instead of pending them, yes I
know this will
drop packets, but might show something up?

Regards

Steve

----- Original Message -----
From: “Cullen”
Newsgroups: ntdev
To: “Windows System Software Devs Interest List”
Cc: “Jason Yates”
Sent: Wednesday, October 24, 2007 3:48 PM
Subject: Re:[ntdev] NdisReset criteria

> Dear Mr. Divine,
>
> Thomas F. Divine wrote:
>>> -----Original Message-----
>>> From: xxxxx@lists.osr.com [mailto:bounce-304034-
>>> xxxxx@lists.osr.com] On Behalf Of Cullen
>>> Sent: Wednesday, October 24, 2007 9:19 AM
>>> To: Windows System Software Devs Interest List
>>> Cc: Jason Yates
>>> Subject: Re:[ntdev] NdisReset criteria
>>>
>>> > Does anyone know what criteria NDIS uses to determine if a driver is
>>> > responding sluggishly and needs to be reset?
>> [PCAUSA] That is fairly clear in the DDK documentation for MiniportReset.
>>
>> If your driver is deserialized, then NDIS should never call your
>> MiniportReset for a stalled send.
>
> The driver is built as an NDIS 4.0 driver so it is not deserialized.
> It is a very old driver. It has been a long time since we tried
> it, but I believe we had problems building the driver for higher
> versions of NDIS.
>
>> That said, do you complete the send immediately after you have made your
>> copy or do you wait to complete the send until all of your slow sends
>> have
>> completed? I would think that you should complete the original send
>> before
>> calling your slow interface.
>
> We send the original packet first. If we are able to send the
> original packet immediately, we return NDIS_STATUS_SUCCESS in our
> send handler. Otherwise, we return NDIS_STATUS_PENDING, and send
> the original packet later. We do not indicate status on the
> duplicate packet.
>
> Our interface is slow for all packets. It uses port I/O.
>
>> In some cases loopback could be a problem. For some types of packets
>> (broadcast, multicast in particular…) higher-level protocols depend on
>> seeing a loopback of send packets for their operation. It is conceivable
>> that a higher-level protocol is initiating a reset because it did not see
>> a
>> loopback packet that it expected to see in a reasonable time.
>>
>> MiniportReset can also be called if MiniportQueryInformation or
>> MiniportSetInformation stalls. It is unlikely, but is there some problem
>> in
>> these paths?
>
> There are no problems in those paths. Our changes to the code have
> not included changes to those paths. The only thing I can think of
> that would be a problem is a situation where the CPU cannot keep up
> with the load. However, monitoring CPU using Task Manager shows
> that CPU usage never gets above 60%.
>
>> Believe me, David knows his stuff and we all are giving the best advice
>> we
>> can. I don’t think that either David or myself have seen or heard of this
>> particular behavior in our years working in this area. So, there is
>> clearly
>> something unusual specifically related to your hardware or driver design.
>
> I believe that the people who follow this group know their stuff or
> I would not have come here looking for help. However, it would be
> nice if a Microsoft representative could answer the key question here:
>
> Does anyone know what criteria NDIS uses to determine if a driver is
> responding sluggishly and needs to be reset?
>
> Thank you,
> Cullen
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>
>
>

Dear Mr. Divine,

Thomas F. Divine wrote:

I am sorry to say that the solution to your problem would almost certainly
be to implement your driver using current NDIS 5 models.

If NDIS was not resetting the driver before we started duplicating
packets, why should it start resetting the driver just because we
are duplicating packets? The only observable effect of the packet
duplication is that throughput is halved. We still indicate packets
in a timely manner.

If we report our supported data rate during a request for
OID_GEN_LINK_SPEED as half the original value, will that have any
effect on NDIS’ decision making? We already tried setting reporting
a lower value for this query, but it did not have any effect on
whether or not NDIS reset our driver.

I am still trying to get the answer to this question:

“Does anyone know what criteria NDIS uses to determine if a driver
is responding sluggishly and needs to be reset?”

Useful answers to this question would be along the lines of

* NDIS resets the driver if it does not get a successful
packet completion indication within X number of seconds
of calling the send packet handler

* NDIS resets the driver if it does not respond to an OID
request after X number of seconds

Any response that replaced X in one of the above samples with a
number would greatly increase our ability to troubleshoot and solve
this problem.

Thank you,
Cullen

> -----Original Message-----

From: xxxxx@lists.osr.com [mailto:bounce-304064-
xxxxx@lists.osr.com] On Behalf Of Cullen
Sent: Wednesday, October 24, 2007 1:02 PM
To: Windows System Software Devs Interest List
Cc: Jason Yates
Subject: Re:[ntdev] NdisReset criteria

Dear Mr. Divine,

Thomas F. Divine wrote:
> I am sorry to say that the solution to your problem would almost
certainly
> be to implement your driver using current NDIS 5 models.

If NDIS was not resetting the driver before we started duplicating
packets, why should it start resetting the driver just because we
are duplicating packets? The only observable effect of the packet
duplication is that throughput is halved. We still indicate packets
in a timely manner.
[PCAUSA] What does “timely” mean (in seconds…).

If we report our supported data rate during a request for
OID_GEN_LINK_SPEED as half the original value, will that have any
effect on NDIS’ decision making? We already tried setting reporting
a lower value for this query, but it did not have any effect on
whether or not NDIS reset our driver.

[PCAUSA] The system may not even ask for link speed.

Example traces of the OIDs made on an XP system are at:

http://ndis.com/papers/802_11_logs/XP-802_11g.htm

Although this was for a wireless adapter, the OID_GEN information is
typical.

I am still trying to get the answer to this question:

“Does anyone know what criteria NDIS uses to determine if a driver
is responding sluggishly and needs to be reset?”

Useful answers to this question would be along the lines of

* NDIS resets the driver if it does not get a successful
packet completion indication within X number of seconds
of calling the send packet handler

* NDIS resets the driver if it does not respond to an OID
request after X number of seconds

Any response that replaced X in one of the above samples with a
number would greatly increase our ability to troubleshoot and solve
this problem.

The DDK documentation for MiniportReset is as good as it gets.

Using a serialized driver is just asking for pain…

Thomas F. Divine

Thank you,
Cullen

Dear Mr. Divine,

Thomas F. Divine wrote:

> -----Original Message-----
> Thomas F. Divine wrote:
> > I am sorry to say that the solution to your problem would almost
> certainly
> > be to implement your driver using current NDIS 5 models.
>
> If NDIS was not resetting the driver before we started duplicating
> packets, why should it start resetting the driver just because we
> are duplicating packets? The only observable effect of the packet
> duplication is that throughput is halved. We still indicate packets
> in a timely manner.
[PCAUSA] What does “timely” mean (in seconds…).

We indicate them immediately if we are able to send them
immediately. We indicate them when they are sent if they are
queued. I have never seen the queue longer than one packet (just
double-checked that in WinDbg), so they are going out as fast as the
OS is sending them to us.

> If we report our supported data rate during a request for
> OID_GEN_LINK_SPEED as half the original value, will that have any
> effect on NDIS’ decision making? We already tried setting reporting
> a lower value for this query, but it did not have any effect on
> whether or not NDIS reset our driver.
>
[PCAUSA] The system may not even ask for link speed.

If I put a breakpoint in the debugger, and we get that request (just
double-checked that in WinDbg).

> I am still trying to get the answer to this question:
>
> “Does anyone know what criteria NDIS uses to determine if a driver
> is responding sluggishly and needs to be reset?”
>
> Useful answers to this question would be along the lines of
>
> * NDIS resets the driver if it does not get a successful
> packet completion indication within X number of seconds
> of calling the send packet handler
>
> * NDIS resets the driver if it does not respond to an OID
> request after X number of seconds
>
> Any response that replaced X in one of the above samples with a
> number would greatly increase our ability to troubleshoot and solve
> this problem.
>
The DDK documentation for MiniportReset is as good as it gets.

That seems a little strange for an interface that should be well
documented by now.

Using a serialized driver is just asking for pain…

I agree. I have no choice. Not my decision. I have no control
over it. However, if my question was answered, it would not matter
if the driver was serialized or not.

Thanks,
Cullen

Cullen,

I appologize for the reference to the DDKs. It is hard to know when that is
appropriate and by no means did I mean to offend.

The only honest answer I can give you to your specific question is “I don’t
know precisely all of the criteria that NDIS uses to gate if and when it
will call MiniportReset()”. AFAIK, if the duration that the current send
has been pending or the current request has been pending exceeds the nominal
4-second timeout, NDIS will whack your MiniportReset(). If there are other
cases, then, I have not seen them occur.

The ‘X’ you are looking for is documented in the DDK in the comments for
MniportCheckForHang() as “… around four seconds” as well as the only two
conditions I am aware of that are measured against that time period;
Request Completion and Send Completion. I recognize that that is not a
satisfactory answer from me or the DDK documents.

That said, I can offer some ideas for you to investigate and when you figure
it out, please close the loop here so we can learn something too.

As a possible experiment, have you tried running your scenario with the
checked version of NDIS.SYS with a debugger attached and the ‘NDIS
Verifier’, Debug Level, and Debug Areas enabled verbosely? It may well be
that NDIS will spit out ‘reason’ for deciding to call the MiniportReset()
entry.

Continuing on the ‘taking a long time’ path: You say that the hardware is
‘slow’ to send packets (I think you mentioned that it was PIO based). I
re-read the posts but did not grok from them an actual quantitative measure
of ‘slow’, especially relative to the reported link speed. Perhaps I missed
it (sorry). Is it possible that with single sends the driver/hardware is
literally just keeping up with the combined send & request rate (both are
competing for the same Miniport lock in a serialized driver)?

Does NDIS call your MiniportReset() when you contrive the scenario to *not*
be sending packets? For instance, if you unbind all protocols from the
adapter? In that scenario you would only have requests that could be
backing up or some yet to be understood criteria to trigger the reset. By
eliminating the send activity, you could perhaps intuit that the send path
*is* the issue.

You noted that CPU utilization reported by Taskman does not exceed 60%. How
does that compare with the ‘non-duplicating’ driver? That difference would
give you a rough estimate of the CPU cost of sending a packet. That cost
should reasonably be almost ‘noise’ in the measurement.

I don’t believe (but I can’t say that I know specifically) that NDIS itself
cares about the link data rate and that it uses it to scale any timeouts.
Bound protocols, however, are encouraged to be aware of the link datarate
and to *not* pound send data into the link such that a large backlog occurs.
You might try reducing the values reported in OID_GEN_TRANSMIT_BUFFER_SPACE
and OID_GEN_TRANSMIT_BLOCK_SIZE to effectively say that the driver can send
only a single packet at a time. Protocols that are nice enough to ask about
these operational characteristics might back down on the rate of packet
transmission.

Please keep in mind that we are your coleagues here, not oracles. Sometime
the best we can do is share with you our experiences and help you find a
path to the answer. Sure, sometimes (others more frequently than me for
sure!) there is the one word, line, or paragraph answer and that is that.
Honestly, we are trying to help. You might consider that there might not
actually be an answer to your question but that does not mean there is not a
solution to your problem. NDIS was created by many people over many years
and has resiliently adapted to new requirements while maintaining backward
compatiblity over a remarkable number releases. It may well be that no
person can answer your question in the terms you have asked it. The
Miniport guts have been ‘improved’ in fits and starts by a few people and
even if they sat down and read through the source code, it might not reveal
a documentable answer (unfortunately). You of course always have the option
of calling MSFT Product Support, opening a case with them, and having
someone with that access try and answer your question.

Good Luck,
-dave

David R. Cattley
Consulting Engineer
Systems Software Development

P.S. Mr. Cattley is fine, by the way. I spoke with him tonight. He & Mom
are watching the Red Sox. Me, I go by just Dave :wink:

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Cullen
Sent: Wednesday, October 24, 2007 2:01 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] NdisReset criteria

Dear Mr. Divine,

Thomas F. Divine wrote:

> -----Original Message-----
> Thomas F. Divine wrote:
> > I am sorry to say that the solution to your problem would almost
> certainly
> > be to implement your driver using current NDIS 5 models.
>
> If NDIS was not resetting the driver before we started duplicating
> packets, why should it start resetting the driver just because we are
> duplicating packets? The only observable effect of the packet
> duplication is that throughput is halved. We still indicate packets
> in a timely manner.
[PCAUSA] What does “timely” mean (in seconds…).

We indicate them immediately if we are able to send them immediately. We
indicate them when they are sent if they are queued. I have never seen the
queue longer than one packet (just double-checked that in WinDbg), so they
are going out as fast as the OS is sending them to us.

> If we report our supported data rate during a request for
> OID_GEN_LINK_SPEED as half the original value, will that have any
> effect on NDIS’ decision making? We already tried setting reporting
> a lower value for this query, but it did not have any effect on
> whether or not NDIS reset our driver.
>
[PCAUSA] The system may not even ask for link speed.

If I put a breakpoint in the debugger, and we get that request (just
double-checked that in WinDbg).

> I am still trying to get the answer to this question:
>
> “Does anyone know what criteria NDIS uses to determine if a driver is
> responding sluggishly and needs to be reset?”
>
> Useful answers to this question would be along the lines of
>
> * NDIS resets the driver if it does not get a successful
> packet completion indication within X number of seconds
> of calling the send packet handler
>
> * NDIS resets the driver if it does not respond to an OID
> request after X number of seconds
>
> Any response that replaced X in one of the above samples with a
> number would greatly increase our ability to troubleshoot and solve
> this problem.
>
The DDK documentation for MiniportReset is as good as it gets.

That seems a little strange for an interface that should be well documented
by now.

Using a serialized driver is just asking for pain…

I agree. I have no choice. Not my decision. I have no control over it.
However, if my question was answered, it would not matter if the driver was
serialized or not.

Thanks,
Cullen


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Dear Mr. Dave,

David R. Cattley wrote:

I appologize for the reference to the DDKs. It is hard to know
when that is appropriate and by no means did I mean to offend.

No need to apologize. While I do not post frequently, I do follow
this thread, and I have seen situations where pasting a quote from
the docs was necessary. I was just trying to let you know that I
have the full docs installed. I guess I could have phrased it more
gently, but I have a co-worker who tells me I would benefit from
enrolling in a charm school.

The only honest answer I can give you to your specific question
is “I don’t know precisely all of the criteria that NDIS uses to
gate if and when it will call MiniportReset()”. AFAIK, if the
duration that the current send has been pending or the current
request has been pending exceeds the nominal 4-second timeout,
NDIS will whack your MiniportReset(). If there are other cases,
then, I have not seen them occur.

The ‘X’ you are looking for is documented in the DDK in the
comments for MniportCheckForHang() as “… around four seconds”
as well as the only two conditions I am aware of that are
measured against that time period; Request Completion and Send
Completion. I recognize that that is not a satisfactory answer
from me or the DDK documents.

Well, I guess I will have to settle for that.

That said, I can offer some ideas for you to investigate and when
you figure it out, please close the loop here so we can learn
something too.

Sounds like a plan.

As a possible experiment, have you tried running your scenario
with the checked version of NDIS.SYS with a debugger attached and
the ‘NDIS Verifier’, Debug Level, and Debug Areas enabled
verbosely? It may well be that NDIS will spit out ‘reason’ for
deciding to call the MiniportReset() entry.

I will do that tomorrow morning. Unfortunately, it’s too late in
the day for me to get knee deep in experiments. At this time of
day, I’ll spend more time making mistakes than making progress.

Continuing on the ‘taking a long time’ path: You say that the
hardware is ‘slow’ to send packets (I think you mentioned that it
was PIO based). I re-read the posts but did not grok from them
an actual quantitative measure of ‘slow’, especially relative to
the reported link speed. Perhaps I missed it (sorry).

The hardware is slow in that packets are sent to the device one at a
time using PIO over an ISA bus. Also, the device needs an extra
wait state during each PIO write. To send a packet, the driver has
to wait for the device to indicate that hardware resources are
available for a send. There is no DMA or other facility that would
allow faster communication with the hardware. The hardware
resources are limited, but even if the device could buffer many
packets, we still have to send them using slow ISA PIO with an extra
wait state.

Our TCP throughput using Iperf is above 5.5 Mbps when we only have
one destination (no duplication of packets). When we have two
destinations, the throughput drops to 2.25 Mbps.

UDP throughput using Iperf is 11 Mbps. We do not duplicate UDP
packets, so that number is the same for any number of destinations.

If you assume the TCP is sending and receiving using the same
medium, then the 5.5 Mbps indicates that the ACKs are using half the
available bandwidth. With two destinations, we have twice the
number of ACKs, and therefore half of the half.

Is it possible that with single sends the driver/hardware is
literally just keeping up with the combined send & request rate
(both are competing for the same Miniport lock in a serialized
driver)?

It is entirely possible. I will dig around and see if there are a
couple of signal lines I can toggle when sends and receives occur.

Does NDIS call your MiniportReset() when you contrive the
scenario to *not* be sending packets? For instance, if you
unbind all protocols from the adapter? In that scenario you
would only have requests that could be backing up or some yet to
be understood criteria to trigger the reset. By eliminating the
send activity, you could perhaps intuit that the send path *is*
the issue.

We duplicate packets when we have two destinations. If I use the
same driver binary and create a setup with only one destination, we
only send one packet, and the resets do not occur. So there is
something about the duplicate sends that makes NDIS unhappy.

You noted that CPU utilization reported by Taskman does not
exceed 60%. How does that compare with the ‘non-duplicating’
driver? That difference would give you a rough estimate of the
CPU cost of sending a packet. That cost should reasonably be
almost ‘noise’ in the measurement.

CPU usage is pinned at 60% for both versions of the driver. The
hardware engineer believes that the ISA bus speed is the limiter
here, and that the CPU spends a lot of time waiting for the PIO to
finish. He has scoped this out as best as he can with the test
equipment we have available.

I don’t believe (but I can’t say that I know specifically) that
NDIS itself cares about the link data rate and that it uses it to
scale any timeouts. Bound protocols, however, are encouraged to
be aware of the link datarate and to *not* pound send data into
the link such that a large backlog occurs. You might try reducing
the values reported in OID_GEN_TRANSMIT_BUFFER_SPACE and
OID_GEN_TRANSMIT_BLOCK_SIZE to effectively say that the driver
can send only a single packet at a time. Protocols that are nice
enough to ask about these operational characteristics might back
down on the rate of packet transmission.

We report OID_GEN_TRANSMIT_BUFFER_SPACE as 0x2000. I won’t embarass
the original coder here by posting his comment, but it indicates
that the value is arbitrary. We report OID_GEN_TRANSMIT_BLOCK_SIZE
as 1514, which is the MTU plus the Ethernet II header size.

Please keep in mind that we are your coleagues here, not oracles.
Sometime the best we can do is share with you our experiences and
help you find a path to the answer. Sure, sometimes (others more
frequently than me for sure!) there is the one word, line, or
paragraph answer and that is that. Honestly, we are trying to
help. You might consider that there might not actually be an
answer to your question but that does not mean there is not a
solution to your problem. NDIS was created by many people over
many years and has resiliently adapted to new requirements while
maintaining backward compatiblity over a remarkable number
releases. It may well be that no person can answer your question
in the terms you have asked it. The Miniport guts have been
‘improved’ in fits and starts by a few people and even if they
sat down and read through the source code, it might not reveal a
documentable answer (unfortunately). You of course always have
the option of calling MSFT Product Support, opening a case with
them, and having someone with that access try and answer your
question.

Once again, the need for charm school seems to have affected my
communication.

I thank everyone who has responded for their time and patience.
There may not be an answer to my specific question, but it appears
that I still have some options for solving the problem thanks to you
and everyone else on this list.

Good Luck,

Thank you, I will probably need it at this rate.

P.S. Mr. Cattley is fine, by the way. I spoke with him tonight.
He & Mom are watching the Red Sox. Me, I go by just Dave :wink:

lol! I know the interweb is breaking down barriers to
communication, but I sometimes feel that manners are being trampled
as a side effect. Also, I grew up in the South, and I was taught to
address people more formally than is the general rule these days. I
apologize for any discomfort.

Thank you very much,
Cullen

Cullen,

You might also try the following:

Instead of returning NDIS_STATUS_PENDING from NdisSend() and holding onto
the packet given to you, define some small number of buffers into which you
will queue packets, say, two for starters. When MiniportSend() is called,
if you have a free buffer, copy the packet to the buffer and return
NDIS_STATUS_SUCCESS. If you don’t have any buffers, return
NDIS_STATUS_RESOURCES. NDIS should hold off sending you any more packets
until you call NdisMSendResourcesAvailable() which you should do *instead*
of using NdisMSendComplete().

You could also do this with a single buffer, of course, and there were NICs
that did more or less just this thing. Keep in mind that most of the NDIS
world left serialized miniports behind a long time ago and probably have not
had to think about it for a long time (save for the occasional NDIS WAN
Miniport here & there) so we don’t exactly have this stuff at the top of the
heap!

It just might be that NDIS will be happy enough with this situation instead
of holding that packet for a long time and it will forego whacking your
MiniportReset(). On the other hand, it might still be upset about a large
backlog of send packets in the (hidden) Miniport Send Queue and still think
your NIC has gone off the rails. I don’t know but it may be worth a shot.
Replacing the ‘completion’ handling with a call to
NdisMSendResourcesAvailable() in the single packet buffer case is probably
not that big a deal to code in.

Just a thought.

Good Luck,
-dave

David R. Cattley
Consulting Engineer
Systems Software Development

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Cullen
Sent: Thursday, October 25, 2007 6:30 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] NdisReset criteria

Dear Mr. Dave,

David R. Cattley wrote:

I appologize for the reference to the DDKs. It is hard to know when
that is appropriate and by no means did I mean to offend.

No need to apologize. While I do not post frequently, I do follow this
thread, and I have seen situations where pasting a quote from the docs was
necessary. I was just trying to let you know that I have the full docs
installed. I guess I could have phrased it more gently, but I have a
co-worker who tells me I would benefit from enrolling in a charm school.

The only honest answer I can give you to your specific question is “I
don’t know precisely all of the criteria that NDIS uses to gate if and
when it will call MiniportReset()”. AFAIK, if the duration that the
current send has been pending or the current request has been pending
exceeds the nominal 4-second timeout,
NDIS will whack your MiniportReset(). If there are other cases,
then, I have not seen them occur.

The ‘X’ you are looking for is documented in the DDK in the comments
for MniportCheckForHang() as “… around four seconds”
as well as the only two conditions I am aware of that are measured
against that time period; Request Completion and Send Completion. I
recognize that that is not a satisfactory answer from me or the DDK
documents.

Well, I guess I will have to settle for that.

That said, I can offer some ideas for you to investigate and when you
figure it out, please close the loop here so we can learn something
too.

Sounds like a plan.

As a possible experiment, have you tried running your scenario with
the checked version of NDIS.SYS with a debugger attached and the ‘NDIS
Verifier’, Debug Level, and Debug Areas enabled
verbosely? It may well be that NDIS will spit out ‘reason’ for
deciding to call the MiniportReset() entry.

I will do that tomorrow morning. Unfortunately, it’s too late in the day
for me to get knee deep in experiments. At this time of day, I’ll spend
more time making mistakes than making progress.

Continuing on the ‘taking a long time’ path: You say that the
hardware is ‘slow’ to send packets (I think you mentioned that it was
PIO based). I re-read the posts but did not grok from them an actual
quantitative measure of ‘slow’, especially relative to the reported
link speed. Perhaps I missed it (sorry).

The hardware is slow in that packets are sent to the device one at a time
using PIO over an ISA bus. Also, the device needs an extra wait state
during each PIO write. To send a packet, the driver has to wait for the
device to indicate that hardware resources are available for a send. There
is no DMA or other facility that would allow faster communication with the
hardware. The hardware resources are limited, but even if the device could
buffer many packets, we still have to send them using slow ISA PIO with an
extra wait state.

Our TCP throughput using Iperf is above 5.5 Mbps when we only have one
destination (no duplication of packets). When we have two destinations, the
throughput drops to 2.25 Mbps.

UDP throughput using Iperf is 11 Mbps. We do not duplicate UDP packets, so
that number is the same for any number of destinations.

If you assume the TCP is sending and receiving using the same medium, then
the 5.5 Mbps indicates that the ACKs are using half the available bandwidth.
With two destinations, we have twice the number of ACKs, and therefore half
of the half.

Is it possible that with single sends the driver/hardware is literally
just keeping up with the combined send & request rate (both are
competing for the same Miniport lock in a serialized driver)?

It is entirely possible. I will dig around and see if there are a couple of
signal lines I can toggle when sends and receives occur.

Does NDIS call your MiniportReset() when you contrive the scenario to
*not* be sending packets? For instance, if you unbind all protocols
from the adapter? In that scenario you would only have requests that
could be backing up or some yet to be understood criteria to trigger
the reset. By eliminating the send activity, you could perhaps intuit
that the send path *is* the issue.

We duplicate packets when we have two destinations. If I use the same
driver binary and create a setup with only one destination, we only send one
packet, and the resets do not occur. So there is something about the
duplicate sends that makes NDIS unhappy.

You noted that CPU utilization reported by Taskman does not exceed
60%. How does that compare with the ‘non-duplicating’
driver? That difference would give you a rough estimate of the CPU
cost of sending a packet. That cost should reasonably be almost
‘noise’ in the measurement.

CPU usage is pinned at 60% for both versions of the driver. The hardware
engineer believes that the ISA bus speed is the limiter here, and that the
CPU spends a lot of time waiting for the PIO to finish. He has scoped this
out as best as he can with the test equipment we have available.

I don’t believe (but I can’t say that I know specifically) that NDIS
itself cares about the link data rate and that it uses it to scale any
timeouts. Bound protocols, however, are encouraged to be aware of the
link datarate and to *not* pound send data into the link such that a
large backlog occurs. You might try reducing the values reported in
OID_GEN_TRANSMIT_BUFFER_SPACE and OID_GEN_TRANSMIT_BLOCK_SIZE to
effectively say that the driver can send only a single packet at a
time. Protocols that are nice enough to ask about these operational
characteristics might back down on the rate of packet transmission.

We report OID_GEN_TRANSMIT_BUFFER_SPACE as 0x2000. I won’t embarass the
original coder here by posting his comment, but it indicates that the value
is arbitrary. We report OID_GEN_TRANSMIT_BLOCK_SIZE as 1514, which is the
MTU plus the Ethernet II header size.

Please keep in mind that we are your coleagues here, not oracles.
Sometime the best we can do is share with you our experiences and help
you find a path to the answer. Sure, sometimes (others more
frequently than me for sure!) there is the one word, line, or
paragraph answer and that is that. Honestly, we are trying to help.
You might consider that there might not actually be an answer to your
question but that does not mean there is not a
solution to your problem. NDIS was created by many people over
many years and has resiliently adapted to new requirements while
maintaining backward compatiblity over a remarkable number releases.
It may well be that no person can answer your question in the terms
you have asked it. The Miniport guts have been ‘improved’ in fits and
starts by a few people and even if they sat down and read through the
source code, it might not reveal a documentable answer
(unfortunately). You of course always have the option of calling MSFT
Product Support, opening a case with them, and having someone with
that access try and answer your question.

Once again, the need for charm school seems to have affected my
communication.

I thank everyone who has responded for their time and patience.
There may not be an answer to my specific question, but it appears that I
still have some options for solving the problem thanks to you and everyone
else on this list.

Good Luck,

Thank you, I will probably need it at this rate.

P.S. Mr. Cattley is fine, by the way. I spoke with him tonight.
He & Mom are watching the Red Sox. Me, I go by just Dave :wink:

lol! I know the interweb is breaking down barriers to communication, but I
sometimes feel that manners are being trampled as a side effect. Also, I
grew up in the South, and I was taught to address people more formally than
is the general rule these days. I apologize for any discomfort.

Thank you very much,
Cullen


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Have you put a breakpoint in your reset handler and looked at the call
stack
at the moment the reset is called?


Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

You are sending the same TCP packet to two locations? In that case when
two acks come back, do you send them both to TCP? If this is the case,
this might cause TCP to be very confused, as it means that it will get
many acks out of order.

Can you try and create a filter that will dropped recv packets from one
location?

Please note that in any case your communication is not that reliable in
any case, as one ACK is enough to cause TCP release the packet.

Thanks
Tzachi

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Cullen
Sent: Friday, October 26, 2007 12:30 AM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] NdisReset criteria

Dear Mr. Dave,

David R. Cattley wrote:
> I appologize for the reference to the DDKs. It is hard to
know when
> that is appropriate and by no means did I mean to offend.

No need to apologize. While I do not post frequently, I do
follow this thread, and I have seen situations where pasting
a quote from the docs was necessary. I was just trying to
let you know that I have the full docs installed. I guess I
could have phrased it more gently, but I have a co-worker who
tells me I would benefit from enrolling in a charm school.

> The only honest answer I can give you to your specific
question is “I
> don’t know precisely all of the criteria that NDIS uses to
gate if and
> when it will call MiniportReset()”. AFAIK, if the duration
that the
> current send has been pending or the current request has
been pending
> exceeds the nominal 4-second timeout,
> NDIS will whack your MiniportReset(). If there are other cases,
> then, I have not seen them occur.
>
> The ‘X’ you are looking for is documented in the DDK in the
comments
> for MniportCheckForHang() as “… around four seconds”
> as well as the only two conditions I am aware of that are measured
> against that time period; Request Completion and Send
Completion. I
> recognize that that is not a satisfactory answer from me or the DDK
> documents.

Well, I guess I will have to settle for that.

> That said, I can offer some ideas for you to investigate
and when you
> figure it out, please close the loop here so we can learn something
> too.

Sounds like a plan.

> As a possible experiment, have you tried running your scenario with
> the checked version of NDIS.SYS with a debugger attached
and the ‘NDIS
> Verifier’, Debug Level, and Debug Areas enabled
> verbosely? It may well be that NDIS will spit out ‘reason’ for
> deciding to call the MiniportReset() entry.

I will do that tomorrow morning. Unfortunately, it’s too
late in the day for me to get knee deep in experiments. At
this time of day, I’ll spend more time making mistakes than
making progress.

> Continuing on the ‘taking a long time’ path: You say that the
> hardware is ‘slow’ to send packets (I think you mentioned
that it was
> PIO based). I re-read the posts but did not grok from them
an actual
> quantitative measure of ‘slow’, especially relative to the reported
> link speed. Perhaps I missed it (sorry).

The hardware is slow in that packets are sent to the device
one at a time using PIO over an ISA bus. Also, the device
needs an extra wait state during each PIO write. To send a
packet, the driver has to wait for the device to indicate
that hardware resources are available for a send. There is
no DMA or other facility that would allow faster
communication with the hardware. The hardware resources are
limited, but even if the device could buffer many packets, we
still have to send them using slow ISA PIO with an extra wait state.

Our TCP throughput using Iperf is above 5.5 Mbps when we only
have one destination (no duplication of packets). When we
have two destinations, the throughput drops to 2.25 Mbps.

UDP throughput using Iperf is 11 Mbps. We do not duplicate
UDP packets, so that number is the same for any number of
destinations.

If you assume the TCP is sending and receiving using the same
medium, then the 5.5 Mbps indicates that the ACKs are using
half the available bandwidth. With two destinations, we have
twice the number of ACKs, and therefore half of the half.

> Is it possible that with single sends the driver/hardware
is literally
> just keeping up with the combined send & request rate (both are
> competing for the same Miniport lock in a serialized driver)?

It is entirely possible. I will dig around and see if there
are a couple of signal lines I can toggle when sends and
receives occur.

> Does NDIS call your MiniportReset() when you contrive the
scenario to
> *not* be sending packets? For instance, if you unbind all
protocols
> from the adapter? In that scenario you would only have
requests that
> could be backing up or some yet to be understood criteria
to trigger
> the reset. By eliminating the send activity, you could
perhaps intuit
> that the send path *is* the issue.

We duplicate packets when we have two destinations. If I use
the same driver binary and create a setup with only one
destination, we only send one packet, and the resets do not
occur. So there is something about the duplicate sends that
makes NDIS unhappy.

> You noted that CPU utilization reported by Taskman does not exceed
> 60%. How does that compare with the ‘non-duplicating’
> driver? That difference would give you a rough estimate of the CPU
> cost of sending a packet. That cost should reasonably be almost
> ‘noise’ in the measurement.

CPU usage is pinned at 60% for both versions of the driver.
The hardware engineer believes that the ISA bus speed is the
limiter here, and that the CPU spends a lot of time waiting
for the PIO to finish. He has scoped this out as best as he
can with the test equipment we have available.

> I don’t believe (but I can’t say that I know specifically)
that NDIS
> itself cares about the link data rate and that it uses it
to scale any
> timeouts. Bound protocols, however, are encouraged to be
aware of the
> link datarate and to *not* pound send data into the link
such that a
> large backlog occurs. You might try reducing the values reported in
> OID_GEN_TRANSMIT_BUFFER_SPACE and OID_GEN_TRANSMIT_BLOCK_SIZE to
> effectively say that the driver can send only a single packet at a
> time. Protocols that are nice enough to ask about these
operational
> characteristics might back down on the rate of packet transmission.

We report OID_GEN_TRANSMIT_BUFFER_SPACE as 0x2000. I won’t
embarass the original coder here by posting his comment, but
it indicates that the value is arbitrary. We report
OID_GEN_TRANSMIT_BLOCK_SIZE as 1514, which is the MTU plus
the Ethernet II header size.

> Please keep in mind that we are your coleagues here, not oracles.
> Sometime the best we can do is share with you our
experiences and help
> you find a path to the answer. Sure, sometimes (others more
> frequently than me for sure!) there is the one word, line, or
> paragraph answer and that is that. Honestly, we are trying
to help.
> You might consider that there might not actually be an
answer to your
> question but that does not mean there is not a
> solution to your problem. NDIS was created by many people over
> many years and has resiliently adapted to new requirements while
> maintaining backward compatiblity over a remarkable number
releases.
> It may well be that no person can answer your question in the terms
> you have asked it. The Miniport guts have been ‘improved’
in fits and
> starts by a few people and even if they sat down and read
through the
> source code, it might not reveal a documentable answer
> (unfortunately). You of course always have the option of
calling MSFT
> Product Support, opening a case with them, and having someone with
> that access try and answer your question.

Once again, the need for charm school seems to have affected
my communication.

I thank everyone who has responded for their time and patience.
There may not be an answer to my specific question, but it
appears that I still have some options for solving the
problem thanks to you and everyone else on this list.

> Good Luck,

Thank you, I will probably need it at this rate.

> P.S. Mr. Cattley is fine, by the way. I spoke with him tonight.
> He & Mom are watching the Red Sox. Me, I go by just Dave :wink:

lol! I know the interweb is breaking down barriers to
communication, but I sometimes feel that manners are being
trampled as a side effect. Also, I grew up in the South, and
I was taught to address people more formally than is the
general rule these days. I apologize for any discomfort.

Thank you very much,
Cullen


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online
at http://www.osronline.com/page.cfm?name=ListServer

Dear Mr. Dar,

Tzachi Dar wrote:

You are sending the same TCP packet to two locations? In that case when
two acks come back, do you send them both to TCP? If this is the case,
this might cause TCP to be very confused, as it means that it will get
many acks out of order.

Only one destination responds to the packets.

Can you try and create a filter that will dropped recv packets from one
location?

See the above answer.

Please note that in any case your communication is not that reliable in
any case, as one ACK is enough to cause TCP release the packet.

Since only one destination responds to the packets, we do not have
reliability problems.

Thank you,
Cullen

Dear Dave,

David R. Cattley wrote:

That said, I can offer some ideas for you to investigate and when you figure
it out, please close the loop here so we can learn something too.

I am somewhat confused today. I modified the driver to call
NdisMSetAttributesEx with AttributeFlags set to zero so that I could
attempt to debug the reset problem, and now my reset handler does
not get called.

Also, I am no longer having intermittent throughput problems.
Previously, in this test setup, I was seeing throughput drop to zero
about once every 60 seconds.

I would like to thank everyone for their input and ideas. I hope to
try some of them out as soon as I can recreate the problem.

Thank you,
Cullen