Consider the following options:
If you are not using async sockets, do so. See if this gives any
improvement.
I am not aware of any mechanism on the sender side to say the sender is
about to discard a packet, but you need to explore this possibility. I
don’t hold out much hope for this approach.
Here’s a question for network experts: is sending handled by a
PASSIVE_LEVEL thread in the kernel? If so, what is its priority? If it
is from a PASSIVE_LEVEL thread, would there be any way to run this thread
at higher priority? It may just be that on a heavily-loaded system the
bottleneck is not how fast the bits get on the wire, but how fast packets
are dequeued for sending, and that may be influenced by thread priorities.
Embedded systems are easy: they have nothing else to do but service the
requests, and in general you can prove (absolutely, in the case of cyclic
exec models; and general, schedulable RTOS systems can be modeled using
Rate Monotonic Analysis, or the newer AADLv2 modeling) that they will meet
all necessary timing windows. But general purpose operating systems, such
as Windows, linux, Mac OS X, etc. are not provable using these
techniques, and in general do not consider concepts like shoving network
packets out as fast as possible as one of their design criteria. So you
are expecting Windows (or any other OS that is a general-purpose OS) to
meet requirements that were either never thought of as important, or were
specifically relegated to be unimportant. This is ultimately a losing
game.
There are workarounds: maintain an app-level queue and measure the rate
you are sending packets down. When you have sent “just enough”, stop
sending for some delta-T, then resume. You will want to make the
parameters “tunable” at runtime to deal with specific network drivers or
system loads, so plan on that.
If I were doing this, I would have one thread whose responsibility was
managing the network connection. It would receive packets from all other
threads and queue them. I would then have a dequeue thread that actually
sent the packets. It would use a semaphore to block sending after N
packets were sent. A timer would ReleaseSemaphore() for some value,
perhaps just 1, perhaps 4, 8, or 17. This throttling-by-timer is a bit
hokey, but cleaner than most other models, and it does limit what you send
down. Your tunable parameters, if all packets are the same size, is the
number of packets, the timing interval, and the amount to release the
semaphore. You adjust these until you stop losing packets on send. If
packets are varying sizes, then your value N for the semaphore is computed
based on packet size, or maybe you change the timer or the release count
based on some function of total bytes transmitted (that is, if you have
lots of short packets you may have a lot more buffer space than if you
have nothing but long packets, and with a mix, you have to account for the
kernel usage by a statistical model). For better response time, I might
use a MultiMedia timer (which executes its callback function in a separate
thread), which gives another tunable parameter, the precision of the
timer. For example, I can ask for 5ms resolution, or 0, which gives
maximum resolution at a cost of increased overall system overhead. If the
network stack can’t give you feedback on UDP lossage at transmission time,
you need something like this to simulate a decent throttling algorithm.
Note that you can create a “test load” and run a “set parameters” option
in your program that sends sequence-numbered, and perhaps time-stamped,
test messages to a test client. It starts doing this with a low
transmission rate based on the various throttling parameters, then
increases the rate until the test client starts reporting packet loss.
Thus, the tunable parameters can be self-adjusting for a given network,
machine load, NIC card, and possibly even taking into account the phase of
the moon. This minimizes the need for the end user to twiddle with these
complex parameters.
joe
Hello,
yes, our receivers are prepared for packet loss and also for packets out
of order and we are aware that each component may silently discard UDP
packets. But I think it’s a difference whether the rate of dropped packets
is about <10 per day (as watched using the embedded devices) or 10 per
second.
And for sure there will be some customers trying to use low end switches
but most customers will follow our requirements and use Cisco etc. and we
can also recommend to use certain NICs.
So what I’m asking for is some hints for getting as close as possible to
the loss rate of the embedded devices when using the same network. To
achieve this I switched to Winsock Kernel which was a big improvement. But
I’m probably not going to write a real hardware driver for a certain NIC.
As a first step I wanted to figure out where exactly the packets are lost,
which - for my impression- seems to be somewhere between my call to
WskSendTo() and the packet leaving the pc on the wire. Next step is, how
to improve this bottleneck- perhaps by modifying certain buffer or QoS
settings or properties inside the NICs parameters or inside WinsockKernel.
According to some posts earlier I’ll do some tests with QoS settings and
different packet rates and sizes. And I’d like to do some tests with
SO_SNDBUF but as mentioned in my first post I’m still receiving this
STATUS_INVALID_PARAMETER whenever I try to get/set the buffer size
(exactly as described in
http://www.osronline.com/showthread.cfm?link=157209)
Regards,
Johannes
NTDEV is sponsored by OSR
Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
OSR is HIRING!! See http://www.osr.com/careers
For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars
To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer