Achieving lowest network latency and highest packet priority?

Hello,

I am trying to implement a network communication module where each packet (UDP datagram) is sent in real-time (or fastest possible) since timing really matters in this scenario. Using Winsock, tests indicated that latency is very low in optimal conditions but, for example, if a network file transfer is initiated, my ping times begin to increase dramatically (150-600ms). The UDP datagrams I want to send are very small (50 bytes every 10ms) so I would like those packets to have priority over the non-urgent ones like file transfer over LAN.

Basically, what I would like to do is a sort of “Audio over Ethernet” solution with software only (no dedicated Ethernet hardware).

What would you suggest to achieve lowest latency possible for sending small packets on a LAN at predefined intervals and have priority over other transfers? Is there a level below Winsock for sending packets or the solution would be something like writing an NDIS miniport driver and use some QoS features?

Thanks in advance for your input!

Have a look at the Winsock 2 QOS / Flowspec functions especially
WSAConnect().

Mark.

On 05/12/2011 22:08, xxxxx@gmail.com wrote:

Hello,

I am trying to implement a network communication module where each packet (UDP datagram) is sent in real-time (or fastest possible) since timing really matters in this scenario. Using Winsock, tests indicated that latency is very low in optimal conditions but, for example, if a network file transfer is initiated, my ping times begin to increase dramatically (150-600ms). The UDP datagrams I want to send are very small (50 bytes every 10ms) so I would like those packets to have priority over the non-urgent ones like file transfer over LAN.

Basically, what I would like to do is a sort of “Audio over Ethernet” solution with software only (no dedicated Ethernet hardware).

What would you suggest to achieve lowest latency possible for sending small packets on a LAN at predefined intervals and have priority over other transfers? Is there a level below Winsock for sending packets or the solution would be something like writing an NDIS miniport driver and use some QoS features?

Thanks in advance for your input!

The only approach that I can think of that might improve latency, etc. is to
minimize user-to-kernel I/O.

Perhaps you could develop a kernel-mode UDP client that sends these packets.
Timing in the kernel is still problematic, but at least you wouldn’t have
the additional timing uncertainty associated with user-to-kernel I/O.

On Vista and later platforms there is a Winsock Kernel (WSK) facility that
provides convenient kernel access to TCP/IP. This could be a fairly simple
legacy (non-PnP) KMDF driver.

You would not write a NDIS miniport: that is the HW adapter vendor’s job.

Good luck!

Thomas F. Divine


From:
Sent: Monday, December 05, 2011 5:08 PM
To: “Windows System Software Devs Interest List”
Subject: [ntdev] Achieving lowest network latency and highest packet
priority?

> Hello,
>
> I am trying to implement a network communication module where each packet
> (UDP datagram) is sent in real-time (or fastest possible) since timing
> really matters in this scenario. Using Winsock, tests indicated that
> latency is very low in optimal conditions but, for example, if a network
> file transfer is initiated, my ping times begin to increase dramatically
> (150-600ms). The UDP datagrams I want to send are very small (50 bytes
> every 10ms) so I would like those packets to have priority over the
> non-urgent ones like file transfer over LAN.
>
> Basically, what I would like to do is a sort of “Audio over Ethernet”
> solution with software only (no dedicated Ethernet hardware).
>
> What would you suggest to achieve lowest latency possible for sending
> small packets on a LAN at predefined intervals and have priority over
> other transfers? Is there a level below Winsock for sending packets or the
> solution would be something like writing an NDIS miniport driver and use
> some QoS features?
>
> Thanks in advance for your input!
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer

I’d first suggest you hook up a network sniffer and look at at the wire
traffic. I was going to suggest you were possible getting head of line
blocking on the NIC, but to create a latency of 150 ms on a gigabit link
would require 15 megabytes be queued ahead of your audio UDP datagram. If
there were multiple streams of file transfers, and each was queuing giant
send offload packets (which can reach I believe 256k each), then you might
have 15 megabytes of data queued to the NIC ahead of your UDP packet. You
also could turn on NDIS tracing (like with the commands in netsh), and look
at the network events and their timing in the created ETW trace file.

You ARE only going over a local gigabit Ethernet link, and not over the
Internet or other much slower WAN link?

Jan

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:bounce-485476-
xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
Sent: Monday, December 05, 2011 2:08 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Achieving lowest network latency and highest packet
priority?

Hello,

I am trying to implement a network communication module where each
packet (UDP datagram) is sent in real-time (or fastest possible) since
timing
really matters in this scenario. Using Winsock, tests indicated that
latency is
very low in optimal conditions but, for example, if a network file
transfer is
initiated, my ping times begin to increase dramatically (150-600ms). The
UDP
datagrams I want to send are very small (50 bytes every 10ms) so I would
like
those packets to have priority over the non-urgent ones like file transfer
over LAN.

Basically, what I would like to do is a sort of “Audio over Ethernet”
solution
with software only (no dedicated Ethernet hardware).

What would you suggest to achieve lowest latency possible for sending
small
packets on a LAN at predefined intervals and have priority over other
transfers? Is there a level below Winsock for sending packets or the
solution
would be something like writing an NDIS miniport driver and use some QoS
features?

Thanks in advance for your input!


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

We have an NDIS Intermediate driver and we were seeing packet delays. Our tests showed that the delay was in the delivery of the packet to the application. From the time our driver indicated the packet up the stack until it was received by an application we saw up to 8 seconds of delay during high CPU or high disk usage. Delay during file transfer could be due to the disk having priority over delivery of packets. For XP and older OS’s both network and disk are handled only by CPU0.

Larry C

What kind of IO model does the application use? For this kind of demanding,
and inherently stateless, IO, I would recommend using an IOCP and pending
several IOPs at all times. The exact number to pend should depend on
available resources (RAM, number of CPUs, OS version are the key factors).
If the application uses the select model (worst perf on Windows) or sync IO
(baseline perf on Windows), the context switch delay can be significant.

BTW on newer OSes, the IOCP model is encapsulated in work queue APIs

wrote in message news:xxxxx@ntdev…

We have an NDIS Intermediate driver and we were seeing packet delays. Our
tests showed that the delay was in the delivery of the packet to the
application. From the time our driver indicated the packet up the stack
until it was received by an application we saw up to 8 seconds of delay
during high CPU or high disk usage. Delay during file transfer could be due
to the disk having priority over delivery of packets. For XP and older OS’s
both network and disk are handled only by CPU0.

Larry C

IOCP is definitely the way to go for efficient handling of network
traffic, I’ve easily had over 800 MB/s sustained using that model.

select() on Windows is not the horror it used to be. Back in the first
days of Winsock 1.x it genuinely was implemented as a polling loop,
which is why things like the EventSelect mechanism was created and
articles recommended avoiding select() on Windows altogether.

When Winsock was ported to the NT 4 code base the internal
implementation of select() did change to waiting on handles, though the
implementation for LSP’s using non-system handles had to wait years for
SP4 to be brought in to that fold.

The upshot is that generally select() is fine at 1 Gb/s network speeds
but above that the efficiency of IOCP makes a big difference.

One other thing comes to mind, the app isn’t using call back functions
for send/recv is it ? These look like a great idea but in practice they
perform abysmally due to the fact that callback has to operate in the
same thread context as the original call to send/recv. So if the thread
doesn’t go in to a wait state callback delivery can be delayed to a very
significant degree.

Mark.

On 06/12/2011 23:49, m wrote:

What kind of IO model does the application use? For this kind of
demanding, and inherently stateless, IO, I would recommend using an
IOCP and pending several IOPs at all times. The exact number to pend
should depend on available resources (RAM, number of CPUs, OS version
are the key factors). If the application uses the select model (worst
perf on Windows) or sync IO (baseline perf on Windows), the context
switch delay can be significant.

BTW on newer OSes, the IOCP model is encapsulated in work queue APIs

wrote in message news:xxxxx@ntdev…

We have an NDIS Intermediate driver and we were seeing packet delays.
Our tests showed that the delay was in the delivery of the packet to
the application. From the time our driver indicated the packet up the
stack until it was received by an application we saw up to 8 seconds
of delay during high CPU or high disk usage. Delay during file
transfer could be due to the disk having priority over delivery of
packets. For XP and older OS’s both network and disk are handled only
by CPU0.

Larry C

>packets. For XP and older OS’s both network and disk are handled only by CPU0.

Wrong at least for disks.

Well, maybe the controller’s ISR is serviced by only one CPU (which is also doubtful, since SCSIPORT/STORPORT have strong synchronization logic in them) - but the rest of the stack can execute on any CPU.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com