How to increase throughput using IRP based winsock

Dear All,
Currently we are using windows kernel sockets for communicating client device. IRP’s are used for notifying completion of send and receive data. By using KeQueryPerformanceCounter, we are observing it is taking so much time for after WskSend or WskReceive to get completing request. which in turn reducing performance. WSK_FLAG_NODELAY is using in case of send packets. In registry for tcip interface, tcpackfrequency and tcpnodelay is made 1. Please suggest any option for completing option for better performance wsk using IRPs

Is the network IRP completion time drastically longer than the packet round trip time? A network send IRP can’t be completed until the local machine knows the remote end has received it, as it might still need the buffer to retransmit a packet.

Jan

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
Sent: Sunday, January 7, 2018 11:04 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] How to increase throughput using IRP based winsock

Dear All,
Currently we are using windows kernel sockets for communicating client device. IRP’s are used for notifying completion of send and receive data. By using KeQueryPerformanceCounter, we are observing it is taking so much time for after WskSend or WskReceive to get completing request. which in turn reducing performance. WSK_FLAG_NODELAY is using in case of send packets. In registry for tcip interface, tcpackfrequency and tcpnodelay is made 1. Please suggest any option for completing option for better performance wsk using IRPs

How many IRP’s do you have in your pool? I had a customer with a similar
complaint, adding more IRP’s fixed the problem.

Don Burn
Windows Driver Consulting
Website: http://www.windrvr.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@gmail.com
Sent: Monday, January 08, 2018 2:04 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] How to increase throughput using IRP based winsock

Dear All,
Currently we are using windows kernel sockets for communicating client
device. IRP’s are used for notifying completion of send and receive data. By
using KeQueryPerformanceCounter, we are observing it is taking so much time
for after WskSend or WskReceive to get completing request. which in turn
reducing performance. WSK_FLAG_NODELAY is using in case of send packets. In
registry for tcip interface, tcpackfrequency and tcpnodelay is made 1.
Please suggest any option for completing option for better performance wsk
using IRPs


NTDEV is sponsored by OSR

Visit the list online at:
http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software
drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at
http:</http:></http:></http:>

@Don Burn
We are allocating IRP’s for every send or receive request using IoAllocateIrp and allocating WSK buffer using IoAllocateMdl with MmBuildMdlForNonPagedPool option. once the request is completed freeing irps and wsk buffer. Currently we are not allocating pool irps.

How many IRP’s do you have in your pool? I had a customer with a similar
complaint, adding more IRP’s fixed the problem.

I’ve always created a pool of IRP’s with the MDL and buffer, then used
those. The allocation costs can be high.

Don Burn
Windows Driver Consulting
Website: http://www.windrvr.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@gmail.com
Sent: Tuesday, January 09, 2018 6:41 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] How to increase throughput using IRP based winsock

@Don Burn
We are allocating IRP’s for every send or receive request using
IoAllocateIrp and allocating WSK buffer using IoAllocateMdl with
MmBuildMdlForNonPagedPool option. once the request is completed freeing
irps and wsk buffer. Currently we are not allocating pool irps.

How many IRP’s do you have in your pool? I had a customer with a similar
complaint, adding more IRP’s fixed the problem.


NTDEV is sponsored by OSR

Visit the list online at:
http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software
drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at
http:</http:></http:></http:>

My understanding was IRPs and MDLs already come from an OS pool, although I agree with Don than some cpu cycles can potentially be shaved off using a private local pool. It hasn’t been established that things are cpu cycle limited, so personally would be analyzing where the bottleneck is before trying to shave cpu cycles.

You haven’t told us what you consider as “too slow”. Are you doing 10M tiny requests/sec or are you doing 100 huge transfers/sec. As I remember, kernel winsock transfers are not buffered, so if you do tiny writes you will get many tiny packets, which will be very bad for performance. If you look at some basic performance numbers is the cpu use high, or is cpu use low. If 99% of the request time is spent waiting for a network reply packet, it won’t help much to shave a few cycles of cpu time off. Are you using a single socket or many sockets?

The first thing I usually do when looking at network performance is I put a packet sniffer on the wire and look at things like remote node response time.

I’d also suggest adding some ETW tracing which will generate pretty high resolution (microsecond or less) performance data. I’s easy to include context data so you can filter traces on a specific request. The ETW TraceLogging facility is very easy to get useful data from. If the performance really is cpu cycle/memory bandwidth limited, Intel’s VTune tool is the best way I know measure where the time goes.

Jan

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
Sent: Tuesday, January 9, 2018 3:41 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] How to increase throughput using IRP based winsock

@Don Burn
We are allocating IRP’s for every send or receive request using IoAllocateIrp and allocating WSK buffer using IoAllocateMdl with MmBuildMdlForNonPagedPool option. once the request is completed freeing irps and wsk buffer. Currently we are not allocating pool irps.

How many IRP’s do you have in your pool? I had a customer with a similar complaint, adding more IRP’s fixed the problem.


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:>

@Jan Bottorff
I am doing huge transfer, may be around 1gb data transferring. which around in one transfer sending 64kb data through socket.

we are using single socket for read and write data, does it make any difference in case of multiple sockets?

from wireshark logs, RTT(158ms for wreiting 64kb data) taken almost equal to the winsock IRP completion(158ms) there is not much difference.

But transfer speeds are relatively very low. we are testing in 5ghz(ac) network.

xxxxx@gmail.com wrote:

I am doing huge transfer, may be around 1gb data transferring. which around in one transfer sending 64kb data through socket.

we are using single socket for read and write data, does it make any difference in case of multiple sockets?

from wireshark logs, RTT(158ms for wreiting 64kb data) taken almost equal to the winsock IRP completion(158ms) there is not much difference.

But transfer speeds are relatively very low. we are testing in 5ghz(ac) network.

What to you mean by “very low,” exactly?  As soon as you mention that
you are testing a wifi network, alarm bells go off all over.  Wifi
performance is heavily dependent on a hundred different environmental
issues, like router location, antenna orientation, building composition,
furniture location, RF interference, reflections, wall positions, and
many others.

The best focused tests in the real world are getting about 300Mbps from
802.11ac, which would be 40 MB/s, about the speed of USB 2.  Your
gigabyte transfer would take 25 to 30 seconds.  What are you actually
seeing?


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Are you saying that you send 64kb (KB?) of data on a socket connection and then wait for an ack from the other end before sending the next block?

If so, your performance is doomed to be terrible for the same reason that many well known and understood TCP and UDP protocols are. The ?chatty? problem; or more appropriately the lack of sufficient pipelining of data on high bandwidth, high latency networks ? which nowadays includes most LANs as well as the traditional problem with WAN throughput. Many protocol specific workarounds have been implemented, but the best advise is to follow the original advice from the designers of the TCP protocol and imagine that there must be a certain ?window? of data in transit at all times. The size of such a window will depend on how much data there is to send naturally, but also on the ability of the remote host to process or buffer it (usually not a concern now a days), but more importantly on the bandwidth latency product of the current network path ? which is of course subject to change without notice during the course of your communications

I don?t know what a ?5ghz(ac) network? is, but 158ms for an IRP completion is an eternity unless it is synchronous and waiting for some remote response

Sent from Mailhttps: for Windows 10

________________________________
From: xxxxx@lists.osr.com on behalf of xxxxx@gmail.com
Sent: Wednesday, January 10, 2018 6:19:10 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] How to increase throughput using IRP based winsock

@Jan Bottorff
I am doing huge transfer, may be around 1gb data transferring. which around in one transfer sending 64kb data through socket.

we are using single socket for read and write data, does it make any difference in case of multiple sockets?

from wireshark logs, RTT(158ms for wreiting 64kb data) taken almost equal to the winsock IRP completion(158ms) there is not much difference.

But transfer speeds are relatively very low. we are testing in 5ghz(ac) network.


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:></https:>

On Jan 10, 2018, at 4:19 PM, xxxxx@hotmail.com wrote:
>
> I don’t know what a ‘5ghz(ac) network’ is,

Oh, I think you probably do. 802.11ac wifi has its channels in the 5GHz band.

Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Tim Roberts wrote:

The best focused tests in the real world are getting about 300Mbps
from 802.11ac, which would be 40 MB/s, about the speed of USB 2.

Glug glug glug…

Ah yes, wireless fun! I just didn?t recognize that particular way of describing it.

Again for the OP, over a high latency network, you will need a pipeline of some kind. All Wifi is high latency even when you test an EM dampened room with nothing but the test harness.

Can we assume TCP?

Sent from Mailhttps: for Windows 10

________________________________
From: xxxxx@lists.osr.com on behalf of xxxxx@gmail.com
Sent: Thursday, January 11, 2018 2:46:01 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] How to increase throughput using IRP based winsock

Tim Roberts wrote:

> The best focused tests in the real world are getting about 300Mbps
> from 802.11ac, which would be 40 MB/s, about the speed of USB 2.

Glug glug glug…


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:></https:>

@ MM
yes its TCP socket(AF_INET, SOCK_STREAM, IPPROTO_TCP).
@tim
802.11ac wifi has its channels in the 5GHz band
yes it is having…

creating multiple irps for data transfer increased performance little currently write speed is around 4~5 mbps and read around 8 mbps with 802.11 ac 5ghz channel. previously after each socket send and receive, waiting for IRP completion.

Testing in shielded room.

xxxxx@gmail.com wrote:

@MM
yes its TCP socket(AF_INET, SOCK_STREAM, IPPROTO_TCP).
@tim
802.11ac wifi has its channels in the 5GHz band

True, but that is not related to the data rate.  The data rate depends
on the channel width, the modulation scheme, and the number of separate
antennae you are using.

creating multiple irps for data transfer increased performance little currently write speed is around 4~5 mbps and read around 8 mbps with 802.11 ac 5ghz channel. previously after each socket send and receive, waiting for IRP completion.

Testing in shielded room.

How wide are the channels, how many antennae, how many streams, what
kind of modulation?  There’s a chart on the 802.11ac page on Wikipedia
that describes the range of data rates that you should expect.  I’d bet
real dollars that your operating system processing is not impacting your
transfer rate in any way.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.