Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Before Posting... Please check out the Community Guidelines in the
Announcements and Administration Category.

How to increase throughput using IRP based winsock

davs_shrusdavs_shrus Posts: 11
Dear All,
Currently we are using windows kernel sockets for communicating client device. IRP's are used for notifying completion of send and receive data. By using KeQueryPerformanceCounter, we are observing it is taking so much time for after WskSend or WskReceive to get completing request. which in turn reducing performance. WSK_FLAG_NODELAY is using in case of send packets. In registry for tcip interface, tcpackfrequency and tcpnodelay is made 1. Please suggest any option for completing option for better performance wsk using IRPs

Comments

  • Jan_BottorffJan_Bottorff Posts: 468
    Is the network IRP completion time drastically longer than the packet round trip time? A network send IRP can't be completed until the local machine knows the remote end has received it, as it might still need the buffer to retransmit a packet.

    Jan

    -----Original Message-----
    From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
    Sent: Sunday, January 7, 2018 11:04 PM
    To: Windows System Software Devs Interest List <xxxxx@lists.osr.com>
    Subject: [ntdev] How to increase throughput using IRP based winsock

    Dear All,
    Currently we are using windows kernel sockets for communicating client device. IRP's are used for notifying completion of send and receive data. By using KeQueryPerformanceCounter, we are observing it is taking so much time for after WskSend or WskReceive to get completing request. which in turn reducing performance. WSK_FLAG_NODELAY is using in case of send packets. In registry for tcip interface, tcpackfrequency and tcpnodelay is made 1. Please suggest any option for completing option for better performance wsk using IRPs
  • Don_BurnDon_Burn Posts: 1,623
    How many IRP's do you have in your pool? I had a customer with a similar
    complaint, adding more IRP's fixed the problem.


    Don Burn
    Windows Driver Consulting
    Website: http://www.windrvr.com




    -----Original Message-----
    From: xxxxx@lists.osr.com
    [mailto:xxxxx@lists.osr.com] On Behalf Of
    xxxxx@gmail.com
    Sent: Monday, January 08, 2018 2:04 AM
    To: Windows System Software Devs Interest List <xxxxx@lists.osr.com>
    Subject: [ntdev] How to increase throughput using IRP based winsock

    Dear All,
    Currently we are using windows kernel sockets for communicating client
    device. IRP's are used for notifying completion of send and receive data. By
    using KeQueryPerformanceCounter, we are observing it is taking so much time
    for after WskSend or WskReceive to get completing request. which in turn
    reducing performance. WSK_FLAG_NODELAY is using in case of send packets. In
    registry for tcip interface, tcpackfrequency and tcpnodelay is made 1.
    Please suggest any option for completing option for better performance wsk
    using IRPs

    ---
    NTDEV is sponsored by OSR

    Visit the list online at:
    <http://www.osronline.com/showlists.cfm?list=ntdev>;

    MONTHLY seminars on crash dump analysis, WDF, Windows internals and software
    drivers!
    Details at <http://www.osr.com/seminars>;

    To unsubscribe, visit the List Server section of OSR Online at
    <http://www.osronline.com/page.cfm?name=ListServer>;
  • davs_shrusdavs_shrus Posts: 11
    @Don Burn
    We are allocating IRP's for every send or receive request using IoAllocateIrp and allocating WSK buffer using IoAllocateMdl with MmBuildMdlForNonPagedPool option. once the request is completed freeing irps and wsk buffer. Currently we are not allocating pool irps.

    How many IRP's do you have in your pool? I had a customer with a similar
    complaint, adding more IRP's fixed the problem.
  • Don_BurnDon_Burn Posts: 1,623
    I've always created a pool of IRP's with the MDL and buffer, then used
    those. The allocation costs can be high.


    Don Burn
    Windows Driver Consulting
    Website: http://www.windrvr.com



    -----Original Message-----
    From: xxxxx@lists.osr.com
    [mailto:xxxxx@lists.osr.com] On Behalf Of
    xxxxx@gmail.com
    Sent: Tuesday, January 09, 2018 6:41 AM
    To: Windows System Software Devs Interest List <xxxxx@lists.osr.com>
    Subject: RE:[ntdev] How to increase throughput using IRP based winsock

    @Don Burn
    We are allocating IRP's for every send or receive request using
    IoAllocateIrp and allocating WSK buffer using IoAllocateMdl with
    MmBuildMdlForNonPagedPool option. once the request is completed freeing
    irps and wsk buffer. Currently we are not allocating pool irps.

    How many IRP's do you have in your pool? I had a customer with a similar
    complaint, adding more IRP's fixed the problem.

    ---
    NTDEV is sponsored by OSR

    Visit the list online at:
    <http://www.osronline.com/showlists.cfm?list=ntdev>;

    MONTHLY seminars on crash dump analysis, WDF, Windows internals and software
    drivers!
    Details at <http://www.osr.com/seminars>;

    To unsubscribe, visit the List Server section of OSR Online at
    <http://www.osronline.com/page.cfm?name=ListServer>;
  • Jan_BottorffJan_Bottorff Posts: 468
    My understanding was IRPs and MDLs already come from an OS pool, although I agree with Don than some cpu cycles can potentially be shaved off using a private local pool. It hasn't been established that things are cpu cycle limited, so personally would be analyzing where the bottleneck is before trying to shave cpu cycles.

    You haven't told us what you consider as "too slow". Are you doing 10M tiny requests/sec or are you doing 100 huge transfers/sec. As I remember, kernel winsock transfers are not buffered, so if you do tiny writes you will get many tiny packets, which will be very bad for performance. If you look at some basic performance numbers is the cpu use high, or is cpu use low. If 99% of the request time is spent waiting for a network reply packet, it won't help much to shave a few cycles of cpu time off. Are you using a single socket or many sockets?

    The first thing I usually do when looking at network performance is I put a packet sniffer on the wire and look at things like remote node response time.

    I'd also suggest adding some ETW tracing which will generate pretty high resolution (microsecond or less) performance data. I's easy to include context data so you can filter traces on a specific request. The ETW TraceLogging facility is very easy to get useful data from. If the performance really is cpu cycle/memory bandwidth limited, Intel's VTune tool is the best way I know measure where the time goes.

    Jan


    -----Original Message-----
    From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
    Sent: Tuesday, January 9, 2018 3:41 AM
    To: Windows System Software Devs Interest List <xxxxx@lists.osr.com>
    Subject: RE:[ntdev] How to increase throughput using IRP based winsock

    @Don Burn
    We are allocating IRP's for every send or receive request using IoAllocateIrp and allocating WSK buffer using IoAllocateMdl with MmBuildMdlForNonPagedPool option. once the request is completed freeing irps and wsk buffer. Currently we are not allocating pool irps.

    How many IRP's do you have in your pool? I had a customer with a similar complaint, adding more IRP's fixed the problem.

    ---
    NTDEV is sponsored by OSR

    Visit the list online at: <http://www.osronline.com/showlists.cfm?list=ntdev>;

    MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
    Details at <http://www.osr.com/seminars>;

    To unsubscribe, visit the List Server section of OSR Online at <http://www.osronline.com/page.cfm?name=ListServer>;
  • @Jan Bottorff
    I am doing huge transfer, may be around 1gb data transferring. which around in one transfer sending 64kb data through socket.

    we are using single socket for read and write data, does it make any difference in case of multiple sockets?

    from wireshark logs, RTT(158ms for wreiting 64kb data) taken almost equal to the winsock IRP completion(158ms) there is not much difference.

    But transfer speeds are relatively very low. we are testing in 5ghz(ac) network.
  • Tim_RobertsTim_Roberts Posts: 12,622
    xxxxx@gmail.com wrote:
    > I am doing huge transfer, may be around 1gb data transferring. which around in one transfer sending 64kb data through socket.
    >
    > we are using single socket for read and write data, does it make any difference in case of multiple sockets?
    >
    > from wireshark logs, RTT(158ms for wreiting 64kb data) taken almost equal to the winsock IRP completion(158ms) there is not much difference.
    >
    > But transfer speeds are relatively very low. we are testing in 5ghz(ac) network.

    What to you mean by "very low," exactly?  As soon as you mention that
    you are testing a wifi network, alarm bells go off all over.  Wifi
    performance is heavily dependent on a hundred different environmental
    issues, like router location, antenna orientation, building composition,
    furniture location, RF interference, reflections, wall positions, and
    many others.

    The best focused tests in the real world are getting about 300Mbps from
    802.11ac, which would be 40 MB/s, about the speed of USB 2.  Your
    gigabyte transfer would take 25 to 30 seconds.  What are you actually
    seeing?

    --
    Tim Roberts, xxxxx@probo.com
    Providenza & Boekelheide, Inc.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • MBondMBond Posts: 843
    Are you saying that you send 64kb (KB?) of data on a socket connection and then wait for an ack from the other end before sending the next block?



    If so, your performance is doomed to be terrible for the same reason that many well known and understood TCP and UDP protocols are. The ?chatty? problem; or more appropriately the lack of sufficient pipelining of data on high bandwidth, high latency networks ? which nowadays includes most LANs as well as the traditional problem with WAN throughput. Many protocol specific workarounds have been implemented, but the best advise is to follow the original advice from the designers of the TCP protocol and imagine that there must be a certain ?window? of data in transit at all times. The size of such a window will depend on how much data there is to send naturally, but also on the ability of the remote host to process or buffer it (usually not a concern now a days), but more importantly on the bandwidth latency product of the _current_ network path ? which is of course subject to change without notice during the course of your communications



    I don?t know what a ?5ghz(ac) network? is, but 158ms for an IRP completion is an eternity unless it is synchronous and waiting for some remote response



    Sent from Mail for Windows 10



    ________________________________
    From: xxxxx@lists.osr.com on behalf of xxxxx@gmail.com
    Sent: Wednesday, January 10, 2018 6:19:10 AM
    To: Windows System Software Devs Interest List
    Subject: RE:[ntdev] How to increase throughput using IRP based winsock


    @Jan Bottorff
    I am doing huge transfer, may be around 1gb data transferring. which around in one transfer sending 64kb data through socket.

    we are using single socket for read and write data, does it make any difference in case of multiple sockets?

    from wireshark logs, RTT(158ms for wreiting 64kb data) taken almost equal to the winsock IRP completion(158ms) there is not much difference.

    But transfer speeds are relatively very low. we are testing in 5ghz(ac) network.





    ---
    NTDEV is sponsored by OSR

    Visit the list online at:

    MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
    Details at

    To unsubscribe, visit the List Server section of OSR Online at
  • Tim_RobertsTim_Roberts Posts: 12,622
    On Jan 10, 2018, at 4:19 PM, xxxxx@hotmail.com wrote:
    >
    > I don’t know what a ‘5ghz(ac) network’ is,

    Oh, I think you probably do. 802.11ac wifi has its channels in the 5GHz band.

    Tim Roberts, xxxxx@probo.com
    Providenza & Boekelheide, Inc.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

  • Chris_AseltineChris_Aseltine Posts: 1,228
    Tim Roberts wrote:

    > The best focused tests in the real world are getting about 300Mbps
    > from 802.11ac, which would be 40 MB/s, about the speed of USB 2.

    Glug glug glug...
  • MBondMBond Posts: 843
    Ah yes, wireless fun! I just didn?t recognize that particular way of describing it.



    Again for the OP, over a high latency network, you will need a pipeline of some kind. All Wifi is high latency even when you test an EM dampened room with nothing but the test harness.





    Can we assume TCP?



    Sent from Mail for Windows 10



    ________________________________
    From: xxxxx@lists.osr.com on behalf of xxxxx@gmail.com
    Sent: Thursday, January 11, 2018 2:46:01 PM
    To: Windows System Software Devs Interest List
    Subject: RE:[ntdev] How to increase throughput using IRP based winsock

    Tim Roberts wrote:

    > The best focused tests in the real world are getting about 300Mbps
    > from 802.11ac, which would be 40 MB/s, about the speed of USB 2.

    Glug glug glug...

    ---
    NTDEV is sponsored by OSR

    Visit the list online at:

    MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
    Details at

    To unsubscribe, visit the List Server section of OSR Online at
  • @ MM
    yes its TCP socket(AF_INET, SOCK_STREAM, IPPROTO_TCP).
    @tim
    802.11ac wifi has its channels in the 5GHz band
    yes it is having..

    creating multiple irps for data transfer increased performance little currently write speed is around 4~5 mbps and read around 8 mbps with 802.11 ac 5ghz channel. previously after each socket send and receive, waiting for IRP completion.

    Testing in shielded room.
  • Tim_RobertsTim_Roberts Posts: 12,622
    xxxxx@gmail.com wrote:
    > @MM
    > yes its TCP socket(AF_INET, SOCK_STREAM, IPPROTO_TCP).
    > @tim
    > 802.11ac wifi has its channels in the 5GHz band

    True, but that is not related to the data rate.  The data rate depends
    on the channel width, the modulation scheme, and the number of separate
    antennae you are using.


    > creating multiple irps for data transfer increased performance little currently write speed is around 4~5 mbps and read around 8 mbps with 802.11 ac 5ghz channel. previously after each socket send and receive, waiting for IRP completion.
    >
    > Testing in shielded room.

    How wide are the channels, how many antennae, how many streams, what
    kind of modulation?  There's a chart on the 802.11ac page on Wikipedia
    that describes the range of data rates that you should expect.  I'd bet
    real dollars that your operating system processing is not impacting your
    transfer rate in any way.

    --
    Tim Roberts, xxxxx@probo.com
    Providenza & Boekelheide, Inc.

    Tim Roberts, [email protected]
    Providenza & Boekelheide, Inc.

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!