Windows System Software -- Consulting, Training, Development -- Unique Expertise, Guaranteed Results

Sept/Oct 2019 Issue of The NT Insider available

Download PDF here:

It’s a particularly BIG issue, too: 40 pages of technical goodness, ranging from WDF to Minifilters. Check it out.
Before Posting...
Please check out the Community Guidelines in the Announcements and Administration Category.

Issue with TCP/IP callbacks. Callbacks received on a particular core in multiple environment

VikashVikash Member Posts: 5

Hi All,

I’m developing a block device driver talking to a storage device over TCP/IP. I’m using WskSend and WskReceive for sending and receiving the data from target device.
I have 8 cores in my PC, when issuing commands from multiple cores (C0, C1, C2, C3, C4, C5, C6, C7) getting all the WSK callbacks runs on one particular core (C0).

These are the APIs which I’m using.
IoSetCompletionRoutine(irp, CommandCallback, pContext, TRUE, TRUE, TRUE);

WskSend(WskSocket, &wskbuf, WSK_FLAG_NODELAY, irp);

WskReceive(WskSocket, &wskbuf, WSK_FLAG_WAITALL, irp);

OS is Windows Server 2012 R2 & Windows 10.

Why the callbacks run on one particular core? This callbacks are running on NIC driver context. Can't it be scheduled to run on multiple cores which I believe will improve performance?

Vikash Kumar


  • Jeffrey_Tippet_[MSFT]Jeffrey_Tippet_[MSFT] Member - All Emails Posts: 552

    If you have a single socket, then the OS will try to keep it on a single core, for cache locality. Obvious the OS can't dictate which core you're on when you initiate a new WskSend, but callbacks will be funnelled up on the core for that socket. (Note that the core may change over time, due to rebalancing.)

    Windows can distribute a network workload across multiple CPUs -- this is called RSS. But the unit of granularity of RSS is the socket; Windows goes to some great lengths to avoid splitting a socket across multiple cores. Imagine if it did not: you could have the WskReceive completions running on two processors in parallel, but the first thing you're likely to do is grab a spinlock to protect your internal state... thus defeating all the parallelism.

    If you want to throw more cores at a networking problem, then you will need (a) a network driver that supports RSS, and (b) to split your workload across multiple sockets.

  • VikashVikash Member Posts: 5

    Hi Jeffrey,

    Thanks for the reply. My system is having 8 cores and I have created 8 sockets. I am pushing the IOs to socket based on the core number.

    I was expecting the callbacks to arrive on different cores but callbacks are always coming on one particular core for all the connections.

    I checked NIC driver properties in device manager in my PC, Its RSS enabled also tried some different RSS load balancing properties but still same behaviour.

Sign In or Register to comment.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Upcoming OSR Seminars
Writing WDF Drivers 21 Oct 2019 OSR Seminar Space & ONLINE
Internals & Software Drivers 18 Nov 2019 Dulles, VA
Kernel Debugging 30 Mar 2020 OSR Seminar Space
Developing Minifilters 27 Apr 2020 OSR Seminar Space & ONLINE