Issue with TCP/IP callbacks. Callbacks received on a particular core in multiple environment

Hi All,

I’m developing a block device driver talking to a storage device over TCP/IP. I’m using WskSend and WskReceive for sending and receiving the data from target device.
I have 8 cores in my PC, when issuing commands from multiple cores (C0, C1, C2, C3, C4, C5, C6, C7) getting all the WSK callbacks runs on one particular core (C0).

These are the APIs which I’m using.
IoSetCompletionRoutine(irp, CommandCallback, pContext, TRUE, TRUE, TRUE);

WskSend(WskSocket, &wskbuf, WSK_FLAG_NODELAY, irp);

WskReceive(WskSocket, &wskbuf, WSK_FLAG_WAITALL, irp);

OS is Windows Server 2012 R2 & Windows 10.

Why the callbacks run on one particular core? This callbacks are running on NIC driver context. Can’t it be scheduled to run on multiple cores which I believe will improve performance?

Regards,
Vikash Kumar

If you have a single socket, then the OS will try to keep it on a single core, for cache locality. Obvious the OS can’t dictate which core you’re on when you initiate a new WskSend, but callbacks will be funnelled up on the core for that socket. (Note that the core may change over time, due to rebalancing.)

Windows can distribute a network workload across multiple CPUs – this is called RSS. But the unit of granularity of RSS is the socket; Windows goes to some great lengths to avoid splitting a socket across multiple cores. Imagine if it did not: you could have the WskReceive completions running on two processors in parallel, but the first thing you’re likely to do is grab a spinlock to protect your internal state… thus defeating all the parallelism.

If you want to throw more cores at a networking problem, then you will need (a) a network driver that supports RSS, and (b) to split your workload across multiple sockets.

Hi Jeffrey,

Thanks for the reply. My system is having 8 cores and I have created 8 sockets. I am pushing the IOs to socket based on the core number.

I was expecting the callbacks to arrive on different cores but callbacks are always coming on one particular core for all the connections.

I checked NIC driver properties in device manager in my PC, Its RSS enabled also tried some different RSS load balancing properties but still same behaviour.