Implementing PROCESSOR_GROUPS

Hello,

I am trying to implement cpu groups feature in windows to miniport driver. I see the interrupt member in _IO_RESOURCE_DESCRIPTOR being defined as
{

#if
IRQ_DEVICE_POLICY AffinityPolicy;
USHORT Group;
#else
IRQ_DEVICE_POLICY AffinityPolicy;
#endif
IRQ_PRIORITY PriorityPolicy;
KAFFINITY TargetedProcessors;
} Interrupt;

Specifies a processor group number. Group is a valid (but optional) member of u.Interrupt only in Windows 7 and later versions of Windows. This member exists only if NT_PROCESSOR_GROUPS is defined at compile time.

So, I need help in defining NT_PROCESSOR_GROUPS during compile time. Which header files I need to include to get this #define included? Any other information to implement this feature would be helpful.

Thanks

You define it yourself, if you need to use the processor group feature.

You can #define it in the source code, or set it as a preprocessor define using a compiler option.

It’s that easy.

Peter

Thanks a lot Peter. I tried that and it worked. However, I see that RSS indirection table does not show group information.

I have configured groupsize to be 4 and I have 8 logical processors. Below output shows correct cpu group and processor number for MaxProcessor and RssProcessorArray. However, rss indirection table shared by stack in OID_GEN_RECEIVE_SCALE_PARAMETERS contains processors from group 0 only. I am not sure why is that the case. Does RSS ind table by default use processors from group 0? And Is there a command or a way to modify indirection table to set different processors in indirection table?

I have also set appropriate processor and group info for interrupts in IO_RESOURCE_LIST members.
ird->u.Interrupt.Group
ird->u.Interrupt.TargetedProcessors
ird->u.Interrupt.AffinityPolicy = IrqPolicySpecifiedProcessors

PS C:\Windows\system32> Get-NetAdapterRSS -Name Ethernet2
Name : Ethernet2
InterfaceDescription : Ethernet Adapter
Enabled : True
NumberOfReceiveQueues : 8
Profile : NUMAStatic
BaseProcessor: [Group:Number] : 0:0
MaxProcessor: [Group:Number] : 1:3
MaxProcessors : 8
RssProcessorArray: [Group:Number/NUMA Distance] : 0:0/0 0:1/0 0:2/0 0:3/0 1:0/0 1:1/0 1:2/0 1:3/0
IndirectionTable: [Group:Number] : 0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3
0:0 0:1 0:2 0:3 0:0 0:1 0:2 0:3

Thanks

Sorry, you’re well into NDIS now and hence well beyond me.

Perhaps @“Jeffrey_Tippet_[MSFT]” will come around and comment.

Peter

If not already look at RSSProfile at https://docs.microsoft.com/en-us/windows-hardware/drivers/network/standardized-inf-keywords-for-rss
Seems there is Set-NetAdapterRSS -profile to change the profile.

@msr has the right idea.

I’ll fill in the background a bit. The RSS indirection table is the result of a handshake between several components, so it’s not always clear where it comes from.

First, the driver INF sets up some keywords, like *RSS. The administrator has a chance to edit these keywords.

Next, the administrator can set some RSS configuration, like MaxProcessors, via Set-NetAdapterRss. These aren’t exactly keywords, since they don’t come from the driver INF.

NDIS captures the processor topology at boot, and sifts through it:

  • “Hyperthreaded” processors are ignored
  • Processors are sorted by their NUMA distance to the NIC (which is itself ideally obtained from ACPI SRAT)
  • Processors that are excluded by the administrator’s RSS configuration above are ignored, e.g. any processor number less than BaseProcessor.

NDIS publishes (NdisGetRssProcessorInformation) the resulting RSS candidate processor list to both the NIC driver (so it can allocate interrupt vectors) and protocol drivers (so they can cook up the final indirection table).

Finally, the protocol driver makes the ultimate decision on which processors to put into the indirection table. The protocol is allowed to select any processors it likes from the RSS candidate processor set. NDIS does inform the protocol of the “RSS profile” that the administrator selected, but the protocol is not obligated to honor the profile. In fact, Windows comes with 2 protocol drivers that can use RSS: TCPIP and VMSWITCH, and currently only the former honors the RSS profile.

So to return to the original question: the precise choice of processor numbers in the indirection table comes from either TCPIP or VMSWITCH. If you don’t have an external vSwitch over the NIC, it’s TCPIP. By default, TCPIP tries to avoid spanning processor groups (actually, NUMA nodes – all nodes are groups, but not all groups are nodes). But you can nudge it into doing so, using different Profile hints.

If, on the other hand, you’re using VMSWITCH, then you currently don’t get even that amount of control over its indirection table algorithm. (It’s possible that VMSWITCH will implement support for some or all RSS profiles in a future release, or add some other mechanism for more administrator control.)

Hello Jeffrey

Thanks for you response. I agree RSSProfile can be used to indicate the protocol driver to select RSS processors based on their NUMA distance, however, how do we tell it to use processors from different groups. There is no VMSwitch, so TCPIP would be deciding the indirection table.

My system does not have 64 processors, so to test processor groups feature, I have configured 8 logical processors into 2 groups using

bcdedit.exe /set groupsize 4
bcdedit.exe /set groupaware on

So, when Get-NetAdapterRSS is done, MaxProcessor and RSSProcessorArray show group information, but I still do not see groups used in indirection table. I tried different RSSProfiles as well, but same result.

I also printed NdisGetRssProcessorInformation() and it, too, shows proper group information.

FilterResourceRequirements: NdisGetRssProcessorInformation() provided revision 2
FilterResourceRequirements: NdisGetRssProcessorInformation() provided max: group 1 num 3
FilterResourceRequirements: NdisGetRssProcessorInformation() provided profile: 4

And accordingly, interrupts are assigned to appropriate processor from respective group. So, I am not sure what is missing or why TCPIP protocol driver does not use processors from group 1 for indirection table.

Hmm, my understanding of TCPIP is that it will choose to span processor groups, if you set it to Numa or NumaStatic modes. But I didn’t write that algorithm, so maybe I’m wrong. It’s possible that TCPIP only uses NUMA nodes that all share the same processor group, but that seems like a weird and unnecessary limitation.

I’ll ask around internally and report back if I find out something interesting.

Meanwhile, you can also try your luck with the HLK: it has a dedicated test for RSS, which should exercise the feature more exhaustively than TCPIP does.

Thanks Jeffrey. Yes, My next step is to run CPU group test and RSS tests of HLK and see the results.

Let me know if you get any other information regarding this.

Thanks

this statement can’t be right

actually, NUMA nodes – all nodes are groups, but not all groups are nodes

https://docs.microsoft.com/en-us/windows/win32/procthread/processor-groups

processor groups span NUMA nodes and speical special applications aware of processor goups can span them when ordinary processes don’t. SQL server is again the cannonical example

I am trying to implement cpu groups feature in windows to miniport driver.

What I would advise you to do is to take a short break from your “endeavours”, and to read the article that Marion refers to

I have configured groupsize to be 4 and I have 8 logical processors.

https://docs.microsoft.com/en-us/windows/win32/procthread/processor-groups

[begin quote]

Support for systems that have more than 64 logical processors is based on the concept of a processor group, which is a static set of up to 64 logical processors that is treated as a single scheduling entity. Processor groups are numbered starting with 0. Systems with fewer than 64 logical processors always have a single group, Group 0.

Windows Server 2008, Windows Vista, Windows Server 2003 and Windows XP: Processor groups are not supported.

When the system starts, the operating system creates processor groups and assigns logical processors to the groups. If the system is capable of hot-adding processors, the operating system allows space in groups for processors that might arrive while the system is running. The operating system minimizes the number of groups in a system. For example, a system with 128 logical processors would have two processor groups with 64 processors in each group, not four groups with 32 logical processors in each group.

[end quote]

Anton Bassov

this statement can’t be right

actually, NUMA nodes – all nodes are groups, but not all groups are nodes

It certainly isn’t :wink:

I had forgotten the complicated relationship between nodes and groups, and incorrectly thought I could simplify the relationship down to once parenthetical. I don’t think it’s possible to describe in one sentence – interested parties had better just read the pages that you’ve linked.

I’m glad that you agree 'cause I was worried for a moment that what I ‘know’ might not be right