improve CPU cores affinity to NUMA nodes v8 #12336

lukashino · 2025-01-03T16:49:36Z

Followup of #12316

Redmine ticket:
https://redmine.openinfosecfoundation.org/issues/7036
https://redmine.openinfosecfoundation.org/issues/2321

This work allows more precise thread assignment and it is done either

automatically - by picking assigned cores of the same NUMA locality as the interface from the worker-cpu-list
manually - you can specify interface-specific settings in the threading section

This works with AF-PACKET and DPDK (and should with other capture modes as well). The primary target is workers runmode but it also works with autofp receive threads.

To try this PR:

compile Suri from this branch
Modify the newly generated suricata.yaml according to your setup.
Run as usual.

Describe changes:

v8:

interface-specific settings now don't require hwloc dependency
CPU affinity refactoring changes segregated to a separate commit
added docs - upgrade docs and general threading docs
a little change in a title

v7:

hwloc is now an optional dependency - it can be enabled with --enable-hwloc in the configure step,
autopinning / autooptimizing is configurable from threading.autopin suri.yaml option,
making CI happy (and with that fixing a few bugs)

v6

support for autofp mode supported - receive-cpu-set can now contain per interface CPU affinity settings - here comes a question - should worker-cpu-set in autofp be somehow configurable to the receive-cpu-set threads?
YAML node "per-iface" changed to "interface-specific-cpu-set"
list form of (management/worker/...)-cpu-set was changed to simple tree nodes
cpu/prio/mode properties of (management/worker/...)-cpu-set nodes moved one YAML layer up
DPDK (net_bonding / PCIe address / ... ) / AFP interfaces are supported

On YAML changes:

CPU affinity section in suricata.yaml has been changed to (see suricata.yaml.in for the full picture):

threading:
  cpu-affinity:
    worker-cpu-set:
      cpu: [ "all" ]
      mode: "exclusive"
      interface-specific-cpu-set:
        - interface: "net_bonding0" # 0000:3b:00.0 # net_bonding0 # ens1f0
          cpu: [ "all" ]
        - interface: "net_bonding1"
          cpu: [ "all" ]

Below is the reasoning for it.

When starting this work I wanted to follow the same structure as CPU set nodes that is:

threading:
  cpu-affinity:
    - worker-cpu-set:
        cpu: [ "all" ]
        mode: "exclusive"

converted to JSON it looks like:

{
    "threading": {
        "cpu-affinity": [
            {
                "worker-cpu-set": {
                    "cpu": [
                        "all"
                    ],
                    "mode": "exclusive"
                }
            }
        ]
    }
}

So worker-cpu-set is an object by itself, it holds the key that is named "worker-cpu-set" and only that contains its children properties (CPU, prio, etc.).

But when adding the interface-specific node I found out that having the interface names directly as the keys for the object storing the properties (CPU, prio, etc.) can change its name to something else, in this case to net-bonding0.

threading:
  cpu-affinity:
    - worker-cpu-set:
        cpu: [ "all" ]
        mode: "exclusive"
        interface-specific-cpu-set:
          - net_bonding0:
              cpu: [ "all" ]
          - net_bonding1:
              cpu: [ "all" ]

So adding it as a value:

threading:
  cpu-affinity:
    - worker-cpu-set:
        cpu: [ "all" ]
        mode: "exclusive"
        interface-specific-cpu-set:
          - interface: net_bonding0
              cpu: [ "all" ]
          - interface: net_bonding1
              cpu: [ "all" ]

yields an invalid YAML construct.

Therefore, for interface-specific CPU affinity settings, I ended up putting it on the same level as the interface name itself. This follows the same structure as is established in e.g. capture-mode interface definition (dpdk.interfaces, af-packet).
Note: In the matter of this I transferred the base *-cpu-set nodes from the list items as part of the Ticket 2321.
The CPU affinity interface-specific design I ended up with is:

threading:
  cpu-affinity:
    worker-cpu-set:
          cpu: [ "all" ]
          mode: "exclusive"
          interface-specific-cpu-set:
            - interface: "net_bonding0" # 0000:3b:00.0 # net_bonding0 # ens1f0
              cpu: [ "all" ]
            - interface: "net_bonding1"
              cpu: [ "all" ]

JSON visualization of this:

{
    "threading": {
        "cpu-affinity": {
            "worker-cpu-set": {
                "cpu": [
                    "all"
                ],
                "mode": "exclusive",
                "interface-specific-cpu-set": [
                    {
                        "interface": "net_bonding0",
                        "cpu": [
                            "all"
                        ]
                    },
                    {
                        "interface": "net_bonding1",
                        "cpu": [
                            "all"
                        ]
                    }
                ]
            }
        }
    }
}

This leaves the base {worker,receive,management}-cpu-set nodes in the original setting and as we know it. The internal code can handle this because the names of these are known beforehand and can "step down" to the children level to inspect cpu-affinity properties. The dynamic part - interface-specific-cpu-set contains all properties on the same level.

Other unsuccessful YAML designs

Specifying the NUMA affinity in the individual capture modes

Example

dpdk:
  - interface: "3b:00.0"
    threads: [ 2, 4, 6, 8]

But management / other threads would still be required to configure elsewhere. I liked the proposed approach more because all affinity-related settings are centralized.

Not having a list nodes in the YAML

threading:
  worker-cpu-set:
    net_bonding0:
      cpu: [ 1, 3, 5 ]

YAML library changes net_bonding to net-bonding when YAML is loaded - so this is a nogo.

Have interface-related properties as children

threading:
  worker-cpu-set:
    - interface: net_bonding0
        cpu: [ 1, 3, 5 ]

This is a disallowed construct in YAML, so another nogo. A similar pattern is currently with *-cpu-set nodes but the different here is that the parent list item has a value (in this case net_bonding0). That is what is causing problems. Interface related therefore need to right under "interface".

PRO TIP: A good way to visualize different YAML structures is by using YAML to JSON converters. Currently it creates object like:

[
  {
    "interface": "3b:00.0",
    "mode": "exclusive",
  }
]

Split the code into multiple functions for easier readability.

Part of Ticket 2321 work to remove unnecessary lists from the config file. Ticket: 2321

Provide backward compatibility with the previous configuration format to allow smooth transition to the new format. The commit adds docs about the new format and the introduced changes.

Using the new configuration format, it is now possible to set CPU affinity settings per interface. The threading.autopin option has been added to automatically use CPUs from the same NUMA node as the interface. The autopin option requires hwloc-devel / hwloc-dev to be installed and --enable-hwloc flag in configure script. Ticket: 7036

codecov · 2025-01-03T17:12:44Z

Codecov Report

Attention: Patch coverage is 46.68588% with 185 lines in your changes missing coverage. Please review.

Project coverage is 83.20%. Comparing base (6f937c7) to head (a717c60).
Report is 5 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #12336      +/-   ##
==========================================
- Coverage   83.26%   83.20%   -0.07%     
==========================================
  Files         912      913       +1     
  Lines      257643   257877     +234     
==========================================
+ Hits       214521   214557      +36     
- Misses      43122    43320     +198

Flag	Coverage Δ
fuzzcorpus	`61.13% <0.00%> (-0.01%)`	⬇️
livemode	`19.43% <46.68%> (+0.03%)`	⬆️
pcap	`44.36% <0.34%> (-0.07%)`	⬇️
suricata-verify	`62.79% <0.34%> (-0.07%)`	⬇️
unittests	`59.13% <0.00%> (-0.06%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

suricata-qa · 2025-01-03T19:15:04Z

Information: QA ran without warnings.

Pipeline 24070

jlucovsky · 2025-01-05T13:16:31Z

src/util-affinity.c

+    if (cpu_node != NULL) {
+        BuildCpuset(setname, cpu_node, &taf->cpu_set);
+    } else {
+        SCLogInfo("Unable to find 'cpu' node for set %s", setname);


What are the consequences of not finding the CPU node for the set?

An info message is emitted -- should it be a warning? If there are no consequences, should it be a debug?

Changed to warning as cpu setting in cpu-affinity node is quite crucial. Other properties - mode, priority left to debug printouts as they are less important in my opinion. When they are missing, the default values are picked up. That is e.g.

{ .name = "worker-cpu-set", .mode_flag = EXCLUSIVE_AFFINITY, .prio = PRIO_MEDIUM, .lcpu = { 0 }, },

jlucovsky · 2025-01-05T13:16:52Z

src/util-affinity.c

+    if (node != NULL) {
+        BuildCpuset(setname, node, cpuset);
+    } else {
+        SCLogDebug("Unable to find '%s' priority for set %s", priority, setname);


Same comment regarding consequences and message level.

Staying at the debug as it is not critical for prio and mode and sane defaults are picked up.

jlucovsky · 2025-01-05T13:28:13Z

src/util-affinity.c


+static bool AllCPUsUsed(ThreadsAffinityType *taf)
+{
+    if (taf->lcpu < UtilCpuGetNumProcessorsOnline()) {


Suggest return taf->lcpu >= UtilCpuGetNumProcessorsOnline();

Ah, I see why the comment has "Outdated" label - this was changed in the subsequent comment.
When implementing the new per-interface CPU affinity settings I also refactored the code a bit. But to not have 2 changes (feature + refactoring) in one commit I divided it into two - so first the refactoring of the old codebase and then implementing the newly added feature.

This gets changed in the follow-up commit to this:

static bool AllCPUsUsed(ThreadsAffinityType *taf) { for (int i = 0; i < MAX_NUMA_NODES; i++) { if (taf->lcpu[i] < UtilCpuGetNumProcessorsOnline()) { return false; } } return true; }

so if it is ok, I would leave it as is.

jlucovsky · 2025-01-05T13:29:34Z

src/util-affinity.c

+static uint16_t GetNextAvailableCPU(ThreadsAffinityType *taf)
+{
+    uint16_t cpu = taf->lcpu;
+    int attempts = 0;


Wondering why there are 2 attempts; this seems non-deterministic? What will change between the 1st and 2nd attempts?

This is relevant for the newer version too.

TLDR: To make it more readable, I added an extra check for an empty CPU set.

Full explanation:

There are two attempts to try to get some CPU, if you are on the 2nd (3rd pass) then you know your CPU affinity is incorrectly configured (empty).

So say you have 4 CPU cores.
Affinity is configured as:
CPU: [ 1,2 ]

Your interface is configured to run on 4 workers.

The affinity is converted to bits as 0 1 1 0 (1 is on indexes which CPUs we want to reside on).
The affinity variable (taf) has a property lcpu - the last CPU used to know on which core to continue in-between the worker assignment.

Worker 1
gets core 1, sets lcpu to 1 in the first pass

Worker 2
gets core 2, sets lcpu to 2 in the first pass

Worker 3
gets core 1, sets lcpu to 1 in the second pass - it had to evaluate core 3, then lcpu was reset to 0 and the number of attempts was increased by one

Worker 4
gets core 2, sets lcpu to 2 in the first pass

If you run into the third pass then you know something is wrong. I believe that can only happen with an empty CPU set so cpu: [ ]

But then I reevaluated the point and to make it easier to comprehend (to not get up until here) I added an extra check prior to running this function by checking CPU count
if (CPU_COUNT(&taf->cpu_set) == 0) {

and added extra comment to this function.

jlucovsky · 2025-01-05T13:35:21Z

src/util-affinity.c

-                FatalError("unknown cpu_affinity node");
-            }
-        }
+        SCLogConfig("Found affinity definition for \"%s\"", setname);


nit: suggest consistent terminology usage -- both "affinity" and "CPU affinity" are used in messages.

tried to convert all "affinity" usages to "CPU affinity"

jlucovsky · 2025-01-05T14:45:19Z

src/util-affinity.c

+        if_obj = HwLocDeviceGetByPcie(topology, tv->iface_name);
+    }
+
+    if (if_obj != NULL) {


Suggest using this to conditionalize call to HwlocObjectDump

if (sc_log_global_log_level >= SC_LOG_DEBUG)

FYI, I've found out there is SCLogGetLogLevel() function to get the log lvl

jlucovsky · 2025-01-05T14:48:41Z

src/util-affinity.c

+}
+#endif /* HAVE_HWLOC */
+
+static bool CPUIsFromNuma(uint16_t ncpu, uint16_t numa)


This is called from a while loop in one instance. Could the CPU set for the NUMA node be retrieved once and then compared multiple times instead?

jlucovsky · 2025-01-05T14:49:54Z

src/util-affinity.c

+        }
+    }
+
+    if (cpu >= 0)


nit: prefer use of { and }

jlucovsky · 2025-01-05T14:51:16Z

src/util-affinity.c

+        if (!printed) {
+            printed = true;
+            SCLogWarning("threading.autopin option is enabled but hwloc support is not available. "
+                         "Please recompile Suricata with hwloc support to enable this feature.");


nit: should mention reconfigure similar to existing log messages, e.g,

"NFQUEUE not enabled. Make sure to pass --enable-nfqueue to configure when"

jlucovsky · 2025-01-05T14:52:03Z

src/util-runmodes.c

@@ -175,6 +175,7 @@ int RunModeSetLiveCaptureAutoFp(ConfigIfaceParserFunc ConfigParser,
                    FatalError("TmThreadsCreate failed");
                }
                tv_receive->printable_name = printable_threadname;
+                tv_receive->iface_name = dev ? strdup(dev) : NULL;


handle strdup failure.

victorjulien · 2025-01-06T06:29:48Z

Can you update the PR description to start with what the scheme is in the PR? I has a helpful discussion of what it does not do, which leaves me guessing a bit :)

lukashino · 2025-01-06T11:10:50Z

@victorjulien updated the PR description. The full example can be seen in suricata.yaml.in.
Reading through it, I thought that maybe the discussion there is a bit lengthy and winds into unnecessary details but it was an idea to lay down details and arguments why I did what I did so I'll not forget about them. You can view suricata.yaml.in and we can discuss on top of that.

@jlucovsky thanks for the feedback, I am going to address that shortly.

ct0br0 · 2025-01-08T17:34:27Z

skimming through and had a few questions.

it looks like if you use all then anything on that numa will be be used?

does it automatically, or could you specify, not to use other say management threads? !management or just !0 any core you want?

could this work for pcap mode too? specify say numa0

kind of a nit, "all" doesn't really sound direct, may seem like you're just going to use every core you can. "auto" or "numa" or some combination may be more specific.

lukashino · 2025-01-08T21:06:05Z

Moving part of the discussion points to #12359

Lukas Sismis added 10 commits January 3, 2025 16:24

runmodes: query the active runmode with a function call

6b05390

dpdk: move DPDK socket retrieval to utils

b2ee7e2

github-ci: install hwloc as a mandatory dependency

ffc2c4c

actions: test hwloc build

75aba5f

threading: refactor CPU affinity code

9c1c2e4

Split the code into multiple functions for easier readability.

util-affinity: move properties of *-cpu-set node one layer up in YAML

c4679b6

threading: transform *-cpu-set nodes from list items to nodes

1db10cd

Part of Ticket 2321 work to remove unnecessary lists from the config file. Ticket: 2321

threading: support previous threading configuration format

6cd2d1a

Provide backward compatibility with the previous configuration format to allow smooth transition to the new format. The commit adds docs about the new format and the introduced changes.

doc: remove title in threading section with no content

f11ca6f

lukashino requested review from jasonish, jufajardini, victorjulien and a team as code owners January 3, 2025 16:49

lukashino mentioned this pull request Jan 3, 2025

improve CPU cores affinity to NUMA nodes v7 #12316

Closed

jlucovsky reviewed Jan 5, 2025

View reviewed changes

lukashino mentioned this pull request Jan 8, 2025

improve CPU cores affinity to NUMA nodes v9 #12359

Closed

lukashino closed this Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve CPU cores affinity to NUMA nodes v8 #12336

improve CPU cores affinity to NUMA nodes v8 #12336

lukashino commented Jan 3, 2025 •

edited

Loading

codecov bot commented Jan 3, 2025 •

edited

Loading

suricata-qa commented Jan 3, 2025

jlucovsky Jan 5, 2025

lukashino Jan 7, 2025 •

edited

Loading

jlucovsky Jan 5, 2025

lukashino Jan 8, 2025

jlucovsky Jan 5, 2025

lukashino Jan 7, 2025

jlucovsky Jan 5, 2025

lukashino Jan 7, 2025

jlucovsky Jan 5, 2025

lukashino Jan 7, 2025

jlucovsky Jan 5, 2025

lukashino Jan 7, 2025

jlucovsky Jan 5, 2025

jlucovsky Jan 5, 2025

jlucovsky Jan 5, 2025

jlucovsky Jan 5, 2025

victorjulien commented Jan 6, 2025

lukashino commented Jan 6, 2025 •

edited

Loading

ct0br0 commented Jan 8, 2025

lukashino commented Jan 8, 2025

improve CPU cores affinity to NUMA nodes v8 #12336

improve CPU cores affinity to NUMA nodes v8 #12336

Conversation

lukashino commented Jan 3, 2025 • edited Loading

Describe changes:

On YAML changes:

Other unsuccessful YAML designs

Specifying the NUMA affinity in the individual capture modes

Not having a list nodes in the YAML

Have interface-related properties as children

codecov bot commented Jan 3, 2025 • edited Loading

Codecov Report

suricata-qa commented Jan 3, 2025

Choose a reason for hiding this comment

lukashino Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

victorjulien commented Jan 6, 2025

lukashino commented Jan 6, 2025 • edited Loading

ct0br0 commented Jan 8, 2025

lukashino commented Jan 8, 2025

lukashino commented Jan 3, 2025 •

edited

Loading

codecov bot commented Jan 3, 2025 •

edited

Loading

lukashino Jan 7, 2025 •

edited

Loading

lukashino commented Jan 6, 2025 •

edited

Loading