-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve CPU cores affinity to NUMA nodes v8 #12336
Conversation
Split the code into multiple functions for easier readability.
Part of Ticket 2321 work to remove unnecessary lists from the config file. Ticket: 2321
Provide backward compatibility with the previous configuration format to allow smooth transition to the new format. The commit adds docs about the new format and the introduced changes.
Using the new configuration format, it is now possible to set CPU affinity settings per interface. The threading.autopin option has been added to automatically use CPUs from the same NUMA node as the interface. The autopin option requires hwloc-devel / hwloc-dev to be installed and --enable-hwloc flag in configure script. Ticket: 7036
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #12336 +/- ##
==========================================
- Coverage 83.26% 83.20% -0.07%
==========================================
Files 912 913 +1
Lines 257643 257877 +234
==========================================
+ Hits 214521 214557 +36
- Misses 43122 43320 +198
Flags with carried forward coverage won't be shown. Click here to find out more. |
Information: QA ran without warnings. Pipeline 24070 |
if (cpu_node != NULL) { | ||
BuildCpuset(setname, cpu_node, &taf->cpu_set); | ||
} else { | ||
SCLogInfo("Unable to find 'cpu' node for set %s", setname); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the consequences of not finding the CPU node for the set?
An info message is emitted -- should it be a warning? If there are no consequences, should it be a debug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to warning as cpu setting in cpu-affinity node is quite crucial. Other properties - mode, priority left to debug printouts as they are less important in my opinion. When they are missing, the default values are picked up. That is e.g.
{
.name = "worker-cpu-set",
.mode_flag = EXCLUSIVE_AFFINITY,
.prio = PRIO_MEDIUM,
.lcpu = { 0 },
},
if (node != NULL) { | ||
BuildCpuset(setname, node, cpuset); | ||
} else { | ||
SCLogDebug("Unable to find '%s' priority for set %s", priority, setname); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment regarding consequences and message level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Staying at the debug as it is not critical for prio and mode and sane defaults are picked up.
src/util-affinity.c
Outdated
|
||
static bool AllCPUsUsed(ThreadsAffinityType *taf) | ||
{ | ||
if (taf->lcpu < UtilCpuGetNumProcessorsOnline()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest return taf->lcpu >= UtilCpuGetNumProcessorsOnline();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see why the comment has "Outdated" label - this was changed in the subsequent comment.
When implementing the new per-interface CPU affinity settings I also refactored the code a bit. But to not have 2 changes (feature + refactoring) in one commit I divided it into two - so first the refactoring of the old codebase and then implementing the newly added feature.
This gets changed in the follow-up commit to this:
static bool AllCPUsUsed(ThreadsAffinityType *taf)
{
for (int i = 0; i < MAX_NUMA_NODES; i++) {
if (taf->lcpu[i] < UtilCpuGetNumProcessorsOnline()) {
return false;
}
}
return true;
}
so if it is ok, I would leave it as is.
src/util-affinity.c
Outdated
static uint16_t GetNextAvailableCPU(ThreadsAffinityType *taf) | ||
{ | ||
uint16_t cpu = taf->lcpu; | ||
int attempts = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering why there are 2 attempts; this seems non-deterministic? What will change between the 1st and 2nd attempts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is relevant for the newer version too.
TLDR: To make it more readable, I added an extra check for an empty CPU set.
Full explanation:
There are two attempts to try to get some CPU, if you are on the 2nd (3rd pass) then you know your CPU affinity is incorrectly configured (empty).
So say you have 4 CPU cores.
Affinity is configured as:
CPU: [ 1,2 ]
Your interface is configured to run on 4 workers.
The affinity is converted to bits as 0 1 1 0 (1 is on indexes which CPUs we want to reside on).
The affinity variable (taf
) has a property lcpu - the last CPU used to know on which core to continue in-between the worker assignment.
Worker 1
gets core 1, sets lcpu to 1 in the first pass
Worker 2
gets core 2, sets lcpu to 2 in the first pass
Worker 3
gets core 1, sets lcpu to 1 in the second pass - it had to evaluate core 3, then lcpu was reset to 0 and the number of attempts was increased by one
Worker 4
gets core 2, sets lcpu to 2 in the first pass
If you run into the third pass then you know something is wrong. I believe that can only happen with an empty CPU set so cpu: [ ]
But then I reevaluated the point and to make it easier to comprehend (to not get up until here) I added an extra check prior to running this function by checking CPU count
if (CPU_COUNT(&taf->cpu_set) == 0) {
and added extra comment to this function.
FatalError("unknown cpu_affinity node"); | ||
} | ||
} | ||
SCLogConfig("Found affinity definition for \"%s\"", setname); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: suggest consistent terminology usage -- both "affinity" and "CPU affinity" are used in messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tried to convert all "affinity" usages to "CPU affinity"
if_obj = HwLocDeviceGetByPcie(topology, tv->iface_name); | ||
} | ||
|
||
if (if_obj != NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest using this to conditionalize call to HwlocObjectDump
if (sc_log_global_log_level >= SC_LOG_DEBUG)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I've found out there is SCLogGetLogLevel()
function to get the log lvl
} | ||
#endif /* HAVE_HWLOC */ | ||
|
||
static bool CPUIsFromNuma(uint16_t ncpu, uint16_t numa) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is called from a while
loop in one instance. Could the CPU set for the NUMA node be retrieved once and then compared multiple times instead?
} | ||
} | ||
|
||
if (cpu >= 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: prefer use of {
and }
if (!printed) { | ||
printed = true; | ||
SCLogWarning("threading.autopin option is enabled but hwloc support is not available. " | ||
"Please recompile Suricata with hwloc support to enable this feature."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should mention reconfigure similar to existing log messages, e.g,
"NFQUEUE not enabled. Make sure to pass --enable-nfqueue to configure when"
@@ -175,6 +175,7 @@ int RunModeSetLiveCaptureAutoFp(ConfigIfaceParserFunc ConfigParser, | |||
FatalError("TmThreadsCreate failed"); | |||
} | |||
tv_receive->printable_name = printable_threadname; | |||
tv_receive->iface_name = dev ? strdup(dev) : NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handle strdup
failure.
Can you update the PR description to start with what the scheme is in the PR? I has a helpful discussion of what it does not do, which leaves me guessing a bit :) |
@victorjulien updated the PR description. The full example can be seen in suricata.yaml.in. @jlucovsky thanks for the feedback, I am going to address that shortly. |
skimming through and had a few questions. it looks like if you use does it automatically, or could you specify, not to use other say management threads? could this work for pcap mode too? specify say kind of a nit, "all" doesn't really sound direct, may seem like you're just going to use every core you can. "auto" or "numa" or some combination may be more specific. |
Moving part of the discussion points to #12359 |
Followup of #12316
Redmine ticket:
https://redmine.openinfosecfoundation.org/issues/7036
https://redmine.openinfosecfoundation.org/issues/2321
This work allows more precise thread assignment and it is done either
This works with AF-PACKET and DPDK (and should with other capture modes as well). The primary target is workers runmode but it also works with autofp receive threads.
To try this PR:
Describe changes:
v8:
v7:
v6
On YAML changes:
CPU affinity section in
suricata.yaml
has been changed to (seesuricata.yaml.in
for the full picture):Below is the reasoning for it.
When starting this work I wanted to follow the same structure as CPU set nodes that is:
converted to JSON it looks like:
So worker-cpu-set is an object by itself, it holds the key that is named "worker-cpu-set" and only that contains its children properties (CPU, prio, etc.).
But when adding the interface-specific node I found out that having the interface names directly as the keys for the object storing the properties (CPU, prio, etc.) can change its name to something else, in this case to
net-bonding0
.So adding it as a value:
yields an invalid YAML construct.
Therefore, for interface-specific CPU affinity settings, I ended up putting it on the same level as the interface name itself. This follows the same structure as is established in e.g. capture-mode interface definition (dpdk.interfaces, af-packet).
Note: In the matter of this I transferred the base *-cpu-set nodes from the list items as part of the Ticket 2321.
The CPU affinity interface-specific design I ended up with is:
JSON visualization of this:
This leaves the base {worker,receive,management}-cpu-set nodes in the original setting and as we know it. The internal code can handle this because the names of these are known beforehand and can "step down" to the children level to inspect cpu-affinity properties. The dynamic part - interface-specific-cpu-set contains all properties on the same level.
Other unsuccessful YAML designs
Specifying the NUMA affinity in the individual capture modes
Example
But management / other threads would still be required to configure elsewhere. I liked the proposed approach more because all affinity-related settings are centralized.
Not having a list nodes in the YAML
YAML library changes net_bonding to net-bonding when YAML is loaded - so this is a nogo.
Have interface-related properties as children
This is a disallowed construct in YAML, so another nogo. A similar pattern is currently with *-cpu-set nodes but the different here is that the parent list item has a value (in this case net_bonding0). That is what is causing problems. Interface related therefore need to right under "interface".
PRO TIP: A good way to visualize different YAML structures is by using YAML to JSON converters. Currently it creates object like: