Useful or unnecessary cruft? (get numa node & bind tx thread for ZC performance) #12

mzpqnxow · 2023-12-09T15:23:30Z

I wanted your opinion on something...

Currently, when I use ZC mode, I have simple bash wrappers to determine the NUMA node for an interface, and invoke masscan with taskset, to bind the masscan process for the best performance in the tx loop

I'll be honest, I did this strictly for correctness and don't know how much it helps performance, or when- presumably it only matters near --rate 1000000

(It's also possible to set and use huge pages allocated specifically to a node, though I don't currently do that and am not sure it matters)

It's only a few lines of bash and IMO not appropriate for inclusion in the masscan repo

What are your thoughts on adding this natively within masscan? The objective is to make it easier to do without requiring any wrappers outside/around masscan

It would be wrapped with ifdef(linux) and would only apply if:

ZC is present/active
(probably) a specific command-line flag is given

I would have done this long ago but I didn't feel like figuring out how to do it properly in C as I wasn't familiar with the interfaces. But I recently noticed that the PF_RING examples have small, simple functions to do it all

References:

busid2node - determine the NUMA node for a device (requires knowing the bus ID for the device, which can be retrieved easily with a PF ring API call)
bind2node - self-explanatory
bindthread2core - self explanatory

These 3 functions are not part of the PF ring API. If implemented, they would be forklifted from the examples and live in the masscan code

Note that the function(s) to get the bus ID of a device are part of the PF ring API, so dlopen/dlsym would have to retrieve it (same as how the other PF ring functions are loaded/resolved)

There are probably very few people who would benefit from this (besides me, perhaps) and the benefit is small since it can already be done as a wrapper by anyone who cares to look up the commands (for the most part)

Do you think it's worth implementing an "auto-pin to NIC NUMA node" in C or is it better left to the user?

If added, I think it would remain undocumented or "experimental" because I don't want anyone to have to support those trying to understand what it does or how it works (or doesn't work)

I can't really convince myself either way and don't have strong feelings about it; since it would be a PR to your fork, I decided to punt the decision to you 😊

I'm happy to send the PR if you want it (after copying, pasting and testing)

BTW, you saw rob is starting to pull your patchset upstream? Hopefully makes your life easier in the long run

Thanks

mzpqnxow · 2024-09-08T14:18:19Z

Maybe some day :)

mzpqnxow closed this as completed Sep 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Useful or unnecessary cruft? (get numa node & bind tx thread for ZC performance) #12

Useful or unnecessary cruft? (get numa node & bind tx thread for ZC performance) #12

mzpqnxow commented Dec 9, 2023

mzpqnxow commented Sep 8, 2024

Useful or unnecessary cruft? (get numa node & bind tx thread for ZC performance) #12

Useful or unnecessary cruft? (get numa node & bind tx thread for ZC performance) #12

Comments

mzpqnxow commented Dec 9, 2023

mzpqnxow commented Sep 8, 2024