Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Useful or unnecessary cruft? (get numa node & bind tx thread for ZC performance) #12

Closed
mzpqnxow opened this issue Dec 9, 2023 · 1 comment

Comments

@mzpqnxow
Copy link

mzpqnxow commented Dec 9, 2023

Hey @p-l-

I wanted your opinion on something...

Currently, when I use ZC mode, I have simple bash wrappers to determine the NUMA node for an interface, and invoke masscan with taskset, to bind the masscan process for the best performance in the tx loop

I'll be honest, I did this strictly for correctness and don't know how much it helps performance, or when- presumably it only matters near --rate 1000000

(It's also possible to set and use huge pages allocated specifically to a node, though I don't currently do that and am not sure it matters)

It's only a few lines of bash and IMO not appropriate for inclusion in the masscan repo

What are your thoughts on adding this natively within masscan? The objective is to make it easier to do without requiring any wrappers outside/around masscan

It would be wrapped with ifdef(linux) and would only apply if:

  1. ZC is present/active
  2. (probably) a specific command-line flag is given

I would have done this long ago but I didn't feel like figuring out how to do it properly in C as I wasn't familiar with the interfaces. But I recently noticed that the PF_RING examples have small, simple functions to do it all

References:

  • busid2node - determine the NUMA node for a device (requires knowing the bus ID for the device, which can be retrieved easily with a PF ring API call)
  • bind2node - self-explanatory
  • bindthread2core - self explanatory

These 3 functions are not part of the PF ring API. If implemented, they would be forklifted from the examples and live in the masscan code

Note that the function(s) to get the bus ID of a device are part of the PF ring API, so dlopen/dlsym would have to retrieve it (same as how the other PF ring functions are loaded/resolved)

There are probably very few people who would benefit from this (besides me, perhaps) and the benefit is small since it can already be done as a wrapper by anyone who cares to look up the commands (for the most part)

Do you think it's worth implementing an "auto-pin to NIC NUMA node" in C or is it better left to the user?

If added, I think it would remain undocumented or "experimental" because I don't want anyone to have to support those trying to understand what it does or how it works (or doesn't work)

I can't really convince myself either way and don't have strong feelings about it; since it would be a PR to your fork, I decided to punt the decision to you 😊

I'm happy to send the PR if you want it (after copying, pasting and testing)

BTW, you saw rob is starting to pull your patchset upstream? Hopefully makes your life easier in the long run

Thanks

@mzpqnxow
Copy link
Author

mzpqnxow commented Sep 8, 2024

Maybe some day :)

@mzpqnxow mzpqnxow closed this as completed Sep 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant