Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Wrong id to pin threads on Cascade Lake #668

Closed
SeiDPierre opened this issue Mar 19, 2025 · 9 comments
Closed

[BUG] Wrong id to pin threads on Cascade Lake #668

SeiDPierre opened this issue Mar 19, 2025 · 9 comments
Labels

Comments

@SeiDPierre
Copy link

SeiDPierre commented Mar 19, 2025

Describe the bug
Lets say I'm running likwid-perfctr -C 0-3 -M 1 -g ENERGY ./bin/bt.B.x. When monitoring cpu usage with htop, I can see 0,2 and 3 are working but not 1. I should see 1 working.
Also whatever I choose (e.g. 10 threads, pinned from 0 to 9; 20 threads, 9-29), the second id is not working

To Reproduce / Env

  • I'm running on a Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz proc.
  • lickwid -v is 5.1.1
  • Debian trixie
  • OpenMP only, run with class B BT benchmark from NAS.
  • Not using any marker

Some debug outpuy from Likwid
Attached are the output of
likwid-topology -V 3 and likwid-perfctr -V 3 -C 0-3 -M 1 -g ENERGY ./bin/bt.B.x

To reproduce
export OMP_NUM_THREADS=4
and
likwid-perfctr -C 0-3 -M 1 -g ENERGY ./bin/bt.B.x

@SeiDPierre SeiDPierre added the bug label Mar 19, 2025
@SeiDPierre
Copy link
Author

Here are the files:
out_perfctr.txt
out_topology.txt

@TomTheBear
Copy link
Member

From the output, it seems that your runtime is starting one thread more than expected. LIKWID sees all threads (progress, library, shepherd threads, ...), not just the application level threads (OpenMP threads). Since you identified that HWThread 1 is not used, I would try to skip that one: likwid-perfctr -s 0x1 .... The skipmask is a bitmask, thus 0x1 skips the first thread that is started (the application process is always pinned). If you want to skip the second started thread, use 0x2. For skipping the first two threads, use 0x3.

A general remark to your system. The hwthread numbering is odd. All even numbered HWthreads are on socket 0 and all odd numbered one on socket 1. So 0-3 resolves to 0,1,2,3 and consequently two threads on one socket and two threads on the other one. If that's your plan, no problem, but I would assume you want to have the four hwthreads close together. I would use the domain syntax for that: S0:0-3 gives you the first 4 physical hwthreads on socket 0, in your case 0,2,4,6.

Last remark: ENERGY counters exist only once per socket, so the first HWthread of a socket in your CPUset measures the energy, the other ones return zero. In your output hwthreads 0 and 1 are on different sockets, thus measure the energy. The hwthreads 2 and 3 do not measure it.

@SeiDPierre
Copy link
Author

SeiDPierre commented Mar 20, 2025

In deed, If I run with the -s 0x1, for 0,2,4,6, I can see there are 4 threads running at 0,2,4,6. Is there an explanation why? I'm not sure to understand.

@SeiDPierre
Copy link
Author

Does it mean likwid defines/modifies some OpenMP env variables?

@TomTheBear
Copy link
Member

LIKWID's pinning mechanism overwrites the pthread_create() call. This function is the basement of almost all threading libraries out there. So as soon as almost anything wants to create a (kernel) thread, it calls this function. LIKWID intercepts the call, thus sees all threads created. But libraries and runtimes sometimes do not start only the worker threads (OpenMP threads in your case) but also some for internal management. Most prominent are threads that advance asynchronous operations in the background. Those threads should not be pinned but it is very difficult to find out whether it is a worker thread or a runtime thread during the interception of pthread_create(). So in order to pin only the worker threads, the user has to tell likwid-perfctr (in fact the pinning code of likwid-pin) which threads to skip. Most OpenMP implementations do not use runtime threads thus a skipmask of 0x0 (the default) fits but in your case, it starts a runtime thread and then the four worker/OpenMP threads. In order to skip the runtime thread, you have to specify the skipmask of 0x1.

LIKWID defines OpenMP environment variables but never modifies them. The user is always right, thus if you export OMP_NUM_THREADS=10 in your environment, it will not be changed. If nothing is specified by the user, the OMP_NUM_THREADS variable is set to the number of hardware threads given on the command line (in your case 4).

@SeiDPierre
Copy link
Author

Thanks for your explanations and remarks!

@TomTheBear
Copy link
Member

Which OpenMP runtime are you using? The common ones (gomp and llvm/intel omp)? Can you provide the compiler version if GCC or Intel compiler?

@SeiDPierre
Copy link
Author

gcc (Debian 13.3.0-12) 13.3.0 and gomp

@TomTheBear
Copy link
Member

Thx. I will check the version, it is currently not installed on our systems. I close the issue but feel free to re-open it if you encounter further related issues with LIKWID on your system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants