-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue]: Build uses ~100 cpu-hours #1614
Comments
Hi @G-Ragghianti. Internal ticket has been created to investigate this issue. Thanks! |
Hi @G-Ragghianti, sorry for the inconvenience this is causing! We're aware of severe build time increases in several ROCm components post-6.2, with For now, I'd recommend setting |
Thanks for looking at it. I encourage a re-evaluation on the use of the loky/joblib for the hipblaslt build. One option that would help spack users out is if it were easy to disable loky job management via cmake. Then the spack package could disable or limit the unnecessary CPU use. I'm also surprised that loky/joblib uses a busy spin method for the multiprocess communication. |
@G-Ragghianti Thanks for raising this issue. I'm on the team working to improve resource consumption during build, and rest assured, we have certainly identified joblib as a key offender for the reasons you mention. We're actively working on decoupling the parallelization layer from the build steps, after which we may either replace joblib or at the very least, make improvements to address your ask. |
Oh wow. This is more than I had hoped for. Thanks a lot! |
Problem Description
I noticed that this project is the longest build in the rocm stack that we are using. We often build the stack from source via spack due to complications with using the binary distributions. The build is currently taking around 6 hours to finish, however, it is worse than that. It is actually using about 100 cpu-hours to build. This appears to be due to an in-built build job distribution system which launches a process for each CPU core on the system. These processes are in a spin-wait state while the distibution of jobs is very slow and not using all the workers. This results in an extreme waste of CPU cycles on systems with many cores. I have a Dockerfile which I used to reproduce this along with the cmake and make output:
Dockerfile
Cmake:
Make:
Operating System
Rockylinux 9
CPU
Any
GPU
Other
Other
No response
ROCm Version
ROCm 6.2.3
ROCm Component
hipBLASLt
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered: