Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: Build uses ~100 cpu-hours #1614

Open
G-Ragghianti opened this issue Feb 3, 2025 · 5 comments
Open

[Issue]: Build uses ~100 cpu-hours #1614

G-Ragghianti opened this issue Feb 3, 2025 · 5 comments

Comments

@G-Ragghianti
Copy link

G-Ragghianti commented Feb 3, 2025

Problem Description

I noticed that this project is the longest build in the rocm stack that we are using. We often build the stack from source via spack due to complications with using the binary distributions. The build is currently taking around 6 hours to finish, however, it is worse than that. It is actually using about 100 cpu-hours to build. This appears to be due to an in-built build job distribution system which launches a process for each CPU core on the system. These processes are in a spin-wait state while the distibution of jobs is very slow and not using all the workers. This results in an extreme waste of CPU cycles on systems with many cores. I have a Dockerfile which I used to reproduce this along with the cmake and make output:

Dockerfile

FROM rockylinux:9
RUN dnf -y group install development
COPY rocm.repo /etc/yum.repos.d/
RUN dnf -y install epel-release
RUN dnf -y --enablerepo=crb install perl-File-BaseDir perl-URI-Encode
RUN dnf -y install hipblaslt
RUN git clone https://github.com/ROCm/hipBLASLt /tmp/hipblaslt
WORKDIR /tmp/hipblaslt
RUN dnf -y install cmake
RUN dnf -y install rocm-hip-sdk rocprim rocm-ml-sdk rocm-openmp-sdk rocm-developer-tools
RUN dnf -y install msgpack-devel time
RUN mkdir build && \
    cd build && \
    cmake .. 2>&1 | tee cmake.log
RUN cd build && \
    /usr/bin/time make 2>&1 | tee make.log

Cmake:

-- The CXX compiler identification is Clang 18.0.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/rocm/bin/amdclang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.43.5") 
-- Setting build type to 'Release' as none was specified.
-- Using amdclang to build for amdgpu backend

*******************************************************************************
*------------------------------- ROCMChecks WARNING --------------------------*
  Options and properties should be set on a cmake target where possible. The
  variable 'CMAKE_CXX_FLAGS' may be set by the cmake toolchain, either by
  calling 'cmake -DCMAKE_CXX_FLAGS=" -D__HIP_HCC_COMPAT_MODE__=1"'
  or set in a toolchain file and added with
  'cmake -DCMAKE_TOOLCHAIN_FILE=<toolchain-file>'. ROCMChecks now calling:
CMake Warning at /opt/rocm/share/rocmcmakebuildtools/cmake/ROCMChecks.cmake:46 (message):
  'CMAKE_CXX_FLAGS' is set at /tmp/hipblaslt/CMakeLists.txt:<line#> shown
  below:
Call Stack (most recent call first):
  CMakeLists.txt:9223372036854775807 (rocm_check_toolchain_var)
  CMakeLists.txt:139 (set)


*-----------------------------------------------------------------------------*
*******************************************************************************


*******************************************************************************
*------------------------------- ROCMChecks WARNING --------------------------*
  Options and properties should be set on a cmake target where possible. The
  variable 'CMAKE_CXX_FLAGS' may be set by the cmake toolchain, either by
  calling 'cmake -DCMAKE_CXX_FLAGS=" -D__HIP_HCC_COMPAT_MODE__=1 -O3"'
  or set in a toolchain file and added with
  'cmake -DCMAKE_TOOLCHAIN_FILE=<toolchain-file>'. ROCMChecks now calling:
CMake Warning at /opt/rocm/share/rocmcmakebuildtools/cmake/ROCMChecks.cmake:46 (message):
  'CMAKE_CXX_FLAGS' is set at /tmp/hipblaslt/CMakeLists.txt:<line#> shown
  below:
Call Stack (most recent call first):
  CMakeLists.txt:9223372036854775807 (rocm_check_toolchain_var)
  CMakeLists.txt:144 (set)


*-----------------------------------------------------------------------------*
*******************************************************************************

-- Performing Test COMPILER_HAS_TARGET_ID_gfx908_xnack_on
-- Performing Test COMPILER_HAS_TARGET_ID_gfx908_xnack_on - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx908_xnack_off
-- Performing Test COMPILER_HAS_TARGET_ID_gfx908_xnack_off - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx90a_xnack_on
-- Performing Test COMPILER_HAS_TARGET_ID_gfx90a_xnack_on - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx90a_xnack_off
-- Performing Test COMPILER_HAS_TARGET_ID_gfx90a_xnack_off - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx942
-- Performing Test COMPILER_HAS_TARGET_ID_gfx942 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1100
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1100 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1101
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1101 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1200
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1200 - Success
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1201
-- Performing Test COMPILER_HAS_TARGET_ID_gfx1201 - Success
-- AMDGPU_TARGETS: gfx908:xnack+;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-;gfx942;gfx1100;gfx1101;gfx1200;gfx1201
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Success
-- Python_ROOT is unset. Setting Python_ROOT to /usr.
-- Configure Python_ROOT variable if a different installation is preferred.
-- Found Python: /usr/bin/python3.9 (found version "3.9.18") found components: Interpreter 
'/usr/bin/python3.9' '-m' 'venv' '/tmp/hipblaslt/build/virtualenv' '--system-site-packages' '--clear'
'/tmp/hipblaslt/build/virtualenv/bin/python3.9' '-m' 'pip' 'install' '--upgrade' 'pip'
Requirement already satisfied: pip in ./virtualenv/lib/python3.9/site-packages (21.2.3)
Collecting pip
  Downloading pip-25.0-py3-none-any.whl (1.8 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 21.2.3
    Uninstalling pip-21.2.3:
      Successfully uninstalled pip-21.2.3
Successfully installed pip-25.0
'/tmp/hipblaslt/build/virtualenv/bin/python3.9' '-m' 'pip' 'install' '--upgrade' 'setuptools'
Requirement already satisfied: setuptools in ./virtualenv/lib/python3.9/site-packages (53.0.0)
Collecting setuptools
  Downloading setuptools-75.8.0-py3-none-any.whl.metadata (6.7 kB)
Downloading setuptools-75.8.0-py3-none-any.whl (1.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 11.2 MB/s eta 0:00:00
Installing collected packages: setuptools
  Attempting uninstall: setuptools
    Found existing installation: setuptools 53.0.0
    Uninstalling setuptools-53.0.0:
      Successfully uninstalled setuptools-53.0.0
Successfully installed setuptools-75.8.0
'/tmp/hipblaslt/build/virtualenv/bin/python3.9' '-m' 'pip' 'install' '/tmp/hipblaslt/tensilelite'
-- Adding /tmp/hipblaslt/build/virtualenv to CMAKE_PREFIX_PATH
-- The C compiler identification is Clang 18.0.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/rocm/bin/amdclang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Tensile script: /tmp/hipblaslt/build/virtualenv/lib64/python3.9/site-packages/Tensile/bin/TensileCreateLibrary
-- Tensile_CREATE_COMMAND: /tmp/hipblaslt/build/virtualenv/bin/python3.9;/tmp/hipblaslt/build/virtualenv/lib64/python3.9/site-packages/Tensile/bin/TensileCreateLibrary;--code-object-version=4;--cxx-compiler=amdclang++;--library-format=msgpack;--architecture=gfx908:xnack+_gfx908:xnack-_gfx90a:xnack+_gfx90a:xnack-_gfx942_gfx1100_gfx1101_gfx1200_gfx1201;--build-id=sha1;/tmp/hipblaslt/library/src/amd_detail/rocblaslt/src/Tensile/Logic/asm_full;/tmp/hipblaslt/build/Tensile;HIP
Setup source kernel targets
archs for source kernel compilation: gfx908,gfx90a,gfx942,gfx1100,gfx1101,gfx1200,gfx1201
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY - Success
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY - Success
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR - Success
-- Configuring done (28.0s)
-- Generating done (0.0s)
-- Build files have been written to: /tmp/hipblaslt/build

Make:

[  2%] Generating Tensile Libraries

################################################################################
# Tensile Create Library
# HIP Version:         6.3.42133-1b9c17779
# Cxx Compiler:        /opt/rocm/bin/amdclang++ (version 18.0.0)
# C Compiler:          /opt/rocm/bin/amdclang (version 18.0.0)
# Assembler:           /opt/rocm/bin/amdclang++ (version 18.0.0)
# Offload Bundler:     /opt/rocm/lib/llvm/bin/clang-offload-bundler (version 18.0.0)
# Code Object Version: 4
...
...
...
[ 83%] Building CXX object library/CMakeFiles/hipblaslt.dir/src/amd_detail/rocblaslt/src/rocblaslt_auxiliary.cpp.o
[ 86%] Building CXX object library/CMakeFiles/hipblaslt.dir/src/amd_detail/rocblaslt/src/rocblaslt_mat.cpp.o
[ 89%] Building CXX object library/CMakeFiles/hipblaslt.dir/src/amd_detail/rocblaslt/src/utility.cpp.o
[ 91%] Building CXX object library/CMakeFiles/hipblaslt.dir/src/amd_detail/rocblaslt/src/rocblaslt_transform.cpp.o
[ 94%] Building CXX object library/CMakeFiles/hipblaslt.dir/src/amd_detail/rocblaslt/src/UserDrivenTuningParser.cpp.o
[ 97%] Building CXX object library/CMakeFiles/hipblaslt.dir/src/amd_detail/rocblaslt/src/tensile_host.cpp.o
[100%] Linking CXX shared library libhipblaslt.so
[100%] Built target hipblaslt
330684.60user 17773.57system 6:01:20elapsed 1607%CPU (0avgtext+0avgdata 124557448maxresident)k
17455320inputs+337551070outputs (41790269major+2925761090minor)pagefaults 0swaps

Operating System

Rockylinux 9

CPU

Any

GPU

Other

Other

No response

ROCm Version

ROCm 6.2.3

ROCm Component

hipBLASLt

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

@ppanchad-amd
Copy link

Hi @G-Ragghianti. Internal ticket has been created to investigate this issue. Thanks!

@schung-amd
Copy link

Hi @G-Ragghianti, sorry for the inconvenience this is causing! We're aware of severe build time increases in several ROCm components post-6.2, with hipBLASLt being a particularly notable offender. We're attacking this from several angles and have some improvements in the pipeline already which should cut down the build time and binary size significantly. I don't have any firm timelines on this, but it's a high priority issue for us.

For now, I'd recommend setting AMDGPU_TARGETS to reflect only the architectures you need to build for, which should help cut down the build time and size.

@G-Ragghianti
Copy link
Author

Thanks for looking at it. I encourage a re-evaluation on the use of the loky/joblib for the hipblaslt build. One option that would help spack users out is if it were easy to disable loky job management via cmake. Then the spack package could disable or limit the unnecessary CPU use. I'm also surprised that loky/joblib uses a busy spin method for the multiprocess communication.

@bstefanuk
Copy link
Contributor

bstefanuk commented Feb 4, 2025

@G-Ragghianti Thanks for raising this issue. I'm on the team working to improve resource consumption during build, and rest assured, we have certainly identified joblib as a key offender for the reasons you mention. We're actively working on decoupling the parallelization layer from the build steps, after which we may either replace joblib or at the very least, make improvements to address your ask.

@G-Ragghianti
Copy link
Author

Oh wow. This is more than I had hoped for. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants