Skip to content

Conversation

@pavelToman
Copy link
Contributor

@pavelToman pavelToman commented Oct 7, 2025

@pavelToman
Copy link
Contributor Author

@boegelbot please test @ jsc-zen3-a100
EB_ARGS="jax-0.4.25-gfbf-2023a-CUDA-12.1.1.eb"
CORE_CNT=16

@boegelbot
Copy link

@pavelToman: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=3951 EB_ARGS="jax-0.4.25-gfbf-2023a-CUDA-12.1.1.eb" EB_CONTAINER= EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_3951 --ntasks="16" --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 8188

Test results coming soon (I hope)...

- notification for comment with ID 3376559036 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@pavelToman
Copy link
Contributor Author

@boegelbot please test @ jsc-zen3
EB_ARGS="jax-0.7.0-gfbf-2025a.eb jax-0.6.2-gfbf-2024a.eb jax-0.4.25-gfbf-2023a.eb --parallel 4"
CORE_CNT=16

@boegelbot
Copy link

@pavelToman: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=3951 EB_ARGS="jax-0.7.0-gfbf-2025a.eb jax-0.6.2-gfbf-2024a.eb jax-0.4.25-gfbf-2023a.eb --parallel 4" EB_CONTAINER= EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_3951 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 8189

Test results coming soon (I hope)...

- notification for comment with ID 3376790731 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link

Test report by @boegelbot

Overview of tested easyconfigs (in order)

Build succeeded for 2 out of 3 (3 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.6, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/16e86dbf6f4a44ae31ee7a48ae9c6033 for a full test report.

@boegelbot
Copy link

Test report by @boegelbot

Overview of tested easyconfigs (in order)

  • SUCCESS jax-0.4.25-gfbf-2023a-CUDA-12.1.1.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.6, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 580.82.07, Python 3.9.21
See https://gist.github.com/boegelbot/86fa24c90782e6c857287476fd945c9b for a full test report.

@pavelToman
Copy link
Contributor Author

pavelToman commented Oct 8, 2025

Test report by @boegelbot

Overview of tested easyconfigs (in order)

* **SUCCESS** _jax-0.7.0-gfbf-2025a.eb_

* **FAIL (build issue)** _jax-0.6.2-gfbf-2024a.eb_ (partial log available at https://gist.github.com/boegelbot/8da3ef7f55b11cba746edfc478d95126)

* **SUCCESS** _jax-0.4.25-gfbf-2023a.eb_

Build succeeded for 2 out of 3 (3 easyconfigs in total) jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.6, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21 See https://gist.github.com/boegelbot/16e86dbf6f4a44ae31ee7a48ae9c6033 for a full test report.

jax-0.6.2 failed with:

ERROR EasyBuild encountered an error (at easybuild/easybuild-framework/easybuild/base/exceptions.py:126 in __init__): 
Lock /project/def-maintainers/boegelbot/rocky9/zen3/software/.locks/_project_def-maintainers_boegelbot_rocky9_zen3_software_jax_0.6.2-gfbf-2024a.lock already exists, aborting! 
(at easybuild/easybuild-framework/easybuild/tools/filetools.py:2142 in check_lock)

@pavelToman
Copy link
Contributor Author

@boegelbot please test @ jsc-zen3
EB_ARGS="jax-0.6.2-gfbf-2024a.eb --parallel 4 --ignore-locks"
CORE_CNT=16

@boegelbot
Copy link

@pavelToman: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=3951 EB_ARGS="jax-0.6.2-gfbf-2024a.eb --parallel 4 --ignore-locks" EB_CONTAINER= EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_3951 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 8201

Test results coming soon (I hope)...

- notification for comment with ID 3380123581 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegel boegel changed the title Update jaxlib.py for jax-0.6.2 with CUDA-12.6.0 Update jaxlib easyblock for jax 0.6.2 with CUDA-12.6.0 Oct 8, 2025
@boegel boegel added the update label Oct 8, 2025
@boegel boegel added this to the next release (5.2.0?) milestone Oct 8, 2025
@boegelbot
Copy link

Test report by @boegelbot

Overview of tested easyconfigs (in order)

  • SUCCESS jax-0.6.2-gfbf-2024a.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.6, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/60a841fa3c1ca4e51b2149ccd3a0b4b7 for a full test report.

@pavelToman
Copy link
Contributor Author

Test report by @pavelToman

Overview of tested easyconfigs (in order)

  • SUCCESS jax-0.4.25-gfbf-2023a-CUDA-12.1.1.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
node4006.donphan.os - Linux RHEL 9.6, x86_64, Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz, 1 x NVIDIA NVIDIA A2, 580.82.07, Python 3.9.21
See https://gist.github.com/pavelToman/0eea099566bb1c56ce106e10b93758d2 for a full test report.

@pavelToman
Copy link
Contributor Author

Test report by @pavelToman

Overview of tested easyconfigs (in order)

  • SUCCESS jax-0.4.25-gfbf-2023a-CUDA-12.1.1.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
node3908.accelgor.os - Linux RHEL 9.6, x86_64, AMD EPYC 7413 24-Core Processor, 1 x NVIDIA NVIDIA A100-SXM4-80GB, 580.82.07, Python 3.9.21
See https://gist.github.com/pavelToman/b42435a3c7bebad7224d13faff6ed7a7 for a full test report.

@pavelToman
Copy link
Contributor Author

pavelToman commented Oct 10, 2025

Test report by @pavelToman

Overview of tested easyconfigs (in order)

Build succeeded for 0 out of 1 (1 easyconfigs in total)
node4303.litleo.os - Linux RHEL 9.6, x86_64, AMD EPYC 9454P 48-Core Processor, 1 x NVIDIA NVIDIA H100 NVL, 580.82.07, Python 3.9.21
See https://gist.github.com/pavelToman/a582b7d33c75370248ba91e56eb5d9d6 for a full test report.

EDIT:
Same problem with H100 gpu and CUDA-12.1.1 as described here:
#3852 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants