Skip to content

Conversation

trz42
Copy link
Collaborator

@trz42 trz42 commented Aug 27, 2024

Just to test if the tweaked hooks solve the build issue for SentencePiece on aarch64 CPUs. See #585

1 out of 2 required modules missing:

* libmad/0.15.1b-GCCcore-12.3.0 (libmad-0.15.1b-GCCcore-12.3.0.eb)

and

12 out of 137 required modules missing:

* parameterized/0.9.0-GCCcore-12.3.0 (parameterized-0.9.0-GCCcore-12.3.0.eb)
* tqdm/4.66.1-GCCcore-12.3.0 (tqdm-4.66.1-GCCcore-12.3.0.eb)
* Scalene/1.5.26-GCCcore-12.3.0 (Scalene-1.5.26-GCCcore-12.3.0.eb)
* gperftools/2.12-GCCcore-12.3.0 (gperftools-2.12-GCCcore-12.3.0.eb)
* SentencePiece/0.2.0-GCC-12.3.0 (SentencePiece-0.2.0-GCC-12.3.0.eb)
* imageio/2.33.1-gfbf-2023a (imageio-2.33.1-gfbf-2023a.eb)
* tensorboard/2.15.1-gfbf-2023a (tensorboard-2.15.1-gfbf-2023a.eb)
* libmad/0.15.1b-GCCcore-12.3.0 (libmad-0.15.1b-GCCcore-12.3.0.eb)
* SoX/14.4.2-GCCcore-12.3.0 (SoX-14.4.2-GCCcore-12.3.0.eb)
* NLTK/3.8.1-foss-2023a (NLTK-3.8.1-foss-2023a.eb)
* scikit-image/0.22.0-foss-2023a (scikit-image-0.22.0-foss-2023a.eb)
* PyTorch-bundle/2.1.2-foss-2023a (PyTorch-bundle-2.1.2-foss-2023a.eb)

@trz42 trz42 added aarch64 related to Arm 64-bit targets (aarch64) 2023.06-software.eessi.io 2023.06 version of software.eessi.io labels Aug 27, 2024
Copy link

eessi-bot bot commented Aug 27, 2024

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi-hpc.org-2023.06-compat, eessi-hpc.org-2023.06-software, eessi.io-2023.06-software, eessi.io-2023.06-compat

Instance boegel-bot-deucalion is configured to build for:

  • architectures: aarch64/a64fx
  • repositories: eessi.io-2023.06-software

Copy link

eessi-bot bot commented Aug 27, 2024

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-compat, eessi-hpc.org-2023.06-compat, eessi-hpc.org-2023.06-software, eessi.io-2023.06-software

@trz42
Copy link
Collaborator Author

trz42 commented Aug 27, 2024

bot: build arch:aarch64/generic repo:eessi.io-2023.06-software

Copy link

eessi-bot bot commented Aug 27, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)

Updates by the bot instance boegel-bot-deucalion (click for details)
  • account trz42 has NO permission to send commands to the bot

Copy link

eessi-bot bot commented Aug 27, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build arch:aarch64/generic repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/generic repository:eessi.io-2023.06-software
  • handling command build architecture:aarch64/generic repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Aug 27, 2024

New job on instance eessi-bot-mc-aws for architecture aarch64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_688/17105

date job status comment
Aug 27 20:37:22 UTC 2024 submitted job id 17105 awaits release by job manager
Aug 27 20:38:12 UTC 2024 released job awaits launch by Slurm scheduler
Aug 27 20:39:16 UTC 2024 running job 17105 is running
Aug 27 21:47:17 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-17105.out
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-generic-1724792235.tar.gzsize: 142 MiB (149144297 bytes)
entries: 4815
modules under 2023.06/software/linux/aarch64/generic/modules/all
gperftools/2.12-GCCcore-12.3.0.lua
imageio/2.33.1-gfbf-2023a.lua
libmad/0.15.1b-GCCcore-12.3.0.lua
NLTK/3.8.1-foss-2023a.lua
parameterized/0.9.0-GCCcore-12.3.0.lua
Scalene/1.5.26-GCCcore-12.3.0.lua
scikit-image/0.22.0-foss-2023a.lua
SentencePiece/0.2.0-GCC-12.3.0.lua
SoX/14.4.2-GCCcore-12.3.0.lua
tensorboard/2.15.1-gfbf-2023a.lua
tqdm/4.66.1-GCCcore-12.3.0.lua
software under 2023.06/software/linux/aarch64/generic/software
gperftools/2.12-GCCcore-12.3.0
imageio/2.33.1-gfbf-2023a
libmad/0.15.1b-GCCcore-12.3.0
NLTK/3.8.1-foss-2023a
parameterized/0.9.0-GCCcore-12.3.0
Scalene/1.5.26-GCCcore-12.3.0
scikit-image/0.22.0-foss-2023a
SentencePiece/0.2.0-GCC-12.3.0
SoX/14.4.2-GCCcore-12.3.0
tensorboard/2.15.1-gfbf-2023a
tqdm/4.66.1-GCCcore-12.3.0
other under 2023.06/software/linux/aarch64/generic
2023.06/init/easybuild/eb_hooks.py
Aug 27 21:47:17 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 18/18 test case(s) from 18 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-17105.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@trz42
Copy link
Collaborator Author

trz42 commented Aug 28, 2024

Try to use bash from compat layer...

bot: build arch:aarch64/generic repo:eessi.io-2023.06-software

Copy link

eessi-bot bot commented Aug 28, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build arch:aarch64/generic repo:eessi.io-2023.06-software from trz42
    • expanded format: build architecture:aarch64/generic repository:eessi.io-2023.06-software

Copy link

eessi-bot bot commented Aug 28, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build arch:aarch64/generic repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/generic repository:eessi.io-2023.06-software
  • handling command build architecture:aarch64/generic repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Aug 28, 2024

`688.diff:29: trailing whitespace.

error: cannot apply binary patch to 'scripts/2023.06/aarch64/bash' without full index line
error: scripts/2023.06/aarch64/bash: patch does not apply
error: cannot apply binary patch to 'scripts/2023.06/x86_64/bash' without full index line
error: scripts/2023.06/x86_64/bash: patch does not apply
`Unable to download or merge changes between the source branch and the destination branch.Tip: This can usually be resolved by syncing your branch and resolving any merge conflicts.

@trz42
Copy link
Collaborator Author

trz42 commented Aug 28, 2024

Second attempt to use bash from compat layer (now obtaining it from CVMFS)...

bot: build arch:aarch64/generic repo:eessi.io-2023.06-software

Copy link

eessi-bot bot commented Aug 28, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)

Copy link

eessi-bot bot commented Aug 28, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)

Copy link

eessi-bot bot commented Aug 28, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build arch:aarch64/generic repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/generic repository:eessi.io-2023.06-software
  • handling command build architecture:aarch64/generic repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Aug 28, 2024

New job on instance eessi-bot-mc-aws for architecture aarch64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_688/17164

date job status comment
Aug 28 22:46:21 UTC 2024 submitted job id 17164 awaits release by job manager
Aug 28 22:46:39 UTC 2024 released job awaits launch by Slurm scheduler
Aug 28 22:47:40 UTC 2024 running job 17164 is running
Aug 28 22:49:42 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-17164.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Aug 28 22:49:42 UTC 2024 test result
🤷 UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job17164.test does not exist in job directory or reading it failed.

@trz42
Copy link
Collaborator Author

trz42 commented Aug 28, 2024

One more...

bot: build arch:aarch64/generic repo:eessi.io-2023.06-software

Copy link

eessi-bot bot commented Aug 28, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)

Copy link

eessi-bot bot commented Aug 28, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build arch:aarch64/generic repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/generic repository:eessi.io-2023.06-software
  • handling command build architecture:aarch64/generic repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Aug 28, 2024

New job on instance eessi-bot-mc-aws for architecture aarch64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_688/17165

date job status comment
Aug 28 22:49:51 UTC 2024 submitted job id 17165 awaits release by job manager
Aug 28 22:50:45 UTC 2024 released job awaits launch by Slurm scheduler
Aug 28 22:51:47 UTC 2024 running job 17165 is running
Aug 28 22:55:51 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-17165.out
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
No artefacts were created or found.
Aug 28 22:55:51 UTC 2024 test result
🤷 UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job17165.test does not exist in job directory or reading it failed.

@trz42
Copy link
Collaborator Author

trz42 commented Aug 28, 2024

+1

bot: build arch:aarch64/generic repo:eessi.io-2023.06-software

Copy link

eessi-bot bot commented Aug 28, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)

Copy link

eessi-bot bot commented Aug 28, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build arch:aarch64/generic repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/generic repository:eessi.io-2023.06-software
  • handling command build architecture:aarch64/generic repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Aug 28, 2024

New job on instance eessi-bot-mc-aws for architecture aarch64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_688/17166

date job status comment
Aug 28 22:56:39 UTC 2024 submitted job id 17166 awaits release by job manager
Aug 28 22:56:53 UTC 2024 released job awaits launch by Slurm scheduler
Aug 28 22:57:55 UTC 2024 running job 17166 is running
Aug 29 00:07:19 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-17166.out
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-generic-1724887036.tar.gzsize: 142 MiB (149143012 bytes)
entries: 4815
modules under 2023.06/software/linux/aarch64/generic/modules/all
gperftools/2.12-GCCcore-12.3.0.lua
imageio/2.33.1-gfbf-2023a.lua
libmad/0.15.1b-GCCcore-12.3.0.lua
NLTK/3.8.1-foss-2023a.lua
parameterized/0.9.0-GCCcore-12.3.0.lua
Scalene/1.5.26-GCCcore-12.3.0.lua
scikit-image/0.22.0-foss-2023a.lua
SentencePiece/0.2.0-GCC-12.3.0.lua
SoX/14.4.2-GCCcore-12.3.0.lua
tensorboard/2.15.1-gfbf-2023a.lua
tqdm/4.66.1-GCCcore-12.3.0.lua
software under 2023.06/software/linux/aarch64/generic/software
gperftools/2.12-GCCcore-12.3.0
imageio/2.33.1-gfbf-2023a
libmad/0.15.1b-GCCcore-12.3.0
NLTK/3.8.1-foss-2023a
parameterized/0.9.0-GCCcore-12.3.0
Scalene/1.5.26-GCCcore-12.3.0
scikit-image/0.22.0-foss-2023a
SentencePiece/0.2.0-GCC-12.3.0
SoX/14.4.2-GCCcore-12.3.0
tensorboard/2.15.1-gfbf-2023a
tqdm/4.66.1-GCCcore-12.3.0
other under 2023.06/software/linux/aarch64/generic
2023.06/init/easybuild/eb_hooks.py
Aug 29 00:07:19 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 18/18 test case(s) from 18 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-17166.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge
Copy link
Collaborator

bedroge commented Aug 29, 2024

      /bin/sh: /lib/aarch64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/gperftools/2.12-GCCcore-12.3.0/lib64/libtcmalloc_mi
nimal.so)

@trz42 looks like you need to do the same for /bin/sh?

@trz42
Copy link
Collaborator Author

trz42 commented Aug 29, 2024

Also replace /bin/sh with sh (which usually symlinks to bash)...

bot: build arch:aarch64/generic repo:eessi.io-2023.06-software

Copy link

eessi-bot bot commented Aug 29, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)

Copy link

eessi-bot bot commented Aug 29, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build arch:aarch64/generic repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/generic repository:eessi.io-2023.06-software
  • handling command build architecture:aarch64/generic repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Aug 29, 2024

New job on instance eessi-bot-mc-aws for architecture aarch64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_688/17215

date job status comment
Aug 29 15:19:29 UTC 2024 submitted job id 17215 awaits release by job manager
Aug 29 15:19:34 UTC 2024 released job awaits launch by Slurm scheduler
Aug 29 15:25:37 UTC 2024 running job 17215 is running
Aug 29 17:32:22 UTC 2024 finished
🤷 UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job17215.result does not exist in job directory or reading it failed.
  • No artefacts were found/reported.
Aug 29 17:32:22 UTC 2024 test result
🤷 UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job17215.test does not exist in job directory or reading it failed.

@trz42
Copy link
Collaborator Author

trz42 commented Aug 29, 2024

Revert change on /bin/sh and build libmad first...

bot: build arch:aarch64/generic repo:eessi.io-2023.06-software

Copy link

eessi-bot bot commented Aug 29, 2024

Updates by the bot instance eessi-bot-mc-aws (click for details)

Copy link

eessi-bot bot commented Aug 29, 2024

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build arch:aarch64/generic repo:eessi.io-2023.06-software from trz42

    • expanded format: build architecture:aarch64/generic repository:eessi.io-2023.06-software
  • handling command build architecture:aarch64/generic repository:eessi.io-2023.06-software resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Aug 29, 2024

New job on instance eessi-bot-mc-aws for architecture aarch64-generic for repository eessi.io-2023.06-software in job dir /project/def-users/SHARED/jobs/2024.08/pr_688/17216

date job status comment
Aug 29 18:43:35 UTC 2024 submitted job id 17216 awaits release by job manager
Aug 29 18:44:31 UTC 2024 released job awaits launch by Slurm scheduler
Aug 29 18:50:33 UTC 2024 running job 17216 is running
Aug 29 20:00:01 UTC 2024 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-17216.out
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.gz created!
Artefacts
eessi-2023.06-software-linux-aarch64-generic-1724958582.tar.gzsize: 142 MiB (149142604 bytes)
entries: 4815
modules under 2023.06/software/linux/aarch64/generic/modules/all
gperftools/2.12-GCCcore-12.3.0.lua
imageio/2.33.1-gfbf-2023a.lua
libmad/0.15.1b-GCCcore-12.3.0.lua
NLTK/3.8.1-foss-2023a.lua
parameterized/0.9.0-GCCcore-12.3.0.lua
Scalene/1.5.26-GCCcore-12.3.0.lua
scikit-image/0.22.0-foss-2023a.lua
SentencePiece/0.2.0-GCC-12.3.0.lua
SoX/14.4.2-GCCcore-12.3.0.lua
tensorboard/2.15.1-gfbf-2023a.lua
tqdm/4.66.1-GCCcore-12.3.0.lua
software under 2023.06/software/linux/aarch64/generic/software
gperftools/2.12-GCCcore-12.3.0
imageio/2.33.1-gfbf-2023a
libmad/0.15.1b-GCCcore-12.3.0
NLTK/3.8.1-foss-2023a
parameterized/0.9.0-GCCcore-12.3.0
Scalene/1.5.26-GCCcore-12.3.0
scikit-image/0.22.0-foss-2023a
SentencePiece/0.2.0-GCC-12.3.0
SoX/14.4.2-GCCcore-12.3.0
tensorboard/2.15.1-gfbf-2023a
tqdm/4.66.1-GCCcore-12.3.0
other under 2023.06/software/linux/aarch64/generic
2023.06/init/easybuild/eb_hooks.py
Aug 29 20:00:01 UTC 2024 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ PASSED ] Ran 18/18 test case(s) from 18 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-17216.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@trz42
Copy link
Collaborator Author

trz42 commented Aug 30, 2024

The last job failed while building the wheel for torchtext when running ninja which is using /bin/sh. It might be that ninja has this path hard-coded vs using sh from the compat layer. See below

$ strings /cvmfs/software.eessi.io/versions/2023.06/software/linux/aarch64/generic/software/Ninja/1.11.1-GCCcore-12.3.0/bin/ninja | grep /bin
#!/usr/bin/env python
/bin/sh

We could try to figure out how to use /bin/sh (and maybe /usr/bin/env) from the compat layer. We have to look into how ninja was built and try to change that, then rebuild it ... we could probably test if a change helps solving the problem by installing a changed ninja into some other location employing EESSI-extend/2023.06-easybuild.

@laraPPr
Copy link
Collaborator

laraPPr commented Jun 27, 2025

@trz42 Can you retarget this pr?

@laraPPr laraPPr added clean-up This label indicates that the pr was because the original pr required a significant update. and removed clean-up This label indicates that the pr was because the original pr required a significant update. labels Jul 1, 2025
@trz42
Copy link
Collaborator Author

trz42 commented Aug 13, 2025

Closing this. If we want to take the package up again, we need to start over fresh.

@trz42 trz42 closed this Aug 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2023.06-software.eessi.io 2023.06 version of software.eessi.io aarch64 related to Arm 64-bit targets (aarch64)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants