Skip to content

Conversation

DevManpreet5
Copy link

@DevManpreet5 DevManpreet5 commented Sep 21, 2025

Motivation

Extend CI matrix to test against ROCm 7.0.
Closes #170

Technical Details

  • .github/workflows/iris-tests-apptainer.yml: Extended CI matrix to test both ROCm 6.3.1 and 7.0
  • apptainer/iris-rocm6.3.1.def: Apptainer definition for ROCm 6.3.1
  • apptainer/iris-rocm7.0.def: Apptainer definition for ROCm 7.0

Test Plan

Could not perform local testing since all AMD cloud droplets are currently out of stock.
Submitting as a Draft PR to validate changes via CI.

Test Result

N/A (pending)

Submission Checklist

@DevManpreet5
Copy link
Author

@neoblizz @maawad Hi, could you please approve and run the workflow for this draft PR? Thanks!

@mawad-amd
Copy link
Collaborator

Thanks for the PR Manpreet!

@mawad-amd
Copy link
Collaborator

Looks like the ROCm 7.0 Apptainer is broken. I suggest trying to build that locally on any system.

@DevManpreet5
Copy link
Author

DevManpreet5 commented Sep 22, 2025

@mawad-amd can you look into new checks , old failing check worked but now 1 -rank is failing. thats Interesting , 6.3 worked but 7.0 failed because pip tried writing to a read-only path. Maybe something different in the 7.0 Docker image? so should I force user install ?

@DevManpreet5
Copy link
Author

@mawad-amd Hi , I am getting [Errno 30] Read-only file system: '/opt/venv/lib/python3.10/site-packages/urllib3'
I was thinking to use pip install --user -e . (https://luminousmen.medium.com/why-use-pip-install-user-2df0259c8fb7) while copilot is suggesting Modify your workflow step to create and activate a new virtual environment before installing packages. ,

Which method will you suggest?

@mawad-amd
Copy link
Collaborator

@mawad-amd Hi , I am getting [Errno 30] Read-only file system: '/opt/venv/lib/python3.10/site-packages/urllib3' I was thinking to use pip install --user -e . (https://luminousmen.medium.com/why-use-pip-install-user-2df0259c8fb7) while copilot is suggesting Modify your workflow step to create and activate a new virtual environment before installing packages. ,

Which method will you suggest?

There seems to be something wrong with the apptainer image itself. See log here.
image

I think we will have to fix that first, then see if the other problem you are pointing out still persist. Do you still have problems with getting AMD GPU access (single GPU is fine)?

@DevManpreet5
Copy link
Author

@mawad-amd Thanks , Ohh the testcase build-apptainer-7.0 passed so I didn’t check the logs I’ll look into it. Yes, I do have access to an AMD GPU now, but just building the Apptainer image fails with:

INFO:    Extracting OCI image...
INFO:    Inserting Apptainer configuration...
INFO:    Running post scriptlet
ERROR  : Failed to set mount propagation: Permission denied
FATAL:   While performing build: while running engine: while running %post section: exit status 1

for all apptainers so i will have to figure some stuff , thanks for your time !

@mawad-amd mawad-amd marked this pull request as ready for review October 4, 2025 03:38
@mawad-amd mawad-amd requested a review from neoblizz as a code owner October 4, 2025 03:38
@Copilot Copilot AI review requested due to automatic review settings October 4, 2025 03:38
@mawad-amd mawad-amd requested a review from BKP as a code owner October 4, 2025 03:38
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds ROCm 7.0 support to the Iris framework by extending the CI matrix to test against both ROCm 6.3.1 and 7.0. The changes address issue #170 by creating separate Apptainer definitions for each ROCm version and updating the GitHub workflow to build and test images for both versions.

  • Added Apptainer definition files for ROCm 6.3.1 and 7.0 with version-specific configurations
  • Extended CI workflow matrix to test both ROCm versions in parallel
  • Updated build and test job names to include ROCm version identification

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
apptainer/iris-rocm7.0.def New Apptainer definition for ROCm 7.0 environment with PyTorch 2.8.0
apptainer/iris-rocm6.3.1.def New Apptainer definition for ROCm 6.3.1 environment
.github/workflows/iris-tests-apptainer.yml Updated CI workflow to support matrix builds for multiple ROCm versions

@mawad-amd mawad-amd changed the title Added ROCm 7.0 support ( fix #170) Added CI for ROCm 7.0 Oct 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: CI for ROCm 7.0
2 participants