Port install_rvs to the orch fixture + Tier-1 dispatch tests#173
Closed
atnair-amd wants to merge 2 commits into
Closed
Port install_rvs to the orch fixture + Tier-1 dispatch tests#173atnair-amd wants to merge 2 commits into
atnair-amd wants to merge 2 commits into
Conversation
Switch cvs/tests/health/install/install_rvs.py from the legacy phdl/shdl Pssh handles to the shared orch fixture in cvs/tests/conftest.py, mirroring rvs_cvs.py. 21 phdl.exec / hdl.exec callsites become orch.exec. Drop the head-only-on-NFS dispatch (config_dict['nfs_install'] is True). On the canonical mi300_health_config.json the JSON value is the string "True" while the test compared with Python `is True`, so the branch was dead in shipped configs and head-only behavior was never engaged. Custom configs that relied on JSON boolean true to install from a single head node will now install on all nodes in parallel; this matches what cvs run install_rvs already did via parallel SSH. Layer-3 doc updates so install_rvs is no longer omitted from the "which suites consume the orchestrator" claims: - docs/_includes/orchestrator-scope.rst: scope note now names rvs_cvs and install_rvs. - docs/reference/configuration-files/cluster-file.rst: L42 (container backend scope) and L107 (orchestrator key scope) updated. - cvs/tests/health/README.md: Container-mode example pins both suites. - docs/how-to/run-cvs-tests.rst: container-mode note pins both suites. Verified via cvs run install_rvs in baremetal Tier-2 on a single MI300X node (a06u01): pytest exit 0, "RVS installation and verification completed successfully" emitted, auto-bundle written. Container Tier-2 not exercised in this commit because the orch fixture's container backend has two pre-existing regressions independent of this port: MultiProcessPssh.exec drops the `detailed=` kwarg (PR #136 backward-compat regression) which blocks orch.setup_containers(), and cvs/core/runtimes/docker.py exec does not wrap commands in `bash -c` so shell chains like `cd X; cmake ...` get split between host and container. Both will be fixed in follow-up PRs.
Pin three properties of the migrated install_rvs.py against both
BaremetalOrchestrator and ContainerOrchestrator orch instances:
1. dispatch flows through orch.exec (not orch.exec_on_head);
2. orch.exec(cmd, timeout=...) is accepted by both backends;
3. the legacy nfs_install head-only branch is gone (regression canary
if someone re-introduces it).
Tests stub orch.exec / orch.exec_on_head directly on the constructed
orchestrator instance, sidestepping the signature drift between
BaremetalOrchestrator.exec(cmd, hosts=None, timeout=None) and
ContainerOrchestrator.exec(cmd, hosts=None, timeout=None, detailed=False).
Run from the repo root:
python -m unittest \
cvs.tests.health.install.unittests.test_install_rvs_dispatch -v
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
cvs/tests/health/install/install_rvs.pyfrom the legacyphdl/shdlPssh handles to the sharedorchfixture (cvs/tests/conftest.py), mirroringrvs_cvs.py. 21 call sites becomeorch.exec.config_dict['nfs_install'] is Truehead-only dispatch (the JSON value is the string"True"in the canonical config,is Truewas always false, so head-only mode was never engaged in shipped configs).BaremetalOrchestratorandContainerOrchestrator(cvs/tests/health/install/unittests/test_install_rvs_dispatch.py).Commits
eb43701— Port install_rvs to the orch fixture; update orchestrator scope docs (5 files, +34 / -91).cf34cbf— Add Tier-1 dispatch tests for install_rvs orch port (2 files, +134).Test plan
Tier-1 — mocked dispatch tests (devbox)
python -m unittest cvs.tests.health.install.unittests.test_install_rvs_dispatch -v:Tier-1 proof of detection (mutate-revert per
incremental-pr-work.mdc§2)M1 — change
install_rvs.py:217orch.exec('which rvs', timeout=30)→orch.exec_on_head('which rvs', timeout=30). Both tests FAIL:Revert → 2 PASS.
M2 — change
install_rvs.py:217timeout=30→timeout=99. Both tests FAIL on the same assertion (actual call showstimeout=99). Revert → 2 PASS.Tier-2 — real cluster (single MI300X node
10.245.128.47/a06u43, ROCm 7.2.0, RVS pre-installed)Baremetal orchestrator:
Container orchestrator (
cvs-harness/rocm-dev:rvs-installed):PYTHONPATH=...is the devbox-local workaround for the editable install in~/repos/cvs/.cvs_venv/pointing at the main cvs repo rather than this worktree; it ensures the cvs CLI loads the portedinstall_rvs.py.Notes
MultiProcessPssh.execdropping thedetailed=kwarg. That bug was fixed independently in Add upload_file and download_file support #161 (commitd666287, "Add upload_file and download_file support"), which now sits between this branch's base and HEAD. The branch was rebased onto currentmainto pick it up; container Tier-2 has since been exercised end-to-end and passes (see above).nfs_installhead-only branch was deliberately dropped. On the canonicalmi300_health_config.json, JSON had"nfs_install": "True"(a string) and the test compared withis True, so head-only was never engaged. Hand-rolled configs that relied on JSON booleantruewill install on all nodes in parallel instead of just the head node; this matches whatcvs run install_rvsalready did via parallel SSH in shipped configs.Out of scope / Future additions
nfs_installpre/post matrix (8 stdout diffs across baremetal/container ×nfs_install=true/false× pre/post) — covered in theorch_refactor-suite-install-rvs-test.mdplan; queued as a separate PR.install_transferbench/install_agfhc/install_rocblas/install_babelstream— queued per-suite as those migrations land.