Skip to content

Commit

Permalink
Install correct dependency and adjust slurm config
Browse files Browse the repository at this point in the history
Dramatic speedup in test time (~10-fold)
  • Loading branch information
berland committed Jan 23, 2025
1 parent ab55946 commit 450d793
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 8 deletions.
9 changes: 6 additions & 3 deletions .github/workflows/test_ert_with_slurm.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ jobs:
run: |
set -e
sudo apt install slurmd slurmctld -y
sudo apt install libpmix-dev slurmd slurmctld -y
sudo mkdir /var/spool/slurm
sudo chown slurm /var/spool/slurm
Expand All @@ -43,9 +43,11 @@ jobs:
ClusterName=localcluster
SlurmUser=slurm
SlurmctldHost=localhost
SchedulerType=sched/builtin # Avoid default backfill scheduler which adds delays
SelectType=select/cons_tres # Select nodes based on consumable resources
SelectTypeParameters=CR_Core # Cores are the consumable resource
StateSaveLocation=/var/spool/slurm
PriorityType=priority/basic # Tests depend on FIFO scheduling
ProctrackType=proctrack/linuxproc # Use /proc to track processes
PartitionName=LocalQ Nodes=ALL Default=YES MaxTime=INFINITE State=UP
PartitionName=AlternativeQ Nodes=ALL Default=YES MaxTime=INFINITE State=UP
Expand All @@ -59,12 +61,13 @@ jobs:
sudo mv slurm.conf /etc/slurm/
sudo systemctl start slurmd # The compute node slurm daemon
sudo systemctl start slurmctld # The slurm controller daemon
sleep 1
systemctl status slurmd
systemctl status slurmctld
# Show partition and node information configured:
sinfo
scontrol show nodes
- name: Verify slurm cluster works
# Timeout is set low to catch a misconfigured cluster where srun will hang.
Expand All @@ -77,7 +80,7 @@ jobs:
run: |
set -e
export _ERT_TESTS_ALTERNATIVE_QUEUE=AlternativeQ
pytest tests/ert/unit_tests/scheduler/test_{generic,slurm}_driver.py --slurm \
pytest tests/ert/unit_tests/scheduler/test_{generic,slurm}_driver.py -sv --slurm \
-n 8 --durations=10 -k "not (LsfDriver or LocalDriver or OpenPBSDriver)"
scontrol show job
Expand Down
6 changes: 1 addition & 5 deletions tests/ert/unit_tests/scheduler/test_slurm_driver.py
Original file line number Diff line number Diff line change
Expand Up @@ -373,11 +373,7 @@ async def test_kill_before_submit_is_finished(
):
os.chdir(tmp_path)

if pytestconfig.getoption("slurm"):
# Allow more time when tested on a real compute cluster to avoid false positives.
job_kill_window = 5
test_grace_time = 10
elif sys.platform.startswith("darwin"):
if sys.platform.startswith("darwin"):
# Mitigate flakiness on low-power test nodes
job_kill_window = 5
test_grace_time = 10
Expand Down

0 comments on commit 450d793

Please sign in to comment.