6.0.0
- Add slurm cluster management daemon
- Update default Slurm version to 23.02.2.
- Make
slurm_cluster
root module use terraform 1.3 and optional object fields. - Reconfigure now is a service on the instances.
- Move from project metadata to GCS bucket to store cluster files.
- Factored out nodeset modules (regular, dynamic) from partition module.
- Replace
zone_policy_*
withzones
in nodeset module. - Replace
access_config
withenable_public_ip
andnetwork_tier
. - Add partition options
default
,resume_timeout
,suspend_time
,
suspend_timeout
. - Increase
nodeset_name
length to 15 characters (from 7). - Remove
partition_name
length limit. - Add
bandwidth_tier
support to instance templates. - Move
spot
preemptible support to instance template. - Fix login template name not using
group_name
in name schema. - Add
enable_login
to toggle creation of login node resources. - Remove partition level startup-scripts and network mounts.
- Fix Ubuntu 20.04 NVIDIA install.
- Change partition level placement policy to nodeset level.
- Use
topology.conf
to prioritize nodes within nodesets. - Remove debian-10 and vanilla rocky-linux-8 images from build process and
support. - Fix threads per core inference.
- Upgrade Slurm to 23.02.3
Full Changelog: 5.7.4...6.0.0