Skip to content

Releases: SchedMD/slurm-gcp

6.1.1

30 Aug 20:14
Compare
Choose a tag to compare
  • Fix suspend issue with TPU nodes
  • Add TPU job example
  • Changed slurm dependency from man2html to man2html-base and man2html-core to
    reduce image size
  • Changed default docker image name to remove the OS reference

Full Changelog: 6.1.0...6.1.1

6.1.0

30 Aug 20:13
Compare
Choose a tag to compare
  • Add on_host_maintenance to packer module to support instances with GPUs.
  • Fix retry of powering up static nodes on failure.
  • Add support for H3 machines and enumerated multi-socket processors.
  • Fix munge failing after manual reboot of node.
  • [Beta feature] Added support for TPU-vm nodes.
  • [Beta feature] Added support for TPU-vm multi-rank nodes.
  • Add ignore_prefer_validation to SchedulerParameters in generated cloud.conf.
  • Remove unaltered centos-7 image from actively published and supported images.
  • Upgrade installed Slurm to 23.02.4.
  • Fix CUDA install on Ubuntu 20.04.

Full Changelog: 6.0.0...6.1.0

6.0.0

30 Aug 20:13
Compare
Choose a tag to compare
  • Add slurm cluster management daemon
  • Update default Slurm version to 23.02.2.
  • Make slurm_cluster root module use terraform 1.3 and optional object fields.
  • Reconfigure now is a service on the instances.
  • Move from project metadata to GCS bucket to store cluster files.
  • Factored out nodeset modules (regular, dynamic) from partition module.
  • Replace zone_policy_* with zones in nodeset module.
  • Replace access_config with enable_public_ip and network_tier.
  • Add partition options default, resume_timeout, suspend_time,
    suspend_timeout.
  • Increase nodeset_name length to 15 characters (from 7).
  • Remove partition_name length limit.
  • Add bandwidth_tier support to instance templates.
  • Move spot preemptible support to instance template.
  • Fix login template name not using group_name in name schema.
  • Add enable_login to toggle creation of login node resources.
  • Remove partition level startup-scripts and network mounts.
  • Fix Ubuntu 20.04 NVIDIA install.
  • Change partition level placement policy to nodeset level.
  • Use topology.conf to prioritize nodes within nodesets.
  • Remove debian-10 and vanilla rocky-linux-8 images from build process and
    support.
  • Fix threads per core inference.
  • Upgrade Slurm to 23.02.3

Full Changelog: 5.7.4...6.0.0