Skip to content

Releases: SchedMD/slurm-gcp

6.3.1

09 Jan 19:59
Compare
Choose a tag to compare
  • Add reserved property for nodeset_tpu
  • update lustre repository url

Full Changelog: 6.3.0...6.3.1

5.10.1

09 Jan 19:58
Compare
Choose a tag to compare
  • Add maintenance_interval

Full Changelog: 5.10.0...5.10.1

6.3.0

03 Jan 15:57
Compare
Choose a tag to compare
  • Upgrade installed Slurm to 23.02.7
  • Fix deprecation warning in google_secret_manager_secret.
  • Fix TPU delete_node API return message.

Full Changelog: 6.2.0...6.3.0

5.10.0

14 Dec 23:11
Compare
Choose a tag to compare
  • Upgrade slurm to 23.02.7
  • Fix slurmsync on reconfig when removing nodes.

Full Changelog: 5.9.1...5.10.0

6.2.0

01 Nov 18:24
Compare
Choose a tag to compare
  • Reverse logic in valid_placement_nodes
  • Add slurm_gcp_plugin support.
  • Add reservation affinity to nodesets via reservation_name option.
  • Change TPU node conf based on tpu version instead of TPU model.
  • Add support for TPUv4
  • Upgrade installed Slurm to 23.02.5

Full Changelog: 6.1.2...6.2.0

5.9.1

05 Oct 22:07
Compare
Choose a tag to compare
  • Use reservation placement policy if placement is enabled, and a reservation is
    specified.

Full Changelog: 5.9.0...5.9.1

5.9.0

21 Sep 17:57
Compare
Choose a tag to compare
  • Remove spurious log message on resume, referring to "Reservation name".
  • Support A3 VMs in compact placement policies.
  • Migrate from network overrides in bulkInsert to honoring instance templates.
  • Add additional_networks support to instance template and partition_nodes.
  • Support Tier 1 networking in instance templates.
  • Reverse logic in valid_placement_nodes

Full Changelog: 5.8.0...5.9.0

6.1.2

30 Aug 20:09
Compare
Choose a tag to compare
  • Fix accelerator optimized machine type SMT handling.
  • Prefix user visible errors with its source.
  • Fix accelerator optimized machine type socket handling.
  • Only compare config.yaml blob to cache file.
  • Fix login nodes appearing as compute nodes in Slurm output.
  • Add enable_debug_logging and extra_logging_flags to terraform.
  • Only attempt static node resume when node is powered down.
  • Fix CUDA on Ubuntu by installing CUDA via runfile alongside NVIDIA driver from signed repo.
  • Fix conf generation issue on reconfiguration.

Full Changelog: 6.1.1...6.1.2

5.8.0

30 Aug 20:10
Compare
Choose a tag to compare
  • Fix login nodes not reconfiguring when enable_reconfigure=true.
  • Do not temporarily disable partitions during reconfigure process.
  • Fix login nodes appearing as compute nodes in Slurm output.
  • Only attempt static node resume when node is powered down.
  • Fix CUDA on Ubuntu by installing CUDA via runfile alongside NVIDIA driver from
    signed repo.
  • Fix conf generation issue on reconfiguration.

Full Changelog: 5.7.6...5.8.0

5.7.6

30 Aug 20:12
Compare
Choose a tag to compare
  • Prefix user visible errors with its source.
  • Fix accelerator optimized machine type SMT handling.
  • Fix accelerator optimized machine type socket handling.

Full Changelog: 5.7.5...5.7.6