Releases: stackhpc/ansible-slurm-appliance
v2.8.2
v2.8.1
What's Changed
This is a minor update on v2.8 to avoid unnecessary delete/recreate of compute nodes:
- Avoid compute node replacements due to optional user_data by @MoteHue in #847
- Use 'nodegroups' param in controlled rebuild docs by @MoteHue in #849
For images see release v2.8.
Full Changelog: v2.8...v2.8.1
v2.8
What's Changed
Improved GPU configuration/support
- Support automatic GRES configuration for NVIDIA GPUs by @sjpb in #820
- Add option to install nvidia-fabricmanager by @claudia-lola in #836
- Add support for GRES to ondemand apps by @sjpb in #837
- Adds bandwidth.yml playbook for NVIDIA nvbandwidth by @claudia-lola in #834
- Make eessi configure gpu node automatically by @claudia-lola in #841
- Bump CUDA to 13.0.2 and NVIDIA driver to 580.105.08 by @priteau in #823
Slurm configuration
- Bump OpenHPC role to v1.4.0 by @sjpb in #818. Adds:
- Enable use of custom Slurm builds by @sjpb in stackhpc/ansible-role-openhpc#163
- CI: Switch to latest rockylinux/rockylinux images by @priteau in stackhpc/ansible-role-openhpc#198
- Add support for mpi.conf templating by @bertiethorpe in stackhpc/ansible-role-openhpc#201
- Bump OpenHPC role to v1.4.1 by @bertiethorpe in #822 - fixes mpi.conf templating
New/improved features
- Add support for InfiniBand interfaces to NHC by @sjpb in #821
- Add tool to set image properties by @sjpb in #829
Docs and other
- Improve pulp docs by @sjpb in #819
- Fix gpg check for cernvmfs installs by @bertiethorpe in #816
- Remove ansible-lint warnings by @bertiethorpe in #817
- Replace whitespace in NHC mount checks by @sjpb in #824
- Allow fixed ip lists to be longer than nodes list by @sjpb in #830
- Add retries to CI tofu apply by @bertiethorpe in #833
- Don't install hpl source during extra builds by @sjpb in #828
- Add docs for eessi by @claudia-lola in #827
- Fix ansible-ssh changes due to linting by @sjpb in #838
- Use Ark repofiles for additional repos by @bertiethorpe in #832
- Set image properties for CI image build and sync by @bertiethorpe in #839
- Run trivy scans on main, to help reporting by @JohnGarbutt in #842
- Describe buildenv in EESSI docs by @claudia-lola in #845
Full Changelog: v2.7...v2.8
Images
Two new images are available:
- RL8: openhpc-RL8-251119-1202-332ac921
- RL9: openhpc-RL9-251119-1202-332ac921
v2.7
What's Changed
- Fix image sync workflow for new larger fat images by @sjpb in #805
- Bump codeserver app to 2025.09.1 to remove password prompt by @bertiethorpe in #806
- Use (group) syntax in access.conf by @priteau in #804
- Remove extra lines in activate scripts by @priteau in #803
- Update fatimages to include OnDemand codeserver password fix by @bertiethorpe in #808
- Improve build group definitions by @sjpb in #788
- Expose FIPs in inventory hosts file by @claudia-lola in #807
- Allow VS Code Remote SSH while blocking NFS mounts by @priteau in #799
- Delete build VMs in CI nightly cleanup by @sjpb in #777
Images
Two new images are available:
- RockyLinux 8: openhpc-RL8-251002-1537-1d21952c
- RockyLinux 9: openhpc-RL9-251002-1456-1d21952c
New Contributors
- @claudia-lola made their first contribution in #807
Full Changelog: v2.6.1...v2.7
v2.6.1
What's Changed
- Fix .caas secrets not persisting post-reimage and skip tofu vars validation for .caas by @wtripp180901 in #798
Images
No new images at this release, see https://github.com/stackhpc/ansible-slurm-appliance/releases/tag/v2.6.
Full Changelog: v2.6...v2.6.1
v2.6
What's Changed
- Add validation for tofu-templated vars by @sjpb in #775
- Fix error message for state_volume_provisioning validation by @sjpb in #780
- Enable linting by @maxstack in #732
- Define login subgroups in Ansible inventory by @priteau in #727
- Fix label in Jupyter Notebook form by @priteau in #787
- Ignore changes to port binding and dhcp options by @sjpb in #778
- Expose vgpu group in site inventory by @priteau in #786
- Add documentation for OpenTofu remote state by @sjpb in #784
- Remove unused cloudalchemy alertmanager role by @sjpb in #781
- Fix various typos by @priteau in #796
- Update dnf repo snapshots (+ source repos, removes RL8 Lustre build CI) by @sjpb in #792
- Validate nodegroup names by @sjpb in #793
- Bump Open OnDemand to v4 & install apps in fatimage by @bertiethorpe in #782
- Support software raid root disks in stackhpc images by @sjpb in #785
- Pin bcrypt to 4.3.0 by @wtripp180901 in #801
Images
Two new images are available:
- RockyLinux 8: openhpc-RL8-250925-1639-62d67ae3
- RockyLinux 9: openhpc-RL9-250925-1639-62d67ae3
New Contributors
Full Changelog: v2.4.2...v2.6
v2.5
What's Changed
- Refactor Pulp repo definitions and add more Pulp documentation by @wtripp180901 in #760
- Fix incorrect use of "partition" in OpenTofu node group variable definitions by @sjpb in #771
- Bump Pulp snapshots for RL 9.6 by @priteau in #772
- Add support for setting server group in scheduler hints by @sjpb in #773
- Bump CUDA to 13.0.1 and NVIDIA driver to 580.82.07 by @priteau in #776
- Make CaaS specific role:
persist_openhpc_secretsidempotent by @bertiethorpe in #774
Full Changelog: v2.4.1...v2.5
Images
Two new images are available:
RockyLinux 8: openhpc-RL8-250820-0800-767addd8
RockyLinux 9: openhpc-RL9-250908-2047-d90ebd0e
v2.4.1
What's Changed
Full Changelog: v2.4...v2.4.1
Images
No new images at this release, see https://github.com/stackhpc/ansible-slurm-appliance/releases/tag/v2.4.
v2.4
What's Changed
- Remove ansible.cfg from CaaS environment by @bertiethorpe in #766
- Add filesystems docs by @MoteHue in #710
- CaaS pre-hook fix for galaxy requirements validation by @bertiethorpe in #767
- Production end to end deployment docs by @MoteHue in #678
Full Changelog: v2.3...v2.4
Images
No new images are provided at this release
v2.3
Important
The images in this release are missing the code-server and rstudio Open Ondemand apps. If they are required it is recommended to use v2.7 instead which fixes that plus an issue with the code-server login.
Key Changes
- Move cookiecutter Tofu to new site environment by @wtripp180901 in #751
Caution
Will cause merge conflicts with existing site environments. See PR #751 description for steps to resolve
- Support clusters with no outbound internet by @sjpb in #717
- Add support for topology-aware scheduling by @wtripp180901 in #737
- Add Rstudio, VSCode, Matlab to OOD application catalogue by @bertiethorpe in #738
What's Changed
All PRs:
- Add LICENSE file by @priteau in #725
- Bump DOCA to 2.9.3 by @priteau in #728
- Support clusters with no outbound internet by @sjpb in #717
- Add s-nail package by @priteau in #713
- CI: Use GitHub token for Packer workflows by @priteau in #729
- Fix detection of CUDA package version by @priteau in #731
- Update production.md docs by @sjpb in #730
- Allow using a specific FIP for a build VM by @sjpb in #734
- Fix convenience variables for proxy by @sjpb in #735
- Add support for topology-aware scheduling by @wtripp180901 in #737
- Bump openstack cli to get working FIP commands by @sjpb in #740
- Support passing freeipa server cert on client enrolment by @sjpb in #739
- Deprecate ofed role by @sjpb in #741
- Allow adding additional dnf repos to rewrite by @sjpb in #744
- Add metadata service to no_proxy defaults by @sjpb in #745
- Make rebuilding slurm optional for
cudabuilds by @sjpb in #746 - Address various issues with production docs by @sjpb in #747
- Update appliance to Rocky 9.6 + Update Lustre to 2.15.7 by @wtripp180901 in #699
- Move HPL source download to fatimage build by @bertiethorpe in #743
- Bump openhpc role by @sjpb in #690
- Move cookiecutter Tofu to new site environment by @wtripp180901 in #751
- Upload CI images to Leafcloud S3 for image syncing by @bertiethorpe in #752
- Don't set topology plugin if topology group not enabled by @sjpb in #754
- Fixup additional_nodes inventory group by @sjpb in #749
- Allow specifying security groups for individual login groups by @wtripp180901 in #758
- Allow setting volume type for node group extra_volumes parameter by @sjpb in #755
- Add Rstudio, VSCode, Matlab to OOD application catalogue by @bertiethorpe in #738
- Allow setting config_drive by @wtripp180901 in #756
- Add option for additional user data by @wtripp180901 in #757
- Configure process tracking and accounting to use cgroups by @bertiethorpe in #762
- Bump CUDA to 13.0 and NVIDIA driver to 580 by @priteau in #764
- Fix parse error in pingmatrix output by @oneswig in #761
Full Changelog: v2.2...v2.3
New Contributors
Images
Two new images are available:
RockyLinux 8: openhpc-RL8-250808-1727-faa44755
RockyLinux 9: openhpc-RL9-250808-1727-faa44755