What's Changed
- Added
batch-launch
command to launch multiple models in a batch, currently this feature only supports single node (single/multiple GPUs within a single node) launches.
- Added
cleanup
command to remove logs using various filters
- Decoupled all the environment settings to an environment.yaml file, further simplifies migration process to different clusters
- Enabled RDMA/Infiniband support in the default container environment
- Adapted
vec-inf
package to Vector Killarney cluster:
- Use apptainer instead of singularity
- Set account and working directory as required fields, can also be set with environment variables
- QoS, partition are now optional, added resource type for specifying type of compute
- Added
env
and config
option for parsing additional environment variables and directly specifying a difference config to use for launch
command
- Code base refactor, removed redundant code in generated scripts
list
command now sorts display by model name
- Added model tracking sheet for tracking cached model weights and model configurations on Killarney
New Contributors
Contributors
@XkunW @scarere @kohankhaki