Skip to content

Commit dc4c614

Browse files
rueianmjacar
authored andcommitted
[core][autoscaler] add release tests on RAY_UP_enable_autoscaler_v2=1 (#54786)
## Why are these changes needed? Since ray 2.48.0 has been released, the autoscaler v2 in the `latest` ray image should have cluster launcher support, and thus we can test autoscaler v2 on the cluster launcher in release tests. This PR adds those tests. <img width="1684" height="637" alt="image" src="https://github.com/user-attachments/assets/b9032008-fd8b-41ef-839e-10be0692779c" /> ## Related issue number <!-- For example: "Closes #1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [x] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Rueian <rueian@anyscale.com> Signed-off-by: Michael Acar <michael.j.acar@gmail.com>
1 parent b6fc13c commit dc4c614

File tree

1 file changed

+67
-0
lines changed

1 file changed

+67
-0
lines changed

release/release_tests.yaml

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4006,6 +4006,11 @@
40064006
timeout: 2400
40074007
script: python launch_and_verify_cluster.py aws/tests/aws_cluster.yaml --num-expected-nodes 2 --retries 10
40084008

4009+
variations:
4010+
- __suffix__: v1
4011+
- __suffix__: v2
4012+
run:
4013+
script: RAY_UP_enable_autoscaler_v2=1 python launch_and_verify_cluster.py aws/tests/aws_cluster.yaml --num-expected-nodes 2 --retries 10
40094014

40104015
- name: aws_cluster_launcher_nightly_image
40114016
group: cluster-launcher-test
@@ -4021,6 +4026,11 @@
40214026
timeout: 2400
40224027
script: python launch_and_verify_cluster.py aws/tests/aws_cluster.yaml --num-expected-nodes 2 --retries 10 --docker-override nightly
40234028

4029+
variations:
4030+
- __suffix__: v1
4031+
- __suffix__: v2
4032+
run:
4033+
script: RAY_UP_enable_autoscaler_v2=1 python launch_and_verify_cluster.py aws/tests/aws_cluster.yaml --num-expected-nodes 2 --retries 10 --docker-override nightly
40244034

40254035
- name: aws_cluster_launcher_latest_image
40264036
group: cluster-launcher-test
@@ -4036,6 +4046,11 @@
40364046
timeout: 2400
40374047
script: python launch_and_verify_cluster.py aws/tests/aws_cluster.yaml --num-expected-nodes 2 --retries 10 --docker-override latest
40384048

4049+
variations:
4050+
- __suffix__: v1
4051+
- __suffix__: v2
4052+
run:
4053+
script: RAY_UP_enable_autoscaler_v2=1 python launch_and_verify_cluster.py aws/tests/aws_cluster.yaml --num-expected-nodes 2 --retries 10 --docker-override latest
40394054

40404055
- name: aws_cluster_launcher_release_image
40414056
group: cluster-launcher-test
@@ -4051,6 +4066,11 @@
40514066
timeout: 2400
40524067
script: python launch_and_verify_cluster.py aws/tests/aws_cluster.yaml --num-expected-nodes 2 --retries 10 --docker-override commit
40534068

4069+
variations:
4070+
- __suffix__: v1
4071+
- __suffix__: v2
4072+
run:
4073+
script: RAY_UP_enable_autoscaler_v2=1 python launch_and_verify_cluster.py aws/tests/aws_cluster.yaml --num-expected-nodes 2 --retries 10 --docker-override commit
40544074

40554075

40564076
- name: aws_cluster_launcher_minimal
@@ -4067,6 +4087,12 @@
40674087
timeout: 1200
40684088
script: python launch_and_verify_cluster.py aws/example-minimal.yaml
40694089

4090+
variations:
4091+
- __suffix__: v1
4092+
- __suffix__: v2
4093+
run:
4094+
script: RAY_UP_enable_autoscaler_v2=1 python launch_and_verify_cluster.py aws/example-minimal.yaml
4095+
40704096
- name: aws_cluster_launcher_full
40714097
group: cluster-launcher-test
40724098
working_dir: ../python/ray/autoscaler/
@@ -4081,6 +4107,12 @@
40814107
timeout: 3000
40824108
script: python launch_and_verify_cluster.py aws/example-full.yaml --num-expected-nodes 2 --retries 20 --docker-override latest
40834109

4110+
variations:
4111+
- __suffix__: v1
4112+
- __suffix__: v2
4113+
run:
4114+
script: RAY_UP_enable_autoscaler_v2=1 python launch_and_verify_cluster.py aws/example-full.yaml --num-expected-nodes 2 --retries 20 --docker-override latest
4115+
40844116
- name: gcp_cluster_launcher_minimal
40854117
group: cluster-launcher-test
40864118
working_dir: ../python/ray/autoscaler/
@@ -4098,6 +4130,12 @@
40984130
timeout: 1200
40994131
script: python launch_and_verify_cluster.py gcp/example-minimal-pinned.yaml
41004132

4133+
variations:
4134+
- __suffix__: v1
4135+
- __suffix__: v2
4136+
run:
4137+
script: RAY_UP_enable_autoscaler_v2=1 python launch_and_verify_cluster.py gcp/example-minimal-pinned.yaml
4138+
41014139
- name: gcp_cluster_launcher_full
41024140
group: cluster-launcher-test
41034141
working_dir: ../python/ray/autoscaler/
@@ -4115,6 +4153,12 @@
41154153
timeout: 4800
41164154
script: python launch_and_verify_cluster.py gcp/example-full.yaml --num-expected-nodes 2 --retries 30
41174155

4156+
variations:
4157+
- __suffix__: v1
4158+
- __suffix__: v2
4159+
run:
4160+
script: RAY_UP_enable_autoscaler_v2=1 python launch_and_verify_cluster.py gcp/example-full.yaml --num-expected-nodes 2 --retries 30 --docker-override latest
4161+
41184162
- name: gcp_cluster_launcher_latest_image
41194163
group: cluster-launcher-test
41204164
working_dir: ../python/ray/autoscaler/
@@ -4132,6 +4176,12 @@
41324176
timeout: 3600
41334177
script: python launch_and_verify_cluster.py gcp/example-full.yaml --num-expected-nodes 2 --retries 20 --docker-override latest
41344178

4179+
variations:
4180+
- __suffix__: v1
4181+
- __suffix__: v2
4182+
run:
4183+
script: RAY_UP_enable_autoscaler_v2=1 python launch_and_verify_cluster.py gcp/example-full.yaml --num-expected-nodes 2 --retries 20 --docker-override latest
4184+
41354185
- name: gcp_cluster_launcher_nightly_image
41364186
group: cluster-launcher-test
41374187
working_dir: ../python/ray/autoscaler/
@@ -4149,6 +4199,11 @@
41494199
timeout: 3600
41504200
script: python launch_and_verify_cluster.py gcp/example-full.yaml --num-expected-nodes 2 --retries 20 --docker-override nightly
41514201

4202+
variations:
4203+
- __suffix__: v1
4204+
- __suffix__: v2
4205+
run:
4206+
script: RAY_UP_enable_autoscaler_v2=1 python launch_and_verify_cluster.py gcp/example-full.yaml --num-expected-nodes 2 --retries 20 --docker-override nightly
41524207

41534208
- name: gcp_cluster_launcher_release_image
41544209
group: cluster-launcher-test
@@ -4167,6 +4222,12 @@
41674222
timeout: 3600
41684223
script: python launch_and_verify_cluster.py gcp/example-full.yaml --num-expected-nodes 2 --retries 20 --docker-override commit
41694224

4225+
variations:
4226+
- __suffix__: v1
4227+
- __suffix__: v2
4228+
run:
4229+
script: RAY_UP_enable_autoscaler_v2=1 python launch_and_verify_cluster.py gcp/example-full.yaml --num-expected-nodes 2 --retries 20 --docker-override commit
4230+
41704231
- name: gcp_cluster_launcher_gpu_docker
41714232
group: cluster-launcher-test
41724233
working_dir: ../python/ray/autoscaler/
@@ -4184,6 +4245,12 @@
41844245
timeout: 1200
41854246
script: python launch_and_verify_cluster.py gcp/example-gpu-docker.yaml
41864247

4248+
variations:
4249+
- __suffix__: v1
4250+
- __suffix__: v2
4251+
run:
4252+
script: RAY_UP_enable_autoscaler_v2=1 python launch_and_verify_cluster.py gcp/example-gpu-docker.yaml
4253+
41874254
- name: autoscaler_aws
41884255
group: autoscaler-test
41894256
working_dir: autoscaling_tests

0 commit comments

Comments
 (0)