-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Description
Which component are you using?:
/area cluster-autoscaler
What version of the component are you using?:
Component version: 1.32.0
What k8s version are you using (kubectl version
)?:
kubectl version
Output
$ kubectl version Client Version: v1.31.2 Kustomize Version: v5.4.2 Server Version: v1.32.5-eks-5d4a308
What environment is this in?:
We are running cluster-autoscaler on AWS EKS cluster.
What did you expect to happen?:
cluster-autoscaler should be able to delete placeholder nodes from ASG that contains /
in its name.
What happened instead?:
cluster-autoscaler failed to delete placeholder nodes from ASG that contains /
in its name, because it was not recognized as a known ASG.
W0709 11:37:32.508503 30155 static_autoscaler.go:867] Error while trying to delete nodes from k8s-node/edge-prod-aeus2a/rika-test: aws:///eu-south-2a/i-placeholder-k8s-node/edge-prod-aeus2a/rika-test-1 doesn't belong to a known asg
Due to above error, cluster-autoscaler stopped all scaling operations after failing to fix the node group size when the MaxNodeStartupTime
window elapsed.
E0709 11:42:02.380948 30155 static_autoscaler.go:441] Failed to fix node group sizes: failed to decrease k8s-node/edge-prod-aeus2a/rika-test: attempt to delete existing nodes targetSize:3 delta:-2 existingNodes: 3
How to reproduce it (as minimally and precisely as possible):
- Create an ASG that includes
/
in its name. - Trigger a cluster-autoscaler scale-up event to the ASG that lacks capacity (in our case, we used a machine type that does not have sufficient resources available).
- At this point, cluster-autoscaler will try to delete the placeholder node, but but it will fail because the node is not recognized as part of a known ASG.
- If this situation persists beyond the
MaxNodeStartupTime
duration, cluster-autoscaler will halt all scaling operations for all ASGs.
Anything else we need to know?: