-
Notifications
You must be signed in to change notification settings - Fork 4.2k
cluster-autoscaler: add ignore-instance-creation-stockout-errors #8322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Hi @dsafdsa1. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: dsafdsa1 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
d9c87ff
to
d7dbcd0
Compare
The behaviour introduced in this PR doesn't seem related to any universal autoscaling logic, but rather "treat this VM as if it were in a different state than we (cloud provider) tell you it is because we (cloud provider) also tell you this group is special in this specific way only". If we want to always ignore the errors from certain instances, why not modify the affected cloud provider(s) to not attach these errors to the instances from this group in the first place? When we fill instances cache, we fetch by the node group anyway, so there's easy access to this information. I'm open to counter arguments, especially if this is a use-case that can be shared by other cloud providers, or if we use the information about stockouts in another place where we actually want to see it. Code-wise, LGTM. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR adds an option to ignore instance creation stockout errors, which would normally trigger a node group backoff and cause the scale-up to fail. Those nodes are recognized as
unregistered
and notunready
, therefore cluster health is not affected through settings likemax-total-unready-percentage
andok-total-unready-count
.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: