-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP-5313: Placement Decision API for multicluster scheduling #5314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Hi @mikeshng. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@mikeshng: GitHub didn't allow me to assign the following users: zhiying-lin. Note that only kubernetes members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/cc @awels |
@iholder101: GitHub didn't allow me to request PR reviews from the following users: awels. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
9ca10ab
to
3406d3d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to simplify it a bit.
At the same time, will try to suggest sharing our MCO one as the placement.
3406d3d
to
881956f
Compare
280c3b3
to
971facb
Compare
Closed threads I believe are resolved. Feel free to reopen or comment if there's more to discuss. Thanks! |
971facb
to
f04fa5a
Compare
This looks good to me. There are a few typos etc. but we can take care of that later (I’ll follow up). I take it we’ll revisit the graduation criteria, test plans etc. after the initial merge, is that right? /lgtm @JeremyOT ping |
/ok-to-test |
f04fa5a
to
272a8e6
Compare
New changes are detected. LGTM label has been removed. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mikeshng The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Right, thanks @skitt ! Just pushed the update-toc check fix. |
|
||
### Terminology | ||
|
||
- **Placement**: A scheduler request that asks "where should this workload run?". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to formalize what a Placement request is before it makes sense to make decisions implementation agnostic?
As is, it seems like you can't really swap out the consumer because it needs to know what the Placement meant to the scheduler references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the KEP to acknowledge this limitation.
While consumers may still need scheduler specific knowledge for complex scenarios, this API still provides value by standardizing the "where" (cluster list) output, enabling basic workload distribution and reducing integration complexity even without full placement request standardization.
WDYT Jeremy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second to Mike. The value of this API is that it provides a single interface for any projects that have a multi-cluster component to out source the scheduling part. For example, Argo's applicationSet can take a cluster generator, Multi-kueue now supports external scheduling. One thing in common between those projects are they all assume that there is an external scheduling component instead of trying to do it in their own project. With the schedulingDecision API, we can now implement a common controller that allows those project to tap into the scheduling capabilities that cluster managers projects (OCM/KubeFleet/Karma.. etc) provide. I think there is clear value in it alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel Ryan has hit the bullseye, that this represent a common scheduling decision for any project to consume, where they don't necessarily need to be concerned with the implementation. The end result being, these projects or the end consumer could pick a scheduler or multi-cluster implementation that suites their needs, and still have it work with the consumer. ArgoCD ApplicationSets etc... being some of those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x-post to comment in API example here: https://github.com/kubernetes/enhancements/pull/5314/files#r2304565590
Regarding this part in the KEP:
Placement: A scheduler request that asks "where should this workload run?". Note: This KEP does not standardize the Placement request format itself, only the PlacementDecision output. Consumers may still need scheduler specific knowledge to fully understand placement intent, though basic workload distribution can be achieved by simply deploying to the clusters listed in decisions.
and these comments from this thread:
While consumers may still need scheduler specific knowledge for complex scenarios
this represent a common scheduling decision for any project to consume, where they don't necessarily need to be concerned with the implementation.
I think the comment from @skitt linked above brings up that even in the basic case, without knowing more about the Placement
object in this KEP, how can a consumer be using this to out sourcing placement fully since it will need to know about the work it submitted for placement?
// ClusterDecision references a target ClusterProfile for placement. | ||
type ClusterDecision struct { | ||
// Reference to the target ClusterProfile. | ||
ClusterProfileRef ClusterProfileRef `json:"clusterProfileRef" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ObjectReference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, updated to ObjectReference as suggested.
The scheduler may choose to populate the reason for each decision for consumers/end-users | ||
(ie, for debugging purposes). | ||
|
||
- **Update / Reschedule**: The scheduler may add or remove clusters in decisions at any time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What action is the consumer expected to take on change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added actions and examples on change.
When the cluster set itself has not changed, this stable ordering produces an identical set of clusters, | ||
so the API server skips the write and no extra change events reach consumers. | ||
|
||
- **Delete**: When a placement is no longer required, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean that a placement is no longer required? What change triggers this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated explanation of what it means to be no longer required and examples of toggles.
Signed-off-by: Mike Ng <ming@redhat.com>
272a8e6
to
a9ba48c
Compare
Triage notes:
|
apiGroup: multicluster.x-k8s.io | ||
kind: Placement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this example be reworked to avoid relying on an object that hasn’t been defined yet, as far as I’m aware? If this KEP were approved, what would implementations look like? Are there common characteristics that an object referenced here would have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would a non-scheduler-specific or aware consumer use this without knowledge of the Placement
object?
/sig multicluster