KEP-5313: Placement Decision API for multicluster scheduling #5314

mikeshng · 2025-05-17T23:48:47Z

One-line PR description: Add a new KEP to introduce the Placement Decision API for multicluster scheduling

Issue link: Placement Decision API for multicluster scheduling #5313

Other comments:

/sig multicluster

k8s-ci-robot · 2025-05-17T23:48:57Z

Hi @mikeshng. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

mikeshng · 2025-05-17T23:51:49Z

/assign @deads2k @RainbowMango @zhiying-lin

CC @corentone @elgnay @haoqing0110 @jnpacker @qiujian16 @ryanzhang-oss

k8s-ci-robot · 2025-05-17T23:51:52Z

@mikeshng: GitHub didn't allow me to assign the following users: zhiying-lin.

Note that only kubernetes members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @deads2k @RainbowMango @zhiying-lin

CC @corentone @elgnay @haoqing0110 @jnpacker @qiujian16 @ryanzhang-oss

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

keps/sig-multicluster/5313-placement-decision-api/README.md

iholder101 · 2025-05-19T10:39:01Z

/cc @awels
FYI

k8s-ci-robot · 2025-05-19T10:39:04Z

@iholder101: GitHub didn't allow me to request PR reviews from the following users: awels.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @awels
FYI

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

corentone

Trying to simplify it a bit.

At the same time, will try to suggest sharing our MCO one as the placement.

keps/sig-multicluster/5313-placement-decision-api/README.md

mikeshng · 2025-06-03T16:35:25Z

Closed threads I believe are resolved. Feel free to reopen or comment if there's more to discuss. Thanks!

lauralorenz · 2025-06-03T16:39:34Z

Triage notes: Now waiting for @skitt and @JeremyOT feedback as recent community comments have been addressed and we feel it is ready for you all to look at!

keps/sig-multicluster/5313-placement-decision-api/README.md

skitt · 2025-06-16T15:55:18Z

This looks good to me. There are a few typos etc. but we can take care of that later (I’ll follow up). I take it we’ll revisit the graduation criteria, test plans etc. after the initial merge, is that right?

/lgtm

@JeremyOT ping

skitt · 2025-06-16T15:55:35Z

/ok-to-test

k8s-ci-robot · 2025-06-16T17:24:52Z

New changes are detected. LGTM label has been removed.

k8s-ci-robot · 2025-06-16T17:24:59Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mikeshng
Once this PR has been reviewed and has the lgtm label, please ask for approval from skitt. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-multicluster/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mikeshng · 2025-06-16T18:18:50Z

This looks good to me. There are a few typos etc. but we can take care of that later (I’ll follow up). I take it we’ll revisit the graduation criteria, test plans etc. after the initial merge, is that right?

Right, thanks @skitt !

Just pushed the update-toc check fix.

JeremyOT · 2025-07-22T15:03:09Z

keps/sig-multicluster/5313-placement-decision-api/README.md

+
+### Terminology
+
+- **Placement**: A scheduler request that asks "where should this workload run?".


Don't we need to formalize what a Placement request is before it makes sense to make decisions implementation agnostic?

As is, it seems like you can't really swap out the consumer because it needs to know what the Placement meant to the scheduler references.

Updated the KEP to acknowledge this limitation.
While consumers may still need scheduler specific knowledge for complex scenarios, this API still provides value by standardizing the "where" (cluster list) output, enabling basic workload distribution and reducing integration complexity even without full placement request standardization.
WDYT Jeremy?

Second to Mike. The value of this API is that it provides a single interface for any projects that have a multi-cluster component to out source the scheduling part. For example, Argo's applicationSet can take a cluster generator, Multi-kueue now supports external scheduling. One thing in common between those projects are they all assume that there is an external scheduling component instead of trying to do it in their own project. With the schedulingDecision API, we can now implement a common controller that allows those project to tap into the scheduling capabilities that cluster managers projects (OCM/KubeFleet/Karma.. etc) provide. I think there is clear value in it alone.

I feel Ryan has hit the bullseye, that this represent a common scheduling decision for any project to consume, where they don't necessarily need to be concerned with the implementation. The end result being, these projects or the end consumer could pick a scheduler or multi-cluster implementation that suites their needs, and still have it work with the consumer. ArgoCD ApplicationSets etc... being some of those.

x-post to comment in API example here: https://github.com/kubernetes/enhancements/pull/5314/files#r2304565590

Regarding this part in the KEP:

Placement: A scheduler request that asks "where should this workload run?". Note: This KEP does not standardize the Placement request format itself, only the PlacementDecision output. Consumers may still need scheduler specific knowledge to fully understand placement intent, though basic workload distribution can be achieved by simply deploying to the clusters listed in decisions.

and these comments from this thread:

While consumers may still need scheduler specific knowledge for complex scenarios

this represent a common scheduling decision for any project to consume, where they don't necessarily need to be concerned with the implementation.

I think the comment from @skitt linked above brings up that even in the basic case, without knowing more about the Placement object in this KEP, how can a consumer be using this to out sourcing placement fully since it will need to know about the work it submitted for placement?

JeremyOT · 2025-07-22T15:04:01Z

keps/sig-multicluster/5313-placement-decision-api/README.md

+// ClusterDecision references a target ClusterProfile for placement.
+type ClusterDecision struct {
+  // Reference to the target ClusterProfile.
+  ClusterProfileRef ClusterProfileRef `json:"clusterProfileRef"


ObjectReference?

Yes, updated to ObjectReference as suggested.

JeremyOT · 2025-07-22T15:04:43Z

keps/sig-multicluster/5313-placement-decision-api/README.md

+  The scheduler may choose to populate the reason for each decision for consumers/end-users
+  (ie, for debugging purposes).
+
+- **Update / Reschedule**: The scheduler may add or remove clusters in decisions at any time.


What action is the consumer expected to take on change?

Added actions and examples on change.

JeremyOT · 2025-07-22T15:13:21Z

keps/sig-multicluster/5313-placement-decision-api/README.md

+  When the cluster set itself has not changed, this stable ordering produces an identical set of clusters,
+  so the API server skips the write and no extra change events reach consumers.
+
+- **Delete**: When a placement is no longer required,


What does it mean that a placement is no longer required? What change triggers this?

Updated explanation of what it means to be no longer required and examples of toggles.

Signed-off-by: Mike Ng <ming@redhat.com>

lauralorenz · 2025-08-12T16:40:55Z

Triage notes:

Conversation right now is about the origin position of having a placement decision API without a placement request API.
For now added it as a non-goal -- is that enough?
If not: at least looking for more motivation on the limits and scope of the placement decision API by itself without the placement request part
Want to determine if this KEP can continue forward with the "one half" of the situation (response part without request)

skitt · 2025-08-27T16:24:05Z

keps/sig-multicluster/5313-placement-decision-api/README.md

+  apiGroup: multicluster.x-k8s.io
+  kind: Placement


Can this example be reworked to avoid relying on an object that hasn’t been defined yet, as far as I’m aware? If this KEP were approved, what would implementations look like? Are there common characteristics that an object referenced here would have?

How would a non-scheduler-specific or aware consumer use this without knowledge of the Placement object?

k8s-ci-robot added sig/multicluster Categorizes an issue or PR as relevant to SIG Multicluster. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 17, 2025

k8s-ci-robot requested review from JeremyOT and skitt May 17, 2025 23:48

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 17, 2025

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 17, 2025

k8s-ci-robot assigned deads2k and RainbowMango May 17, 2025

mikeshng mentioned this pull request May 17, 2025

Placement Decision API for multicluster scheduling #5313

Open

4 tasks

haoqing0110 reviewed May 19, 2025

View reviewed changes

keps/sig-multicluster/5313-placement-decision-api/README.md Outdated Show resolved Hide resolved

zhiying-lin reviewed May 19, 2025

View reviewed changes

keps/sig-multicluster/5313-placement-decision-api/README.md Outdated Show resolved Hide resolved

keps/sig-multicluster/5313-placement-decision-api/README.md Outdated Show resolved Hide resolved

keps/sig-multicluster/5313-placement-decision-api/README.md Show resolved Hide resolved

mikeshng force-pushed the placement-decision-api branch from 9ca10ab to 3406d3d Compare May 19, 2025 16:02

corentone reviewed May 19, 2025

View reviewed changes

qiujian16 reviewed May 20, 2025

View reviewed changes

RainbowMango reviewed May 20, 2025

View reviewed changes

keps/sig-multicluster/5313-placement-decision-api/README.md Outdated Show resolved Hide resolved

RainbowMango reviewed May 20, 2025

View reviewed changes

keps/sig-multicluster/5313-placement-decision-api/README.md Show resolved Hide resolved

mikeshng force-pushed the placement-decision-api branch from 3406d3d to 881956f Compare May 25, 2025 15:34

zhiying-lin reviewed May 26, 2025

View reviewed changes

mikeshng force-pushed the placement-decision-api branch 4 times, most recently from 280c3b3 to 971facb Compare May 27, 2025 21:56

ryanzhang-oss reviewed Jun 3, 2025

View reviewed changes

mikeshng force-pushed the placement-decision-api branch from 971facb to f04fa5a Compare June 3, 2025 22:51

k8s-ci-robot assigned skitt Jun 16, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 16, 2025

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 16, 2025

mikeshng force-pushed the placement-decision-api branch from f04fa5a to 272a8e6 Compare June 16, 2025 17:24

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 16, 2025

mikeshng mentioned this pull request Jul 14, 2025

TAG Workloads Foundation Tech Lead Nomination cncf/toc#1660

Closed

JeremyOT reviewed Jul 22, 2025

View reviewed changes

KEP Placement Decision API

a9ba48c

Signed-off-by: Mike Ng <ming@redhat.com>

mikeshng force-pushed the placement-decision-api branch from 272a8e6 to a9ba48c Compare August 11, 2025 18:30

skitt mentioned this pull request Aug 27, 2025

add WG AI Gateway kubernetes/community#8521

Open

skitt reviewed Aug 27, 2025

View reviewed changes


		### Terminology

		- Placement: A scheduler request that asks "where should this workload run?".

KEP-5313: Placement Decision API for multicluster scheduling #5314

Are you sure you want to change the base?

KEP-5313: Placement Decision API for multicluster scheduling #5314

Conversation

mikeshng commented May 17, 2025

Uh oh!

k8s-ci-robot commented May 17, 2025

Uh oh!

mikeshng commented May 17, 2025

Uh oh!

k8s-ci-robot commented May 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

iholder101 commented May 19, 2025

Uh oh!

k8s-ci-robot commented May 19, 2025

Uh oh!

corentone left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mikeshng commented Jun 3, 2025

Uh oh!

lauralorenz commented Jun 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

skitt commented Jun 16, 2025

Uh oh!

skitt commented Jun 16, 2025

Uh oh!

k8s-ci-robot commented Jun 16, 2025

Uh oh!

k8s-ci-robot commented Jun 16, 2025

Uh oh!

mikeshng commented Jun 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryanzhang-oss Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryanzhang-oss Aug 12, 2025 •

edited

Loading