Skip to content

Commit 507a64b

Browse files
committed
promote KEP-4742 to beta
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
1 parent ef7e11d commit 507a64b

File tree

2 files changed

+50
-44
lines changed

2 files changed

+50
-44
lines changed

keps/sig-node/4742-node-topology-downward-api/README.md

Lines changed: 47 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,10 @@
1212
- [Risks and Mitigations](#risks-and-mitigations)
1313
- [Design Details](#design-details)
1414
- [Test Plan](#test-plan)
15-
- [Prerequisite testing updates](#prerequisite-testing-updates)
16-
- [Unit tests](#unit-tests)
17-
- [Integration tests](#integration-tests)
18-
- [e2e tests](#e2e-tests)
15+
- [Prerequisite testing updates](#prerequisite-testing-updates)
16+
- [Unit tests](#unit-tests)
17+
- [Integration tests](#integration-tests)
18+
- [e2e tests](#e2e-tests)
1919
- [Graduation Criteria](#graduation-criteria)
2020
- [Alpha](#alpha)
2121
- [Beta](#beta)
@@ -59,7 +59,6 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
5959

6060
[kubernetes.io]: https://kubernetes.io/
6161
[kubernetes/enhancements]: https://git.k8s.io/enhancements
62-
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
6362
[kubernetes/website]: https://git.k8s.io/website
6463

6564
## Summary
@@ -76,7 +75,8 @@ promoting a more secure and consistent solution.
7675
Topology awareness is crucial for a growing number of applications and workloads on Kubernetes.
7776
Knowing the Pods's location within the cluster's topology allows for significant performance
7877
optimizations and improved resilience. End-user feedback has highlighted the following key use cases:
79-
* High-bandwidth AI/ML workloads, especially those employing distributed training or inference,
78+
79+
- High-bandwidth AI/ML workloads, especially those employing distributed training or inference,
8080
demonstrate a substantial need for topology awareness. The performance of these workloads is
8181
highly sensitive to communication latency between GPUs. By preferentially scheduling Pods
8282
requiring GPU-to-GPU communication within the same zone or rack, training and inference
@@ -85,10 +85,10 @@ optimizations and improved resilience. End-user feedback has highlighted the fol
8585
within the machine learning framework itself (e.g., Ray, PyTorch Distributed. etc). Kubernetes provides the
8686
necessary foundation for these frameworks to operate efficiently, but the frameworks are ultimately responsible
8787
for leveraging that foundation.
88-
* CNI plugins can leverage topology information to establish more efficient network paths,
88+
- CNI plugins can leverage topology information to establish more efficient network paths,
8989
reducing latency and increasing throughput. For instance, a CNI could prioritize connections
9090
within the same availability zone or rack.
91-
* Distributed stateful applications, such as sharded databases, can use topology information to improve fault tolerance.
91+
- Distributed stateful applications, such as sharded databases, can use topology information to improve fault tolerance.
9292
By spreading replicas and traffic across different failure domains (e.g., zones, racks), these applications can achieve
9393
higher availability and resilience to failures in any one topology.
9494

@@ -98,16 +98,16 @@ we aim to simplify access to this information for Pods via the Downward API.
9898

9999
### Goals
100100

101-
* Values from Node labels `topology.k8s.io/zone`, `topology.k8s.io/region` and `kubernetes.io/hostname` are made
101+
- Values from Node labels `topology.kubernetes.io/zone`, `topology.kubernetes.io/region` and `kubernetes.io/hostname` are made
102102
available via downward API
103-
* Additional node labels can be made available via downward API using admission webhooks that mutate `pods/binding`.
103+
- Additional node labels can be made available via downward API using admission webhooks that mutate `pods/binding`.
104104

105105
### Non-Goals
106106

107-
* Exposing non-standard node labels by default
108-
* Enhnacements to topology-aware scheduling
109-
* Changes to standard topology labels in Kubernetes
110-
* Changes to downward API
107+
- Exposing non-standard node labels by default
108+
- Enhnacements to topology-aware scheduling
109+
- Changes to standard topology labels in Kubernetes
110+
- Changes to downward API
111111

112112
## Proposal
113113

@@ -117,6 +117,7 @@ adding topology labels that match those of the target Node. The Binding REST imp
117117
from the `pods/binding` subresource to the assigned Pod's labels.
118118

119119
Using the downward API to retrieve topology information will look similar to the following:
120+
120121
```
121122
apiVersion: v1
122123
kind: Pod
@@ -147,36 +148,35 @@ spec:
147148

148149
### User Stories
149150

150-
* As an ML engineer, I want to optimize GPU-to-GPU communication during training which requires topology-awareness in my training code.
151-
* As a database developer, I want to leverage topology information to improve fault tolerance of sharded databases on Kubernetes.
152-
* As a developer of a Kubernetes CNI plugin, I want to pass topology information down to the CNI plugin to optimize data paths for container networks.
151+
- As an ML engineer, I want to optimize GPU-to-GPU communication during training which requires topology-awareness in my training code.
152+
- As a database developer, I want to leverage topology information to improve fault tolerance of sharded databases on Kubernetes.
153+
- As a developer of a Kubernetes CNI plugin, I want to pass topology information down to the CNI plugin to optimize data paths for container networks.
153154

154155
### Notes/Constraints/Caveats (Optional)
155156

156157
### Risks and Mitigations
157158

158-
* Scope creep. Allowing additional node information or node label info could
159+
- Scope creep. Allowing additional node information or node label info could
159160
create security issues. This is mitigated by limiting the node labels
160161
to strictly those that are standardized through KEP-1659.
161162

162-
* Exposing sensitive data as node labels to pods. This is mitigated by ensuring
163+
- Exposing sensitive data as node labels to pods. This is mitigated by ensuring
163164
standard topology labels are available to Pods.
164165

165-
* Stale data. Information obtained through node labels is like information
166+
- Stale data. Information obtained through node labels is like information
166167
attained through a configmap or secret mounted to a pod, being passed on
167168
creation but not guaranteed to be immutable and thus should be treated as so.
168169

169-
170170
## Design Details
171171

172-
* A built-in Kubernetes admission plugin, `PodTopologyLabels` will be introduced in kube-apiserver
173-
* The `PodTopologyLabels` admission plugin is responsible for mutating `pods/binding` subresource, adding topology labels matching the target Node.
174-
* `PodTopologyLabels` admission will overwrite `topology.k8s.io/*` labels on Pods.
175-
* A feature gate, `PodTopologyLabelsAdmission` will be introduced in v1.33. Alpha and disabled by default.
172+
- A built-in Kubernetes admission plugin, `PodTopologyLabels` will be introduced in kube-apiserver
173+
- The `PodTopologyLabels` admission plugin is responsible for mutating `pods/binding` subresource, adding topology labels matching the target Node.
174+
- `PodTopologyLabels` admission will overwrite `topology.kubernetes.io/*` labels on Pods.
175+
- A feature gate, `PodTopologyLabelsAdmission` will be introduced in v1.33. Alpha and disabled by default.
176176
The `PodTopologyLabels` admission plugin can only be set when this feature gate is enabled.
177-
* The Binding REST implementation will be updated to copy all labels from `pods/binding` subresource into Pods.
177+
- The Binding REST implementation will be updated to copy all labels from `pods/binding` subresource into Pods.
178178
At this point we will overwrite Pod labels in Binding that are allowed to be exposed via Downward API.
179-
* For exposing additional node labels, at the discretion of the cluster admin, a mutating admission webhook can be used to mutate labels of the `pods/binding` subresource.
179+
- For exposing additional node labels, at the discretion of the cluster admin, a mutating admission webhook can be used to mutate labels of the `pods/binding` subresource.
180180

181181
### Test Plan
182182

@@ -189,29 +189,32 @@ to implement this enhancement.
189189
##### Unit tests
190190

191191
Unit tests will be added for:
192-
* New `PodTopologyLabels` admission plugin
193-
* Binding REST implementation
192+
193+
- New `PodTopologyLabels` admission plugin
194+
- Binding REST implementation
194195

195196
Unit tests will also ensure behavior is exercised when the feature gate is enabled.
196197

197198
##### Integration tests
198199

199200
Integration tests will be added to test the following behavior:
200-
* Pods contain topology labels when `PodTopologyLabels` admission plugin is enabled.
201-
* Topology labels can be expressed in Pod Downward API
202-
* Node labels outside standard topology labels are disallowed
203-
* Topology labels are empty if underlying Node does not specify a topology
201+
202+
- Pods contain topology labels when `PodTopologyLabels` admission plugin is enabled.
203+
- Topology labels can be expressed in Pod Downward API
204+
- Node labels outside standard topology labels are disallowed
205+
- Topology labels are empty if underlying Node does not specify a topology
204206

205207
Integration tests will also ensure behavior is exercised when the feature gate is enabled.
206208

207209
##### e2e tests
208210

209211
E2E tests will be added to test the following scenarios:
210-
* Pods contain topology labels when `PodTopologyLabels` admission plugin is enabled.
211-
* A Pod using downward API to retrieve the underlying topology information about the Node
212-
* A Pod attempting to use downward API to retrieve Node labels that are not the standard topology labels
213-
* A Pod using downward API on a Node that does not contain any topology information.
214-
* Use MutatingAdmissionPolicy to add a custom node label that can be used via downward API.
212+
213+
- Pods contain topology labels when `PodTopologyLabels` admission plugin is enabled.
214+
- A Pod using downward API to retrieve the underlying topology information about the Node
215+
- A Pod attempting to use downward API to retrieve Node labels that are not the standard topology labels
216+
- A Pod using downward API on a Node that does not contain any topology information.
217+
- Use MutatingAdmissionPolicy to add a custom node label that can be used via downward API.
215218

216219
E2E tests will also ensure behavior is exercised when the feature gate is enabled.
217220

@@ -225,7 +228,8 @@ E2E tests will also ensure behavior is exercised when the feature gate is enable
225228

226229
#### Beta
227230

228-
TODO after Alpha.
231+
- Fix standard topology label used in PodTopologyLabels admission controller (topology.k8s.io -> topology.kubernetes.io)
232+
- Unit, integration and e2e tests
229233

230234
#### GA
231235

@@ -290,7 +294,9 @@ Tests will be added to ensure feature gate works as expected.
290294

291295
### Rollout, Upgrade and Rollback Planning
292296

293-
TODO for Beta.
297+
Manual testing will be exercised to ensure that PodTopologyLabelsAdmission can be enabled and then disabled.
298+
When disabled, existing Pods with topology labels will continue to run with those labels and new Pods will no longer
299+
container topology labels.
294300

295301
###### How can a rollout or rollback fail? Can it impact already running workloads?
296302

@@ -333,7 +339,6 @@ N/A
333339

334340
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
335341

336-
337342
- [X] Metrics
338343
- Metric name: `pod_scheduling_attempts`
339344
- [Optional] Aggregation method:
@@ -402,6 +407,7 @@ Revert feature gate and stop consuming downward API.
402407
## Implementation History
403408

404409
- `v1.33`: initial KEP is accepeted and alpha implementation is complete
410+
- `v1.34`: fix topology labels from topology.k8s.io to topology.kubernetes.io
405411

406412
## Drawbacks
407413

keps/sig-node/4742-node-topology-downward-api/kep.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,17 +21,17 @@ see-also:
2121
replaces:
2222

2323
# The target maturity stage in the current dev cycle for this KEP.
24-
stage: alpha
24+
stage: beta
2525

2626
# The most recent milestone for which work toward delivery of this KEP has been
2727
# done. This can be the current (upcoming) milestone, if it is being actively
2828
# worked on.
29-
latest-milestone: "v1.33"
29+
latest-milestone: "v1.34"
3030

3131
# The milestone at which this feature was, or is targeted to be, at each stage.
3232
milestone:
3333
alpha: "v1.33"
34-
# beta: "v1.34"
34+
beta: "v1.34"
3535
# stable: "v1.35"
3636

3737
# The following PRR answers are required at alpha release

0 commit comments

Comments
 (0)