You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -76,7 +75,8 @@ promoting a more secure and consistent solution.
76
75
Topology awareness is crucial for a growing number of applications and workloads on Kubernetes.
77
76
Knowing the Pods's location within the cluster's topology allows for significant performance
78
77
optimizations and improved resilience. End-user feedback has highlighted the following key use cases:
79
-
* High-bandwidth AI/ML workloads, especially those employing distributed training or inference,
78
+
79
+
- High-bandwidth AI/ML workloads, especially those employing distributed training or inference,
80
80
demonstrate a substantial need for topology awareness. The performance of these workloads is
81
81
highly sensitive to communication latency between GPUs. By preferentially scheduling Pods
82
82
requiring GPU-to-GPU communication within the same zone or rack, training and inference
@@ -85,10 +85,10 @@ optimizations and improved resilience. End-user feedback has highlighted the fol
85
85
within the machine learning framework itself (e.g., Ray, PyTorch Distributed. etc). Kubernetes provides the
86
86
necessary foundation for these frameworks to operate efficiently, but the frameworks are ultimately responsible
87
87
for leveraging that foundation.
88
-
* CNI plugins can leverage topology information to establish more efficient network paths,
88
+
- CNI plugins can leverage topology information to establish more efficient network paths,
89
89
reducing latency and increasing throughput. For instance, a CNI could prioritize connections
90
90
within the same availability zone or rack.
91
-
* Distributed stateful applications, such as sharded databases, can use topology information to improve fault tolerance.
91
+
- Distributed stateful applications, such as sharded databases, can use topology information to improve fault tolerance.
92
92
By spreading replicas and traffic across different failure domains (e.g., zones, racks), these applications can achieve
93
93
higher availability and resilience to failures in any one topology.
94
94
@@ -98,16 +98,16 @@ we aim to simplify access to this information for Pods via the Downward API.
98
98
99
99
### Goals
100
100
101
-
* Values from Node labels `topology.k8s.io/zone`, `topology.k8s.io/region` and `kubernetes.io/hostname` are made
101
+
- Values from Node labels `topology.kubernetes.io/zone`, `topology.kubernetes.io/region` and `kubernetes.io/hostname` are made
102
102
available via downward API
103
-
* Additional node labels can be made available via downward API using admission webhooks that mutate `pods/binding`.
103
+
- Additional node labels can be made available via downward API using admission webhooks that mutate `pods/binding`.
104
104
105
105
### Non-Goals
106
106
107
-
* Exposing non-standard node labels by default
108
-
* Enhnacements to topology-aware scheduling
109
-
* Changes to standard topology labels in Kubernetes
110
-
* Changes to downward API
107
+
- Exposing non-standard node labels by default
108
+
- Enhnacements to topology-aware scheduling
109
+
- Changes to standard topology labels in Kubernetes
110
+
- Changes to downward API
111
111
112
112
## Proposal
113
113
@@ -117,6 +117,7 @@ adding topology labels that match those of the target Node. The Binding REST imp
117
117
from the `pods/binding` subresource to the assigned Pod's labels.
118
118
119
119
Using the downward API to retrieve topology information will look similar to the following:
120
+
120
121
```
121
122
apiVersion: v1
122
123
kind: Pod
@@ -147,36 +148,35 @@ spec:
147
148
148
149
### User Stories
149
150
150
-
* As an ML engineer, I want to optimize GPU-to-GPU communication during training which requires topology-awareness in my training code.
151
-
* As a database developer, I want to leverage topology information to improve fault tolerance of sharded databases on Kubernetes.
152
-
* As a developer of a Kubernetes CNI plugin, I want to pass topology information down to the CNI plugin to optimize data paths for container networks.
151
+
- As an ML engineer, I want to optimize GPU-to-GPU communication during training which requires topology-awareness in my training code.
152
+
- As a database developer, I want to leverage topology information to improve fault tolerance of sharded databases on Kubernetes.
153
+
- As a developer of a Kubernetes CNI plugin, I want to pass topology information down to the CNI plugin to optimize data paths for container networks.
153
154
154
155
### Notes/Constraints/Caveats (Optional)
155
156
156
157
### Risks and Mitigations
157
158
158
-
* Scope creep. Allowing additional node information or node label info could
159
+
- Scope creep. Allowing additional node information or node label info could
159
160
create security issues. This is mitigated by limiting the node labels
160
161
to strictly those that are standardized through KEP-1659.
161
162
162
-
* Exposing sensitive data as node labels to pods. This is mitigated by ensuring
163
+
- Exposing sensitive data as node labels to pods. This is mitigated by ensuring
163
164
standard topology labels are available to Pods.
164
165
165
-
* Stale data. Information obtained through node labels is like information
166
+
- Stale data. Information obtained through node labels is like information
166
167
attained through a configmap or secret mounted to a pod, being passed on
167
168
creation but not guaranteed to be immutable and thus should be treated as so.
168
169
169
-
170
170
## Design Details
171
171
172
-
* A built-in Kubernetes admission plugin, `PodTopologyLabels` will be introduced in kube-apiserver
173
-
* The `PodTopologyLabels` admission plugin is responsible for mutating `pods/binding` subresource, adding topology labels matching the target Node.
174
-
*`PodTopologyLabels` admission will overwrite `topology.k8s.io/*` labels on Pods.
175
-
* A feature gate, `PodTopologyLabelsAdmission` will be introduced in v1.33. Alpha and disabled by default.
172
+
- A built-in Kubernetes admission plugin, `PodTopologyLabels` will be introduced in kube-apiserver
173
+
- The `PodTopologyLabels` admission plugin is responsible for mutating `pods/binding` subresource, adding topology labels matching the target Node.
174
+
-`PodTopologyLabels` admission will overwrite `topology.kubernetes.io/*` labels on Pods.
175
+
- A feature gate, `PodTopologyLabelsAdmission` will be introduced in v1.33. Alpha and disabled by default.
176
176
The `PodTopologyLabels` admission plugin can only be set when this feature gate is enabled.
177
-
* The Binding REST implementation will be updated to copy all labels from `pods/binding` subresource into Pods.
177
+
- The Binding REST implementation will be updated to copy all labels from `pods/binding` subresource into Pods.
178
178
At this point we will overwrite Pod labels in Binding that are allowed to be exposed via Downward API.
179
-
* For exposing additional node labels, at the discretion of the cluster admin, a mutating admission webhook can be used to mutate labels of the `pods/binding` subresource.
179
+
- For exposing additional node labels, at the discretion of the cluster admin, a mutating admission webhook can be used to mutate labels of the `pods/binding` subresource.
180
180
181
181
### Test Plan
182
182
@@ -189,29 +189,32 @@ to implement this enhancement.
189
189
##### Unit tests
190
190
191
191
Unit tests will be added for:
192
-
* New `PodTopologyLabels` admission plugin
193
-
* Binding REST implementation
192
+
193
+
- New `PodTopologyLabels` admission plugin
194
+
- Binding REST implementation
194
195
195
196
Unit tests will also ensure behavior is exercised when the feature gate is enabled.
196
197
197
198
##### Integration tests
198
199
199
200
Integration tests will be added to test the following behavior:
200
-
* Pods contain topology labels when `PodTopologyLabels` admission plugin is enabled.
201
-
* Topology labels can be expressed in Pod Downward API
202
-
* Node labels outside standard topology labels are disallowed
203
-
* Topology labels are empty if underlying Node does not specify a topology
201
+
202
+
- Pods contain topology labels when `PodTopologyLabels` admission plugin is enabled.
203
+
- Topology labels can be expressed in Pod Downward API
204
+
- Node labels outside standard topology labels are disallowed
205
+
- Topology labels are empty if underlying Node does not specify a topology
204
206
205
207
Integration tests will also ensure behavior is exercised when the feature gate is enabled.
206
208
207
209
##### e2e tests
208
210
209
211
E2E tests will be added to test the following scenarios:
210
-
* Pods contain topology labels when `PodTopologyLabels` admission plugin is enabled.
211
-
* A Pod using downward API to retrieve the underlying topology information about the Node
212
-
* A Pod attempting to use downward API to retrieve Node labels that are not the standard topology labels
213
-
* A Pod using downward API on a Node that does not contain any topology information.
214
-
* Use MutatingAdmissionPolicy to add a custom node label that can be used via downward API.
212
+
213
+
- Pods contain topology labels when `PodTopologyLabels` admission plugin is enabled.
214
+
- A Pod using downward API to retrieve the underlying topology information about the Node
215
+
- A Pod attempting to use downward API to retrieve Node labels that are not the standard topology labels
216
+
- A Pod using downward API on a Node that does not contain any topology information.
217
+
- Use MutatingAdmissionPolicy to add a custom node label that can be used via downward API.
215
218
216
219
E2E tests will also ensure behavior is exercised when the feature gate is enabled.
217
220
@@ -225,7 +228,8 @@ E2E tests will also ensure behavior is exercised when the feature gate is enable
225
228
226
229
#### Beta
227
230
228
-
TODO after Alpha.
231
+
- Fix standard topology label used in PodTopologyLabels admission controller (topology.k8s.io -> topology.kubernetes.io)
232
+
- Unit, integration and e2e tests
229
233
230
234
#### GA
231
235
@@ -290,7 +294,9 @@ Tests will be added to ensure feature gate works as expected.
290
294
291
295
### Rollout, Upgrade and Rollback Planning
292
296
293
-
TODO for Beta.
297
+
Manual testing will be exercised to ensure that PodTopologyLabelsAdmission can be enabled and then disabled.
298
+
When disabled, existing Pods with topology labels will continue to run with those labels and new Pods will no longer
299
+
container topology labels.
294
300
295
301
###### How can a rollout or rollback fail? Can it impact already running workloads?
296
302
@@ -333,7 +339,6 @@ N/A
333
339
334
340
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
0 commit comments