You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -394,17 +395,25 @@ The pod will be allocated on the 4 connected devices - mla0, mla1, mla4, and mla
394
395
395
396
#### Runaway expressions
396
397
397
-
A malicious or buggy workload can specify CEL expressions that degrade the performance of constraint evaluation and scheduling. We will specify a limit on evaluation cost for the expression. There is already a mechanism to cap this today with CEL selectors that we can reuse.[Reference](https://github.com/kubernetes/kubernetes/blob/6188e5cb7b2f106b047493b7b498c1882723cab4/pkg/apis/resource/types.go#L910-L933)
398
+
A malicious or buggy workload can specify CEL expressions that degrade the performance of constraint evaluation and scheduling. We will specify a limit on evaluation cost for the expression. There is already a mechanism to cap this today with CEL selectors that we can reuse [Reference](https://github.com/kubernetes/kubernetes/blob/6188e5cb7b2f106b047493b7b498c1882723cab4/pkg/apis/resource/types.go#L910-L933). Additionally, the scheduler supports a configurable timeout during filtering that can limit this evaluation [Reference](https://github.com/kubernetes/kubernetes/pull/132033).
398
399
399
400
#### Performance at scale
400
401
401
402
The feature depends on exhaustive search for devices. The worst-case performance impact of calculating all combinations is comparable to that of the pathological test case for existing MatchAttribute constraint. Scalability testing is needed to validate the performance of allocation.
402
403
403
404
Additionally, with the introduction of [workload-aware scheduling](https://github.com/kubernetes/kubernetes/issues/132192), the performance might take a hit as filtering can be executed multiple times.
404
405
405
-
As a partial mitigation for the performance impact, the scheduler plugin supports a configurable timeout that can be applied for a node. [Reference](https://github.com/kubernetes/enhancements/blob/5b1270421f5bc3315fe191c29e0356bc91cbfe6b/keps/sig-node/4381-dra-structured-parameters/README.md?plain=1#L2025-L2032)
406
+
##### Mitigation
406
407
407
-
We will need to revisit this solution before beta release.
408
+
To minimize the performance impact, users are recommended to trim down the solution space using existing filtering mechanisms before utilizing this constraint type, i.e. define it in the spec template after mechanisms such as CEL selectors and MatchAttribute. This will limit how many device combinations need to be evaluated. The feature is not expected to reach the scheduler timeout as the combination space has been reduced by previous filtering.
409
+
410
+
Note: Users are discouraged from specifying a CEL expression which takes into account the order of devices since this will result in permutational complexity instead of combinational complexity. If users specify expensive queries, it should be noted that the evaluation cost limit and scheduler timeout described earlier will limit the exhaustive search.
411
+
412
+
For example, if there are 12 devices to choose from and a user specifies needing 6 of those devices, the maximum number of calculations taken should be:
413
+
414
+
12C6 = 12! / (6!)(6!) = 924
415
+
416
+
If the user specifies an order in which these devices should be evaluated, i.e. set should contain device0 first, then device1, device2, etc., this will result in permutational complexity, and is strongly discouraged. Feature will not work and return an error in this case since it crosses the preset threshold for evaluation cost.
408
417
409
418
## Design Details
410
419
@@ -732,7 +741,7 @@ No
732
741
733
742
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
734
743
735
-
Yes, through feature gates. If the feature gate is disabled in kube-apiserver, kube-scheduler, or kube-controller-manager, the feature will not work. Existing running Pods continue unaffected. New workloads cannot use the feature, and pod creation will fail if attempting to use the feature. Existing pods that need rescheduling will remain in pending state. Pod description will notify users to enable the feature gate. The reason for not allowing existing pods to re-schedule is because users expect optimal placement for these pods, which cannot happen when the feature is disabled
744
+
Yes, through feature gates. If the feature gate is disabled in kube-apiserver, kube-scheduler, or kube-controller-manager, the feature will not work. Existing running Pods continue unaffected. New workloads cannot use the feature, and pod creation will fail if attempting to use the feature. Existing pods that need rescheduling will remain in pending state. Pod description will notify users to enable the feature gate. The reason for not allowing existing pods to re-schedule is because users expect optimal placement for these pods, which cannot happen when the feature is disabled.
736
745
737
746
###### What happens if we reenable the feature if it was previously rolled back?
0 commit comments