You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Pod Construction Changes](#pod-construction-changes)
@@ -186,21 +190,45 @@ type VolumeNodeResources struct {
186
190
187
191
#### CSIDriver
188
192
189
-
A new field, `NodeAllocatableUpdatePeriodSeconds`, will be added to the `CSIDriverSpec` struct. This field allows a CSI driver to specify the interval at which the Kubelet should periodically query a driver's `NodeGetInfo` RPC endpoint to update the `CSINode` object. If this field is not set, updates will only occur in response to volume attachment failures as a result of no capacity.
193
+
A new field, `NodeAllocatableUpdatePeriodSeconds`, will be added to the `CSIDriverSpec` struct. This field allows a CSI driver to specify the interval at which the Kubelet should periodically query a driver's `NodeGetInfo` RPC endpoint to update the `CSINode` object. If this field is not set, no updates occur (neither periodic nor upon detecting capacity-related failures), and the allocatable count remains static.
190
194
191
195
```golang
192
196
// CSIDriverSpec is the specification of a CSIDriver.
193
197
typeCSIDriverSpecstruct {
194
198
...
195
-
// NodeAllocatableUpdatePeriodSeconds specifies the interval between periodic updates of
196
-
// the CSINode allocatable capacity for this driver. If not set, periodic updates
197
-
// are disabled, and updates occur only upon detecting capacity-related failures.
198
-
// The minimum allowed value for this field is 10 seconds.
199
-
// +optional
199
+
// nodeAllocatableUpdatePeriodSeconds specifies the interval between periodic updates of
200
+
// the CSINode allocatable capacity for this driver. When set, both periodic updates and
201
+
// updates triggered by capacity-related failures are enabled. If not set, no updates
202
+
// occur (neither periodic nor upon detecting capacity-related failures), and the
203
+
// allocatable.count remains static. The minimum allowed value for this field is 10 seconds.
204
+
//
205
+
//
206
+
// This field is mutable.
207
+
//
208
+
// +featureGate=MutableCSINodeAllocatableCount
209
+
// +optional
200
210
NodeAllocatableUpdatePeriodSeconds *int64
201
211
}
202
212
```
203
213
214
+
#### VolumeError
215
+
216
+
A new field, `ErrorCode`, will be added to the `VolumeError` struct to facilitate detection of capacity-related errors:
217
+
218
+
```golang
219
+
// Captures an error encountered during a volume operation.
220
+
typeVolumeErrorstruct {
221
+
...
222
+
// errorCode is a numeric gRPC code representing the error encountered during Attach or Detach operations.
223
+
//
224
+
// This is an optional field that requires the MutableCSINodeAllocatableCount feature gate being enabled to be set.
225
+
//
226
+
// +featureGate=MutableCSINodeAllocatableCount
227
+
// +optional
228
+
ErrorCode *int32
229
+
}
230
+
```
231
+
204
232
#### Validation Changes
205
233
206
234
The [ValidateCSINodeUpdate](https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/storage/validation/validation.go#L304) function in the API validation code path will be modified to allow updates to the `Allocatable.Count`
@@ -226,20 +254,53 @@ func ValidateCSINodeUpdate(new, old *storage.CSINode) field.ErrorList {
226
254
227
255
This updated logic allows the `Allocatable.Count` field to be modified when the feature gate is enabled, while ensuring all other fields remain immutable. When the feature gate is disabled, it falls back to the existing validation logic for backward compatibility.
228
256
229
-
#### Volume Plugin Manager
257
+
#### CSI Node Updater
258
+
259
+
A new plugin-level updated will be implemented in `kubernetes/pkg/volume/csi/csi_node_updater.go` to manage periodic updates of CSINode allocatable counts. This updater watches for changes to CSIDriver objects and manages per-driver update goroutines based on the `NodeAllocatableUpdatePeriodSeconds` setting.
230
260
231
-
A new goroutine will be started in VolumePluginMgr’s [Run()](https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/plugins.go#L953) func if the `NodeAllocatableUpdatePeriodSeconds` is set to a nonzero value. This goroutine will periodically trigger updates to the `CSINode` object based on the specified interval:
go wait.Until(pm.updateCSINodeInfo, pm.csiNodeUpdateInterval, stopCh)
264
+
// csiNodeUpdater watches for changes to CSIDriver objects and manages the lifecycle
265
+
// of per-driver goroutines that periodically update CSINodeDriver.Allocatable information
266
+
type csiNodeUpdater struct {
267
+
// Informer for CSIDriver objects
268
+
driverInformer cache.SharedIndexInformer
269
+
270
+
// Map of driver names to stop channels for update goroutines
271
+
driverUpdaters sync.Map
272
+
273
+
// Ensures the updater is only started once
274
+
once sync.Once
275
+
}
276
+
```
277
+
#### Update behavior
278
+
279
+
When a `CSIDriver` object is added or updated with `NodeAllocatableUpdatePeriodSeconds` set, the updater checks if the driver is installed on the node before running periodic updates.
280
+
281
+
When `NodeAllocatableUpdatePeriodSeconds` is modified, the updater automatically adjusts by stopping the old goroutine and starting a new one. Setting the period to 0 or nil stops updates entirely. Driver uninstallation or `CSIDriver` object deletion also stops the update goroutine for that specific driver.
282
+
283
+
```golang
284
+
func (u *csiNodeUpdater) runPeriodicUpdate(driverName string, period time.Duration, stopCh <-chan struct{}) {
285
+
ticker:= time.NewTicker(period)
286
+
defer ticker.Stop()
287
+
288
+
for {
289
+
select {
290
+
case<-ticker.C:
291
+
iferr:=updateCSIDriver(driverName); err != nil {
292
+
klog.ErrorS(err, "Failed to update CSIDriver", "driver", driverName)
293
+
}
294
+
case<-stopCh:
295
+
return
296
+
}
237
297
}
238
298
}
239
299
```
240
300
241
-
In case of a failure during the `updateCSINodeInfo` call, the `Allocatable.Count` will retain its current value and `updateCSINodeInfo` will be retried.
301
+
#### Error handling
242
302
303
+
If `updateCSIDriver()` fails, the error is logged but the allocatable count retains its current value. Updates continue at the configured interval regardless of individual failures.
243
304
244
305
#### NodeInfoManager Interface Extension
245
306
@@ -262,7 +323,7 @@ This table explains how updates to the `CSINode.Spec.Drivers[*].Allocatable.Coun
262
323
|**Feature Flag Status**|**`NodeAllocatableUpdatePeriodSeconds`**|**Behavior**|
| Enabled | Set | Periodic updates occur at the defined interval + when invalid state is detected (volume attachment failures due to `ResourceExhausted`)|
265
-
| Enabled | Not set |Updates occur only in response to volume attachment failures (`ResourceExhausted` errors)|
326
+
| Enabled | Not set |No updates occur; `Allocatable.Count` remains static|
266
327
| Disabled | Set |`NodeAllocatableUpdatePeriodSeconds` is ignored; `Allocatable.Count` remains static and immutable |
267
328
| Disabled | Not set | No updates occur; `Allocatable.Count` remains static and immutable |
268
329
@@ -271,7 +332,7 @@ This table explains how updates to the `CSINode.Spec.Drivers[*].Allocatable.Coun
271
332
272
333
To address race conditions where the scheduler assigns stateful pods to nodes with insufficient capacity, Kubelet's pod construction process during [WaitForAttachAndMount](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/volumemanager/volume_manager.go#L393) will now handle `ResourceExhausted` errors returned by CSI drivers during the `ControllerPublishVolume` RPC.
273
334
274
-
The `ResourceExhausted` error is directly reported on the `VolumeAttachment` object associated with the relevant attachment. To facilitate easier detection of `ResourceExhausted` errors from `VolumeAttachment` statuses, we propose adding a `StatusCode` field to the [VolumeError](https://github.com/kubernetes/api/blob/master/storage/v1/types.go#L219) struct.
335
+
The `ResourceExhausted` error is directly reported on the `VolumeAttachment` object associated with the relevant attachment. To facilitate easier detection of `ResourceExhausted` errors from `VolumeAttachment` statuses, we propose adding a `ErrorCode` field to the [VolumeError](https://github.com/kubernetes/api/blob/master/storage/v1/types.go#L219) struct.
- Components depending on the feature gate: `kube-apiserver`, `kube-controller-manager`, `kubelet`.
552
+
- Components depending on the feature gate: `kube-apiserver`, `kubelet`.
494
553
495
554
###### Does enabling the feature change any default behavior?
496
555
@@ -556,13 +615,28 @@ rollout. Similarly, consider large clusters and how enablement/disablement
556
615
will rollout across nodes.
557
616
-->
558
617
618
+
The rollout or rollback of this feature is designed such that it cannot fail in a way that impacts cluster operation.
619
+
620
+
During rollout, if the API server / Kubelet doesn't support the feature or if there's a version mismatch, update attempts to CSINode.Allocatable will fail gracefully, maintaining the existing value. This ensures that the worst-case scenario is simply a continuation of the current behavior, rather than a failure state.
621
+
622
+
For rollback, disabling the feature gate will immediately stop any updates to the allocatable property. Kubernetes will continue using the last known value, which may be outdated but won't cause operational issues.
623
+
624
+
In essence, the feature's best-effort nature and feature gate protection make it resilient against rollout or rollback failures. The primary risk is temporary inconsistency in reported capacities during transition periods, but this does not impact running workloads or overall cluster stability.
625
+
559
626
###### What specific metrics should inform a rollback?
560
627
561
628
<!--
562
629
What signals should users be paying attention to when the feature is young
563
630
that might indicate a serious problem?
564
631
-->
565
632
633
+
Since this feature implements best-effort updates to CSINode.Allocatable, the only metrics that would necessitate a rollback are:
634
+
635
+
- Unexpected kubelet crashes after enabling the feature.
636
+
- API server crashes related to CSINode updates.
637
+
638
+
In both cases, component crashes would be evident through standard monitoring of node and control plane health. Outside of these scenarios, there are no specific metrics that would require rolling back this feature, as failed updates simply maintain existing values.
639
+
566
640
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
567
641
568
642
<!--
@@ -571,12 +645,22 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
571
645
are missing a bunch of machinery and tooling and can't do that now.
572
646
-->
573
647
648
+
Yes, the following test scenarios were validated in the Alpha release:
649
+
650
+
- Upgrade path: API server and Kubelet upgrades were tested with the feature gate enabled, confirming that CSINode updates begin working once both components support the feature.
651
+
652
+
- Downgrade path: When the feature gate is disabled or components are downgraded, confirmed that CSINode.Allocatable remains at its last value and becomes immutable again.
653
+
654
+
- upgrade->downgrade->upgrade path: Verified that the full cycle works as expected, with CSINode updates resuming when the feature is re-enabled without requiring additional configuration.
655
+
574
656
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
575
657
576
658
<!--
577
659
Even if applying deprecation policies, they may still surprise some users.
578
660
-->
579
661
662
+
No.
663
+
580
664
### Monitoring Requirements
581
665
582
666
<!--
@@ -594,6 +678,8 @@ checking if there are objects with field X set) may be a last resort. Avoid
594
678
logs or events for this purpose.
595
679
-->
596
680
681
+
An operator can determine if this feature is in use by checking the CSIDriver objects in their cluster for the `nodeAllocatableUpdatePeriodSeconds` field. If this field is set on a CSI driver, the feature is being used. This is similar to how operators check for other CSI capabilities through fields in the CSIDriver object, such as `fsGroupPolicy` or `podInfoOnMount`.
682
+
597
683
###### How can someone using this feature know that it is working for their instance?
598
684
599
685
<!--
@@ -605,13 +691,9 @@ and operation of this feature.
605
691
Recall that end users cannot usually observe component logs or access metrics.
606
692
-->
607
693
608
-
-[ ] Events
609
-
- Event Reason:
610
-
-[ ] API .status
611
-
- Condition name:
612
-
- Other field:
613
-
-[ ] Other (treat as last resort)
614
-
- Details:
694
+
-[X] API .status
695
+
-`VolumeAttachment.Status.Errors[].ErrorCode` will be populated with the gRPC error code when a `ResourceExhausted` error occurs during a driver's `ControllerPublishVolume` RPC.
696
+
-`CSINode.Spec.Drivers[*].Allocatable.Count` will be updated periodically based on the `nodeAllocatableUpdatePeriodSeconds` configuration in the CSIDriver object.
615
697
616
698
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
617
699
@@ -630,18 +712,19 @@ These goals will help you determine what you need to measure (SLIs) in the next
630
712
question.
631
713
-->
632
714
715
+
For this enhancement, the following SLOs are reasonable:
716
+
717
+
- 99.9% of CSINode updates (both periodic and reactive) should complete within 1 second of being triggered.
718
+
- The introduction of this feature should not increase the overall API server error rate (5xx errors) by more than 0.1%.
719
+
- No measurable impact on pod startup latency, as CSINode updates are performed asynchronously.
720
+
633
721
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
634
722
635
723
<!--
636
724
Pick one more of these and delete the rest.
637
725
-->
638
726
639
-
-[ ] Metrics
640
-
- Metric name:
641
-
-[Optional] Aggregation method:
642
-
- Components exposing the metric:
643
-
-[ ] Other (treat as last resort)
644
-
- Details:
727
+
Not applicable. The feature operates in a best-effort manner - either `CSINode.Spec.Drivers[*].Allocatable` gets updated or maintains its existing value. Standard API server and kubelet health metrics are sufficient to monitor the overall cluster health.
645
728
646
729
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
647
730
@@ -650,6 +733,12 @@ Describe the metrics themselves and the reasons why they weren't added (e.g., co
650
733
implementation difficulties, etc.).
651
734
-->
652
735
736
+
While the following metrics could provide additional visibility into the feature's operation, they weren't added because API server health metrics already indirectly measure the success of CSINode updates - if the API server is healthy, we expect updates to succeed:
737
+
738
+
`csi_node_updates_total`: Could track `CSINode.Spec.Drivers[*].Allocatable` updates attempted (periodic/reactive).
739
+
`csi_node_update_errors_total`: Could track failed update attempts.
740
+
`csi_node_update_duration_seconds`: Could track update latency.
741
+
653
742
### Dependencies
654
743
655
744
<!--
@@ -673,6 +762,10 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
673
762
- Impact of its degraded performance or high-error rates on the feature:
674
763
-->
675
764
765
+
This feature primarily depends on CSI drivers implementing the `NodeGetInfo` RPC to report volume attachment limits. If a CSI driver is unavailable, the `CSINode.Spec.Drivers[*].Allocatable` value remains at its last known value. Degraded performance or high error rates in CSI drivers may cause periodic or reactive updates to fail, but this only results in using the last known value, with no impact on existing workloads.
766
+
767
+
Beyond CSI drivers, which are already a requirement for volume operations, this feature introduces no additional service dependencies. It builds upon existing Kubernetes components (kubelet and API server) and their normal operation.
768
+
676
769
### Scalability
677
770
678
771
<!--
@@ -705,7 +798,7 @@ Yes, there will be new API calls to update the `CSINode` object:
705
798
```
706
799
API call type: PATCH
707
800
Estimated throughput: Depends on the `NodeAllocatableUpdatePeriodSeconds` setting and the frequency of volume attachment failures.
708
-
Originating component: Kubelet, KCM
801
+
Originating component: Kubelet
709
802
```
710
803
711
804
###### Will enabling / using this feature result in introducing new API types?
@@ -800,6 +893,8 @@ details). For now, we leave it here.
800
893
801
894
###### How does this feature react if the API server and/or etcd is unavailable?
802
895
896
+
When the API server is unavailable, `CSINode` update attempts fail and are logged, however, the periodic update goroutines will continue running and retry at their configured intervals. Additionally, `ResourceExhausted` errors cannot trigger immediate updates since `VolumeAttachment` statuses cannot be read. Existing allocatable values remain unchanged and stateful workloads continue running normally.
897
+
803
898
###### What are other known failure modes?
804
899
805
900
<!--
@@ -815,8 +910,12 @@ For each of them, fill in the following information by copying the below templat
815
910
- Testing: Are there any tests for failure mode? If not, describe why.
816
911
-->
817
912
913
+
No other known failure modes.
914
+
818
915
###### What steps should be taken if SLOs are not being met to determine the problem?
819
916
917
+
N/A
918
+
820
919
## Implementation History
821
920
822
921
<!--
@@ -830,6 +929,10 @@ Major milestones might include:
830
929
- when the KEP was retired or superseded
831
930
-->
832
931
932
+
- 2024-08-08 - Enhancement proposed in sig-storage.
933
+
- 2024-09-25 - Enhancement officially submitted to Kubernetes.
934
+
- 2025-04-23 - Kubernetes v1.33: Enhancement implemented and released in Alpha.
0 commit comments