-
Notifications
You must be signed in to change notification settings - Fork 367
Allow provisioning to proceed to prevent leaking resources #1448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: xing-yang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/assign @jsafrane |
|
I can see it makes sense to continue provisioning when it started with a valid snapshot. But does it make sense to start it with a snapshot that is being deleted? Do we have a good way how to distinguish those two? For volume cloning, the provisioner adds a finalizer to PVC that is being cloned. Should it add a similar finalizer to VolumeSnapshot that is being restored? We could then easily see if provisioning started before a VolumeSnapshot got DeletionTimestamp:
|
|
That's a good point. We do have a finalizer added to VolumeSnapshot being used as a source to create PVC. I can add a check for that to differentiate an already started provisioning from a new provisioning case. |
Check for finalizers on VolumeSnapshot to differentiate between provisioning that has already started vs new provisioning attempts. When a VolumeSnapshot has DeletionTimestamp set: - If it has finalizers: provisioning was started before deletion, continue to prevent resource leaks. The external-snapshotter adds finalizers when a snapshot is used as a data source. - If it has no finalizers: this is a new provisioning attempt, reject with an error. This ensures the CSI driver can complete in-flight provisioning operations and properly clean up resources, while preventing new provisioning from snapshots that are being deleted.
| snapshotNotBound = "snapshot %s not bound" | ||
|
|
||
| pvcCloneFinalizer = "provisioner.storage.kubernetes.io/cloning-protection" | ||
| pvcCloneFinalizer = "provisioner.storage.kubernetes.io/cloning-protection" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we discuss the approach between finalizer and in memory cache or persistent storage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
External-provisioner already keeps retrying PVC provisioning. See logs below.
I1125 17:27:23.655056 1 controller.go:1082] Temporary error received, adding PVC 2844879f-0f2f-4d6d-bf0d-e02ae676b8e7 to claims in progress”.
Here the CSI driver returns a non-final error, the PVC is added to “Claims in Progress” to be retried even if PVC itself is deleted.
This check for finalizer is just to differentiate between a new provisioning request vs a provisioning retry. If it is a new provisioning request (the finalizer is not added yet), the external provisioner will fail the request and won't call the CSI driver. If it is a retry (finalizer is already added), the external provisioner will call CSI driver to retry. This allows the CSI driver to complete the volume provisioning and return success. After that, the volume will be deleted if the deletionTimestamp was added to PVC earlier.
|
/lgtm |
What type of PR is this?
/kind bug
What this PR does / why we need it:
The external-provisioner should continue to retry provisioning the PVC by calling the CSI driver even if VolumeSnapshot is being deleted. CSI driver should be idempotent and it will handle retries.
Which issue(s) this PR fixes:
Fixes #1449
Special notes for your reviewer:
Does this PR introduce a user-facing change?: