-
Notifications
You must be signed in to change notification settings - Fork 367
Description
What happened:
Volume is leaked when Creating PVC from VolumeSnapshot times out and the user starts to delete PVC and delete VolumeSnapshot.
Create volume from snapshot times out, user tries to delete PVC, and delete VolumeSnapshot. CSI driver returns non-final error code. However, provisioner never tries to delete the volume when it is finally created.
In the external-provisioner logs, I see “I1125 17:27:23.655056 1 controller.go:1082] Temporary error received, adding PVC 2844879f-0f2f-4d6d-bf0d-e02ae676b8e7 to claims in progress”. So after the CSI driver returns a non-final error, the PVC is added to “Claims in Progress” so that it will be retried.
However it failed because the VolumeSnapshot DataSource is being deleted: “E1125 17:27:23.666986 1 controller.go:957] error syncing claim "2844879f-0f2f-4d6d-bf0d-e02ae676b8e7": failed to provision volume with StorageClass "wcpglobal-storage-profile": error getting handle for DataSource Type VolumeSnapshot by Name test-snapshot: snapshot test-snapshot is currently being deleted”.
As a result, CSI driver never gets a call to retry. Volume is finally getting created but it is leaked because there isn't a PVC/PV that points to it.
Although VolumeSnapshot has a deletionTimestamp, it has a finalizer that prevents it from being deleted when it is used as a source to create PVC. The finalizer is eventually removed when the volume is created. It takes a while but VolumeSnapshot will be deleted eventually.
Note: If I only delete the pending PVC, not deleting the VolumeSnapshot, the volume will be deleted after it is being created successfully. However, if creating PVC from VolumeSnapshot takes a long time, it is typical for a backup application to delete PVC and then also delete VolumeSnapshot.
What you expected to happen:
Volume and VolumeSnapshot should be deleted.
The external-provisioner should continue to retry provisioning the PVC by calling the CSI driver even if VolumeSnapshot is being deleted. CSI driver should be idempotent and it will handle retries.
How to reproduce it:
Anything else we need to know?:
Environment:
- Driver version:
- Kubernetes version (use
kubectl version): - OS (e.g. from /etc/os-release):
- Kernel (e.g.
uname -a): - Install tools:
- Others: