Skip to content

Leaking resources when pending PVC is deleted while being created from snapshot that is also being deleted #1449

@xing-yang

Description

@xing-yang

What happened:
Volume is leaked when Creating PVC from VolumeSnapshot times out and the user starts to delete PVC and delete VolumeSnapshot.

Create volume from snapshot times out, user tries to delete PVC, and delete VolumeSnapshot. CSI driver returns non-final error code. However, provisioner never tries to delete the volume when it is finally created.

In the external-provisioner logs, I see “I1125 17:27:23.655056 1 controller.go:1082] Temporary error received, adding PVC 2844879f-0f2f-4d6d-bf0d-e02ae676b8e7 to claims in progress”. So after the CSI driver returns a non-final error, the PVC is added to “Claims in Progress” so that it will be retried.

However it failed because the VolumeSnapshot DataSource is being deleted: “E1125 17:27:23.666986 1 controller.go:957] error syncing claim "2844879f-0f2f-4d6d-bf0d-e02ae676b8e7": failed to provision volume with StorageClass "wcpglobal-storage-profile": error getting handle for DataSource Type VolumeSnapshot by Name test-snapshot: snapshot test-snapshot is currently being deleted”.

As a result, CSI driver never gets a call to retry. Volume is finally getting created but it is leaked because there isn't a PVC/PV that points to it.

Although VolumeSnapshot has a deletionTimestamp, it has a finalizer that prevents it from being deleted when it is used as a source to create PVC. The finalizer is eventually removed when the volume is created. It takes a while but VolumeSnapshot will be deleted eventually.

Note: If I only delete the pending PVC, not deleting the VolumeSnapshot, the volume will be deleted after it is being created successfully. However, if creating PVC from VolumeSnapshot takes a long time, it is typical for a backup application to delete PVC and then also delete VolumeSnapshot.

What you expected to happen:
Volume and VolumeSnapshot should be deleted.

The external-provisioner should continue to retry provisioning the PVC by calling the CSI driver even if VolumeSnapshot is being deleted. CSI driver should be idempotent and it will handle retries.

How to reproduce it:

Anything else we need to know?:

Environment:

  • Driver version:
  • Kubernetes version (use kubectl version):
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions