Skip to content

retain valid certs on fetch failures #1567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

deveshdama
Copy link

Retain existing valid certificates when new fetch attempts fail, improving service reliability during CA outages. Implements backoff scheduling that respects certificate expiry times.
This PR addresses istio issue#56452

@deveshdama deveshdama requested a review from a team as a code owner June 4, 2025 05:22
@istio-policy-bot
Copy link

😊 Welcome @deveshdama! This is either your first contribution to the Istio ztunnel repo, or it's been
a while since you've been here.

You can learn more about the Istio working groups, Code of Conduct, and contribution guidelines
by referring to Contributing to Istio.

Thanks for contributing!

Courtesy of your friendly welcome wagon.

@istio-testing istio-testing added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. needs-ok-to-test labels Jun 4, 2025
@istio-testing
Copy link
Contributor

Hi @deveshdama. Thanks for your PR.

I'm waiting for a istio member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@keithmattix
Copy link
Contributor

/ok-to-test

@istio-testing istio-testing added ok-to-test Set this label allow normal testing to take place for a PR not submitted by an Istio org member. and removed needs-ok-to-test labels Jun 4, 2025
- fetches now records failed attempts as well.
- validate that valid certificate are retained across fetch attempts despite ca failures
@istio-testing istio-testing added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 4, 2025
@deveshdama deveshdama requested a review from howardjohn June 9, 2025 17:10
@deveshdama
Copy link
Author

@howardjohn can you please take a look.

},
// we don't have a valid existing certificate
None => {
tracing::debug!(%id, "certificate fetch failed ({err}) and no valid existing certificate, retrying in {retry_delay:?}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could make this a warn rather than a debug. Do we continue to retry indefinitely?

tracing::debug!(%id, "certificate fetch failed ({err}), retrying in {retry:?}");
let refresh_at = Instant::now() + retry;
(CertState::Unavailable(err), refresh_at)
tracing::debug!(%id, "certificate fetch failed ({err}), retrying in {retry_delay:?}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this log within Some((valid_cert, cert_expiry_instant))? And clarify we are using existing valid certificate

@jaellio
Copy link
Contributor

jaellio commented Jun 23, 2025

@keithmattix or @Stevenjin8 could you PTAL?

@@ -362,6 +362,10 @@ impl Worker {
// Note that we are using a backoff-per-unique-identity-request. This is to prevent issues
// when a cert cannot be fetched for Pod A, but that should not stall retries for
// pods B, C, and D.

// Check if we should retain the existing valid certificate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this above the big comment block?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok-to-test Set this label allow normal testing to take place for a PR not submitted by an Istio org member. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants