retain valid certs on fetch failures #1567

deveshdama · 2025-06-04T05:22:15Z

Retain existing valid certificates when new fetch attempts fail, improving service reliability during CA outages. Implements backoff scheduling that respects certificate expiry times.
This PR addresses istio issue#56452

istio-policy-bot · 2025-06-04T05:22:20Z

😊 Welcome @deveshdama! This is either your first contribution to the Istio ztunnel repo, or it's been
a while since you've been here.

You can learn more about the Istio working groups, Code of Conduct, and contribution guidelines
by referring to Contributing to Istio.

Thanks for contributing!

Courtesy of your friendly welcome wagon.

istio-testing · 2025-06-04T05:22:27Z

Hi @deveshdama. Thanks for your PR.

I'm waiting for a istio member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

keithmattix · 2025-06-04T16:28:54Z

/ok-to-test

src/identity/manager.rs

- fetches now records failed attempts as well. - validate that valid certificate are retained across fetch attempts despite ca failures

src/identity/caclient.rs

deveshdama · 2025-06-11T20:43:58Z

@howardjohn can you please take a look.

jaellio · 2025-06-23T17:51:05Z

src/identity/manager.rs

+                                },
+                                // we don't have a valid existing certificate
+                                None => {
+                                    tracing::debug!(%id, "certificate fetch failed ({err}) and no valid existing certificate, retrying in {retry_delay:?}");


I think we could make this a warn rather than a debug. Do we continue to retry indefinitely?

jaellio · 2025-06-23T17:53:54Z

src/identity/manager.rs

-                            tracing::debug!(%id, "certificate fetch failed ({err}), retrying in {retry:?}");
-                            let refresh_at = Instant::now() + retry;
-                            (CertState::Unavailable(err), refresh_at)
+                            tracing::debug!(%id, "certificate fetch failed ({err}), retrying in {retry_delay:?}");


Could we move this log within Some((valid_cert, cert_expiry_instant))? And clarify we are using existing valid certificate

jaellio · 2025-06-23T17:56:45Z

@keithmattix or @Stevenjin8 could you PTAL?

Stevenjin8 · 2025-07-03T19:29:04Z

src/identity/manager.rs

@@ -362,6 +362,10 @@ impl Worker {
                            // Note that we are using a backoff-per-unique-identity-request. This is to prevent issues
                            // when a cert cannot be fetched for Pod A, but that should not stall retries for
                            // pods B, C, and D.
+
+                            // Check if we should retain the existing valid certificate


can we move this above the big comment block?

retain valid certs on fetch failures

36e06cf

deveshdama requested a review from a team as a code owner June 4, 2025 05:22

istio-testing added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. needs-ok-to-test labels Jun 4, 2025

istio-testing added ok-to-test Set this label allow normal testing to take place for a PR not submitted by an Istio org member. and removed needs-ok-to-test labels Jun 4, 2025

howardjohn reviewed Jun 4, 2025

View reviewed changes

src/identity/manager.rs Outdated Show resolved Hide resolved

better unit tests.

1dd9e33

- fetches now records failed attempts as well. - validate that valid certificate are retained across fetch attempts despite ca failures

istio-testing added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 4, 2025

deveshdama commented Jun 4, 2025

View reviewed changes

src/identity/caclient.rs Show resolved Hide resolved

deveshdama requested a review from howardjohn June 9, 2025 17:10

jaellio reviewed Jun 23, 2025

View reviewed changes

Stevenjin8 requested changes Jul 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

retain valid certs on fetch failures #1567

retain valid certs on fetch failures #1567

deveshdama commented Jun 4, 2025

Uh oh!

istio-policy-bot commented Jun 4, 2025

Uh oh!

istio-testing commented Jun 4, 2025

Uh oh!

keithmattix commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

deveshdama commented Jun 11, 2025

Uh oh!

jaellio Jun 23, 2025

Uh oh!

jaellio Jun 23, 2025

Uh oh!

jaellio commented Jun 23, 2025

Uh oh!

Stevenjin8 Jul 3, 2025

Uh oh!

Uh oh!

retain valid certs on fetch failures #1567

Are you sure you want to change the base?

retain valid certs on fetch failures #1567

Conversation

deveshdama commented Jun 4, 2025

Uh oh!

istio-policy-bot commented Jun 4, 2025

Uh oh!

istio-testing commented Jun 4, 2025

Uh oh!

keithmattix commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

deveshdama commented Jun 11, 2025

Uh oh!

jaellio Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

jaellio Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

jaellio commented Jun 23, 2025

Uh oh!

Stevenjin8 Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!