Skip to content

Conversation

@domsolutions
Copy link
Contributor

@domsolutions domsolutions commented Oct 24, 2025

Motivation

There was an issue where ServerNotify gRPC call blocks and doesn't allow the Server CR reconcile loop to complete. This blocks other Server CRs from reconciling, thus scheduler will not be notified about them and Servers will remiain in an unready state

Summary of changes

  • changed from gRPC in-built expoential backoff retry (retrying 100 times which is effetively forever) to a constant backoff, reconcile will manage exponential backoff if it eventually fails
  • ctx generally should not be part of state of struct for request - changed to be passed as func arg
  • added few missing places where ctx was not being used in IO calls

Note:

A further impovement, to mitigate blocking is to allow for multiple reconcile workers, separate story has been created for this.

Checklist

  • Added/updated unit tests
  • Added/updated documentation
  • Checked for typos in variable names, comments, etc.
  • Added licences for new files

Testing

@domsolutions domsolutions requested a review from lc525 as a code owner October 24, 2025 14:33
@domsolutions domsolutions changed the title fix: blocked on ServerNotify when expoentially backing off retrying fix(operator): Blocking gRPC calls Oct 24, 2025
Copy link
Member

@lc525 lc525 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, with some minor comments/questions. A much better end-state than where we started.

@domsolutions domsolutions merged commit b09e513 into v2 Oct 30, 2025
5 checks passed
@domsolutions domsolutions deleted the CSM-1069/fix-controller-scheduler-block branch October 30, 2025 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants