Skip to content

Conversation

@paulgmiller
Copy link
Member

@paulgmiller paulgmiller commented Nov 25, 2025

Reason for Change:
CNS should not signal its healthz till it can write out a conflist. Going ready

  1. makes it it harder for both engineers and systems to know CNS isn't getting to a good state.
  2. makes it easier for a bad rollout to takea bug to all nodes.

Refactors

  1. This moves alot of ncsync logic out of internalapi.go to synchostnc.go
  2. It adds a new wait group like object (networkContainerSyncState) to httprestservice
  3. Changes direct calls/loops in main to SyncHostNCVersion to use StartSyncHostNCVersionLoop for consistencey and better mangemetn of starting wait
  4. has nodesubent also startwat/signal readiness and not write the conflist direcl
  5. moves conf list writing to a wait function that also makes sure waits on nc (nodesubbnet and syncHostNCVersion) have signaled.

Tries to hide as much as possible to make movign to subpackage easier in future but pretty tangled with HttpRestServer still.

Notes:

Copilot finished reviewing on behalf of paulgmiller November 25, 2025 01:20
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a synchronization mechanism to prevent CNS from marking itself as ready before the CNI conflist has been successfully generated. It refactors the SyncHostNCVersion loop code by extracting it into a new file and adds a WaitForNCSynced function that blocks until the conflist is written out.

Key changes:

  • Added WaitForNCSynced call in main.go before marking the service as ready
  • Refactored duplicate SyncHostNCVersion loop code into a new StartSyncHostNCVersionLoop function
  • Moved sync-related functions to a new file synchostnc.go for better code organization

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
cns/service/main.go Adds WaitForNCSynced call before marking service ready; refactors duplicate SyncHostNCVersion loop initialization to use StartSyncHostNCVersionLoop
cns/restserver/synchostnc.go New file containing refactored sync functionality including StartSyncHostNCVersionLoop, SyncHostNCVersion, syncHostNCVersion, MustGenerateCNIConflistOnce, and WaitForNCSynced
cns/restserver/restserver.go Adds ncSynced channel field; removes MustGenerateCNIConflistOnce (moved to synchostnc.go)
cns/restserver/internalapi.go Removes SyncHostNCVersion and syncHostNCVersion functions (moved to synchostnc.go); removes unused imports
cns/healthserver/healthz.go Minor formatting improvement (adds blank line)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

Comment on lines 189 to 190
service.generateCNIConflistOnce.Do(func() {
close(service.ncSynced)
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ncSynced channel is closed before the conflist generation completes. If Generate() or Close() fail and panic, the channel will already be closed, potentially allowing WaitForNCSynced to return success even though the conflist generation failed. Consider moving close(service.ncSynced) to after both Generate() and Close() succeed, or handle it in a defer that checks if the operations succeeded.

Copilot uses AI. Check for mistakes.

func (service *HTTPRestService) StartSyncHostNCVersionLoop(ctx context.Context, cnsconfig configuration.CNSConfig) {
//do we need a sync.once to protect this? should we error if this is called twice?
service.ncSynced = make(chan struct{})
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ncSynced channel is unconditionally recreated without protection. If StartSyncHostNCVersionLoop is called multiple times (which could happen in error scenarios or test code), it will create a new channel and orphan any goroutines waiting on the previous channel. Consider using sync.Once to ensure the channel is only initialized once, or return an error if it's already initialized, as suggested by the comment on line 21.

Copilot uses AI. Check for mistakes.
// saved NC successfully. UpdateIPsForNodeSubnet is called only when IPs are fetched from NMAgent.
// We now have IPs to serve IPAM requests. Generate conflist to indicate CNS is ready
service.MustGenerateCNIConflistOnce()
service.ncSynced = make(chan struct{}) // in case this is called multiple times
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can revert though maybe a commetn here is good.

return nil
}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert or push ready channel into here.

@paulgmiller paulgmiller changed the title CNS should not mark itself as ready if it can't write out a conflist. fix: CNS should not mark itself as ready if it can't write out a conflist. Nov 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant