Skip to content

feat(zrpc): migrate kube resolver from Endpoints to EndpointSlice API #4987

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

soasurs
Copy link
Contributor

@soasurs soasurs commented Jul 8, 2025

Motivation

Kubernetes is gradually deprecating the legacy core/v1 Endpoints API in favor of the newer discovery/v1 EndpointSlice API.
Compared with Endpoints, EndpointSlice objects:

  • Scale horizontally—each slice covers a subset of endpoints, avoiding the single-object size limit.
  • Reduce API-server churn—only slices that change are updated.
  • Provide topology and readiness metadata that can be leveraged for smarter routing.

To keep go-zero up-to-date with upstream best practices and to prevent future breakage as clusters disable the old API, we need to adopt EndpointSlice.


What this PR does

  1. Event Path Migration

    • Replaces all core/v1 Endpoints types with discovery/v1 EndpointSlice in the kube resolver.
    • Adjusts event-handler logic to iterate over EndpointSlice.Endpoints[].Addresses[].
  2. Informer Updates

    • Swaps the informer factory from Core().V1().Endpoints() to Discovery().V1().EndpointSlices().
    • Uses the recommended label selector kubernetes.io/service-name=<svc>.
  3. Initial Sync & Port Resolution

    • Lists existing EndpointSlice objects for the target service at startup.
    • Resolves svc.Port from the first available slice port when the user omits it.
  4. Constants & Clean-ups

    • Renames nameSelectorserviceSelector for clarity.
    • Adds explicit error reporting when no matching slices are found.

Impact

  • Transparent to callers – existing zrpc resolver API remains unchanged.
  • Better scalability & performance – large services no longer hit the 1000-endpoint hard limit.
  • Future-proof – compatible with clusters that have disabled the legacy Endpoints API.
  • RBAC change required – controllers using this resolver must be granted get, list, watch on EndpointSlices (discovery.k8s.io group).

No other functional regressions are expected. All unit tests pass against Kubernetes ≥1.21 (the minimum version that GA-supports EndpointSlice).

Copy link

codecov bot commented Jul 8, 2025

Codecov Report

Attention: Patch coverage is 57.89474% with 16 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
zrpc/resolver/internal/kubebuilder.go 0.00% 16 Missing ⚠️

📢 Thoughts on this report? Let us know!

@kevwan kevwan force-pushed the feat/zrpc-kube-resolver-discovery-v1 branch from dc48d57 to ca7e131 Compare July 12, 2025 16:02
@kevwan
Copy link
Contributor

kevwan commented Jul 12, 2025

Thanks for your contribution!

This PR will be merged in v1.9.0

@kevwan kevwan requested review from kevwan and Copilot July 12, 2025 16:04
@kevwan kevwan self-assigned this Jul 12, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR migrates the zrpc Kubernetes resolver from using the legacy core/v1 Endpoints API to the newer discovery/v1 EndpointSlice API, updating event handling, informer setup, initial synchronization, and port resolution.

  • Swap informer and event-handler logic to work with EndpointSlice and its Endpoints[].Addresses[].
  • Implement initial listing of EndpointSlice objects and derive service port from the first available slice.
  • Rename selector constant and add explicit error when no slices are found.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
zrpc/resolver/internal/kubebuilder.go Replace Endpoints informer with EndpointSlices, update port resolution and initial sync
zrpc/resolver/internal/kube/eventhandler.go Update event handler and update logic to handle discovery/v1.EndpointSlice
zrpc/resolver/internal/kube/eventhandler_test.go Update tests to use EndpointSlice types and verify address-change handling
Comments suppressed due to low confidence (1)

zrpc/resolver/internal/kubebuilder.go:62

  • The logic that lists EndpointSlices and assigns svc.Port from the first slice isn't covered by existing tests. Consider adding unit tests for port resolution and the error path when no slices or ports are available.
	if svc.Port == 0 {

svc.Port = int(endpoints.Subsets[0].Ports[0].Port)
// Since this resolver is used for in-cluster service discovery,
// there should be at least one port available.
svc.Port = int(*endpointSlices.Items[0].Ports[0].Port)
Copy link
Preview

Copilot AI Jul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code assumes that each EndpointSlice has a non-empty Ports slice and will panic if Ports is empty. Add a check for len(endpointSlices.Items[0].Ports) > 0 (and a nil-Port pointer) before dereferencing, and return a clear error if missing.

Copilot uses AI. Check for mistakes.

Comment on lines +114 to +116
for _, endpointSlice := range endpointSlices.Items {
handler.Update(&endpointSlice)
}
Copy link
Preview

Copilot AI Jul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling Update for each EndpointSlice resets the resolver state per slice, so after processing multiple slices only the last slice’s endpoints remain. Instead, merge addresses across all slices into one update or call OnAdd per slice to accumulate state.

Suggested change
for _, endpointSlice := range endpointSlices.Items {
handler.Update(&endpointSlice)
}
var allEndpoints []string
for _, endpointSlice := range endpointSlices.Items {
for _, endpoint := range endpointSlice.Endpoints {
for _, address := range endpoint.Addresses {
allEndpoints = append(allEndpoints, address)
}
}
}
allEndpoints = subset(allEndpoints, subsetSize)
addrs := make([]resolver.Address, 0, len(allEndpoints))
for _, val := range allEndpoints {
addrs = append(addrs, resolver.Address{
Addr: fmt.Sprintf("%s:%d", val, svc.Port),
})
}
if err := cc.UpdateState(resolver.State{
Addresses: addrs,
}); err != nil {
logx.Error(err)
}

Copilot uses AI. Check for mistakes.

@kevwan kevwan added this to the v1.9.0 milestone Jul 13, 2025
@kevwan kevwan force-pushed the feat/zrpc-kube-resolver-discovery-v1 branch from ca7e131 to 45868b0 Compare July 16, 2025 16:02
kevwan
kevwan previously approved these changes Jul 18, 2025
soasurs added 3 commits July 18, 2025 22:45
* Replace core/v1 Endpoints with discovery/v1 EndpointSlice in event handler
* Update handler logic to iterate over EndpointSlice.Endpoints[].Addresses[]
* Switch informer to watch EndpointSlices using label selector "kubernetes.io/service-name"
* Retrieve service port from the first EndpointSlice port when not specified
* Rename constant `nameSelector` → `serviceSelector` for clarity
* Improve error handling for missing EndpointSlices

Signed-off-by: soasurs <soasurs@gmail.com>
Signed-off-by: soasurs <soasurs@gmail.com>
…v1 EndpointSlice

* Replace legacy core/v1 Endpoints fixtures with discovery/v1 EndpointSlice
* Adapt test data: switch Subsets/Addresses to Endpoints[].Addresses[]
* Preserve existing test scenarios for add, delete, and update event flows

Signed-off-by: soasurs <soasurs@gmail.com>
@kevwan kevwan force-pushed the feat/zrpc-kube-resolver-discovery-v1 branch from 45868b0 to d69d042 Compare July 18, 2025 14:45
@kevwan kevwan dismissed their stale review July 18, 2025 15:06

When there’s a pod change, the EndpointSlice returned isn’t the full set, but only the changed slice, right? We need to carefully analyze the notification mechanism. For example, if it’s split into 10 slices and a new pod is added, only one slice is returned — then traffic might become skewed toward that part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants