Skip to content

feat: support endpoint override policy based routing #6458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Xunzhuo
Copy link
Member

@Xunzhuo Xunzhuo commented Jul 3, 2025

What type of PR is this?

feat: support host override policy based routing

What this PR does / why we need it:

Support host override policy based routing, a typical scenario is the LLM Endpoint Picker.

Which issue(s) this PR fixes:

Fixes #6456

Release Notes: Yes

Use in this way:

- apiVersion: gateway.envoyproxy.io/v1alpha1
  kind: BackendTrafficPolicy
  metadata:
    namespace: default
    name: policy-for-header-override
  spec:
    targetRef:
      group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: httproute
    loadBalancer:
      type: RoundRobin
      endpointOverride:
        extractFrom:
        - header: "x-gateway-destination-endpoint"
        - metadata:
            key: envoy.lb
            path:
            - key: x-gateway-destination-endpoint
- apiVersion: gateway.networking.k8s.io/v1
  kind: HTTPRoute
  metadata:
    namespace: default
    name: httproute
  spec:
    hostnames:
    - gateway.envoyproxy.io
    parentRefs:
    - namespace: envoy-gateway
      name: inference-gateway
      sectionName: http
    rules:
    - matches:
      - path:
          value: "/v1"
      backendRefs:
      - name: fallback-inference-service
        port: 8080

@Xunzhuo Xunzhuo marked this pull request as ready for review July 3, 2025 09:51
@Xunzhuo Xunzhuo requested a review from a team as a code owner July 3, 2025 09:51
@Xunzhuo Xunzhuo force-pushed the feat-host-override branch 3 times, most recently from 483e17a to a9e8062 Compare July 3, 2025 09:59
Copy link
Member Author

@Xunzhuo Xunzhuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image

e2e passed locally.

Copy link

codecov bot commented Jul 3, 2025

Codecov Report

Attention: Patch coverage is 74.31907% with 66 lines in your changes missing coverage. Please review.

Project coverage is 70.63%. Comparing base (e8fefff) to head (3944377).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
internal/xds/translator/cluster.go 71.42% 57 Missing and 9 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6458      +/-   ##
==========================================
- Coverage   70.68%   70.63%   -0.05%     
==========================================
  Files         220      220              
  Lines       37701    37867     +166     
==========================================
+ Hits        26648    26747      +99     
- Misses       9490     9549      +59     
- Partials     1563     1571       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Xunzhuo Xunzhuo force-pushed the feat-host-override branch from a9e8062 to 364e8a8 Compare July 3, 2025 13:49
@Xunzhuo
Copy link
Member Author

Xunzhuo commented Jul 4, 2025

/retest

@Xunzhuo Xunzhuo force-pushed the feat-host-override branch from 364e8a8 to 9fb2b51 Compare July 4, 2025 14:26
Comment on lines 207 to 212
// FallbackPolicy defines the child LB policy to use in case neither header nor metadata with selected hosts is present.
// If not specified, defaults to LeastRequest.
//
// +optional
// +kubebuilder:default="LeastRequest"
FallbackPolicy *LoadBalancerType `json:"fallbackPolicy,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, I will perfer the HostOverrideSettings as a common enhancement to all exist lb policy. It self needn't to be treated as an independent lb policy in the control plane.

That's say, we needn't the FallbackPolicy and the HostOverride policy. If the optional HostOverrideSettings is configured, then we can think the override lb policy of data plane is used, and the original LoadBalancerType , etc. could be used to construct the fallback policy of data plane. This should could simplify our API and avoid the users to configure nested override host lb.

I am not expert of the the gateway/control plane, only a comment from the point of the API.

cc @Xunzhuo @arkodg

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure I like this approach

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

piggybacking off this

endpointOverride:
 - header: x-custom-host

or

endpointOverride:
  from:
   - header: x-custom-host

which is similar to

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use endpoint instead of host in the API, so also suggested that change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then i am ok with option 2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool lets use extractFrom to keep the API surface similar

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clipboard_Screenshot_1752206423

wanna make sure the logics in envoy, take the picture above as an example: how does envoy pick the endpoint, from the index 0 to the end, if found then break the selection?

cc @wbpcode @yanavlasov

@arkodg arkodg added this to the v1.5.0-rc.1 Release milestone Jul 9, 2025
@Xunzhuo Xunzhuo changed the title feat: support host override policy based routing feat: support endpoint override policy based routing Jul 10, 2025
@Xunzhuo Xunzhuo force-pushed the feat-host-override branch 4 times, most recently from 91de7b0 to be0b252 Compare July 10, 2025 14:14
@Xunzhuo Xunzhuo requested a review from arkodg July 10, 2025 14:17
Signed-off-by: bitliu <bitliu@tencent.com>
@Xunzhuo Xunzhuo force-pushed the feat-host-override branch from be0b252 to 3944377 Compare July 10, 2025 14:25
// If set this field then it will take precedence over the header field.
//
// +optional
Metadata *EndpointOverrideMetadataKey `json:"metadata,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we reuse

type RateLimitCostMetadata struct {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should move it into shared types to reuse

Copy link
Member Author

@Xunzhuo Xunzhuo Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emmm I checked that, they are in different cases: endpoint override is to build metadatav3.MetadataKey

Ratelimit is to build the routev3.RateLimit_HitsAddend, they have different format input

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @wbpcode @mathetake curious why 2 different metadata types exist ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and do we need to add this to the API ?
will ext proc write this value ? if so extProc's API in EG

type ExtProcMetadata struct {

uses the term writableNamespaces

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhaohuabing does path need to be a [] ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @wbpcode @mathetake curious why 2 different metadata types exist ?

The RateLimitCostMetadata should be a control plane abstractino that used to construct the hits_addend. The Envoy only provide a MetadataKey to access the Metadata.

But as @zhaohuabing said, key is confusing. The Envoy use key for both metadata namespace and metadata key/path. So, sound good to me to reuse the Envoy MetadataKey or create a new shared variant for EG.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhaohuabing does path need to be a [] ?

As shared MetadataKey. I think [] will provide more flexibility.

Copy link
Member

@zhaohuabing zhaohuabing Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhaohuabing does path need to be a [] ?

Not necessarily, but it's semantically clearer than something like a:b:c:d.

@Xunzhuo
Copy link
Member Author

Xunzhuo commented Jul 11, 2025

/retest

@Xunzhuo
Copy link
Member Author

Xunzhuo commented Jul 11, 2025

Endpoint Picker General Implementation Logics

This is how we generally implemented the EPP logics in Envoy based API Gateway, no matter the control plane is Envoy Gateway, Istio, or KGateway:

  1. Control Plane: it tells envoy how to route (host override lbpolicy or original cluster) and tells envoy how to connect to the epp ext-proc (http ext-proc filter + route level epp ext-proc config override, if the extproc need to read/write the metadata we should also set receiving_namespaces/forwarding_namespaces at ext-proc config)
  2. Data Plane: epp ext proc selects the endpoint and adding it to metadata or header. Envoy routes to that endpoint based on the control plane sent rules.

Envoy Original Dst Cluster vs Host Override LbPolicy

Original Dst Cluster: It is easy to implement and don't need the real cluster endpoints. But it does not support fallback, which means if the selection is failed, the routing will fail immediately.

Host Override LbPolicy: It is a bit complexer than original dst cluster to implement, it requires the real cluster endpoints, and the selected endpoint should be in the endpoints, otherwise it will fallback. So when Gateway implements the InferencePool with host override lbpolicy, we usually need a real service selects the inference workload endpoints, and the host override lbpolicy is working on the kubernetes service, and the endpoint selection logics in EndpointPicker should also select the endpoints in the same endpoints (Istio creates a service with the same labels selectors with the InferencePool Selectors)

How to implement the Endpoint Picker logics in Envoy Gateway?

Different AI Gateway based on Envoy Gateway has different approaches to reach the above goal:

Envoy AI Gateway: use Envoy Gateway Extension Server.

It edits cluster, route, listener to make this work, this is quite challenging since it is a complex work, which need to work well with the existing config. This is not suitable for adopters like AIBrix.

The default EPP implementation is GIE.

AIBrix Inference Gateway: use Envoy Gateway CRD configuration.

The EPP imlementation is AIBrix Gateway Plugin.(Similar to GIE, it provides intelligently endpoint picker)

  1. v1 (currently): use envoy gateway EnvoyPatchPolicy (patch original cluster config) + EnvoyExtensionPolicy (add epp ext-proc config to gateway), this is static and not easy to maintain or orchestrate.
  2. v2 (planning): use envoy gateway btp (add host override lb policy) + eep (add epp ext-proc config to gateway, also add receiving/forwarding ns with 'envoy.lb' if needed), this can largely improve UX, and also add fallback abilities to the GW.
  3. v3 (after v2): use controller to automatically do what we configure manually in v2, and support InferencePool API. Simplify UX and can adopt GIE conformance test.

// EndpointOverride defines the configuration for endpoint override.
// When specified, the load balancer will attempt to route requests to endpoints
// based on the override information extracted from request headers or metadata.
// If no valid override endpoint is found, the configured load balancer policy will be used as fallback.
Copy link
Member

@zhaohuabing zhaohuabing Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
// If no valid override endpoint is found, the configured load balancer policy will be used as fallback.
// If the override endpoints are not available, the configured load balancer policy will be used as fallback.

// EndpointOverride defines the configuration for endpoint override.
// When specified, the load balancer will attempt to route requests to endpoints
// based on the override information extracted from request headers or metadata.
// If no valid override endpoint is found, the configured load balancer policy will be used as fallback.
Copy link
Member

@zhaohuabing zhaohuabing Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
// If no valid override endpoint is found, the configured load balancer policy will be used as fallback.
// If the override endpoints are not available, the configured load balancer policy will be used as fallback.

//
// +kubebuilder:validation:MinItems=1
// +kubebuilder:validation:MaxItems=10
ExtractFrom []EndpointOverrideExtractFrom `json:"extractFrom"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will EndpointOverrideSources (field name) and EndpointOverrideSource (type name) better? Then the naming will align with the data plane.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the word endpoint is repeated endpointOverride.endpointOverrideSources which shouldnt be required since the parent structure already has it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support endpoint picker based on host override policy
4 participants