feat: support endpoint override policy based routing #6458

Xunzhuo · 2025-07-03T09:50:34Z

What type of PR is this?

feat: support host override policy based routing

What this PR does / why we need it:

Support host override policy based routing, a typical scenario is the LLM Endpoint Picker.

Which issue(s) this PR fixes:

Fixes #6456

Release Notes: Yes

Use in this way:

- apiVersion: gateway.envoyproxy.io/v1alpha1
  kind: BackendTrafficPolicy
  metadata:
    namespace: default
    name: policy-for-header-override
  spec:
    targetRef:
      group: gateway.networking.k8s.io
      kind: HTTPRoute
      name: httproute
    loadBalancer:
      type: RoundRobin
      endpointOverride:
        extractFrom:
        - header: "x-gateway-destination-endpoint"
        - metadata:
            key: envoy.lb
            path:
            - key: x-gateway-destination-endpoint
- apiVersion: gateway.networking.k8s.io/v1
  kind: HTTPRoute
  metadata:
    namespace: default
    name: httproute
  spec:
    hostnames:
    - gateway.envoyproxy.io
    parentRefs:
    - namespace: envoy-gateway
      name: inference-gateway
      sectionName: http
    rules:
    - matches:
      - path:
          value: "/v1"
      backendRefs:
      - name: fallback-inference-service
        port: 8080

Xunzhuo

e2e passed locally.

codecov · 2025-07-03T10:08:10Z

Codecov Report

Attention: Patch coverage is 74.31907% with 66 lines in your changes missing coverage. Please review.

Project coverage is 70.63%. Comparing base (e8fefff) to head (3944377).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
internal/xds/translator/cluster.go	71.42%	57 Missing and 9 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6458      +/-   ##
==========================================
- Coverage   70.68%   70.63%   -0.05%     
==========================================
  Files         220      220              
  Lines       37701    37867     +166     
==========================================
+ Hits        26648    26747      +99     
- Misses       9490     9549      +59     
- Partials     1563     1571       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Xunzhuo · 2025-07-04T01:54:59Z

/retest

wbpcode · 2025-07-08T06:25:05Z

api/v1alpha1/loadbalancer_types.go

+	// FallbackPolicy defines the child LB policy to use in case neither header nor metadata with selected hosts is present.
+	// If not specified, defaults to LeastRequest.
+	//
+	// +optional
+	// +kubebuilder:default="LeastRequest"
+	FallbackPolicy *LoadBalancerType `json:"fallbackPolicy,omitempty"`


IMO, I will perfer the HostOverrideSettings as a common enhancement to all exist lb policy. It self needn't to be treated as an independent lb policy in the control plane.

That's say, we needn't the FallbackPolicy and the HostOverride policy. If the optional HostOverrideSettings is configured, then we can think the override lb policy of data plane is used, and the original LoadBalancerType , etc. could be used to construct the fallback policy of data plane. This should could simplify our API and avoid the users to configure nested override host lb.

I am not expert of the the gateway/control plane, only a comment from the point of the API.

cc @Xunzhuo @arkodg

sure I like this approach

+1 for this

piggybacking off this

endpointOverride: - header: x-custom-host

or

endpointOverride: from: - header: x-custom-host

which is similar to

gateway/api/v1alpha1/api_key_auth_types.go

Line 24 in 1737078

ExtractFrom []*ExtractFrom `json:"extractFrom"`

gateway/api/v1alpha1/jwt_types.go

Line 86 in 1737078

ExtractFrom *JWTExtractor `json:"extractFrom,omitempty"`

we use endpoint instead of host in the API, so also suggested that change

then i am ok with option 2

cool lets use extractFrom to keep the API surface similar

Working on it

wanna make sure the logics in envoy, take the picture above as an example: how does envoy pick the endpoint, from the index 0 to the end, if found then break the selection?

cc @wbpcode @yanavlasov

Signed-off-by: bitliu <bitliu@tencent.com>

arkodg · 2025-07-10T15:56:08Z

api/v1alpha1/loadbalancer_types.go

+	// If set this field then it will take precedence over the header field.
+	//
+	// +optional
+	Metadata *EndpointOverrideMetadataKey `json:"metadata,omitempty"`


can we reuse

gateway/api/v1alpha1/ratelimit_types.go

Line 172 in f8054b1

type RateLimitCostMetadata struct {

should move it into shared types to reuse

Emmm I checked that, they are in different cases: endpoint override is to build metadatav3.MetadataKey

Ratelimit is to build the routev3.RateLimit_HitsAddend, they have different format input

hey @wbpcode @mathetake curious why 2 different metadata types exist ?

and do we need to add this to the API ?
will ext proc write this value ? if so extProc's API in EG

gateway/api/v1alpha1/ext_proc_types.go

Line 113 in 64f3576

type ExtProcMetadata struct {

uses the term writableNamespaces

@zhaohuabing does path need to be a [] ?

hey @wbpcode @mathetake curious why 2 different metadata types exist ?

The RateLimitCostMetadata should be a control plane abstractino that used to construct the hits_addend. The Envoy only provide a MetadataKey to access the Metadata.

But as @zhaohuabing said, key is confusing. The Envoy use key for both metadata namespace and metadata key/path. So, sound good to me to reuse the Envoy MetadataKey or create a new shared variant for EG.

@zhaohuabing does path need to be a [] ?

As shared MetadataKey. I think [] will provide more flexibility.

@zhaohuabing does path need to be a [] ?

Not necessarily, but it's semantically clearer than something like a:b:c:d.

Xunzhuo · 2025-07-11T02:06:55Z

/retest

Xunzhuo · 2025-07-11T06:10:08Z

Endpoint Picker General Implementation Logics

This is how we generally implemented the EPP logics in Envoy based API Gateway, no matter the control plane is Envoy Gateway, Istio, or KGateway:

Control Plane: it tells envoy how to route (host override lbpolicy or original cluster) and tells envoy how to connect to the epp ext-proc (http ext-proc filter + route level epp ext-proc config override, if the extproc need to read/write the metadata we should also set receiving_namespaces/forwarding_namespaces at ext-proc config)
Data Plane: epp ext proc selects the endpoint and adding it to metadata or header. Envoy routes to that endpoint based on the control plane sent rules.

Envoy Original Dst Cluster vs Host Override LbPolicy

Original Dst Cluster: It is easy to implement and don't need the real cluster endpoints. But it does not support fallback, which means if the selection is failed, the routing will fail immediately.

Host Override LbPolicy: It is a bit complexer than original dst cluster to implement, it requires the real cluster endpoints, and the selected endpoint should be in the endpoints, otherwise it will fallback. So when Gateway implements the InferencePool with host override lbpolicy, we usually need a real service selects the inference workload endpoints, and the host override lbpolicy is working on the kubernetes service, and the endpoint selection logics in EndpointPicker should also select the endpoints in the same endpoints (Istio creates a service with the same labels selectors with the InferencePool Selectors)

How to implement the Endpoint Picker logics in Envoy Gateway?

Different AI Gateway based on Envoy Gateway has different approaches to reach the above goal:

Envoy AI Gateway: use Envoy Gateway Extension Server.

It edits cluster, route, listener to make this work, this is quite challenging since it is a complex work, which need to work well with the existing config. This is not suitable for adopters like AIBrix.

The default EPP implementation is GIE.

AIBrix Inference Gateway: use Envoy Gateway CRD configuration.

The EPP imlementation is AIBrix Gateway Plugin.(Similar to GIE, it provides intelligently endpoint picker)

v1 (currently): use envoy gateway EnvoyPatchPolicy (patch original cluster config) + EnvoyExtensionPolicy (add epp ext-proc config to gateway), this is static and not easy to maintain or orchestrate.
v2 (planning): use envoy gateway btp (add host override lb policy) + eep (add epp ext-proc config to gateway, also add receiving/forwarding ns with 'envoy.lb' if needed), this can largely improve UX, and also add fallback abilities to the GW.
v3 (after v2): use controller to automatically do what we configure manually in v2, and support InferencePool API. Simplify UX and can adopt GIE conformance test.

zhaohuabing · 2025-07-14T07:57:00Z

internal/ir/xds.go

+	// EndpointOverride defines the configuration for endpoint override.
+	// When specified, the load balancer will attempt to route requests to endpoints
+	// based on the override information extracted from request headers or metadata.
+	// If no valid override endpoint is found, the configured load balancer policy will be used as fallback.


Nit:

Suggested change

// If no valid override endpoint is found, the configured load balancer policy will be used as fallback.

// If the override endpoints are not available, the configured load balancer policy will be used as fallback.

zhaohuabing · 2025-07-14T08:01:49Z

api/v1alpha1/loadbalancer_types.go

+	// EndpointOverride defines the configuration for endpoint override.
+	// When specified, the load balancer will attempt to route requests to endpoints
+	// based on the override information extracted from request headers or metadata.
+	// If no valid override endpoint is found, the configured load balancer policy will be used as fallback.


Nit:

Suggested change

// If no valid override endpoint is found, the configured load balancer policy will be used as fallback.

// If the override endpoints are not available, the configured load balancer policy will be used as fallback.

wbpcode · 2025-07-17T09:07:16Z

api/v1alpha1/loadbalancer_types.go

+	//
+	// +kubebuilder:validation:MinItems=1
+	// +kubebuilder:validation:MaxItems=10
+	ExtractFrom []EndpointOverrideExtractFrom `json:"extractFrom"`


Will EndpointOverrideSources (field name) and EndpointOverrideSource (type name) better? Then the naming will align with the data plane.

the word endpoint is repeated endpointOverride.endpointOverrideSources which shouldnt be required since the parent structure already has it

Xunzhuo marked this pull request as ready for review July 3, 2025 09:51

Xunzhuo requested a review from a team as a code owner July 3, 2025 09:51

Xunzhuo force-pushed the feat-host-override branch 3 times, most recently from 483e17a to a9e8062 Compare July 3, 2025 09:59

Xunzhuo commented Jul 3, 2025

View reviewed changes

Xunzhuo force-pushed the feat-host-override branch from a9e8062 to 364e8a8 Compare July 3, 2025 13:49

Xunzhuo force-pushed the feat-host-override branch from 364e8a8 to 9fb2b51 Compare July 4, 2025 14:26

wbpcode reviewed Jul 8, 2025

View reviewed changes

arkodg added this to the v1.5.0-rc.1 Release milestone Jul 9, 2025

Xunzhuo changed the title ~~feat: support host override policy based routing~~ feat: support endpoint override policy based routing Jul 10, 2025

Xunzhuo force-pushed the feat-host-override branch 4 times, most recently from 91de7b0 to be0b252 Compare July 10, 2025 14:14

Xunzhuo requested a review from arkodg July 10, 2025 14:17

feat: support endpoint override policy based routing

3944377

Signed-off-by: bitliu <bitliu@tencent.com>

Xunzhuo force-pushed the feat-host-override branch from be0b252 to 3944377 Compare July 10, 2025 14:25

arkodg reviewed Jul 10, 2025

View reviewed changes

Xunzhuo mentioned this pull request Jun 28, 2025

Add InferencePool Integration Support to Gateway Plugin vllm-project/aibrix#1233

Open

zhaohuabing reviewed Jul 14, 2025

View reviewed changes

wbpcode reviewed Jul 17, 2025

View reviewed changes

	// If no valid override endpoint is found, the configured load balancer policy will be used as fallback.
	// If the override endpoints are not available, the configured load balancer policy will be used as fallback.

feat: support endpoint override policy based routing #6458

Are you sure you want to change the base?

feat: support endpoint override policy based routing #6458

Uh oh!

Conversation

Xunzhuo commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Xunzhuo left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Xunzhuo commented Jul 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Xunzhuo Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhaohuabing Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Xunzhuo commented Jul 11, 2025

Uh oh!

Xunzhuo commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Endpoint Picker General Implementation Logics

Envoy Original Dst Cluster vs Host Override LbPolicy

How to implement the Endpoint Picker logics in Envoy Gateway?

Envoy AI Gateway: use Envoy Gateway Extension Server.

AIBrix Inference Gateway: use Envoy Gateway CRD configuration.

Uh oh!

zhaohuabing Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhaohuabing Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Xunzhuo commented Jul 3, 2025 •

edited

Loading

Xunzhuo left a comment •

edited

Loading

codecov bot commented Jul 3, 2025 •

edited

Loading

Xunzhuo Jul 11, 2025 •

edited

Loading

zhaohuabing Jul 18, 2025 •

edited

Loading

Xunzhuo commented Jul 11, 2025 •

edited

Loading

zhaohuabing Jul 14, 2025 •

edited

Loading

zhaohuabing Jul 14, 2025 •

edited

Loading