-
Notifications
You must be signed in to change notification settings - Fork 518
feat: support endpoint override policy based routing #6458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
483e17a
to
a9e8062
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6458 +/- ##
==========================================
- Coverage 70.68% 70.63% -0.05%
==========================================
Files 220 220
Lines 37701 37867 +166
==========================================
+ Hits 26648 26747 +99
- Misses 9490 9549 +59
- Partials 1563 1571 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
a9e8062
to
364e8a8
Compare
/retest |
364e8a8
to
9fb2b51
Compare
api/v1alpha1/loadbalancer_types.go
Outdated
// FallbackPolicy defines the child LB policy to use in case neither header nor metadata with selected hosts is present. | ||
// If not specified, defaults to LeastRequest. | ||
// | ||
// +optional | ||
// +kubebuilder:default="LeastRequest" | ||
FallbackPolicy *LoadBalancerType `json:"fallbackPolicy,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, I will perfer the HostOverrideSettings as a common enhancement to all exist lb policy. It self needn't to be treated as an independent lb policy in the control plane.
That's say, we needn't the FallbackPolicy
and the HostOverride
policy. If the optional HostOverrideSettings
is configured, then we can think the override lb policy of data plane is used, and the original LoadBalancerType
, etc. could be used to construct the fallback policy of data plane. This should could simplify our API and avoid the users to configure nested override host lb.
I am not expert of the the gateway/control plane, only a comment from the point of the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure I like this approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
piggybacking off this
endpointOverride:
- header: x-custom-host
or
endpointOverride:
from:
- header: x-custom-host
which is similar to
ExtractFrom []*ExtractFrom `json:"extractFrom"` gateway/api/v1alpha1/jwt_types.go
Line 86 in 1737078
ExtractFrom *JWTExtractor `json:"extractFrom,omitempty"`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use endpoint
instead of host
in the API, so also suggested that change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then i am ok with option 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool lets use extractFrom
to keep the API surface similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
91de7b0
to
be0b252
Compare
Signed-off-by: bitliu <bitliu@tencent.com>
be0b252
to
3944377
Compare
// If set this field then it will take precedence over the header field. | ||
// | ||
// +optional | ||
Metadata *EndpointOverrideMetadataKey `json:"metadata,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we reuse
gateway/api/v1alpha1/ratelimit_types.go
Line 172 in f8054b1
type RateLimitCostMetadata struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should move it into shared types to reuse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emmm I checked that, they are in different cases: endpoint override is to build metadatav3.MetadataKey
Ratelimit is to build the routev3.RateLimit_HitsAddend, they have different format input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @wbpcode @mathetake curious why 2 different metadata types exist ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and do we need to add this to the API ?
will ext proc write this value ? if so extProc's API in EG
gateway/api/v1alpha1/ext_proc_types.go
Line 113 in 64f3576
type ExtProcMetadata struct { |
uses the term writableNamespaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhaohuabing does path
need to be a []
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @wbpcode @mathetake curious why 2 different metadata types exist ?
The RateLimitCostMetadata
should be a control plane abstractino that used to construct the hits_addend
. The Envoy only provide a MetadataKey
to access the Metadata
.
But as @zhaohuabing said, key
is confusing. The Envoy use key
for both metadata namespace and metadata key/path. So, sound good to me to reuse the Envoy MetadataKey
or create a new shared variant for EG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhaohuabing does
path
need to be a[]
?
As shared MetadataKey
. I think []
will provide more flexibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhaohuabing does path need to be a [] ?
Not necessarily, but it's semantically clearer than something like a:b:c:d
.
/retest |
Endpoint Picker General Implementation LogicsThis is how we generally implemented the EPP logics in Envoy based API Gateway, no matter the control plane is Envoy Gateway, Istio, or KGateway:
Envoy Original Dst Cluster vs Host Override LbPolicyOriginal Dst Cluster: It is easy to implement and don't need the real cluster endpoints. But it does not support fallback, which means if the selection is failed, the routing will fail immediately. Host Override LbPolicy: It is a bit complexer than original dst cluster to implement, it requires the real cluster endpoints, and the selected endpoint should be in the endpoints, otherwise it will fallback. So when Gateway implements the InferencePool with host override lbpolicy, we usually need a real service selects the inference workload endpoints, and the host override lbpolicy is working on the kubernetes service, and the endpoint selection logics in EndpointPicker should also select the endpoints in the same endpoints (Istio creates a service with the same labels selectors with the InferencePool Selectors) How to implement the Endpoint Picker logics in Envoy Gateway?Different AI Gateway based on Envoy Gateway has different approaches to reach the above goal: Envoy AI Gateway: use Envoy Gateway Extension Server.It edits cluster, route, listener to make this work, this is quite challenging since it is a complex work, which need to work well with the existing config. This is not suitable for adopters like AIBrix. The default EPP implementation is GIE. AIBrix Inference Gateway: use Envoy Gateway CRD configuration.The EPP imlementation is AIBrix Gateway Plugin.(Similar to GIE, it provides intelligently endpoint picker)
|
// EndpointOverride defines the configuration for endpoint override. | ||
// When specified, the load balancer will attempt to route requests to endpoints | ||
// based on the override information extracted from request headers or metadata. | ||
// If no valid override endpoint is found, the configured load balancer policy will be used as fallback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
// If no valid override endpoint is found, the configured load balancer policy will be used as fallback. | |
// If the override endpoints are not available, the configured load balancer policy will be used as fallback. |
// EndpointOverride defines the configuration for endpoint override. | ||
// When specified, the load balancer will attempt to route requests to endpoints | ||
// based on the override information extracted from request headers or metadata. | ||
// If no valid override endpoint is found, the configured load balancer policy will be used as fallback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
// If no valid override endpoint is found, the configured load balancer policy will be used as fallback. | |
// If the override endpoints are not available, the configured load balancer policy will be used as fallback. |
// | ||
// +kubebuilder:validation:MinItems=1 | ||
// +kubebuilder:validation:MaxItems=10 | ||
ExtractFrom []EndpointOverrideExtractFrom `json:"extractFrom"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will EndpointOverrideSources (field name) and EndpointOverrideSource
(type name) better? Then the naming will align with the data plane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the word endpoint
is repeated endpointOverride.endpointOverrideSources
which shouldnt be required since the parent structure already has it
What type of PR is this?
feat: support host override policy based routing
What this PR does / why we need it:
Support host override policy based routing, a typical scenario is the LLM Endpoint Picker.
Which issue(s) this PR fixes:
Fixes #6456
Release Notes: Yes
Use in this way: