-
Couldn't load subscription status.
- Fork 3
feat(router): Hive Console Usage Reporting #499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @ardatan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant new feature: the ability for the GraphQL router to report detailed usage metrics to the GraphQL Hive Console. This integration provides valuable insights into how the router is being used, including operation names, execution times, and error rates. The reporting mechanism is highly configurable, allowing users to control aspects like sampling rates, excluded operations, and reporting intervals, ensuring efficient and relevant data collection. This enhancement is crucial for monitoring and optimizing GraphQL API performance and adoption. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces Hive Console Client integration for usage reporting in the Hive Router. It includes changes to Cargo.lock and Cargo.toml files to add the new dependency, modifications to bin/router/src/lib.rs and bin/router/src/pipeline/mod.rs to implement the usage reporting logic, and a new file bin/router/src/pipeline/usage.rs for sending usage reports. The shared state is also updated to include the usage agent. I have provided review comments to address potential issues related to error handling and code clarity.
b11336d to
7c73c86
Compare
25e7e44 to
5c9a3ac
Compare
25e2b93 to
61308f1
Compare
| } | ||
| let client_name = get_header_value(req, &usage_config.client_name_header); | ||
| let client_version = get_header_value(req, &usage_config.client_version_header); | ||
| let timestamp = SystemTime::now() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be as_millis instead of sec*1000
| }) | ||
| } | ||
|
|
||
| pub fn send_usage_report( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the name here is not good, as the function really collects the operation.
Maybe collect_usage_report?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this function is called from the hotpath, just for the sake of micro-perf, let's #[inline] it.
| } | ||
|
|
||
| fn get_header_value(req: &HttpRequest, header_name: &str) -> Option<String> { | ||
| req.headers() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of the header doesn't have to be a String here. As long as you don't need, it can remain &str here and return as such.
Even if eventually it wll be cloned internally by the usage-agent, i don't think it should happen here.
|
|
||
| use crate::background_tasks::BackgroundTask; | ||
|
|
||
| pub fn from_config(router_config: &HiveRouterConfig) -> Option<UsageAgent> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should get only the config part that's relevant to it, and then turn UsageAgent. The condition/decision on making the agent should happen in the caller function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, naming! probably create_hive_usage_agrent is a better name here.
| usage_config.request_timeout, | ||
| usage_config.accept_invalid_certs, | ||
| flush_interval, | ||
| "hive-router".to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is the user-agent to use?
You can use ROUTER_VERSION const and append it here, so we'll have something like hive-router@VERSION or hive-router/VERSION.
|
|
||
| pub fn send_usage_report( | ||
| schema: Arc<Document<'static, String>>, | ||
| start: Instant, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like passing the Instant here over and over can be replaced with measuring the total time and then just pass it here as Duration?
| }; | ||
| usage_agent | ||
| .add_report(execution_report) | ||
| .unwrap_or_else(|err| tracing::error!("Failed to send usage report: {}", err)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I feel like unwrap_or_else here could be more readable with a if let or match on the Result.
| #[async_trait] | ||
| impl BackgroundTask for UsageAgent { | ||
| fn id(&self) -> &str { | ||
| "usage_report_flush_interval" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hive_console_usage_report_task
|
🐋 This PR was built and pushed to the following Docker images: Image Names: Platforms: Image Tags: Docker metadata{
"buildx.build.ref": "builder-d55a74dc-0f8c-4874-aa3e-31e9a10f42b5/builder-d55a74dc-0f8c-4874-aa3e-31e9a10f42b50/4mj7y985v31b2dp9mwel5oai2",
"containerimage.descriptor": {
"mediaType": "application/vnd.oci.image.index.v1+json",
"digest": "sha256:9e394c83e383b910b4b5a777cf061ff458a6eaf195a27292d3874298c99d4d77",
"size": 1609
},
"containerimage.digest": "sha256:9e394c83e383b910b4b5a777cf061ff458a6eaf195a27292d3874298c99d4d77",
"image.name": "ghcr.io/graphql-hive/router:pr-499,ghcr.io/graphql-hive/router:sha-d70cd8e"
} |
bin/router/src/lib.rs
Outdated
| true => Some(JwtAuthRuntime::init(bg_tasks_manager, &router_config.jwt).await?), | ||
| false => None, | ||
| }; | ||
| let usage_agent = pipeline::usage_reporting::from_config(&router_config).map(Arc::new); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace with the new fn name i suggested above.
nit: use just the fn name, not full import here.
✅
|
bin/router/src/schema_state.rs
Outdated
| pub metadata: SchemaMetadata, | ||
| pub planner: Planner, | ||
| pub subgraph_executor_map: SubgraphExecutorMap, | ||
| pub schema: Arc<Document<'static, String>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to supergraph_schema as it's not clear if that's a supergraphg or a public-api schema
bin/router/src/shared_state.rs
Outdated
| pub override_labels_evaluator: OverrideLabelsEvaluator, | ||
| pub cors_runtime: Option<Cors>, | ||
| pub jwt_auth_runtime: Option<JwtAuthRuntime>, | ||
| pub usage_agent: Option<Arc<UsageAgent>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should pick one path:
1- call it usage_agent and then we need to abstract it a bit (like we did with supergraph loading)
2-call it hive_usage_agent here.
I tend to go with 2 for now
bin/router/src/shared_state.rs
Outdated
| CORSConfig(#[from] Box<CORSConfigError>), | ||
| #[error("invalid override labels config: {0}")] | ||
| OverrideLabelsCompile(#[from] Box<OverrideLabelsCompileError>), | ||
| #[error("error creating usage agent: {0}")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error creating hive usage agent:
| #[serde(deny_unknown_fields)] | ||
| pub struct UsageReportingConfig { | ||
| /// Your [Registry Access Token](https://the-guild.dev/graphql/hive/docs/management/targets#registry-access-tokens) with write permission. | ||
| pub token: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be access_token as this is explicitly how we call it everywhere in Console.
|
|
||
| /// Configuration for usage reporting to GraphQL Hive. | ||
| #[serde(default)] | ||
| pub usage_reporting: Option<usage_reporting::UsageReportingConfig>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have a enabled field in UsageReportingConfig, this one should not be wrapped with Option.
It should be:
#[serde(default = "usage_reporting::UsageReportingConfig::default")]
pub usage_reporting: usage_reporting::UsageReportingConfig,
And the impl Default for UsageReportingConfig should be implemented to configure it with enabled: false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we don't have an enabled flag in UsageReportingConfig.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I think we should. We need to be explicit on these, otherwise it might end up with lack of ability to enable/disale via things like env vars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we introduce an enabled flag, user will need to define enabled: true even if they provide the env variables.
| /// 1.0 = 100% chance of being sent | ||
| /// Default: 1.0 | ||
| #[serde(default = "default_sample_rate")] | ||
| pub sample_rate: f64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be a more user-friendly value?
I mean, if for durations we are using humantime and allow things like 10s, then why not allow user to write here 10% instead of 0.1?
| /// Unit: seconds | ||
| /// Default: 15 (s) | ||
| #[serde(default = "default_request_timeout")] | ||
| pub request_timeout: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be humantime, see other plugins for example.
| /// Unit: seconds | ||
| /// Default: 5 (s) | ||
| #[serde(default = "default_connect_timeout")] | ||
| pub connect_timeout: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be humantime, see other plugins for example.
| /// Frequency of flushing the buffer to the server | ||
| /// Default: 5 seconds | ||
| #[serde(default = "default_flush_interval")] | ||
| pub flush_interval: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be humantime, see other plugins for example.
| /// Your [Registry Access Token](https://the-guild.dev/graphql/hive/docs/management/targets#registry-access-tokens) with write permission. | ||
| pub token: String, | ||
| /// A target ID, this can either be a slug following the format “$organizationSlug/$projectSlug/$targetSlug” (e.g “the-guild/graphql-hive/staging”) or an UUID (e.g. “a0f4c605-6541-4350-8cfe-b31f21a4bf80”). To be used when the token is configured with an organization access token. | ||
| pub target_id: Option<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this one can be validated during de-serialization as either {string}/{string}/{string} or uuid.
| [workspace.dependencies] | ||
| graphql-tools = "0.4.0" | ||
| graphql-parser = "0.4.1" | ||
| graphql-parser = { version = "0.5.0", package = "graphql-parser-hive-fork" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why use the custom one? we dropped it on purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SDK uses this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we need to fix that in SDK. I don't think the SDK has any reason to still use it now.
Router shouldn't use this custom one
|
One more thing, I think we might need to expose these by default using env vars. See env_var_override for example on how to do it. We can align to how it looks like in GW. |
Hive Console Client integration
Ref ROUTER-102
Blocked by graphql-hive/console#7143
Documentation -> graphql-hive/console#7171
TODOs:
hive-console-sdkand add it to Cargo.toml here