Skip to content

Conversation

@ardatan
Copy link
Member

@ardatan ardatan commented Oct 21, 2025

Hive Console Client integration
Ref ROUTER-102
Blocked by graphql-hive/console#7143

Documentation -> graphql-hive/console#7171

TODOs:

@ardatan ardatan marked this pull request as draft October 21, 2025 14:50
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ardatan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant new feature: the ability for the GraphQL router to report detailed usage metrics to the GraphQL Hive Console. This integration provides valuable insights into how the router is being used, including operation names, execution times, and error rates. The reporting mechanism is highly configurable, allowing users to control aspects like sampling rates, excluded operations, and reporting intervals, ensuring efficient and relevant data collection. This enhancement is crucial for monitoring and optimizing GraphQL API performance and adoption.

Highlights

  • Hive Console Usage Reporting Integration: The router now integrates with the hive-console-client to report GraphQL operation usage data, providing insights into router activity.
  • Configurable Usage Settings: A new UsageConfig struct has been introduced, allowing detailed configuration of reporting parameters such as API token, endpoint, sample rate, excluded operations, client headers, buffer size, and various timeouts.
  • Dynamic Usage Agent Initialization and Background Task: A UsageAgent is initialized based on the provided UsageConfig and registered as a background task, ensuring that usage reports are periodically flushed to the Hive Console.
  • GraphQL Pipeline Integration: The GraphQL execution pipeline has been modified to capture operation details, execution duration, and error counts, which are then conditionally sent to the UsageAgent for reporting.
  • Dependency Updates and Forked Parser: The project's dependencies have been updated, notably replacing the graphql-parser crate with graphql-parser-hive-fork across Cargo.lock and Cargo.toml files, and adding new crates like md5 and webpki-roots.
  • Error Counting in Execution Output: The PlanExecutionOutput now includes an error_count field, which tracks the number of errors encountered during query plan execution, providing crucial data for usage reporting.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Hive Console Client integration for usage reporting in the Hive Router. It includes changes to Cargo.lock and Cargo.toml files to add the new dependency, modifications to bin/router/src/lib.rs and bin/router/src/pipeline/mod.rs to implement the usage reporting logic, and a new file bin/router/src/pipeline/usage.rs for sending usage reports. The shared state is also updated to include the usage agent. I have provided review comments to address potential issues related to error handling and code clarity.

@ardatan ardatan force-pushed the hive-usage-reporting branch 2 times, most recently from b11336d to 7c73c86 Compare October 23, 2025 14:56
@ardatan ardatan force-pushed the hive-usage-reporting branch 3 times, most recently from 25e7e44 to 5c9a3ac Compare October 28, 2025 12:50
@ardatan ardatan marked this pull request as ready for review October 29, 2025 14:03
@ardatan ardatan force-pushed the hive-usage-reporting branch from 25e2b93 to 61308f1 Compare October 29, 2025 14:03
}
let client_name = get_header_value(req, &usage_config.client_name_header);
let client_version = get_header_value(req, &usage_config.client_version_header);
let timestamp = SystemTime::now()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be as_millis instead of sec*1000

})
}

pub fn send_usage_report(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the name here is not good, as the function really collects the operation.
Maybe collect_usage_report?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this function is called from the hotpath, just for the sake of micro-perf, let's #[inline] it.

}

fn get_header_value(req: &HttpRequest, header_name: &str) -> Option<String> {
req.headers()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of the header doesn't have to be a String here. As long as you don't need, it can remain &str here and return as such.
Even if eventually it wll be cloned internally by the usage-agent, i don't think it should happen here.


use crate::background_tasks::BackgroundTask;

pub fn from_config(router_config: &HiveRouterConfig) -> Option<UsageAgent> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should get only the config part that's relevant to it, and then turn UsageAgent. The condition/decision on making the agent should happen in the caller function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, naming! probably create_hive_usage_agrent is a better name here.

usage_config.request_timeout,
usage_config.accept_invalid_certs,
flush_interval,
"hive-router".to_string(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is the user-agent to use?
You can use ROUTER_VERSION const and append it here, so we'll have something like hive-router@VERSION or hive-router/VERSION.


pub fn send_usage_report(
schema: Arc<Document<'static, String>>,
start: Instant,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like passing the Instant here over and over can be replaced with measuring the total time and then just pass it here as Duration?

};
usage_agent
.add_report(execution_report)
.unwrap_or_else(|err| tracing::error!("Failed to send usage report: {}", err));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I feel like unwrap_or_else here could be more readable with a if let or match on the Result.

#[async_trait]
impl BackgroundTask for UsageAgent {
fn id(&self) -> &str {
"usage_report_flush_interval"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hive_console_usage_report_task

@github-actions
Copy link

github-actions bot commented Oct 29, 2025

🐋 This PR was built and pushed to the following Docker images:

Image Names: ghcr.io/graphql-hive/router

Platforms: linux/amd64,linux/arm64

Image Tags: ghcr.io/graphql-hive/router:pr-499 ghcr.io/graphql-hive/router:sha-d70cd8e

Docker metadata
{
"buildx.build.ref": "builder-d55a74dc-0f8c-4874-aa3e-31e9a10f42b5/builder-d55a74dc-0f8c-4874-aa3e-31e9a10f42b50/4mj7y985v31b2dp9mwel5oai2",
"containerimage.descriptor": {
  "mediaType": "application/vnd.oci.image.index.v1+json",
  "digest": "sha256:9e394c83e383b910b4b5a777cf061ff458a6eaf195a27292d3874298c99d4d77",
  "size": 1609
},
"containerimage.digest": "sha256:9e394c83e383b910b4b5a777cf061ff458a6eaf195a27292d3874298c99d4d77",
"image.name": "ghcr.io/graphql-hive/router:pr-499,ghcr.io/graphql-hive/router:sha-d70cd8e"
}

true => Some(JwtAuthRuntime::init(bg_tasks_manager, &router_config.jwt).await?),
false => None,
};
let usage_agent = pipeline::usage_reporting::from_config(&router_config).map(Arc::new);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace with the new fn name i suggested above.

nit: use just the fn name, not full import here.

@github-actions
Copy link

github-actions bot commented Oct 29, 2025

k6-benchmark results

     ✓ response code was 200
     ✓ no graphql errors
     ✓ valid response structure

     █ setup

     checks.........................: 100.00% ✓ 219057      ✗ 0    
     data_received..................: 6.4 GB  213 MB/s
     data_sent......................: 86 MB   2.8 MB/s
     http_req_blocked...............: avg=3.38µs  min=672ns   med=1.69µs  max=5.6ms    p(90)=2.38µs  p(95)=2.71µs  
     http_req_connecting............: avg=779ns   min=0s      med=0s      max=2.12ms   p(90)=0s      p(95)=0s      
     http_req_duration..............: avg=20.07ms min=2.21ms  med=19.17ms max=104.75ms p(90)=27.32ms p(95)=30.45ms 
       { expected_response:true }...: avg=20.07ms min=2.21ms  med=19.17ms max=104.75ms p(90)=27.32ms p(95)=30.45ms 
     http_req_failed................: 0.00%   ✓ 0           ✗ 73039
     http_req_receiving.............: avg=151.3µs min=24.32µs med=39.58µs max=85.79ms  p(90)=84.16µs p(95)=385.55µs
     http_req_sending...............: avg=23.5µs  min=5.07µs  med=10.55µs max=17.52ms  p(90)=15.41µs p(95)=28.08µs 
     http_req_tls_handshaking.......: avg=0s      min=0s      med=0s      max=0s       p(90)=0s      p(95)=0s      
     http_req_waiting...............: avg=19.9ms  min=2.15ms  med=19.04ms max=55.82ms  p(90)=27.1ms  p(95)=30.13ms 
     http_reqs......................: 73039   2429.049096/s
     iteration_duration.............: avg=20.53ms min=6.09ms  med=19.51ms max=244.34ms p(90)=27.76ms p(95)=30.96ms 
     iterations.....................: 73019   2428.383958/s
     vus............................: 50      min=50        max=50 
     vus_max........................: 50      min=50        max=50 

pub metadata: SchemaMetadata,
pub planner: Planner,
pub subgraph_executor_map: SubgraphExecutorMap,
pub schema: Arc<Document<'static, String>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to supergraph_schema as it's not clear if that's a supergraphg or a public-api schema

pub override_labels_evaluator: OverrideLabelsEvaluator,
pub cors_runtime: Option<Cors>,
pub jwt_auth_runtime: Option<JwtAuthRuntime>,
pub usage_agent: Option<Arc<UsageAgent>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should pick one path:
1- call it usage_agent and then we need to abstract it a bit (like we did with supergraph loading)
2-call it hive_usage_agent here.

I tend to go with 2 for now

CORSConfig(#[from] Box<CORSConfigError>),
#[error("invalid override labels config: {0}")]
OverrideLabelsCompile(#[from] Box<OverrideLabelsCompileError>),
#[error("error creating usage agent: {0}")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error creating hive usage agent:

#[serde(deny_unknown_fields)]
pub struct UsageReportingConfig {
/// Your [Registry Access Token](https://the-guild.dev/graphql/hive/docs/management/targets#registry-access-tokens) with write permission.
pub token: String,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be access_token as this is explicitly how we call it everywhere in Console.


/// Configuration for usage reporting to GraphQL Hive.
#[serde(default)]
pub usage_reporting: Option<usage_reporting::UsageReportingConfig>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have a enabled field in UsageReportingConfig, this one should not be wrapped with Option.
It should be:

#[serde(default = "usage_reporting::UsageReportingConfig::default")]
    pub usage_reporting: usage_reporting::UsageReportingConfig,

And the impl Default for UsageReportingConfig should be implemented to configure it with enabled: false

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we don't have an enabled flag in UsageReportingConfig.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I think we should. We need to be explicit on these, otherwise it might end up with lack of ability to enable/disale via things like env vars.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we introduce an enabled flag, user will need to define enabled: true even if they provide the env variables.

/// 1.0 = 100% chance of being sent
/// Default: 1.0
#[serde(default = "default_sample_rate")]
pub sample_rate: f64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a more user-friendly value?
I mean, if for durations we are using humantime and allow things like 10s, then why not allow user to write here 10% instead of 0.1?

/// Unit: seconds
/// Default: 15 (s)
#[serde(default = "default_request_timeout")]
pub request_timeout: u64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be humantime, see other plugins for example.

/// Unit: seconds
/// Default: 5 (s)
#[serde(default = "default_connect_timeout")]
pub connect_timeout: u64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be humantime, see other plugins for example.

/// Frequency of flushing the buffer to the server
/// Default: 5 seconds
#[serde(default = "default_flush_interval")]
pub flush_interval: u64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be humantime, see other plugins for example.

/// Your [Registry Access Token](https://the-guild.dev/graphql/hive/docs/management/targets#registry-access-tokens) with write permission.
pub token: String,
/// A target ID, this can either be a slug following the format “$organizationSlug/$projectSlug/$targetSlug” (e.g “the-guild/graphql-hive/staging”) or an UUID (e.g. “a0f4c605-6541-4350-8cfe-b31f21a4bf80”). To be used when the token is configured with an organization access token.
pub target_id: Option<String>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this one can be validated during de-serialization as either {string}/{string}/{string} or uuid.

[workspace.dependencies]
graphql-tools = "0.4.0"
graphql-parser = "0.4.1"
graphql-parser = { version = "0.5.0", package = "graphql-parser-hive-fork" }
Copy link
Member

@dotansimha dotansimha Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use the custom one? we dropped it on purpose.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SDK uses this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we need to fix that in SDK. I don't think the SDK has any reason to still use it now.
Router shouldn't use this custom one

@dotansimha
Copy link
Member

One more thing, I think we might need to expose these by default using env vars. See env_var_override for example on how to do it. We can align to how it looks like in GW.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants