feat(ingest): Implement first version of project config handler #72

nikhars · 2025-11-20T19:52:21Z

Implemented the basic version of relay project config handler which splits the project keys across multiple cells. Right now it does a round robin split. Will change to lookup into locator later.

ingest-router/src/relay_project_config_handler.rs

markstory · 2025-11-20T21:03:23Z

ingest-router/src/relay_project_config_handler.rs

+
+        Response::builder()
+            .status(StatusCode::OK)
+            .header(CONTENT_TYPE, "application/json")


Do we need to copy any headers from the responses we receive from relay?

Added code to handle the response headers. They are now added from the first upstream which makes a successful response.

ingest-router/Cargo.toml

ingest-router/src/relay_project_config_handler.rs

lynnagara · 2025-11-20T20:57:03Z

ingest-router/src/relay_project_config_handler.rs

+    #[serde(flatten)]
+    pub extra: HashMap<String, JsonValue>,


i see no extra anywhere in this file

https://github.com/getsentry/sentry/blob/master/src/sentry/api/endpoints/relay/project_configs.py

am i looking in the wrong place?

The purpose of extra is to be forward compatible with the protocol. Any field which the ingest-router does not care about (i.e the fields should be opaque) is stashed into extra. It's defined as serde(flatten), so it will be handled appropriately during both serialization and deserialization.

lynnagara · 2025-11-20T21:21:54Z

ingest-router/src/relay_project_config_handler.rs

+///
+/// See module-level docs for complete protocol details, implementation strategy,
+/// and request/response flow diagrams.
+pub struct RelayProjectConfigsHandler {


Since there will be a number of handlers that follow this pattern could this be a trait to enforce they all follow similar patterns?

e.g. something like

trait Handler<Req: Serialize + Deserialize, Res: Serialize + Deserialize> { fn split_requests(request: Req) -> Vec<Req>, fn merge_results(results: Vec<Res>) -> Res, }

where Req and Res can be anything that implement serialize+deserialize. I think this forces the implementation of each endpoint to conform to some basic patterns and look the same which is nice for readability. It could enable more of the the common parts - spawning multiple requests and waiting for them, timeouts, error handling/backpressure to be done in a central place outside of the specific handler.

I do not know of more use cases where this may be needed right now. If we see that use case pop up, then I can make this a generic trait and have multiple implementations. But I would rather do it when I see the need for it.

The other parts of the handler code looks like they are already built with the assumption that there would be many handlers though.

e.g. https://github.com/getsentry/synapse/blob/816dfb3b85e06b1ad1e2a2ffaa5479e615619950/ingest-router/src/config.rs#L50C10-L53 -- wouldn't there be many members on this enum, each of which would map 1:1 with a separate handler?

As for other use cases -- there is this list of endpoints in getsentry, which I would assume would each be separate handler implementations here
https://github.com/getsentry/sentry/blob/c3e99bba9310b1b8385ac04dae75baaaadfe61fe/src/sentry/api/urls.py#L1087-L1128

Motivation for this: You can see in the diff of this PR the complexity of what you have implemented in relay_project_config_handler module, as well as all the conversations here around timeout handling and things like that. It seems like it's easy to get wrong, and I don't think we'd want people to implement that again for every endpoint that needs to be added to synapse now and in the future?

lynnagara · 2025-11-20T21:28:10Z

ingest-router/src/relay_project_config_handler.rs

+                mut handle,
+                public_keys,
+            } = task_with_keys;
+            let task_result = timeout(Duration::from_secs(30), &mut handle).await;


I sort of wonder if we need more than 1 timeout value for various scenarios. The specific cases i'm thinking about are:

i think a really long value like 30 secs is fine if we have no responses from any cell -- without any data relay is making no progress anyway so it's not any worse to wait longer for a response

If 1 cell returns in say 200 milliseconds, and the other cell is down, waiting for the full 30 seconds for the other cell may be a serious problem. Since the requests are chunked this become a very large amount of 30 second waits back to back. If we returned sooner, the cell that was functional would be able to make progress much sooner. I'm worried that such a long wait for an unresponsive cell may risk effectively taking down all other cells as well.

I have added the capability to do adaptive timeout. By default, the slower tasks get 5 more seconds than the fastest task to complete its work. Though, I would imagine when we roll this out in production we would need to have a higher value since we want to avoid the scenario of an empty cell always returning first and never letting the existing cells respond.

I sort of wonder if we need more than 1 timeout value for various scenarios.

Having multiple configuration points for different timeouts seems wise. Waiting here would be sequential which isn't ideal. Perhaps we could combine a join_set to wait on the task futures concurrently, and a timeout so that we get as many responses as possible within the 30s timeout?

I have a naive implementation right now. Working on making it robust.

ingest-router/src/relay_project_config_handler.rs

…ndler

lynnagara · 2025-11-25T21:26:39Z

ingest-router/src/relay_project_config_handler.rs

+    /// Failed keys are added to the pending array for retry.
+    ///
+    /// Global config is selected from the highest priority cell based on cells.cell_list order.
+    async fn collect_and_merge(&self, tasks: Vec<TaskWithKeys>, cells: &Cells) -> MergedResults {


if i'm reading correctly doesn't this mean that we are waiting for each task and applying the modified timeout in order that cells are defined?

so if cell 1 returns quickly, and cell 2 is down -> we correctly apply the shorter timeout to the second request and return to relay faster

if cell 2 returns quickly and cell 1 does not -> we still wait for cell 1 for the 30 seconds (the longer timeout), even though there was a cell 2 response much sooner.

lynnagara · 2025-11-25T21:33:24Z

ingest-router/src/relay_project_config_handler.rs

+                let now = Instant::now();
+                if now >= deadline {
+                    // Deadline already passed, use minimal timeout
+                    Duration::from_millis(1)


is there something special about 1 millisecond here? curious why this isn't Duration::ZERO

lynnagara · 2025-11-25T21:43:49Z

ingest-router/src/relay_project_config_handler.rs

+        })?;
+
+        // Split publicKeys across upstreams
+        let split_requests = self.split_keys_by_upstream(&request_data, cells);


since it's not needed afterward, you could pass the owned request_data into this function and save yourself a clone

lynnagara · 2025-11-25T22:19:07Z

ingest-router/src/relay_project_config_handler.rs

+            for (name, value) in base_request.headers() {
+                req_builder = req_builder.header(name, value);
+            }


were you planning to include any of the header transformations that the proxy does here? e.g. adding the via and stripping hop by hop headers

lynnagara · 2025-11-25T22:58:28Z

ingest-router/src/relay_project_config_handler.rs

+        let mut headers_by_cell: HashMap<String, HeaderMap> = HashMap::new();
+        let mut additional_deadline: Option<Instant> = None;
+
+        for task_with_keys in tasks {


how about using a tokio::JoinSet here

it allows you to easily get the first result with the global timeout regardless of the cell order, e.g.

let first = timeout(global_timeout, task_set.join_next()).await;

then you can loop through and keep calling next with the smaller timeout

let next = timeout(reduced_timeout, task_set.join_next()).await;

This way the tasks return in the order they are completed

feat(ingest): Implement first version of project config handler

caf8840

Implemented the basic version of relay project config handler which splits the project keys across multiple cells. Right now it does a round robin split. Will change to lookup into locator later.

nikhars requested a review from a team as a code owner November 20, 2025 19:52

Fix lint

01b0a64

sentry bot reviewed Nov 20, 2025

View reviewed changes

ingest-router/src/relay_project_config_handler.rs Outdated Show resolved Hide resolved

nikhars added 2 commits November 20, 2025 12:03

Address Seer comments

5055556

Call abort on task failure

3bca55c

markstory reviewed Nov 20, 2025

View reviewed changes

lynnagara reviewed Nov 20, 2025

View reviewed changes

nikhars added 3 commits November 25, 2025 11:24

Address review comments

607a63e

Merge remote-tracking branch 'origin/main' into nikhars/feat/relay-ha…

c5e6dee

…ndler

Make serde_json as workspace depenedency

eb86374

lynnagara reviewed Nov 25, 2025

View reviewed changes

Make project handler submodule

95d4bb7

Uh oh!

feat(ingest): Implement first version of project config handler #72

Are you sure you want to change the base?

feat(ingest): Implement first version of project config handler #72

Uh oh!

Conversation

nikhars commented Nov 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lynnagara Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lynnagara Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lynnagara Nov 25, 2025 •

edited

Loading

lynnagara Nov 25, 2025 •

edited

Loading