Skip to content

Conversation

@RobertJoonas
Copy link
Contributor

Changes

  • Use Stats.query instead of the legacy Stats.breakdown for fetching pages
  • Move metric labelling logic into the query API itself to keep the LiveView code dumb about that. This works with a query.include.dashboard_metric_labels flag that optionally writes a meta.metric_labels map into the response.
  • Consider user preferences in DashboardQueryParser

About user preferences

My initial idea was that an empty URL getting parsed into ParsedQueryParams, and then serialized back into a query string shouldn’t add stuff into the URL due to defaults or localStorage preferences. E.g.:

  1. user loads dashboard with empty query string
  2. DashboardQueryParser defaults input_date_range to {:last_n_days, 28} (or another valid period from localStorage)
  3. In the Top Pages report, every page entry will be a filterLink serializing ParsedQueryParams into a new, updated query string with the filter added. Clicking it will take the user to ?period=28d&f=… (period=28d param appears in the URL out of the blue - shouldn't happen).

In addition to the period param, there are two more query params that would get parsed into the "global" dashboard query state: comparison and match_day_of_week. So basically, if the user has the following preferences saved in localStorage:

  • comparison=previous_period
  • match_day_of_week=false
  • period=year

Then DashboardQueryParser.parse("") |> DashboardQuerySerializer.serialize() might even result in ?comparison=previous_period&match_day_of_week=false&period=year.

I wanted to avoid this, but I'm now leaning more towards accepting it because:

  1. (most importantly) although stuff will appear in the URL, it doesn't disrupt anyone's dashboard experience as long as the URL params are handled equivalently between React and LiveView.
  2. Parsing the global query and applying defaults happens in a single place. Otherwise, e.g. date-picker and pages would both have to apply user preferences on their own in order to know what's the real dashboard period.
  3. We would need to allow compare_match_day_of_week to be nil (instead of enforcing boolean) in query.include

Tests

  • Automated tests have been added

Changelog

  • This PR does not make a user-facing change

Documentation

  • This change does not need a documentation update

Dark mode

  • This PR does not change the UI

@RobertJoonas RobertJoonas requested a review from apata December 29, 2025 17:17
compare_match_day_of_week: false,
legacy_time_on_page_cutoff: nil
legacy_time_on_page_cutoff: nil,
dashboard_metric_labels: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion, non-blocking: Since metrics are ordered in the response and the dashboard knows what it requested (choose_metrics), I think we should provide / map the labels there. Then this struct will only contain fields that influence the actual database query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything wrong with keeping logic unrelated to the ClickHouse SQL query in Stats.query. There's also include.time_labels which has nothing to do with the DB query either.

However, I agree that the mapping part is unnecessary. I think you reviewed before I pushed this commit: 9f7695b. I've changed meta.metric_labels to be a returned as a list instead of map. That way the metric labels can just be used directly, like metric keys.

There's another edge case that will be a bit easier to handle this way. Revenue metrics are sometimes dropped silently (when mixing currencies in goal filters). So even though a LiveView report always asks for them, they might not be returned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it's actually weird that time labels don't go through the database. It's very database adjacent though.

I see your point about it making an edge case easier to handle, I disagree that it's worth the tradeoff (of making Query and its result more complex).

I suspect we'll be tempted to put a lot of configuration in QueryInclude. I think we should do so only for things that can't be easily handled another way. Metric labels can be. Still, it's not a blocker for merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say this adds much complexity into the query function. A bit more code and a new input/response field, sure, but it doesn't make it more difficult to reason about.

But I see your concern because the %Query{} struct is quite messy and it's quite terrifying to look at it. It needs some cleaning up for sure, and it will happen in this project. There are some legacy fields that are not used anymore (interval, sample_threshold, include_imported, legacy_breakdown, maybe more...). And we also have a bunch of stuff there that gets slapped onto the query object somewhere along the pipeline for internal use only (site_id, site_native_stats_start_at, consolidated_site_ids, sql_join_type, etc...). All of those fields could be nested under a query.assigns field.

Copy link
Contributor

@apata apata Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit more code and a new input/response field, sure, but it doesn't make it more difficult to reason about.

Whether something is easy or hard to reason about is quite hard to discuss.
I'd prefer we be specific.
Complexity is a bit better, though we can be even more specific (more/less code, more/less code paths, more/less fields).
If we add a feature/refactor, I suspect the question is not whether we want more complexity, it's where we want it.

My argument is that we don't want it in QueryInclude, here's why:

What's the responsibility of the Query module? As I see it, it's to represent a request for data, and must contain exactly what is needed to get that data from the database. QueryInclude is a set of options regarding that request for data. Much like filters, dimensions, metrics.

If we keep to this responsibility, we can say that as long as one doesn't touch Query and QueryInclude, there's no risk of regression regarding requests for data. This is the benefit that I'm trying to sell.

QueryResult serializes the data. Let's say its responsibility is also sometimes to get human readable names for the metrics queried (which IMO is debatable, maybe it should be a separate step before QueryResult). We don't have to pass this option through Query.include. It can be an argument of the function QueryResult.from.

Regarding cleanup, I'm not sure that assigns is the right way to go: that's a bit of an opaque misc drawer. The struct fields are at least explicit. I look at them this way: "these are the fields necessary to get data, they get populated somewhere, maybe, and when they do, they influence the result".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your point about separating responsibilities but I'm not seeing how this new include field could cause us any pain in the future. The benefit of not having to worry about metric mapping in every LiveView stats report module sounds far more real. That said though, the query refactoring work is still ahead and we can consider it further down the line.

Regarding cleanup, I'm not sure that assigns is the right way to go: that's a bit of an opaque misc drawer. The struct fields are at least explicit. I look at them this way: "these are the fields necessary to get data, they get populated somewhere, maybe, and when they do, they influence the result".

It could also be a nested struct with the fields explicitly defined with default values. The point would be to separate the input fields from those that are attached to the query automatically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could also be a nested struct with the fields explicitly defined with default values. The point would be to separate the input fields from those that are attached to the query automatically.

@RobertJoonas, that overall point is very good!

I understand your point about separating responsibilities but I'm not seeing how this new include field could cause us any pain in the future.

Thanks for considering it! This one field is definitely innocent and good code. It just got me thinking on why we need to do it this way. I believe by doing it a little bit differently, we can have the benefits that both of us brought up (shared code between LiveView stats reports, more singular responsibility for QueryInclude). To illustrate what I had in mind, I put together this #5983.

@RobertJoonas RobertJoonas added this pull request to the merge queue Jan 5, 2026
Merged via the queue into master with commit bf1971e Jan 5, 2026
16 checks passed
@RobertJoonas RobertJoonas deleted the pages-live-cont branch January 5, 2026 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants