Skip to content

Conversation

@badboy
Copy link
Member

@badboy badboy commented Nov 4, 2025

BigQuery's integer type is a signed 64-bit value, while we collect unsigned ones.
Timestamps are in milliseconds, so 9223372036854775807 (i64::MAX) ms is about 4.5 billion years, or roughly 1/15 of the age of earth. I think we should be fine with that for a little while longer.

So any such value should not happen. But it does in 0.002% of pings and thus data ends up in additional_properties instead of where it belongs.
By clamping we might reduce that %age further. An event with that timestamp is most likely still not useful, but maybe the others are?

One more problem is: Should we report this as an issue? If so how? Right now we don't have the APIs to tie that back to a specific event other then the one we own. But then our error types don't express the right things.


this is a fun one, so I'm putting it up for discussion.

@badboy badboy requested a review from chutten November 4, 2025 16:02
@badboy badboy force-pushed the push-pqvtrrwxlszv branch 2 times, most recently from 5a6c0b6 to 6f0ca70 Compare November 4, 2025 16:04
@chutten
Copy link
Contributor

chutten commented Nov 4, 2025

To me enforcing a limit like this is analogous to the 1MB ping limit. It's us acting in concert with the rest of the ecosystem.

For error reporting, I don't like tying this to glean.restarted. I also think that tying it to the event (well, a synthetic copy we build from its record's category and name) could be incorrect because it's not an error we expect folks to be able to handle. I'd put it on some other metric collected into the "events" and "health" pings.

@badboy
Copy link
Member Author

badboy commented Nov 5, 2025

To me enforcing a limit like this is analogous to the 1MB ping limit. It's us acting in concert with the rest of the ecosystem.

For error reporting, I don't like tying this to glean.restarted. I also think that tying it to the event (well, a synthetic copy we build from its record's category and name) could be incorrect because it's not an error we expect folks to be able to handle. I'd put it on some other metric collected into the "events" and "health" pings.

Hm, yeah, I guess I can make up a metric.

@badboy badboy changed the title Clamp event timestamps to i64::MAX Bug 1873482 - Clamp event timestamps to i64::MAX Nov 7, 2025
@badboy badboy force-pushed the push-pqvtrrwxlszv branch from 6f0ca70 to ef9bac4 Compare November 7, 2025 11:18
BigQuery's integer type is a signed 64-bit value, while we collect
unsigned ones.
Timestamps are in milliseconds, so 9223372036854775807 (i64::MAX) ms is
about 4.5 billion years, or roughly 1/15 of the age of earth.
I think we should be fine with that for a little while longer.

So any such value _should_ not happen. But it does in 0.002% of pings
and thus data ends up in `additional_properties` instead of where it
belongs.
By clamping we might reduce that %age further. An event with that
timestamp is most likely still not useful, but maybe the others are?

One more problem is: Should we report this as an issue? If so how? Right
now we don't have the APIs to tie that back to a specific event other
then the one we own. But then our error types don't express the right
things.
@badboy badboy force-pushed the push-pqvtrrwxlszv branch from ef9bac4 to f403ed5 Compare November 7, 2025 11:19
@badboy badboy marked this pull request as ready for review November 7, 2025 11:19
@badboy badboy requested a review from a team as a code owner November 7, 2025 11:19
@badboy badboy added the needs:data-review PR is awaiting a data review label Nov 7, 2025
@badboy
Copy link
Member Author

badboy commented Nov 7, 2025

ERROR: METRIC_ON_EVENTS_LIFETIME: glean.error.event_timestamp_clamped: Non-event metrics sent on the Events ping should not have the ping lifetime.

Oh of course. I put that metric on the events ping. But it will only land on the next events ping after the ping with clamped timestamps, because we already assembled the metrics for that ping. Plus this might happen on events on another ping than the events ping.
So we can either (1) no_lint it, (2) move the error to health+metrics ping, (3) extend the code to report it in the same ping as the event we're doing this for, (3b) ensure that error metric goes onto the same payload

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs:data-review PR is awaiting a data review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants