Some improvements for Production deployments

Don't log 4xx responses as error. Either log them as info or not at all, they aren't an error from our perspective but only client errors.

Don't expose internal errors to the client (make sure they are properly logged though). They are not helpful for users and might expose internal information. If you want traceability, you could generate a random id and return that to the user instead (and also log it so we can grep the logs for a specific failed request). 

Count failed Clickhouse queries as Prometheus metric, that way we can easily add alerts for database issues (for now this is probably equivalent with all 500 errors, but that might diverge in the future).

Use a Promtheus histogram to bucket query times instead of a counter. That way we can monitor query time percentiles, which is more useful than a global average of query times.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Some improvements for Production deployments #71

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Some improvements for Production deployments #71

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions