-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
What would you like to be added:
Allow stateset
metric-sets for CR metrics to optionally omit 0-valued series for inactive states, or a new custom metric type that maps a payload field to a label and emits a constant value.
(ab)using info
typed metrics for this purpose is inappropriate and misleading; if the metric has the _info
suffix it misleads the reader into thinking it should be constant for the lifetime of the resource when it won't; if the suffix is omitted, it produces unwanted warnings about the metric name lacking the info
suffix and could produce misleading type metadata.
The other existing alternative is to (ab)use a gauge
metric, by pointing its gauge.path
at an always-present field, like [metadata, creationTimestamp]
, and add labelsFromPath
to expose the desired field as a label. This works, but it produces a value other than 1
for the series value, which may be confusing, and prevents inter-operation of such metrics with queries that consume normal state-sets where they filter for == 1
to find the active state.
A possible future alternative might be to use Prometheus's upcoming scrape-rules (if the feature is merged) from prometheus/prometheus#10529 to drop all series with value 0 before TSDB ingestion. But it's unclear if/when this feature may become available.
Why is this needed:
Greatly reduce series cardinality where the set of possible states is wide, by only emitting the 1-valued active state when exposing stateful sets.
If I have 1000 resources with 20 possible states each, kube-state-metrics will emit 20,000 series for them. Of these, 19,000 are useless, and serve only to work around the difficulty of reliably querying for the most recent state in a related set of series with PromQL.
In prometheus/prometheus#17129 I've been working on adding a reliable means of querying a state-set that will work whether or not it has 0-valued placeholders for inactive states. One that will work correctly even across restarts of kube-state-metrics too. If it proceeds, this will eliminate the need for such padding.
Even if this improvement doesn't land up in Prometheus proper, it is possible to query the most recent state - albeit with some chance of occasional duplicate states returned during state transitions - using last_over_time(series{}[$short_interval])
where $short_interval
is generally between 1x and 2x the scrape interval of the target. This is a bit cumbersome because the querying application must embed its knowledge of the scrape interval, and it doesn't work as well for e.g. OTLP pushed series, but it's good-enough for most cases most of the time. Especially where the alternative is emitting tends or hundreds of thousands of unwanted series.
Describe the solution you'd like:
Only emit the 1-valued active state for CR metrics where the stateset
has the new option emitActiveOnly: true
.
or
Add support for a new custom series type that carries its value in a label and has a constant value, but unlike info
does not enforce any naming requirements, and unlike gauge
does not require the presence of a numeric value to expose. It should probably also declare its data type as UNTYPED
in scrapes. This could be used to expose a .state
or .phase
that may not even have a known set of possible states at all.
Say it gets called type: Constant
, it could be configured something like this:
# ...
metrics:
- name: "status_phase"
help: "Foo status_phase"
each:
type: Constant
constant:
value: 1
labelsFromPath:
# warning: it's important for .status.phase to not constantly change to new distinct values
# and to have a reasonably short value length, or excessive memory and storage will be
# consumed by monitoring systems like Prometheus.
phase: [status, phase]
This would then emit series like
# TYPE kube_customresource_status_phase UNTYPED
kube_customresource_status_phase{customresource_group="myteam.io", customresource_kind="Foo", customresource_version="v1", customresource_name="res", phase="Pending"} 1
(without the 0-value series seen in the stateset example)
then in a subsequent scrape after a state change it might emit
# TYPE kube_customresource_status_phase UNTYPED
kube_customresource_status_phase{customresource_group="myteam.io", customresource_kind="Foo", customresource_version="v1", customresource_name="res", phase="Bar"} 1
again without 0-valued states.
Assuming a 15s scrape interval this would be queried on current Prometheus with:
group by (customresource_name, phase) (
last_over_time(kube_customresource_status_phase{customresource_group="myteam.io", customresource_kind="Foo", customresource_version="v1"}[15s])
)
or if the scrape interval is not known to the querying entity, but the query author can be certain there are no 0-valued placeholders emitted for this series:
group by (customresource_name, phase) (
topk(1, timestamp(kube_customresource_status_phase{customresource_group="myteam.io", customresource_kind="Foo", customresource_version="v1"})) without (phase)
)
either of which will, after the 2nd sample above has been ingested, yield only:
kube_customresource_status_phase{ customresource_name="res", phase="Bar"} 1
and omit phase="Pending"
.
Additional context:
- Add a latest(...) aggregation function to keep the matching sample with the most recent timestamp [abandoned, with test case and ugly workaround] prometheus/prometheus#17129
- https://cloud-native.slack.com/archives/C01AUBA4PFE/p1757023802535249
- Possible alternative: use Prometheus's upcoming scrape-rules (if the feature is merged) from Implement scrape-time rule evaluation prometheus/prometheus#10529 to drop all series with value 0 before TSDB ingestion.
I'm potentially interested in cooking up a patch for this. I lean toward the type: Constant
approach at the moment.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status