Skip to content

Commit cc1b470

Browse files
committed
feat: prometheus redis detectors
1 parent 4d5fed6 commit cc1b470

21 files changed

+994
-1
lines changed

docs/severity.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@
8787
- [prometheus-exporter_kong](#prometheus-exporter_kong)
8888
- [prometheus-exporter_oracledb](#prometheus-exporter_oracledb)
8989
- [prometheus-exporter_postfix](#prometheus-exporter_postfix)
90+
- [prometheus-exporter_redis](#prometheus-exporter_redis)
9091
- [prometheus-exporter_squid](#prometheus-exporter_squid)
9192
- [prometheus-exporter_varnish](#prometheus-exporter_varnish)
9293
- [prometheus-exporter_wallix-bastion](#prometheus-exporter_wallix-bastion)
@@ -139,7 +140,6 @@
139140
|AWS CWAgent heartbeat|X|-|-|-|-|
140141
|AWS CWAgent memory used|X|X|-|-|-|
141142
|AWS CWAgent disk used|X|X|-|-|-|
142-
|AWS CWAgent cpu usage active|X|X|-|-|-|
143143

144144

145145
## fame_azure-automation-updates
@@ -951,6 +951,17 @@
951951
|Postfix size postfix delivery delay|X|X|-|-|-|
952952

953953

954+
## prometheus-exporter_redis
955+
956+
|Detector|Critical|Major|Minor|Warning|Info|
957+
|---|---|---|---|---|---|
958+
|Redis heartbeat|X|-|-|-|-|
959+
|Redis blocked over connected clients ratio|X|X|-|-|-|
960+
|Redis evicted keys change rate|X|X|-|-|-|
961+
|Redis expired keys change rate|X|X|-|-|-|
962+
|Redis rejected connections|X|X|-|-|-|
963+
964+
954965
## prometheus-exporter_squid
955966

956967
|Detector|Critical|Major|Minor|Warning|Info|
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# REDIS SignalFx detectors
2+
3+
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
4+
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
5+
:link: **Contents**
6+
7+
- [How to use this module?](#how-to-use-this-module)
8+
- [What are the available detectors in this module?](#what-are-the-available-detectors-in-this-module)
9+
- [How to collect required metrics?](#how-to-collect-required-metrics)
10+
- [Metrics](#metrics)
11+
- [Related documentation](#related-documentation)
12+
13+
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
14+
15+
## How to use this module?
16+
17+
This directory defines a [Terraform](https://www.terraform.io/)
18+
[module](https://www.terraform.io/language/modules/syntax) you can use in your
19+
existing [stack](https://github.com/claranet/terraform-signalfx-detectors/wiki/Getting-started#stack) by adding a
20+
`module` configuration and setting its `source` parameter to URL of this folder:
21+
22+
```hcl
23+
module "signalfx-detectors-prometheus-exporter-redis" {
24+
source = "github.com/claranet/terraform-signalfx-detectors.git//modules/prometheus-exporter_redis?ref={revision}"
25+
26+
environment = var.environment
27+
notifications = local.notifications
28+
}
29+
```
30+
31+
Note the following parameters:
32+
33+
* `source`: Use this parameter to specify the URL of the module. The double slash (`//`) is intentional and required.
34+
Terraform uses it to specify subfolders within a Git repo (see [module
35+
sources](https://www.terraform.io/language/modules/sources)). The `ref` parameter specifies a specific Git tag in
36+
this repository. It is recommended to use the latest "pinned" version in place of `{revision}`. Avoid using a branch
37+
like `master` except for testing purpose. Note that every modules in this repository are available on the Terraform
38+
[registry](https://registry.terraform.io/modules/claranet/detectors/signalfx) and we recommend using it as source
39+
instead of `git` which is more flexible but less future-proof.
40+
41+
* `environment`: Use this parameter to specify the
42+
[environment](https://github.com/claranet/terraform-signalfx-detectors/wiki/Getting-started#environment) used by this
43+
instance of the module.
44+
Its value will be added to the `prefixes` list at the start of the [detector
45+
name](https://github.com/claranet/terraform-signalfx-detectors/wiki/Templating#example).
46+
In general, it will also be used in the `filtering` internal sub-module to [apply
47+
filters](https://github.com/claranet/terraform-signalfx-detectors/wiki/Guidance#filtering) based on our default
48+
[tagging convention](https://github.com/claranet/terraform-signalfx-detectors/wiki/Tagging-convention) by default.
49+
50+
* `notifications`: Use this parameter to define where alerts should be sent depending on their severity. It consists
51+
of a Terraform [object](https://www.terraform.io/language/expressions/type-constraints#object) where each key represents an available
52+
[detector rule severity](https://docs.splunk.com/observability/alerts-detectors-notifications/create-detectors-for-alerts.html#severity)
53+
and its value is a list of recipients. Every recipients must respect the [detector notification
54+
format](https://registry.terraform.io/providers/splunk-terraform/signalfx/latest/docs/resources/detector#notification-format).
55+
Check the [notification binding](https://github.com/claranet/terraform-signalfx-detectors/wiki/Notifications-binding)
56+
documentation to understand the recommended role of each severity.
57+
58+
These 3 parameters along with all variables defined in [common-variables.tf](common-variables.tf) are common to all
59+
[modules](../) in this repository. Other variables, specific to this module, are available in
60+
[variables-gen.tf](variables-gen.tf).
61+
In general, the default configuration "works" but all of these Terraform
62+
[variables](https://www.terraform.io/language/values/variables) make it possible to
63+
customize the detectors behavior to better fit your needs.
64+
65+
Most of them represent usual tips and rules detailed in the
66+
[guidance](https://github.com/claranet/terraform-signalfx-detectors/wiki/Guidance) documentation and listed in the
67+
common [variables](https://github.com/claranet/terraform-signalfx-detectors/wiki/Variables) dedicated documentation.
68+
69+
Feel free to explore the [wiki](https://github.com/claranet/terraform-signalfx-detectors/wiki) for more information about
70+
general usage of this repository.
71+
72+
## What are the available detectors in this module?
73+
74+
This module creates the following SignalFx detectors which could contain one or multiple alerting rules:
75+
76+
|Detector|Critical|Major|Minor|Warning|Info|
77+
|---|---|---|---|---|---|
78+
|Redis heartbeat|X|-|-|-|-|
79+
|Redis blocked over connected clients ratio|X|X|-|-|-|
80+
|Redis evicted keys change rate|X|X|-|-|-|
81+
|Redis expired keys change rate|X|X|-|-|-|
82+
|Redis rejected connections|X|X|-|-|-|
83+
84+
## How to collect required metrics?
85+
86+
This module deploys detectors using metrics reported by the
87+
scraping of a server following the [OpenMetrics convention](https://openmetrics.io/) based on and compatible with [the Prometheus
88+
exposition format](https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md#openmetrics-text-format).
89+
90+
They are generally called `Prometheus Exporters` which can be fetched by both the [SignalFx Smart Agent](https://github.com/signalfx/signalfx-agent)
91+
thanks to its [prometheus exporter monitor](https://github.com/signalfx/signalfx-agent/blob/main/docs/monitors/prometheus-exporter.md) and the
92+
[OpenTelemetry Collector](https://github.com/signalfx/splunk-otel-collector) using its [prometheus
93+
receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/prometheusreceiver) or its derivatives.
94+
95+
These exporters could be embedded directly in the tool you want to monitor (e.g. nginx ingress) or must be installed next to it as
96+
a separate program configured to connect, create metrics and expose them as server.
97+
98+
99+
Check the [Related documentation](#related-documentation) section for more detailed and specific information about this module dependencies.
100+
101+
The detectors of this module uses metrics from the [prometheus redis exporter](https://github.com/oliver006/redis_exporter) plugin for Prometheus.
102+
103+
104+
### Metrics
105+
106+
107+
Here is the list of required metrics for detectors in this module.
108+
109+
* `redis_blocked_clients`
110+
* `redis_connected_clients`
111+
* `redis_evicted_keys_total`
112+
* `redis_expired_keys_total`
113+
* `redis_memory_used_bytes`
114+
* `redis_rejected_connections_total`
115+
116+
117+
118+
119+
## Related documentation
120+
121+
* [Terraform SignalFx provider](https://registry.terraform.io/providers/splunk-terraform/signalfx/latest/docs)
122+
* [Terraform SignalFx detector](https://registry.terraform.io/providers/splunk-terraform/signalfx/latest/docs/resources/detector)
123+
* [Splunk Observability integrations](https://docs.splunk.com/Observability/gdi/get-data-in/integrations.html)
124+
* [Prometheus Exporter for Redis](https://github.com/oliver006/redis_exporter)
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../../common/module/filters-otel-collector.tf
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../../common/module/locals.tf
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../../common/module/modules.tf
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../../common/module/variables.tf
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../../common/module/versions.tf
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
module: redis
2+
name: heartbeat
3+
aggregation: ".sum(by=['k8s.workload.name', 'k8s.namespace.name', 'k8s.cluster.name'], allow_missing=True)"
4+
5+
transformation: false
6+
exclude_not_running_vm: true
7+
8+
signals:
9+
signal:
10+
metric: redis_memory_used_bytes
11+
rules:
12+
critical:
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
module: redis
2+
name: blocked over connected clients ratio
3+
aggregation: ".sum(by=['k8s.workload.name', 'k8s.namespace.name', 'k8s.cluster.name'], allow_missing=True)"
4+
5+
6+
value_unit: "%"
7+
8+
signals:
9+
A:
10+
metric: redis_blocked_clients
11+
B:
12+
metric: redis_connected_clients
13+
signal:
14+
formula: (A/B).scale(100)
15+
16+
rules:
17+
critical:
18+
threshold: 5
19+
comparator: '>'
20+
lasting_duration: 1h
21+
lasting_at_least: 0.5
22+
major:
23+
threshold: 0
24+
comparator: '>'
25+
lasting_duration: 1h
26+
lasting_at_least: 0.5
27+
dependency: critical
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
module: redis
2+
name: evicted keys change rate
3+
aggregation: ".sum(by=['k8s.workload.name', 'k8s.namespace.name', 'k8s.cluster.name'], allow_missing=True)"
4+
5+
signals:
6+
A:
7+
metric: redis_evicted_keys_total
8+
rollup: delta
9+
signal:
10+
formula: A.rateofchange()
11+
12+
rules:
13+
critical:
14+
threshold: 50
15+
comparator: '>'
16+
lasting_duration: 15m
17+
lasting_at_least: 0.5
18+
major:
19+
threshold: 25
20+
comparator: '>'
21+
lasting_duration: 15m
22+
lasting_at_least: 0.5
23+
dependency: critical

0 commit comments

Comments
 (0)