@@ -234,28 +234,27 @@ Provides metrics related to the backfill process, including On-Demand and Histor
234234| Gauge | ` backfill_status ` | Current status of the backfill process (0=Idle, 1=Running, 2=Error, 3=On-Demand Error, 4=Unknown) |
235235| Gauge | ` backfill_pending_blocks ` | Number of blocks pending to be backfilled |
236236
237-
238237## Alerting Recommendations
238+
239239Alerting rules can be created in Prometheus based on these metrics to notify the operations team of potential issues.
240240Utilizing Low (L), Medium (M) and High (H) severity levels, some recommended alerting rules include:
241241
242-
243242Node Status: High level alerts for overall node health
244243
245- | Severity | Metric | Alert Condition |
244+ | Severity | Metric | Alert Condition |
246245| ----------| --------------------| ---------------------------|
247246| M | ` app_state_status ` | If not equal to ` RUNNING ` |
248247
249248Publisher: Alerts related to publisher connections and performance
250249
251- | Severity | Metric | Alert Condition |
252- | ----------| --------------------------------------------- | --------------------------------------------|
253- | L | ` publisher_open_connections ` | If value exceeds 40 or configure as needed |
254- | M | ` publisher_receive_latency_ns ` | If value exceeds 5 s |
250+ | Severity | Metric | Alert Condition |
251+ | ----------| --------------------------------| --------------------------------------------|
252+ | L | ` publisher_open_connections ` | If value exceeds 40 or configure as needed |
253+ | M | ` publisher_receive_latency_ns ` | If value exceeds 5 s |
255254
256255Failures: Alerts for various failure metrics
257256
258- | Severity | Metric | Alert Condition |
257+ | Severity | Metric | Alert Condition |
259258| ----------| ----------------------------------------| -----------------------------------------------------------|
260259| M | ` verification_blocks_error ` | If errors during verification exceed 3 in last 60 s |
261260| M | ` publisher_block_send_response_failed ` | If value exceeds 3 in the last 60s or configure as needed |
@@ -264,14 +263,14 @@ Failures: Alerts for various failure metrics
264263
265264Messaging: Alerts for messaging service operations regarding block items and block notification
266265
267- | Severity | Metric | Alert Condition |
266+ | Severity | Metric | Alert Condition |
268267| ----------| ---------------------------------------------| --------------------------------------------------|
269268| L | ` messaging_item_queue_percent_used ` | If percentage exceeds 60% or configure as needed |
270269| L | ` messaging_notification_queue_percent_used ` | If percentage exceeds 60% or configure as needed |
271270
272271Latency: Alerts for latency metrics in receiving, hashing, verifying, and persisting blocks
273272
274- | Severity | Metric | Alert Condition |
273+ | Severity | Metric | Alert Condition |
275274| ----------| --------------------------------------------| ---------------------------------------------|
276275| M | ` publisher_receive_latency_ns ` | If value exceeds 20s or configure as needed |
277276| M | ` hashing_block_time ` | If value exceeds 2s or configure as needed |
0 commit comments