hiero-ledger · Nana-EC · Nov 11, 2025 · Nov 14, 2025 · Nov 17, 2025 · AlfredoG87
@@ -233,3 +233,49 @@ Provides metrics related to the backfill process, including On-Demand and Histor
 | Counter | `backfill_retries`           | Total number of retries attempted during backfill                                                 |
 | Gauge   | `backfill_status`            | Current status of the backfill process (0=Idle, 1=Running, 2=Error, 3=On-Demand Error, 4=Unknown) |
 | Gauge   | `backfill_pending_blocks`    | Number of blocks pending to be backfilled                                                         |
+
+## Alerting Recommendations
+
+Alerting rules can be created based on these metrics to notify the operations team of potential issues.
+Utilizing Low (L), Medium (M) and High (H) severity levels, some recommended alerting rules to consider include:
+
+Note: High level alerts are intentionally left out during the beta 1 phase to reduce noise.
+As the product matures through beta and rc phases, high severity alerts will be added.
+
+**Node Status**: High level alerts for overall node health
+
+| Severity |       Metric       |      Alert Condition      |
+|----------|--------------------|---------------------------|
+| M        | `app_state_status` | If not equal to `RUNNING` |
+
+**Publisher**: Alerts related to publisher connections and performance
+
+| Severity |             Metric             |                   Alert Condition                   |
+|----------|--------------------------------|-----------------------------------------------------|
+| L        | `publisher_open_connections`   | If value exceeds 40, otherwise, configure as needed |
+| M        | `publisher_receive_latency_ns` | If value exceeds 5s                                 |
+
+**Failures**: Alerts for various failure metrics
+
+| Severity |                 Metric                 |                          Alert Condition                           |
+|----------|----------------------------------------|--------------------------------------------------------------------|
+| M        | `verification_blocks_error`            | If errors during verification exceed 3 in last 60s                 |
+| M        | `publisher_block_send_response_failed` | If value exceeds 3 in the last 60s, otherwise, configure as needed |
+| L        | `backfill_fetch_errors`                | If value exceeds 3 in the last 60s, otherwise, configure as needed |
+| M        | `publisher_stream_errors`              | If value exceeds 3 in the last 60s, otherwise, configure as needed |
+
+**Messaging**: Alerts for messaging service operations regarding block items and block notification
+
+| Severity |                   Metric                    |                      Alert Condition                      |
+|----------|---------------------------------------------|-----------------------------------------------------------|
+| L        | `messaging_item_queue_percent_used`         | If percentage exceeds 60%, otherwise, configure as needed |
+| L        | `messaging_notification_queue_percent_used` | If percentage exceeds 60%, otherwise, configure as needed |
+
+**Latency**: Alerts for latency metrics in receiving, hashing, verifying, and persisting blocks
+
+| Severity |                   Metric                   |                   Alert Condition                    |
+|----------|--------------------------------------------|------------------------------------------------------|
+| M        | `publisher_receive_latency_ns`             | If value exceeds 20s, otherwise, configure as needed |
+| M        | `hashing_block_time`                       | If value exceeds 2s, otherwise, configure as needed  |
+| M        | `verification_block_time`                  | If value exceeds 20s, otherwise, configure as needed |
+| M        | `files_recent_persistence_time_latency_ns` | If value exceeds 20s, otherwise, configure as needed |