-
Notifications
You must be signed in to change notification settings - Fork 42
NETOBSERV-2365: Add the ability to create recording rules instead of alerts for the Network Health feature #2112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
0c4d303 to
b1fae84
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2112 +/- ##
==========================================
- Coverage 73.24% 73.03% -0.21%
==========================================
Files 82 82
Lines 9339 9431 +92
==========================================
+ Hits 6840 6888 +48
- Misses 2075 2115 +40
- Partials 424 428 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
91777b1 to
ee8bbc2
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2112 +/- ##
==========================================
- Coverage 72.48% 72.29% -0.19%
==========================================
Files 88 88
Lines 9677 9747 +70
==========================================
+ Hits 7014 7047 +33
- Misses 2222 2250 +28
- Partials 441 450 +9
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
| Template AlertTemplate `json:"template,omitempty"` | ||
| Template HealthRuleTemplate `json:"template,omitempty"` | ||
|
|
||
| // Mode defines whether this health rule should be generated as an alert or a recording rule. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add more information about what is the difference. Such as: "Recording rules violations are visible in the Network Health dashboard without generating any Prometheus alert."
| // They are expressed as a percentage of errors above which the alert is triggered. They must be parsable as floats. | ||
| // +required | ||
| Thresholds AlertThresholds `json:"thresholds,omitempty"` | ||
| // Required for alert mode, optional for recording mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something I don't get here: how can it be optional for recording mode, if we still want them to appear in the health dashboard, associated with a severity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally thought the recording rule to be just a value, without the notion of a severity, but now that you mention, we can display the value with severities. You're right. Thanks for the feedback.
| // More information on health rules: https://github.com/netobserv/network-observability-operator/blob/main/docs/Alerts.md | ||
| // +optional | ||
| DisableAlerts []AlertTemplate `json:"disableAlerts"` | ||
| DisableHealthRules []HealthRuleTemplate `json:"disableHealthRules"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for disabling, we should keep the existing API and continue to only affect alerts. Disabling is mostly to help users when they have too much alerting noise, and it's not so needed for recording rules
Also because we should not rename this field because it already existed before the TP feature, that would be a breaking change.
| HealthRuleNoFlows HealthRuleTemplate = "NetObservNoFlows" | ||
| HealthRuleLokiError HealthRuleTemplate = "NetObservLokiError" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"NoFlows" and "LokiError" are different from the others, I think we should keep referring to them as alerts only; unlike the others, they are not health items for the dashboard, they're only for alerting.
We should probably add a validation check that they are not used in "recording" mode
|
|
||
| func (rb *ruleBuilder) additionalDescription() string { | ||
| return fmt.Sprintf("You can turn off this alert by adding '%s' to spec.processor.metrics.disableAlerts in FlowCollector, or reconfigure it via spec.processor.metrics.alerts.", rb.template) | ||
| return fmt.Sprintf("You can turn off this health rule by adding '%s' to spec.processor.metrics.disableHealthRules in FlowCollector, or reconfigure it via spec.processor.metrics.healthRules.", rb.template) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(just fyi: we will remove this added description, as it will be moved to the runbooks ; I was about to say something about this message wrt disabled alerts, but we don't care)
|
|
||
| // buildRecordingRuleName builds recording rule name following the convention: | ||
| // netobserv:health:<template>:<groupby>:<side>:rate2m | ||
| func (rb *ruleBuilder) buildRecordingRuleName() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would prefer if the recording rule name was passed to createRule directly from each alert directly (in alerts.go) ; maybe it's less smart but it's more explicit and make it easier if there's a specific name or something that we want to change for any reason
Actually surely this function can be kept but it's the acronyms and toSnakeCase that IMO can be replaced with an explicit string per alert?
| recordName := rb.buildRecordingRuleName() | ||
| return &monitoringv1.Rule{ | ||
| Record: recordName, | ||
| // Note: Recording rules cannot have annotations in Prometheus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really? So we cannot pass any metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only labels: https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/
That would be an option although is not as good as annotations for these kind of information.
e595bab to
d0307be
Compare
|
New images:
They will expire after two weeks. To deploy this build: # Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:5330495 make deploy
# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-5330495Or as a Catalog Source: apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: netobserv-dev
namespace: openshift-marketplace
spec:
sourceType: grpc
image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-5330495
displayName: NetObserv development catalog
publisher: Me
updateStrategy:
registryPoll:
interval: 1m |
775b5b7 to
5954079
Compare
73968fc to
b14a7ac
Compare
|
@leandroberetta: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |

Description
Add the ability to create recording rules instead of alerts for the Network Health feature.
We can test this feature by applying the next command. It will configure all default health rules (ex alerts) in recording mode:
Let's generate DNS errors as an example:
UI:
Dependencies
netobserv/network-observability-console-plugin#1163
Checklist
If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.