NETOBSERV-2365: Add the ability to create recording rules instead of alerts for the Network Health feature #2112

leandroberetta · 2025-10-27T14:50:03Z

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

We can test this feature by applying the next command. It will configure all default health rules (ex alerts) in recording mode:

kubectl patch flowcollector cluster --type=merge -n netobserv -p '
spec:
  agent:
    ebpf:
      privileged: true
      features:
      - PacketDrop
      - DNSTracking
      - FlowRTT
      - NetworkEvents
      - IPSec
  processor:
    metrics:
      healthRules:
      - template: PacketDropsByKernel
        mode: recording
        variants:
        - lowVolumeThreshold: "5"
          groupBy: Namespace
        - groupBy: Node
      - template: PacketDropsByDevice
        mode: recording
        variants:
        - groupBy: Node
      - template: IPsecErrors
        mode: recording
        variants:
        - {}
        - groupBy: Node
      - template: DNSErrors
        mode: recording
        variants:
        - {}
        - groupBy: Namespace
      - template: NetpolDenied
        mode: recording
        variants:
        - groupBy: Namespace
      - template: LatencyHighTrend
        mode: recording
        variants:
        - groupBy: Namespace
          trendOffset: 20m
          trendDuration: 20m
'

Let's generate DNS errors as an example:

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: dns-error-generator
  namespace: default
spec:
  containers:
  - name: dns-errors
    image: busybox:latest
    command:
    - /bin/sh
    - -c
    - |
      while true; do
        nslookup nonexistent-domain-12345.invalid
        nslookup another-fake-domain-67890.invalid
        nslookup error-test-domain.notreal
        nslookup invalid-dns-query.fake
        sleep 2
      done
  restartPolicy: Always
EOF

UI:

Dependencies

netobserv/network-observability-console-plugin#1163

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

openshift-ci-robot · 2025-10-27T14:50:07Z

openshift-ci · 2025-10-27T14:50:07Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

codecov · 2025-11-28T15:35:22Z

Codecov Report

❌ Patch coverage is 58.30116% with 108 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.03%. Comparing base (8b929d7) to head (2090cbb).
⚠️ Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
api/flowcollector/v1beta2/zz_generated.deepcopy.go	6.52%	42 Missing and 1 partial ⚠️
internal/pkg/metrics/alerts/builder.go	65.55%	27 Missing and 4 partials ⚠️
...lector/v1beta2/flowcollector_validation_webhook.go	58.13%	16 Missing and 2 partials ⚠️
internal/pkg/metrics/alerts/alerts.go	76.31%	8 Missing and 1 partial ⚠️
internal/pkg/metrics/alerts/promql.go	61.11%	6 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2112      +/-   ##
==========================================
- Coverage   73.24%   73.03%   -0.21%     
==========================================
  Files          82       82              
  Lines        9339     9431      +92     
==========================================
+ Hits         6840     6888      +48     
- Misses       2075     2115      +40     
- Partials      424      428       +4

Flag	Coverage Δ
unittests	`73.03% <58.30%> (-0.21%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...flowcollector/v1beta2/flowcollector_alert_types.go	`97.46% <100.00%> (ø)`
api/flowcollector/v1beta2/flowcollector_types.go	`100.00% <ø> (ø)`
internal/pkg/metrics/alerts/promql.go	`84.50% <61.11%> (-8.48%)`	⬇️
internal/pkg/metrics/alerts/alerts.go	`90.68% <76.31%> (-3.88%)`	⬇️
...lector/v1beta2/flowcollector_validation_webhook.go	`72.69% <58.13%> (-2.48%)`	⬇️
internal/pkg/metrics/alerts/builder.go	`81.51% <65.55%> (-13.94%)`	⬇️
api/flowcollector/v1beta2/zz_generated.deepcopy.go	`38.09% <6.52%> (ø)`

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

openshift-ci · 2025-12-03T19:07:02Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign oliviercazade for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

DOWNSTREAM_OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

codecov · 2025-12-04T17:10:48Z

Codecov Report

❌ Patch coverage is 62.50000% with 78 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.29%. Comparing base (b3f03d0) to head (775b5b7).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
api/flowcollector/v1beta2/zz_generated.deepcopy.go	2.27%	42 Missing and 1 partial ⚠️
internal/pkg/metrics/alerts/builder.go	78.57%	12 Missing and 3 partials ⚠️
internal/pkg/metrics/alerts/alerts.go	78.04%	8 Missing and 1 partial ⚠️
internal/pkg/metrics/alerts/promql.go	61.11%	6 Missing and 1 partial ⚠️
...lector/v1beta2/flowcollector_validation_webhook.go	69.23%	3 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2112      +/-   ##
==========================================
- Coverage   72.48%   72.29%   -0.19%     
==========================================
  Files          88       88              
  Lines        9677     9747      +70     
==========================================
+ Hits         7014     7047      +33     
- Misses       2222     2250      +28     
- Partials      441      450       +9

Flag	Coverage Δ
unittests	`72.29% <62.50%> (-0.19%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...flowcollector/v1beta2/flowcollector_alert_types.go	`97.46% <100.00%> (ø)`
api/flowcollector/v1beta2/flowcollector_types.go	`100.00% <ø> (ø)`
...lector/v1beta2/flowcollector_validation_webhook.go	`73.42% <69.23%> (-1.75%)`	⬇️
internal/pkg/metrics/alerts/promql.go	`84.50% <61.11%> (-8.48%)`	⬇️
internal/pkg/metrics/alerts/alerts.go	`91.93% <78.04%> (-3.42%)`	⬇️
internal/pkg/metrics/alerts/builder.go	`86.00% <78.57%> (-3.88%)`	⬇️
api/flowcollector/v1beta2/zz_generated.deepcopy.go	`37.41% <2.27%> (ø)`

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

openshift-ci-robot · 2025-12-09T12:04:41Z

openshift-ci-robot · 2025-12-09T12:04:42Z

openshift-ci-robot · 2025-12-09T12:04:53Z

openshift-ci-robot · 2025-12-09T12:05:01Z

openshift-ci-robot · 2025-12-09T12:11:53Z

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

We can test this feature by applying the next command. It will configure all default health rules (ex alerts) in recording mode:
kubectl patch flowcollector cluster --type=merge -p '
spec:
 processor:
   metrics:
     healthRules:
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - lowVolumeThreshold: "5"
         groupBy: Namespace
       - groupBy: Node
     - template: PacketDropsByDevice
       mode: recording
       variants:
       - groupBy: Node
     - template: IPsecErrors
       mode: recording
       variants:
       - {}
       - groupBy: Node
     - template: DNSErrors
       mode: recording
       variants:
       - {}
       - groupBy: Namespace
     - template: NetpolDenied
       mode: recording
       variants:
       - groupBy: Namespace
     - template: LatencyHighTrend
       mode: recording
       variants:
       - groupBy: Namespace
         trendOffset: 20m
         trendDuration: 20m
'
Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).

Does this PR require product documentation?

If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.

Does this PR require a product release notes entry?

If so, fill in "Release Note Text" in the JIRA.

Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.

If so, make sure it is described in the JIRA ticket.

QE requirements (check 1 from the list):

Standard QE validation, with pre-merge tests unless stated otherwise.

Regression tests only (e.g. refactoring with no user-facing change).

No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-12-09T12:31:37Z

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

We can test this feature by applying the next command. It will configure all default health rules (ex alerts) in recording mode:
kubectl patch flowcollector cluster --type=merge -n netobserv -p '
spec:
 processor:
   metrics:
     healthRules:
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - lowVolumeThreshold: "5"
         groupBy: Namespace
       - groupBy: Node
     - template: PacketDropsByDevice
       mode: recording
       variants:
       - groupBy: Node
     - template: IPsecErrors
       mode: recording
       variants:
       - {}
       - groupBy: Node
     - template: DNSErrors
       mode: recording
       variants:
       - {}
       - groupBy: Namespace
     - template: NetpolDenied
       mode: recording
       variants:
       - groupBy: Namespace
     - template: LatencyHighTrend
       mode: recording
       variants:
       - groupBy: Namespace
         trendOffset: 20m
         trendDuration: 20m
'
Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).

Does this PR require product documentation?

If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.

Does this PR require a product release notes entry?

If so, fill in "Release Note Text" in the JIRA.

Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.

If so, make sure it is described in the JIRA ticket.

QE requirements (check 1 from the list):

Standard QE validation, with pre-merge tests unless stated otherwise.

Regression tests only (e.g. refactoring with no user-facing change).

No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-12-09T12:33:49Z

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

We can test this feature by applying the next command. It will configure all default health rules (ex alerts) in recording mode:
kubectl patch flowcollector cluster --type=merge -n netobserv -p '
spec:
 agent:
   ebpf:
     privileged: true
     features:
     - PacketDrop
     - DNSTracking
     - FlowRTT
     - NetworkEvents
     - IPSec
 processor:
   metrics:
     healthRules:
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - lowVolumeThreshold: "5"
         groupBy: Namespace
       - groupBy: Node
     - template: PacketDropsByDevice
       mode: recording
       variants:
       - groupBy: Node
     - template: IPsecErrors
       mode: recording
       variants:
       - {}
       - groupBy: Node
     - template: DNSErrors
       mode: recording
       variants:
       - {}
       - groupBy: Namespace
     - template: NetpolDenied
       mode: recording
       variants:
       - groupBy: Namespace
     - template: LatencyHighTrend
       mode: recording
       variants:
       - groupBy: Namespace
         trendOffset: 20m
         trendDuration: 20m
'
Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).

Does this PR require product documentation?

If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.

Does this PR require a product release notes entry?

If so, fill in "Release Note Text" in the JIRA.

Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.

If so, make sure it is described in the JIRA ticket.

QE requirements (check 1 from the list):

Standard QE validation, with pre-merge tests unless stated otherwise.

Regression tests only (e.g. refactoring with no user-facing change).

No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-12-09T12:37:09Z

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

We can test this feature by applying the next command. It will configure all default health rules (ex alerts) in recording mode:
kubectl patch flowcollector cluster --type=merge -n netobserv -p '
spec:
 agent:
   ebpf:
     privileged: true
     features:
     - PacketDrop
     - DNSTracking
     - FlowRTT
     - NetworkEvents
     - IPSec
 processor:
   metrics:
     healthRules:
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - lowVolumeThreshold: "5"
         groupBy: Namespace
       - groupBy: Node
     - template: PacketDropsByDevice
       mode: recording
       variants:
       - groupBy: Node
     - template: IPsecErrors
       mode: recording
       variants:
       - {}
       - groupBy: Node
     - template: DNSErrors
       mode: recording
       variants:
       - {}
       - groupBy: Namespace
     - template: NetpolDenied
       mode: recording
       variants:
       - groupBy: Namespace
     - template: LatencyHighTrend
       mode: recording
       variants:
       - groupBy: Namespace
         trendOffset: 20m
         trendDuration: 20m
'
Let's generate DNS errors as an example:
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
 name: dns-error-generator
 namespace: default
spec:
 containers:
 - name: dns-errors
   image: busybox:latest
   command:
   - /bin/sh
   - -c
   - |
     while true; do
       nslookup nonexistent-domain-12345.invalid
       nslookup another-fake-domain-67890.invalid
       nslookup error-test-domain.notreal
       nslookup invalid-dns-query.fake
       sleep 2
     done
 restartPolicy: Always
EOF
Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).

Does this PR require product documentation?

If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.

Does this PR require a product release notes entry?

If so, fill in "Release Note Text" in the JIRA.

Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.

If so, make sure it is described in the JIRA ticket.

QE requirements (check 1 from the list):

Standard QE validation, with pre-merge tests unless stated otherwise.

Regression tests only (e.g. refactoring with no user-facing change).

No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-12-09T15:37:52Z

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

We can test this feature by applying the next command. It will configure all default health rules (ex alerts) in recording mode:
kubectl patch flowcollector cluster --type=merge -n netobserv -p '
spec:
 agent:
   ebpf:
     privileged: true
     features:
     - PacketDrop
     - DNSTracking
     - FlowRTT
     - NetworkEvents
     - IPSec
 processor:
   metrics:
     healthRules:
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - lowVolumeThreshold: "5"
         groupBy: Namespace
       - groupBy: Node
     - template: PacketDropsByDevice
       mode: recording
       variants:
       - groupBy: Node
     - template: IPsecErrors
       mode: recording
       variants:
       - {}
       - groupBy: Node
     - template: DNSErrors
       mode: recording
       variants:
       - {}
       - groupBy: Namespace
     - template: NetpolDenied
       mode: recording
       variants:
       - groupBy: Namespace
     - template: LatencyHighTrend
       mode: recording
       variants:
       - groupBy: Namespace
         trendOffset: 20m
         trendDuration: 20m
'
Let's generate DNS errors as an example:
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
 name: dns-error-generator
 namespace: default
spec:
 containers:
 - name: dns-errors
   image: busybox:latest
   command:
   - /bin/sh
   - -c
   - |
     while true; do
       nslookup nonexistent-domain-12345.invalid
       nslookup another-fake-domain-67890.invalid
       nslookup error-test-domain.notreal
       nslookup invalid-dns-query.fake
       sleep 2
     done
 restartPolicy: Always
EOF
UI:

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).

Does this PR require product documentation?

If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.

Does this PR require a product release notes entry?

If so, fill in "Release Note Text" in the JIRA.

Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.

If so, make sure it is described in the JIRA ticket.

QE requirements (check 1 from the list):

Standard QE validation, with pre-merge tests unless stated otherwise.

Regression tests only (e.g. refactoring with no user-facing change).

No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-12-09T15:38:16Z

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

We can test this feature by applying the next command. It will configure all default health rules (ex alerts) in recording mode:
kubectl patch flowcollector cluster --type=merge -n netobserv -p '
spec:
 agent:
   ebpf:
     privileged: true
     features:
     - PacketDrop
     - DNSTracking
     - FlowRTT
     - NetworkEvents
     - IPSec
 processor:
   metrics:
     healthRules:
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - lowVolumeThreshold: "5"
         groupBy: Namespace
       - groupBy: Node
     - template: PacketDropsByDevice
       mode: recording
       variants:
       - groupBy: Node
     - template: IPsecErrors
       mode: recording
       variants:
       - {}
       - groupBy: Node
     - template: DNSErrors
       mode: recording
       variants:
       - {}
       - groupBy: Namespace
     - template: NetpolDenied
       mode: recording
       variants:
       - groupBy: Namespace
     - template: LatencyHighTrend
       mode: recording
       variants:
       - groupBy: Namespace
         trendOffset: 20m
         trendDuration: 20m
'
Let's generate DNS errors as an example:
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
 name: dns-error-generator
 namespace: default
spec:
 containers:
 - name: dns-errors
   image: busybox:latest
   command:
   - /bin/sh
   - -c
   - |
     while true; do
       nslookup nonexistent-domain-12345.invalid
       nslookup another-fake-domain-67890.invalid
       nslookup error-test-domain.notreal
       nslookup invalid-dns-query.fake
       sleep 2
     done
 restartPolicy: Always
EOF
UI:

Dependencies

netobserv/network-observability-console-plugin#1163

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).

Does this PR require product documentation?

If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.

Does this PR require a product release notes entry?

If so, fill in "Release Note Text" in the JIRA.

Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.

If so, make sure it is described in the JIRA ticket.

QE requirements (check 1 from the list):

Standard QE validation, with pre-merge tests unless stated otherwise.

Regression tests only (e.g. refactoring with no user-facing change).

No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

jotak · 2025-12-15T08:48:44Z

api/flowcollector/v1beta2/flowcollector_alert_types.go

-	Template AlertTemplate `json:"template,omitempty"`
+	Template HealthRuleTemplate `json:"template,omitempty"`
+
+	// Mode defines whether this health rule should be generated as an alert or a recording rule.


Perhaps add more information about what is the difference. Such as: "Recording rules violations are visible in the Network Health dashboard without generating any Prometheus alert."

jotak · 2025-12-15T08:51:46Z

api/flowcollector/v1beta2/flowcollector_alert_types.go

 	// They are expressed as a percentage of errors above which the alert is triggered. They must be parsable as floats.
-	// +required
-	Thresholds AlertThresholds `json:"thresholds,omitempty"`
+	// Required for alert mode, optional for recording mode.


Maybe something I don't get here: how can it be optional for recording mode, if we still want them to appear in the health dashboard, associated with a severity?

I originally thought the recording rule to be just a value, without the notion of a severity, but now that you mention, we can display the value with severities. You're right. Thanks for the feedback.

jotak · 2025-12-15T08:59:21Z

api/flowcollector/v1beta2/flowcollector_types.go

+	// More information on health rules: https://github.com/netobserv/network-observability-operator/blob/main/docs/Alerts.md
 	// +optional
-	DisableAlerts []AlertTemplate `json:"disableAlerts"`
+	DisableHealthRules []HealthRuleTemplate `json:"disableHealthRules"`


I think for disabling, we should keep the existing API and continue to only affect alerts. Disabling is mostly to help users when they have too much alerting noise, and it's not so needed for recording rules
Also because we should not rename this field because it already existed before the TP feature, that would be a breaking change.

jotak · 2025-12-15T09:03:38Z

api/flowcollector/v1beta2/flowcollector_alert_types.go

+	HealthRuleNoFlows                  HealthRuleTemplate = "NetObservNoFlows"
+	HealthRuleLokiError                HealthRuleTemplate = "NetObservLokiError"


"NoFlows" and "LokiError" are different from the others, I think we should keep referring to them as alerts only; unlike the others, they are not health items for the dashboard, they're only for alerting.
We should probably add a validation check that they are not used in "recording" mode

jotak · 2025-12-15T09:10:32Z

internal/pkg/metrics/alerts/builder.go


 func (rb *ruleBuilder) additionalDescription() string {
-	return fmt.Sprintf("You can turn off this alert by adding '%s' to spec.processor.metrics.disableAlerts in FlowCollector, or reconfigure it via spec.processor.metrics.alerts.", rb.template)
+	return fmt.Sprintf("You can turn off this health rule by adding '%s' to spec.processor.metrics.disableHealthRules in FlowCollector, or reconfigure it via spec.processor.metrics.healthRules.", rb.template)


(just fyi: we will remove this added description, as it will be moved to the runbooks ; I was about to say something about this message wrt disabled alerts, but we don't care)

jotak · 2025-12-15T09:34:52Z

internal/pkg/metrics/alerts/builder.go

+
+// buildRecordingRuleName builds recording rule name following the convention:
+// netobserv:health:<template>:<groupby>:<side>:rate2m
+func (rb *ruleBuilder) buildRecordingRuleName() string {


I think I would prefer if the recording rule name was passed to createRule directly from each alert directly (in alerts.go) ; maybe it's less smart but it's more explicit and make it easier if there's a specific name or something that we want to change for any reason

Actually surely this function can be kept but it's the acronyms and toSnakeCase that IMO can be replaced with an explicit string per alert?

jotak · 2025-12-15T09:43:42Z

internal/pkg/metrics/alerts/builder.go

+		recordName := rb.buildRecordingRuleName()
+		return &monitoringv1.Rule{
+			Record: recordName,
+			// Note: Recording rules cannot have annotations in Prometheus


really? So we cannot pass any metadata?

Only labels: https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/

That would be an option although is not as good as annotations for these kind of information.

github-actions · 2025-12-17T16:21:38Z

New images:

quay.io/netobserv/network-observability-operator:5330495
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-5330495
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-5330495

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:5330495 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-5330495

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-5330495
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

openshift-ci · 2025-12-22T22:46:42Z

@leandroberetta: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-operator	`dd36b0d`	link	false	`/test e2e-operator`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

leandroberetta self-assigned this Oct 27, 2025

openshift-ci-robot added the jira/valid-reference label Oct 27, 2025

openshift-ci bot added the do-not-merge/work-in-progress label Oct 27, 2025

leandroberetta force-pushed the netobserv-2365 branch from 0c4d303 to b1fae84 Compare November 28, 2025 14:34

leandroberetta force-pushed the netobserv-2365 branch from 91777b1 to ee8bbc2 Compare December 3, 2025 19:06

leandroberetta requested review from jotak and jpinsonneau December 4, 2025 00:47

leandroberetta marked this pull request as ready for review December 4, 2025 16:00

openshift-ci bot removed the do-not-merge/work-in-progress label Dec 4, 2025

jotak reviewed Dec 15, 2025

View reviewed changes

leandroberetta force-pushed the netobserv-2365 branch from e595bab to d0307be Compare December 17, 2025 14:31

leandroberetta added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Dec 17, 2025

leandroberetta force-pushed the netobserv-2365 branch from 775b5b7 to 5954079 Compare December 18, 2025 13:50

github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Dec 18, 2025

leandroberetta added 13 commits December 19, 2025 10:32

refactor

c126806

improvements

188f238

more testing, some fixes

b31a2bc

fix cr

7f9e37f

bundle & tests

d91ccb6

include template name in recording rule as a label

933c5b6

Regenerate files

77d60fd

improvements

1e2f404

update bundle

2ef940e

fix

502d42e

fix test

6cf2f07

fixes

7814245

clean cross az

b14a7ac

leandroberetta force-pushed the netobserv-2365 branch from 73968fc to b14a7ac Compare December 19, 2025 13:55

leandroberetta added 4 commits December 19, 2025 11:26

fixes

ef06b04

bundle

9b0e49f

feedback

b6f0d8a

bundle

dd36b0d

		HealthRuleNoFlows HealthRuleTemplate = "NetObservNoFlows"
		HealthRuleLokiError HealthRuleTemplate = "NetObservLokiError"

NETOBSERV-2365: Add the ability to create recording rules instead of alerts for the Network Health feature #2112

Are you sure you want to change the base?

NETOBSERV-2365: Add the ability to create recording rules instead of alerts for the Network Health feature #2112

Uh oh!

Conversation

leandroberetta commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Dependencies

Checklist

Uh oh!

openshift-ci-robot commented Oct 27, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Dependencies

Checklist

Uh oh!

openshift-ci bot commented Oct 27, 2025

Uh oh!

codecov bot commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

openshift-ci bot commented Dec 3, 2025

Uh oh!

codecov bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

openshift-ci-robot commented Dec 9, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Dependencies

Checklist

Uh oh!

openshift-ci-robot commented Dec 9, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Dependencies

Checklist

Uh oh!

openshift-ci-robot commented Dec 9, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Dependencies

Checklist

Uh oh!

openshift-ci-robot commented Dec 9, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Dependencies

Checklist

Uh oh!

openshift-ci-robot commented Dec 9, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Dependencies

Checklist

Uh oh!

openshift-ci-robot commented Dec 9, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Dependencies

Checklist

Uh oh!

openshift-ci-robot commented Dec 9, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Dependencies

Checklist

Uh oh!

openshift-ci-robot commented Dec 9, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Dependencies

Checklist

Uh oh!

openshift-ci-robot commented Dec 9, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

leandroberetta commented Oct 27, 2025 •

edited

Loading

openshift-ci-robot commented Oct 27, 2025 •

edited by openshift-ci bot

Loading

codecov bot commented Nov 28, 2025 •

edited

Loading

codecov bot commented Dec 4, 2025 •

edited

Loading

openshift-ci-robot commented Dec 9, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Dec 9, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Dec 9, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Dec 9, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Dec 9, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Dec 9, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Dec 9, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Dec 9, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Dec 9, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Dec 9, 2025 •

edited by openshift-ci bot

Loading

jotak Dec 15, 2025 •

edited

Loading