Skip to content

Conversation

@leandroberetta
Copy link
Contributor

@leandroberetta leandroberetta commented Oct 27, 2025

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

We can test this feature by applying the next command. It will configure all default health rules (ex alerts) in recording mode:

kubectl patch flowcollector cluster --type=merge -n netobserv -p '
spec:
  agent:
    ebpf:
      privileged: true
      features:
      - PacketDrop
      - DNSTracking
      - FlowRTT
      - NetworkEvents
      - IPSec
  processor:
    metrics:
      healthRules:
      - template: PacketDropsByKernel
        mode: recording
        variants:
        - lowVolumeThreshold: "5"
          groupBy: Namespace
        - groupBy: Node
      - template: PacketDropsByDevice
        mode: recording
        variants:
        - groupBy: Node
      - template: IPsecErrors
        mode: recording
        variants:
        - {}
        - groupBy: Node
      - template: DNSErrors
        mode: recording
        variants:
        - {}
        - groupBy: Namespace
      - template: NetpolDenied
        mode: recording
        variants:
        - groupBy: Namespace
      - template: LatencyHighTrend
        mode: recording
        variants:
        - groupBy: Namespace
          trendOffset: 20m
          trendDuration: 20m
'

Let's generate DNS errors as an example:

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: dns-error-generator
  namespace: default
spec:
  containers:
  - name: dns-errors
    image: busybox:latest
    command:
    - /bin/sh
    - -c
    - |
      while true; do
        nslookup nonexistent-domain-12345.invalid
        nslookup another-fake-domain-67890.invalid
        nslookup error-test-domain.notreal
        nslookup invalid-dns-query.fake
        sleep 2
      done
  restartPolicy: Always
EOF

UI:

image

Dependencies

netobserv/network-observability-console-plugin#1163

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
    • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
    • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
    • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Oct 27, 2025

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link

openshift-ci bot commented Oct 27, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@codecov
Copy link

codecov bot commented Nov 28, 2025

Codecov Report

❌ Patch coverage is 58.30116% with 108 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.03%. Comparing base (8b929d7) to head (2090cbb).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
api/flowcollector/v1beta2/zz_generated.deepcopy.go 6.52% 42 Missing and 1 partial ⚠️
internal/pkg/metrics/alerts/builder.go 65.55% 27 Missing and 4 partials ⚠️
...lector/v1beta2/flowcollector_validation_webhook.go 58.13% 16 Missing and 2 partials ⚠️
internal/pkg/metrics/alerts/alerts.go 76.31% 8 Missing and 1 partial ⚠️
internal/pkg/metrics/alerts/promql.go 61.11% 6 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2112      +/-   ##
==========================================
- Coverage   73.24%   73.03%   -0.21%     
==========================================
  Files          82       82              
  Lines        9339     9431      +92     
==========================================
+ Hits         6840     6888      +48     
- Misses       2075     2115      +40     
- Partials      424      428       +4     
Flag Coverage Δ
unittests 73.03% <58.30%> (-0.21%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...flowcollector/v1beta2/flowcollector_alert_types.go 97.46% <100.00%> (ø)
api/flowcollector/v1beta2/flowcollector_types.go 100.00% <ø> (ø)
internal/pkg/metrics/alerts/promql.go 84.50% <61.11%> (-8.48%) ⬇️
internal/pkg/metrics/alerts/alerts.go 90.68% <76.31%> (-3.88%) ⬇️
...lector/v1beta2/flowcollector_validation_webhook.go 72.69% <58.13%> (-2.48%) ⬇️
internal/pkg/metrics/alerts/builder.go 81.51% <65.55%> (-13.94%) ⬇️
api/flowcollector/v1beta2/zz_generated.deepcopy.go 38.09% <6.52%> (ø)

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@openshift-ci
Copy link

openshift-ci bot commented Dec 3, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign oliviercazade for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@leandroberetta leandroberetta marked this pull request as ready for review December 4, 2025 16:00
@codecov
Copy link

codecov bot commented Dec 4, 2025

Codecov Report

❌ Patch coverage is 62.50000% with 78 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.29%. Comparing base (b3f03d0) to head (775b5b7).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
api/flowcollector/v1beta2/zz_generated.deepcopy.go 2.27% 42 Missing and 1 partial ⚠️
internal/pkg/metrics/alerts/builder.go 78.57% 12 Missing and 3 partials ⚠️
internal/pkg/metrics/alerts/alerts.go 78.04% 8 Missing and 1 partial ⚠️
internal/pkg/metrics/alerts/promql.go 61.11% 6 Missing and 1 partial ⚠️
...lector/v1beta2/flowcollector_validation_webhook.go 69.23% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2112      +/-   ##
==========================================
- Coverage   72.48%   72.29%   -0.19%     
==========================================
  Files          88       88              
  Lines        9677     9747      +70     
==========================================
+ Hits         7014     7047      +33     
- Misses       2222     2250      +28     
- Partials      441      450       +9     
Flag Coverage Δ
unittests 72.29% <62.50%> (-0.19%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...flowcollector/v1beta2/flowcollector_alert_types.go 97.46% <100.00%> (ø)
api/flowcollector/v1beta2/flowcollector_types.go 100.00% <ø> (ø)
...lector/v1beta2/flowcollector_validation_webhook.go 73.42% <69.23%> (-1.75%) ⬇️
internal/pkg/metrics/alerts/promql.go 84.50% <61.11%> (-8.48%) ⬇️
internal/pkg/metrics/alerts/alerts.go 91.93% <78.04%> (-3.42%) ⬇️
internal/pkg/metrics/alerts/builder.go 86.00% <78.57%> (-3.88%) ⬇️
api/flowcollector/v1beta2/zz_generated.deepcopy.go 37.41% <2.27%> (ø)

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Dec 9, 2025

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Dec 9, 2025

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Dec 9, 2025

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Dec 9, 2025

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Dec 9, 2025

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

We can test this feature by applying the next command. It will configure all default health rules (ex alerts) in recording mode:

kubectl patch flowcollector cluster --type=merge -p '
spec:
 processor:
   metrics:
     healthRules:
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - lowVolumeThreshold: "5"
         groupBy: Namespace
       - groupBy: Node
     - template: PacketDropsByDevice
       mode: recording
       variants:
       - groupBy: Node
     - template: IPsecErrors
       mode: recording
       variants:
       - {}
       - groupBy: Node
     - template: DNSErrors
       mode: recording
       variants:
       - {}
       - groupBy: Namespace
     - template: NetpolDenied
       mode: recording
       variants:
       - groupBy: Namespace
     - template: LatencyHighTrend
       mode: recording
       variants:
       - groupBy: Namespace
         trendOffset: 20m
         trendDuration: 20m
'

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Dec 9, 2025

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

We can test this feature by applying the next command. It will configure all default health rules (ex alerts) in recording mode:

kubectl patch flowcollector cluster --type=merge -n netobserv -p '
spec:
 processor:
   metrics:
     healthRules:
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - lowVolumeThreshold: "5"
         groupBy: Namespace
       - groupBy: Node
     - template: PacketDropsByDevice
       mode: recording
       variants:
       - groupBy: Node
     - template: IPsecErrors
       mode: recording
       variants:
       - {}
       - groupBy: Node
     - template: DNSErrors
       mode: recording
       variants:
       - {}
       - groupBy: Namespace
     - template: NetpolDenied
       mode: recording
       variants:
       - groupBy: Namespace
     - template: LatencyHighTrend
       mode: recording
       variants:
       - groupBy: Namespace
         trendOffset: 20m
         trendDuration: 20m
'

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Dec 9, 2025

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

We can test this feature by applying the next command. It will configure all default health rules (ex alerts) in recording mode:

kubectl patch flowcollector cluster --type=merge -n netobserv -p '
spec:
 agent:
   ebpf:
     privileged: true
     features:
     - PacketDrop
     - DNSTracking
     - FlowRTT
     - NetworkEvents
     - IPSec
 processor:
   metrics:
     healthRules:
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - lowVolumeThreshold: "5"
         groupBy: Namespace
       - groupBy: Node
     - template: PacketDropsByDevice
       mode: recording
       variants:
       - groupBy: Node
     - template: IPsecErrors
       mode: recording
       variants:
       - {}
       - groupBy: Node
     - template: DNSErrors
       mode: recording
       variants:
       - {}
       - groupBy: Namespace
     - template: NetpolDenied
       mode: recording
       variants:
       - groupBy: Namespace
     - template: LatencyHighTrend
       mode: recording
       variants:
       - groupBy: Namespace
         trendOffset: 20m
         trendDuration: 20m
'

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Dec 9, 2025

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

We can test this feature by applying the next command. It will configure all default health rules (ex alerts) in recording mode:

kubectl patch flowcollector cluster --type=merge -n netobserv -p '
spec:
 agent:
   ebpf:
     privileged: true
     features:
     - PacketDrop
     - DNSTracking
     - FlowRTT
     - NetworkEvents
     - IPSec
 processor:
   metrics:
     healthRules:
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - lowVolumeThreshold: "5"
         groupBy: Namespace
       - groupBy: Node
     - template: PacketDropsByDevice
       mode: recording
       variants:
       - groupBy: Node
     - template: IPsecErrors
       mode: recording
       variants:
       - {}
       - groupBy: Node
     - template: DNSErrors
       mode: recording
       variants:
       - {}
       - groupBy: Namespace
     - template: NetpolDenied
       mode: recording
       variants:
       - groupBy: Namespace
     - template: LatencyHighTrend
       mode: recording
       variants:
       - groupBy: Namespace
         trendOffset: 20m
         trendDuration: 20m
'

Let's generate DNS errors as an example:

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
 name: dns-error-generator
 namespace: default
spec:
 containers:
 - name: dns-errors
   image: busybox:latest
   command:
   - /bin/sh
   - -c
   - |
     while true; do
       nslookup nonexistent-domain-12345.invalid
       nslookup another-fake-domain-67890.invalid
       nslookup error-test-domain.notreal
       nslookup invalid-dns-query.fake
       sleep 2
     done
 restartPolicy: Always
EOF

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Dec 9, 2025

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

We can test this feature by applying the next command. It will configure all default health rules (ex alerts) in recording mode:

kubectl patch flowcollector cluster --type=merge -n netobserv -p '
spec:
 agent:
   ebpf:
     privileged: true
     features:
     - PacketDrop
     - DNSTracking
     - FlowRTT
     - NetworkEvents
     - IPSec
 processor:
   metrics:
     healthRules:
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - lowVolumeThreshold: "5"
         groupBy: Namespace
       - groupBy: Node
     - template: PacketDropsByDevice
       mode: recording
       variants:
       - groupBy: Node
     - template: IPsecErrors
       mode: recording
       variants:
       - {}
       - groupBy: Node
     - template: DNSErrors
       mode: recording
       variants:
       - {}
       - groupBy: Namespace
     - template: NetpolDenied
       mode: recording
       variants:
       - groupBy: Namespace
     - template: LatencyHighTrend
       mode: recording
       variants:
       - groupBy: Namespace
         trendOffset: 20m
         trendDuration: 20m
'

Let's generate DNS errors as an example:

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
 name: dns-error-generator
 namespace: default
spec:
 containers:
 - name: dns-errors
   image: busybox:latest
   command:
   - /bin/sh
   - -c
   - |
     while true; do
       nslookup nonexistent-domain-12345.invalid
       nslookup another-fake-domain-67890.invalid
       nslookup error-test-domain.notreal
       nslookup invalid-dns-query.fake
       sleep 2
     done
 restartPolicy: Always
EOF

UI:

image

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Dec 9, 2025

@leandroberetta: This pull request references NETOBSERV-2365 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Description

Add the ability to create recording rules instead of alerts for the Network Health feature.

We can test this feature by applying the next command. It will configure all default health rules (ex alerts) in recording mode:

kubectl patch flowcollector cluster --type=merge -n netobserv -p '
spec:
 agent:
   ebpf:
     privileged: true
     features:
     - PacketDrop
     - DNSTracking
     - FlowRTT
     - NetworkEvents
     - IPSec
 processor:
   metrics:
     healthRules:
     - template: PacketDropsByKernel
       mode: recording
       variants:
       - lowVolumeThreshold: "5"
         groupBy: Namespace
       - groupBy: Node
     - template: PacketDropsByDevice
       mode: recording
       variants:
       - groupBy: Node
     - template: IPsecErrors
       mode: recording
       variants:
       - {}
       - groupBy: Node
     - template: DNSErrors
       mode: recording
       variants:
       - {}
       - groupBy: Namespace
     - template: NetpolDenied
       mode: recording
       variants:
       - groupBy: Namespace
     - template: LatencyHighTrend
       mode: recording
       variants:
       - groupBy: Namespace
         trendOffset: 20m
         trendDuration: 20m
'

Let's generate DNS errors as an example:

kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
 name: dns-error-generator
 namespace: default
spec:
 containers:
 - name: dns-errors
   image: busybox:latest
   command:
   - /bin/sh
   - -c
   - |
     while true; do
       nslookup nonexistent-domain-12345.invalid
       nslookup another-fake-domain-67890.invalid
       nslookup error-test-domain.notreal
       nslookup invalid-dns-query.fake
       sleep 2
     done
 restartPolicy: Always
EOF

UI:

image

Dependencies

netobserv/network-observability-console-plugin#1163

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Template AlertTemplate `json:"template,omitempty"`
Template HealthRuleTemplate `json:"template,omitempty"`

// Mode defines whether this health rule should be generated as an alert or a recording rule.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add more information about what is the difference. Such as: "Recording rules violations are visible in the Network Health dashboard without generating any Prometheus alert."

// They are expressed as a percentage of errors above which the alert is triggered. They must be parsable as floats.
// +required
Thresholds AlertThresholds `json:"thresholds,omitempty"`
// Required for alert mode, optional for recording mode.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something I don't get here: how can it be optional for recording mode, if we still want them to appear in the health dashboard, associated with a severity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally thought the recording rule to be just a value, without the notion of a severity, but now that you mention, we can display the value with severities. You're right. Thanks for the feedback.

// More information on health rules: https://github.com/netobserv/network-observability-operator/blob/main/docs/Alerts.md
// +optional
DisableAlerts []AlertTemplate `json:"disableAlerts"`
DisableHealthRules []HealthRuleTemplate `json:"disableHealthRules"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for disabling, we should keep the existing API and continue to only affect alerts. Disabling is mostly to help users when they have too much alerting noise, and it's not so needed for recording rules
Also because we should not rename this field because it already existed before the TP feature, that would be a breaking change.

Comment on lines 18 to 19
HealthRuleNoFlows HealthRuleTemplate = "NetObservNoFlows"
HealthRuleLokiError HealthRuleTemplate = "NetObservLokiError"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"NoFlows" and "LokiError" are different from the others, I think we should keep referring to them as alerts only; unlike the others, they are not health items for the dashboard, they're only for alerting.
We should probably add a validation check that they are not used in "recording" mode


func (rb *ruleBuilder) additionalDescription() string {
return fmt.Sprintf("You can turn off this alert by adding '%s' to spec.processor.metrics.disableAlerts in FlowCollector, or reconfigure it via spec.processor.metrics.alerts.", rb.template)
return fmt.Sprintf("You can turn off this health rule by adding '%s' to spec.processor.metrics.disableHealthRules in FlowCollector, or reconfigure it via spec.processor.metrics.healthRules.", rb.template)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just fyi: we will remove this added description, as it will be moved to the runbooks ; I was about to say something about this message wrt disabled alerts, but we don't care)


// buildRecordingRuleName builds recording rule name following the convention:
// netobserv:health:<template>:<groupby>:<side>:rate2m
func (rb *ruleBuilder) buildRecordingRuleName() string {
Copy link
Member

@jotak jotak Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would prefer if the recording rule name was passed to createRule directly from each alert directly (in alerts.go) ; maybe it's less smart but it's more explicit and make it easier if there's a specific name or something that we want to change for any reason

Actually surely this function can be kept but it's the acronyms and toSnakeCase that IMO can be replaced with an explicit string per alert?

recordName := rb.buildRecordingRuleName()
return &monitoringv1.Rule{
Record: recordName,
// Note: Recording rules cannot have annotations in Prometheus
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really? So we cannot pass any metadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only labels: https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/

That would be an option although is not as good as annotations for these kind of information.

@leandroberetta leandroberetta added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Dec 17, 2025
@github-actions
Copy link

New images:

  • quay.io/netobserv/network-observability-operator:5330495
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-5330495
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-5330495

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:5330495 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-5330495

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-5330495
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Dec 18, 2025
@openshift-ci
Copy link

openshift-ci bot commented Dec 22, 2025

@leandroberetta: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-operator dd36b0d link false /test e2e-operator

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants