Merge pull request #1680 from qJkee/OCPEDGE-1102

openshift-merge-bot[bot] · web-flow · commit aabe03ae1ad7 · 2024-06-19T11:17:15.000Z
OCPEDGE-1102: Revert high cpu usage alert description
diff --git a/bindata/assets/alerts/cpu-utilization.yaml b/bindata/assets/alerts/cpu-utilization.yaml
@@ -10,18 +10,15 @@ spec:
         - alert: HighOverallControlPlaneCPU
           annotations:
             summary: >-
-              CPU utilization across all control plane nodes is more than 60% of the total available CPU. Control plane node outage may cause a cascading failure; increase available CPU.
+              CPU utilization across all three control plane nodes is higher than two control plane nodes can sustain; a single control plane node outage may
+              cause a cascading failure; increase available CPU.
             runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/cluster-kube-apiserver-operator/ExtremelyHighIndividualControlPlaneCPU.md
             description: >-
-              On a multi-node cluster with three control plane nodes, the overall CPU utilization may only be about 2/3 of all available capacity.
+              Given three control plane nodes, the overall CPU utilization may only be about 2/3 of all available capacity.
               This is because if a single control plane node fails, the remaining two must handle the load of the cluster in order to be HA.
-              If the cluster is using more than 2/3 of all capacity, if one control plane node fails, the remaining two are likely to fail when they take the load.
+              If the cluster is using more than 2/3 of all capacity, if one control plane node fails, the remaining two are likely to
+              fail when they take the load.
               To fix this, increase the CPU and memory on your control plane nodes.
-              
-              On a single node OpenShift (SNO) cluster, this alert will also fire if the 2/3 of the CPU cores of the node are in use by any workload. This level of CPU utlization
-              of an SNO cluster is probably not a problem under most circumstances, but high levels of utilization may result in degraded performance.
-              To manage this alert or silence it in case of false positives see the following link: 
-              https://docs.openshift.com/container-platform/latest/monitoring/managing-alerts.html
           expr: |
             sum(
               100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[1m])) * 100)