Skip to content

Commit aabe03a

Browse files
Merge pull request #1680 from qJkee/OCPEDGE-1102
OCPEDGE-1102: Revert high cpu usage alert description
2 parents c3adc9e + 88976d9 commit aabe03a

File tree

1 file changed

+5
-8
lines changed

1 file changed

+5
-8
lines changed

bindata/assets/alerts/cpu-utilization.yaml

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,18 +10,15 @@ spec:
1010
- alert: HighOverallControlPlaneCPU
1111
annotations:
1212
summary: >-
13-
CPU utilization across all control plane nodes is more than 60% of the total available CPU. Control plane node outage may cause a cascading failure; increase available CPU.
13+
CPU utilization across all three control plane nodes is higher than two control plane nodes can sustain; a single control plane node outage may
14+
cause a cascading failure; increase available CPU.
1415
runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/cluster-kube-apiserver-operator/ExtremelyHighIndividualControlPlaneCPU.md
1516
description: >-
16-
On a multi-node cluster with three control plane nodes, the overall CPU utilization may only be about 2/3 of all available capacity.
17+
Given three control plane nodes, the overall CPU utilization may only be about 2/3 of all available capacity.
1718
This is because if a single control plane node fails, the remaining two must handle the load of the cluster in order to be HA.
18-
If the cluster is using more than 2/3 of all capacity, if one control plane node fails, the remaining two are likely to fail when they take the load.
19+
If the cluster is using more than 2/3 of all capacity, if one control plane node fails, the remaining two are likely to
20+
fail when they take the load.
1921
To fix this, increase the CPU and memory on your control plane nodes.
20-
21-
On a single node OpenShift (SNO) cluster, this alert will also fire if the 2/3 of the CPU cores of the node are in use by any workload. This level of CPU utlization
22-
of an SNO cluster is probably not a problem under most circumstances, but high levels of utilization may result in degraded performance.
23-
To manage this alert or silence it in case of false positives see the following link:
24-
https://docs.openshift.com/container-platform/latest/monitoring/managing-alerts.html
2522
expr: |
2623
sum(
2724
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[1m])) * 100)

0 commit comments

Comments
 (0)