OCPNODE-3201: Default Enablement of system-reserved-compressible in OpenShift #5408

ngopalak-redhat · 2025-11-12T04:45:52Z

What I did

This PR enables system-reserved-compressible enforcement by default for all OpenShift 4.22+ clusters to allow better CPU allocation for system reserved processes through cgroup-based enforcement.

Template Changes:

Added systemReservedCgroup: /system.slice to default kubelet configuration for all node types (master, worker, arbiter)
Added system-reserved-compressible to enforceNodeAllocatable alongside pods in kubelet template files

Performance Profile Compatibility:
The kubelet cannot simultaneously enforce both systemReservedCgroup and --reserved-cpus (used by Performance Profiles in the Node Tuning Operator). To resolve this conflict, I added logic in the Kubelet Config Controller (pkg/controller/kubelet-config/helpers.go) to:

Detect when reservedSystemCPUs (--reserved-cpus) is set
Automatically clear systemReservedCgroup when reservedSystemCPUs is detected
Set enforceNodeAllocatable to ["pods"] only in this scenario
Preserve existing Performance Profile behavior without requiring any operator changes

This approach leverages the fact that --reserved-cpus already supersedes system-reserved, making systemReservedCgroup enforcement redundant in PerformanceProfile scenarios.

Validation:

Added validation to ensure systemReservedCgroup matches systemCgroups when both are user-specified

How to verify it

For New OCP 4.21+ Clusters:

Deploy a new OCP 4.21+ cluster
SSH into a node and verify kubelet configuration:
cat /etc/kubernetes/kubelet.conf | grep -A2 systemReservedCgroup
cat /etc/kubernetes/kubelet.conf | grep -A3 enforceNodeAllocatable
Verify the output shows:
systemReservedCgroup: /system.slice
enforceNodeAllocatable:

pods
system-reserved-compressible

For Clusters with Performance Profiles:

Create a Performance Profile with reservedSystemCPUs set (via Node Tuning Operator)
Wait for the MachineConfig to be applied and nodes to reboot
SSH into the affected node and check kubelet configuration:
cat /etc/kubernetes/kubelet.conf | grep systemReservedCgroup
cat /etc/kubernetes/kubelet.conf | grep enforceNodeAllocatable
Verify that:
- systemReservedCgroup is NOT present (empty/cleared)
- enforceNodeAllocatable only contains ["pods"]
- Kubelet starts successfully without errors
Check kubelet logs to confirm no conflicts:
journalctl -u kubelet | grep -i "system-reserved|reserved-cpus"

Notes from testing

When setting the empty string for systemReservedCGroup, the the line in pkg/controller/kubelet-config/helpers.go:
```
err = mergo.Merge(originalKubeConfig, specKubeletConfig, mergo.WithOverride)
```
ignores it as mergo.WithOverwriteWithEmptyValue is not used. The impact of adding mergo.WithOverwriteWithEmptyValue could impact other keys in the kubeletconfig. Hence to reduce the blast radius an if condition is added:
```
 	if specKubeletConfig.SystemReservedCgroup == "" {
 	}
```
Adding a new e2e test will increase the time taken significantly for the overall test suite duration, hence enhancing an existing test case. As discussed in https://redhat-internal.slack.com/archives/CK1AE4ZCK/p1765210654986779 I have only added high level kubeletconfig test. We are in the process of defining a new test suite for testing other capabilities.

Stress testing
Tests were conducted to validate CPU usage behavior regarding system.slice weights versus hard limits under various load conditions.

Behavior without Contention

Observation: With system-reserved-compressible enabled (500m limit / weight 20), a process in system.slice consumed a full CPU core (1000m) when other slices were idle.

Conclusion: Validated that CPU weights are not hard limits. As per kernel documentation, slices can burst to use available CPU if there is no contention from other slices.

Behavior with Contention (4-core Node)

Test: Simultaneous load applied to system.slice (3 processes) and kubepods.slice (4 processes).

Result: system.slice usage correctly adhered to the configured threshold (did not exceed 500m).

Conclusion: Confirmed that CPU weights correctly enforce proportional distribution when the CPU is under stress.

Large Scale Behavior (192-core Node)

Test: Auto node sizing applied (2.35 cores reserved). Stressed with 200 processes on kubepods and 50 on system.

Result: Observed usage was ~3.27 cores (calculated weight ~92).

Conclusion: Performance is within an acceptable range of the target reservation.

Documentation Update
The following note regarding the default behavior change should be added:

"By default in OpenShift 4.22 and later, system-reserved-compressible is enabled for all clusters that do not use the reserved CPU feature. This addresses previous issues where the system reserved CPU exceeded the desired limit. This default can be overridden by setting systemReservedCPU to "" in the kubelet configuration. Note: In rare cases where other slices are running CPU-intensive workloads, contention from slices other than system.slice and kubepods.slice may still impact overall CPU allocation."

Description for the changelog

Enable system-reserved-compressible enforcement by default in OCP 4.22+ clusters. The kubelet now enforces CPU limits on system daemons via systemReservedCgroup (/system.slice), improving CPU allocation for system reserved processes on nodes with high CPU counts. Automatically disables systemReservedCgroup enforcement when Performance Profiles with reserved-cpus are used to prevent conflicts.

JIRA: https://issues.redhat.com/browse/OCPNODE-3201
Related: https://issues.redhat.com/browse/OCPNODE-3200

Decision Update

As per latest discussion, we plan to make this a default in OCP 4.22. The clusters upgraded from 4.20 also will have this enabled. The changes required for managing backward compatibility is more than just a machine config.

openshift-ci · 2025-11-12T04:46:13Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci-robot · 2025-11-19T12:52:40Z

@ngopalak-redhat: This pull request references OCPNODE-3201 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

TODO: Before Review

Complete upgrade testing

What I did

This PR enables system-reserved-compressible enforcement by default for all new OpenShift 4.21+ clusters to allow better CPU allocation for system reserved processes through cgroup-based enforcement.

Template Changes:

Added systemReservedCgroup: /system.slice to default kubelet configuration for all node types (master, worker, arbiter)

Added system-reserved-compressible to enforceNodeAllocatable alongside pods in kubelet template files

Performance Profile Compatibility:
The kubelet cannot simultaneously enforce both systemReservedCgroup and --reserved-cpus (used by Performance Profiles in the Node Tuning Operator). To resolve this conflict, I added logic in the Kubelet Config Controller (pkg/controller/kubelet-config/helpers.go) to:

Detect when reservedSystemCPUs (--reserved-cpus) is set

Automatically clear systemReservedCgroup when reservedSystemCPUs is detected

Set enforceNodeAllocatable to ["pods"] only in this scenario

Preserve existing Performance Profile behavior without requiring any operator changes

This approach leverages the fact that --reserved-cpus already supersedes system-reserved, making systemReservedCgroup enforcement redundant in PerformanceProfile scenarios.

Validation:

Added validation to ensure systemReservedCgroup matches systemCgroups when both are user-specified

How to verify it

For New OCP 4.21+ Clusters:

Deploy a new OCP 4.21+ cluster

SSH into a node and verify kubelet configuration:
cat /etc/kubernetes/kubelet.conf | grep -A2 systemReservedCgroup
cat /etc/kubernetes/kubelet.conf | grep -A3 enforceNodeAllocatable

Verify the output shows:
systemReservedCgroup: /system.slice
enforceNodeAllocatable:

pods

system-reserved-compressible

For Clusters with Performance Profiles:

Create a Performance Profile with reservedSystemCPUs set (via Node Tuning Operator)

Wait for the MachineConfig to be applied and nodes to reboot

SSH into the affected node and check kubelet configuration:
cat /etc/kubernetes/kubelet.conf | grep systemReservedCgroup
cat /etc/kubernetes/kubelet.conf | grep enforceNodeAllocatable

Verify that:

systemReservedCgroup is NOT present (empty/cleared)

enforceNodeAllocatable only contains ["pods"]

Kubelet starts successfully without errors

Check kubelet logs to confirm no conflicts:
journalctl -u kubelet | grep -i "system-reserved|reserved-cpus"

For OCP 4.20 to 4.21 Upgrades:

Verify that the migration MachineConfig from PR WIP : [release-4.20] kubelet-config compressible patch #5412 is present and preserves old behavior

Confirm no unexpected node reboots occur during upgrade

Description for the changelog

Enable system-reserved-compressible enforcement by default in new OCP 4.21+ clusters. The kubelet now enforces CPU limits on system daemons via systemReservedCgroup (/system.slice), improving CPU allocation for system reserved processes on nodes with high CPU counts. Automatically disables systemReservedCgroup enforcement when Performance Profiles with reserved-cpus are used to prevent conflicts. Existing OCP 4.20 clusters upgrading to 4.21+ will preserve their current behavior via migration MachineConfig.

Related:

JIRA: https://issues.redhat.com/browse/OCPNODE-3201

Related: https://issues.redhat.com/browse/OCPNODE-3200

Migration PR: WIP : [release-4.20] kubelet-config compressible patch #5412

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-11-20T00:44:20Z

@ngopalak-redhat: This pull request references OCPNODE-3201 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

What I did

This PR enables system-reserved-compressible enforcement by default for all OpenShift 4.21+ clusters to allow better CPU allocation for system reserved processes through cgroup-based enforcement.

Template Changes:

Added systemReservedCgroup: /system.slice to default kubelet configuration for all node types (master, worker, arbiter)

Added system-reserved-compressible to enforceNodeAllocatable alongside pods in kubelet template files

Performance Profile Compatibility:
The kubelet cannot simultaneously enforce both systemReservedCgroup and --reserved-cpus (used by Performance Profiles in the Node Tuning Operator). To resolve this conflict, I added logic in the Kubelet Config Controller (pkg/controller/kubelet-config/helpers.go) to:

Detect when reservedSystemCPUs (--reserved-cpus) is set

Automatically clear systemReservedCgroup when reservedSystemCPUs is detected

Set enforceNodeAllocatable to ["pods"] only in this scenario

Preserve existing Performance Profile behavior without requiring any operator changes

This approach leverages the fact that --reserved-cpus already supersedes system-reserved, making systemReservedCgroup enforcement redundant in PerformanceProfile scenarios.

Validation:

Added validation to ensure systemReservedCgroup matches systemCgroups when both are user-specified

How to verify it

For New OCP 4.21+ Clusters:

Deploy a new OCP 4.21+ cluster

SSH into a node and verify kubelet configuration:
cat /etc/kubernetes/kubelet.conf | grep -A2 systemReservedCgroup
cat /etc/kubernetes/kubelet.conf | grep -A3 enforceNodeAllocatable

Verify the output shows:
systemReservedCgroup: /system.slice
enforceNodeAllocatable:

pods

system-reserved-compressible

For Clusters with Performance Profiles:

Create a Performance Profile with reservedSystemCPUs set (via Node Tuning Operator)

Wait for the MachineConfig to be applied and nodes to reboot

SSH into the affected node and check kubelet configuration:
cat /etc/kubernetes/kubelet.conf | grep systemReservedCgroup
cat /etc/kubernetes/kubelet.conf | grep enforceNodeAllocatable

Verify that:

systemReservedCgroup is NOT present (empty/cleared)

enforceNodeAllocatable only contains ["pods"]

Kubelet starts successfully without errors

Check kubelet logs to confirm no conflicts:
journalctl -u kubelet | grep -i "system-reserved|reserved-cpus"

For OCP 4.20 to 4.21 Upgrades:

Verify that the migration MachineConfig from PR WIP : [release-4.20] kubelet-config compressible patch #5412 is present and preserves old behavior

Confirm no unexpected node reboots occur during upgrade

Description for the changelog

Enable system-reserved-compressible enforcement by default in OCP 4.21+ clusters. The kubelet now enforces CPU limits on system daemons via systemReservedCgroup (/system.slice), improving CPU allocation for system reserved processes on nodes with high CPU counts. Automatically disables systemReservedCgroup enforcement when Performance Profiles with reserved-cpus are used to prevent conflicts. Existing OCP 4.20 clusters upgrading to 4.21+ will preserve their current behavior via migration MachineConfig.

Related:

JIRA: https://issues.redhat.com/browse/OCPNODE-3201

Related: https://issues.redhat.com/browse/OCPNODE-3200

Migration PR: WIP : [release-4.20] kubelet-config compressible patch #5412

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-11-20T00:48:31Z

@ngopalak-redhat: This pull request references OCPNODE-3201 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

What I did

This PR enables system-reserved-compressible enforcement by default for all OpenShift 4.21+ clusters to allow better CPU allocation for system reserved processes through cgroup-based enforcement.

Template Changes:

Added systemReservedCgroup: /system.slice to default kubelet configuration for all node types (master, worker, arbiter)

Added system-reserved-compressible to enforceNodeAllocatable alongside pods in kubelet template files

Performance Profile Compatibility:
The kubelet cannot simultaneously enforce both systemReservedCgroup and --reserved-cpus (used by Performance Profiles in the Node Tuning Operator). To resolve this conflict, I added logic in the Kubelet Config Controller (pkg/controller/kubelet-config/helpers.go) to:

Detect when reservedSystemCPUs (--reserved-cpus) is set

Automatically clear systemReservedCgroup when reservedSystemCPUs is detected

Set enforceNodeAllocatable to ["pods"] only in this scenario

Preserve existing Performance Profile behavior without requiring any operator changes

This approach leverages the fact that --reserved-cpus already supersedes system-reserved, making systemReservedCgroup enforcement redundant in PerformanceProfile scenarios.

Validation:

Added validation to ensure systemReservedCgroup matches systemCgroups when both are user-specified

How to verify it

For New OCP 4.21+ Clusters:

Deploy a new OCP 4.21+ cluster

SSH into a node and verify kubelet configuration:
cat /etc/kubernetes/kubelet.conf | grep -A2 systemReservedCgroup
cat /etc/kubernetes/kubelet.conf | grep -A3 enforceNodeAllocatable

Verify the output shows:
systemReservedCgroup: /system.slice
enforceNodeAllocatable:

pods

system-reserved-compressible

For Clusters with Performance Profiles:

Create a Performance Profile with reservedSystemCPUs set (via Node Tuning Operator)

Wait for the MachineConfig to be applied and nodes to reboot

SSH into the affected node and check kubelet configuration:
cat /etc/kubernetes/kubelet.conf | grep systemReservedCgroup
cat /etc/kubernetes/kubelet.conf | grep enforceNodeAllocatable

Verify that:

systemReservedCgroup is NOT present (empty/cleared)

enforceNodeAllocatable only contains ["pods"]

Kubelet starts successfully without errors

Check kubelet logs to confirm no conflicts:
journalctl -u kubelet | grep -i "system-reserved|reserved-cpus"

For OCP 4.20 to 4.21 Upgrades:

Verify that the migration MachineConfig from PR WIP : [release-4.20] kubelet-config compressible patch #5412 is present and preserves old behavior

Confirm no unexpected node reboots occur during upgrade

Description for the changelog

Enable system-reserved-compressible enforcement by default in OCP 4.21+ clusters. The kubelet now enforces CPU limits on system daemons via systemReservedCgroup (/system.slice), improving CPU allocation for system reserved processes on nodes with high CPU counts. Automatically disables systemReservedCgroup enforcement when Performance Profiles with reserved-cpus are used to prevent conflicts. Existing OCP 4.20 clusters upgrading to 4.21+ will preserve their current behavior via migration MachineConfig.

Related:

JIRA: https://issues.redhat.com/browse/OCPNODE-3201

Related: https://issues.redhat.com/browse/OCPNODE-3200

Decision Update
As per latest discussion, we plan to make this a default in OCP 4.21. The clusters upgraded from 4.20 also will have this enabled.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

ngopalak-redhat · 2025-11-20T00:57:46Z

cc: @MarSik @ffromani

sairameshv · 2025-11-20T08:19:10Z

pkg/controller/kubelet-config/helpers.go

 	}
+	// Validate that systemReservedCgroup matches systemCgroups if both are set
+	if kcDecoded.SystemReservedCgroup != "" && kcDecoded.SystemCgroups != "" {
+		if kcDecoded.SystemReservedCgroup != kcDecoded.SystemCgroups {


Why should both the values of SystemReservedCgroup and SystemCgroups match?
From the kubelet configuration doc I don't find such a condition.

As per https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/

It is recommended that the OS system daemons are placed under a top level control group (system.slice on systemd machines for example).

If its not the same, the enforcement would happen on different cgroup while the calculation of the values would happen using SystemCgroups

Apologies, I'm still unclear on this.

I did some more digging into this. If they were different, Kubelet would move system processes to one cgroup (via SystemCgroups) but enforce resource reservation on an empty or different cgroup (via SystemReservedCgroup), then the weights useless for those processes.

pkg/controller/kubelet-config/helpers.go

ngopalak-redhat · 2025-11-20T09:10:26Z

@haircommander Please review

openshift-ci-robot · 2025-11-25T00:36:24Z

@ngopalak-redhat: This pull request references OCPNODE-3201 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

What I did

This PR enables system-reserved-compressible enforcement by default for all OpenShift 4.21+ clusters to allow better CPU allocation for system reserved processes through cgroup-based enforcement.

Template Changes:

Added systemReservedCgroup: /system.slice to default kubelet configuration for all node types (master, worker, arbiter)

Added system-reserved-compressible to enforceNodeAllocatable alongside pods in kubelet template files

Performance Profile Compatibility:
The kubelet cannot simultaneously enforce both systemReservedCgroup and --reserved-cpus (used by Performance Profiles in the Node Tuning Operator). To resolve this conflict, I added logic in the Kubelet Config Controller (pkg/controller/kubelet-config/helpers.go) to:

Detect when reservedSystemCPUs (--reserved-cpus) is set

Automatically clear systemReservedCgroup when reservedSystemCPUs is detected

Set enforceNodeAllocatable to ["pods"] only in this scenario

Preserve existing Performance Profile behavior without requiring any operator changes

This approach leverages the fact that --reserved-cpus already supersedes system-reserved, making systemReservedCgroup enforcement redundant in PerformanceProfile scenarios.

Validation:

Added validation to ensure systemReservedCgroup matches systemCgroups when both are user-specified

How to verify it

For New OCP 4.21+ Clusters:

Deploy a new OCP 4.21+ cluster

SSH into a node and verify kubelet configuration:
cat /etc/kubernetes/kubelet.conf | grep -A2 systemReservedCgroup
cat /etc/kubernetes/kubelet.conf | grep -A3 enforceNodeAllocatable

Verify the output shows:
systemReservedCgroup: /system.slice
enforceNodeAllocatable:

pods

system-reserved-compressible

For Clusters with Performance Profiles:

Create a Performance Profile with reservedSystemCPUs set (via Node Tuning Operator)

Wait for the MachineConfig to be applied and nodes to reboot

SSH into the affected node and check kubelet configuration:
cat /etc/kubernetes/kubelet.conf | grep systemReservedCgroup
cat /etc/kubernetes/kubelet.conf | grep enforceNodeAllocatable

Verify that:

systemReservedCgroup is NOT present (empty/cleared)

enforceNodeAllocatable only contains ["pods"]

Kubelet starts successfully without errors

Check kubelet logs to confirm no conflicts:
journalctl -u kubelet | grep -i "system-reserved|reserved-cpus"

Description for the changelog

Enable system-reserved-compressible enforcement by default in OCP 4.22+ clusters. The kubelet now enforces CPU limits on system daemons via systemReservedCgroup (/system.slice), improving CPU allocation for system reserved processes on nodes with high CPU counts. Automatically disables systemReservedCgroup enforcement when Performance Profiles with reserved-cpus are used to prevent conflicts.

Related:

JIRA: https://issues.redhat.com/browse/OCPNODE-3201

Related: https://issues.redhat.com/browse/OCPNODE-3200

Decision Update
As per latest discussion, we plan to make this a default in OCP 4.21. The clusters upgraded from 4.20 also will have this enabled.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-11-25T00:36:47Z

@ngopalak-redhat: This pull request references OCPNODE-3201 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

What I did

This PR enables system-reserved-compressible enforcement by default for all OpenShift 4.22+ clusters to allow better CPU allocation for system reserved processes through cgroup-based enforcement.

Template Changes:

Added systemReservedCgroup: /system.slice to default kubelet configuration for all node types (master, worker, arbiter)

Added system-reserved-compressible to enforceNodeAllocatable alongside pods in kubelet template files

Performance Profile Compatibility:
The kubelet cannot simultaneously enforce both systemReservedCgroup and --reserved-cpus (used by Performance Profiles in the Node Tuning Operator). To resolve this conflict, I added logic in the Kubelet Config Controller (pkg/controller/kubelet-config/helpers.go) to:

Detect when reservedSystemCPUs (--reserved-cpus) is set

Automatically clear systemReservedCgroup when reservedSystemCPUs is detected

Set enforceNodeAllocatable to ["pods"] only in this scenario

Preserve existing Performance Profile behavior without requiring any operator changes

This approach leverages the fact that --reserved-cpus already supersedes system-reserved, making systemReservedCgroup enforcement redundant in PerformanceProfile scenarios.

Validation:

Added validation to ensure systemReservedCgroup matches systemCgroups when both are user-specified

How to verify it

For New OCP 4.21+ Clusters:

Deploy a new OCP 4.21+ cluster

SSH into a node and verify kubelet configuration:
cat /etc/kubernetes/kubelet.conf | grep -A2 systemReservedCgroup
cat /etc/kubernetes/kubelet.conf | grep -A3 enforceNodeAllocatable

Verify the output shows:
systemReservedCgroup: /system.slice
enforceNodeAllocatable:

pods

system-reserved-compressible

For Clusters with Performance Profiles:

Create a Performance Profile with reservedSystemCPUs set (via Node Tuning Operator)

Wait for the MachineConfig to be applied and nodes to reboot

SSH into the affected node and check kubelet configuration:
cat /etc/kubernetes/kubelet.conf | grep systemReservedCgroup
cat /etc/kubernetes/kubelet.conf | grep enforceNodeAllocatable

Verify that:

systemReservedCgroup is NOT present (empty/cleared)

enforceNodeAllocatable only contains ["pods"]

Kubelet starts successfully without errors

Check kubelet logs to confirm no conflicts:
journalctl -u kubelet | grep -i "system-reserved|reserved-cpus"

Description for the changelog

Enable system-reserved-compressible enforcement by default in OCP 4.22+ clusters. The kubelet now enforces CPU limits on system daemons via systemReservedCgroup (/system.slice), improving CPU allocation for system reserved processes on nodes with high CPU counts. Automatically disables systemReservedCgroup enforcement when Performance Profiles with reserved-cpus are used to prevent conflicts.

Related:

JIRA: https://issues.redhat.com/browse/OCPNODE-3201

Related: https://issues.redhat.com/browse/OCPNODE-3200

Decision Update
As per latest discussion, we plan to make this a default in OCP 4.21. The clusters upgraded from 4.20 also will have this enabled.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-11-25T00:40:02Z

@ngopalak-redhat: This pull request references OCPNODE-3201 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

Details

In response to this:

What I did

This PR enables system-reserved-compressible enforcement by default for all OpenShift 4.22+ clusters to allow better CPU allocation for system reserved processes through cgroup-based enforcement.

Template Changes:

Added systemReservedCgroup: /system.slice to default kubelet configuration for all node types (master, worker, arbiter)

Added system-reserved-compressible to enforceNodeAllocatable alongside pods in kubelet template files

Performance Profile Compatibility:
The kubelet cannot simultaneously enforce both systemReservedCgroup and --reserved-cpus (used by Performance Profiles in the Node Tuning Operator). To resolve this conflict, I added logic in the Kubelet Config Controller (pkg/controller/kubelet-config/helpers.go) to:

Detect when reservedSystemCPUs (--reserved-cpus) is set

Automatically clear systemReservedCgroup when reservedSystemCPUs is detected

Set enforceNodeAllocatable to ["pods"] only in this scenario

Preserve existing Performance Profile behavior without requiring any operator changes

This approach leverages the fact that --reserved-cpus already supersedes system-reserved, making systemReservedCgroup enforcement redundant in PerformanceProfile scenarios.

Validation:

Added validation to ensure systemReservedCgroup matches systemCgroups when both are user-specified

How to verify it

For New OCP 4.21+ Clusters:

Deploy a new OCP 4.21+ cluster

SSH into a node and verify kubelet configuration:
cat /etc/kubernetes/kubelet.conf | grep -A2 systemReservedCgroup
cat /etc/kubernetes/kubelet.conf | grep -A3 enforceNodeAllocatable

Verify the output shows:
systemReservedCgroup: /system.slice
enforceNodeAllocatable:

pods

system-reserved-compressible

For Clusters with Performance Profiles:

Create a Performance Profile with reservedSystemCPUs set (via Node Tuning Operator)

Wait for the MachineConfig to be applied and nodes to reboot

SSH into the affected node and check kubelet configuration:
cat /etc/kubernetes/kubelet.conf | grep systemReservedCgroup
cat /etc/kubernetes/kubelet.conf | grep enforceNodeAllocatable

Verify that:

systemReservedCgroup is NOT present (empty/cleared)

enforceNodeAllocatable only contains ["pods"]

Kubelet starts successfully without errors

Check kubelet logs to confirm no conflicts:
journalctl -u kubelet | grep -i "system-reserved|reserved-cpus"

Description for the changelog

Enable system-reserved-compressible enforcement by default in OCP 4.22+ clusters. The kubelet now enforces CPU limits on system daemons via systemReservedCgroup (/system.slice), improving CPU allocation for system reserved processes on nodes with high CPU counts. Automatically disables systemReservedCgroup enforcement when Performance Profiles with reserved-cpus are used to prevent conflicts.

Related:

JIRA: https://issues.redhat.com/browse/OCPNODE-3201

Related: https://issues.redhat.com/browse/OCPNODE-3200

Decision Update
As per latest discussion, we plan to make this a default in OCP 4.22. The clusters upgraded from 4.20 also will have this enabled.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

ngopalak-redhat · 2025-11-25T00:40:19Z

/hold until OCP 4.22

ngopalak-redhat · 2025-11-25T00:46:46Z

cc: @harche

ngopalak-redhat · 2025-12-08T05:33:52Z

Keeping in draft state to add a e2e test

ngopalak-redhat · 2025-12-17T11:56:06Z

/payload-job periodic-ci-openshift-machine-config-operator-release-4.21-periodics-e2e-aws-mco-disruptive-techpreview-1of2

openshift-ci · 2025-12-17T11:56:10Z

@ngopalak-redhat: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-machine-config-operator-release-4.21-periodics-e2e-aws-mco-disruptive-techpreview-1of2

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5f3903c0-db3f-11f0-9304-6254ced58bff-0

ngopalak-redhat · 2025-12-18T08:09:24Z

/payload-job periodic-ci-openshift-machine-config-operator-release-4.21-periodics-e2e-aws-mco-disruptive-techpreview-1of2

openshift-ci · 2025-12-18T08:09:32Z

@ngopalak-redhat: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-machine-config-operator-release-4.21-periodics-e2e-aws-mco-disruptive-techpreview-1of2

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/de46c180-dbe8-11f0-9b16-f4843203486f-0

ngopalak-redhat · 2025-12-29T09:13:55Z

/payload-job periodic-ci-openshift-machine-config-operator-release-4.21-periodics-e2e-aws-mco-disruptive-techpreview-1of2 openshift/origin#30644

openshift-ci · 2025-12-29T09:13:56Z

@ngopalak-redhat: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.

ngopalak-redhat · 2025-12-29T09:16:05Z

/payload-job periodic-ci-openshift-machine-config-operator-release-4.21-periodics-e2e-aws-mco-disruptive-techpreview-1of2,openshift/origin#30644

openshift-ci · 2025-12-29T09:16:08Z

@ngopalak-redhat: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.

ngopalak-redhat · 2025-12-29T09:21:39Z

/payload-job-with-prs periodic-ci-openshift-machine-config-operator-release-4.21-periodics-e2e-aws-mco-disruptive-techpreview-1of2 openshift/origin#30644

openshift-ci · 2025-12-29T09:21:42Z

@ngopalak-redhat: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-machine-config-operator-release-4.21-periodics-e2e-aws-mco-disruptive-techpreview-1of2

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c86ee1f0-e497-11f0-8a0b-ff5ebaa0ea08-0

ngopalak-redhat · 2026-01-02T04:52:45Z

/payload-job-with-prs periodic-ci-openshift-machine-config-operator-release-4.21-periodics-e2e-aws-mco-disruptive-techpreview-1of2 openshift/origin#30644

openshift-ci · 2026-01-02T04:52:47Z

@ngopalak-redhat: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.

ngopalak-redhat · 2026-01-02T04:54:06Z

/payload-job-with-prs periodic-ci-openshift-machine-config-operator-release-4.21-periodics-e2e-aws-mco-disruptive-techpreview-1of2 openshift/origin#30644

openshift-ci · 2026-01-02T04:54:10Z

@ngopalak-redhat: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-machine-config-operator-release-4.21-periodics-e2e-aws-mco-disruptive-techpreview-1of2

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/121ace70-e797-11f0-9238-0cc470394ced-0

ngopalak-redhat · 2026-01-02T11:01:21Z

/payload-job periodic-ci-openshift-machine-config-operator-release-4.21-periodics-e2e-aws-mco-disruptive-techpreview-1of2

openshift-ci · 2026-01-02T11:01:24Z

@ngopalak-redhat: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

periodic-ci-openshift-machine-config-operator-release-4.21-periodics-e2e-aws-mco-disruptive-techpreview-1of2

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5fc5bd50-e7ca-11f0-9c72-75e7f7ff3b1b-0

ngopalak-redhat · 2026-01-02T11:03:11Z

/test all

ngopalak-redhat · 2026-01-02T11:09:32Z

/test all

ngopalak-redhat · 2026-01-05T01:56:32Z

/test all

openshift-ci-robot · 2026-01-05T06:17:18Z

@ngopalak-redhat: This pull request references OCPNODE-3201 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

What I did

This PR enables system-reserved-compressible enforcement by default for all OpenShift 4.22+ clusters to allow better CPU allocation for system reserved processes through cgroup-based enforcement.

Template Changes:

Added systemReservedCgroup: /system.slice to default kubelet configuration for all node types (master, worker, arbiter)

Added system-reserved-compressible to enforceNodeAllocatable alongside pods in kubelet template files

Performance Profile Compatibility:
The kubelet cannot simultaneously enforce both systemReservedCgroup and --reserved-cpus (used by Performance Profiles in the Node Tuning Operator). To resolve this conflict, I added logic in the Kubelet Config Controller (pkg/controller/kubelet-config/helpers.go) to:

Detect when reservedSystemCPUs (--reserved-cpus) is set

Automatically clear systemReservedCgroup when reservedSystemCPUs is detected

Set enforceNodeAllocatable to ["pods"] only in this scenario

Preserve existing Performance Profile behavior without requiring any operator changes

This approach leverages the fact that --reserved-cpus already supersedes system-reserved, making systemReservedCgroup enforcement redundant in PerformanceProfile scenarios.

Validation:

Added validation to ensure systemReservedCgroup matches systemCgroups when both are user-specified

How to verify it

For New OCP 4.21+ Clusters:

Deploy a new OCP 4.21+ cluster

SSH into a node and verify kubelet configuration:
cat /etc/kubernetes/kubelet.conf | grep -A2 systemReservedCgroup
cat /etc/kubernetes/kubelet.conf | grep -A3 enforceNodeAllocatable

Verify the output shows:
systemReservedCgroup: /system.slice
enforceNodeAllocatable:

pods

system-reserved-compressible

For Clusters with Performance Profiles:

Create a Performance Profile with reservedSystemCPUs set (via Node Tuning Operator)

Wait for the MachineConfig to be applied and nodes to reboot

SSH into the affected node and check kubelet configuration:
cat /etc/kubernetes/kubelet.conf | grep systemReservedCgroup
cat /etc/kubernetes/kubelet.conf | grep enforceNodeAllocatable

Verify that:

systemReservedCgroup is NOT present (empty/cleared)

enforceNodeAllocatable only contains ["pods"]

Kubelet starts successfully without errors

Check kubelet logs to confirm no conflicts:
journalctl -u kubelet | grep -i "system-reserved|reserved-cpus"

Notes from testing
When setting the empty string for systemReservedCGroup, the the line in pkg/controller/kubelet-config/helpers.go:
err = mergo.Merge(originalKubeConfig, specKubeletConfig, mergo.WithOverride)
ignores it as mergo.WithOverwriteWithEmptyValue is not used. The impact of adding mergo.WithOverwriteWithEmptyValue could impact other keys in the kubeletconfig. Hence to reduce the blast radius an if condition is added:
	if specKubeletConfig.SystemReservedCgroup == "" {
	}
Adding a new e2e test will increase the time taken significantly for the overall test suite duration, hence enhancing an existing test case. As discussed in https://redhat-internal.slack.com/archives/CK1AE4ZCK/p1765210654986779 I have only added high level kubeletconfig test. We are in the process of defining a new test suite for testing other capabilities.
Stress testing
Tests were conducted to validate CPU usage behavior regarding system.slice weights versus hard limits under various load conditions.

Behavior without Contention

Observation: With system-reserved-compressible enabled (500m limit / weight 20), a process in system.slice consumed a full CPU core (1000m) when other slices were idle.

Conclusion: Validated that CPU weights are not hard limits. As per kernel documentation, slices can burst to use available CPU if there is no contention from other slices.

Behavior with Contention (4-core Node)

Test: Simultaneous load applied to system.slice (3 processes) and kubepods.slice (4 processes).

Result: system.slice usage correctly adhered to the configured threshold (did not exceed 500m).

Conclusion: Confirmed that CPU weights correctly enforce proportional distribution when the CPU is under stress.

Large Scale Behavior (192-core Node)

Test: Auto node sizing applied (2.35 cores reserved). Stressed with 200 processes on kubepods and 50 on system.

Result: Observed usage was ~3.27 cores (calculated weight ~92).

Conclusion: Performance is within an acceptable range of the target reservation.

Documentation Update
The following note regarding the default behavior change should be added:

"By default in OpenShift 4.22 and later, system-reserved-compressible is enabled for all clusters that do not use the reserved CPU feature. This addresses previous issues where the system reserved CPU exceeded the desired limit. This default can be overridden by setting systemReservedCPU to "" in the kubelet configuration. Note: In rare cases where other slices are running CPU-intensive workloads, contention from slices other than system.slice and kubepods.slice may still impact overall CPU allocation."

Description for the changelog

Enable system-reserved-compressible enforcement by default in OCP 4.22+ clusters. The kubelet now enforces CPU limits on system daemons via systemReservedCgroup (/system.slice), improving CPU allocation for system reserved processes on nodes with high CPU counts. Automatically disables systemReservedCgroup enforcement when Performance Profiles with reserved-cpus are used to prevent conflicts.

Related:

JIRA: https://issues.redhat.com/browse/OCPNODE-3201

Related: https://issues.redhat.com/browse/OCPNODE-3200

Decision Update

As per latest discussion, we plan to make this a default in OCP 4.22. The clusters upgraded from 4.20 also will have this enabled. The changes required for managing backward compatibility is more than just a machine config.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2026-01-05T09:29:20Z

@ngopalak-redhat: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/okd-scos-images	`6abd4be`	link	true	`/test okd-scos-images`
ci/prow/bootstrap-unit	`6abd4be`	link	false	`/test bootstrap-unit`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

ngopalak-redhat changed the title ~~Implement system-reserved-compressible~~ WIP: Implement system-reserved-compressible Nov 12, 2025

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 12, 2025

ngopalak-redhat force-pushed the ngopalak/system-reserved-compressible-1 branch from ca28d80 to 00bb8e1 Compare November 17, 2025 03:53

ngopalak-redhat changed the title ~~WIP: Implement system-reserved-compressible~~ OCPNODE-3201: Default Enablement of system-reserved-compressible in OpenShift 4.21 Nov 19, 2025

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 19, 2025

ngopalak-redhat marked this pull request as ready for review November 20, 2025 00:48

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 20, 2025

openshift-ci bot requested review from RishabhSaini and djoshy November 20, 2025 00:49

sairameshv reviewed Nov 20, 2025

View reviewed changes

ngopalak-redhat marked this pull request as draft November 20, 2025 15:11

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 20, 2025

ngopalak-redhat changed the title ~~OCPNODE-3201: Default Enablement of system-reserved-compressible in OpenShift 4.21~~ OCPNODE-3201: Default Enablement of system-reserved-compressible in OpenShift Nov 25, 2025

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 25, 2025

ngopalak-redhat marked this pull request as ready for review November 25, 2025 00:40

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 25, 2025

openshift-ci bot requested review from isabella-janssen and yuqi-zhang November 25, 2025 00:41

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 17, 2025

openshift-ci bot mentioned this pull request Dec 29, 2025

WIP: Increase pinnedimage timeout openshift/origin#30644

Draft

ngopalak-redhat force-pushed the ngopalak/system-reserved-compressible-1 branch from ab9b00b to 308744e Compare January 2, 2026 11:05

OCPNODE-3201: Make system-reserved-compressible default

6abd4be

ngopalak-redhat force-pushed the ngopalak/system-reserved-compressible-1 branch from 308744e to 6abd4be Compare January 5, 2026 01:56

ngopalak-redhat marked this pull request as ready for review January 5, 2026 06:17

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 5, 2026

ngopalak-redhat requested a review from sairameshv January 5, 2026 06:17

openshift-ci bot requested a review from umohnani8 January 5, 2026 06:17

OCPNODE-3201: Default Enablement of system-reserved-compressible in OpenShift #5408

Are you sure you want to change the base?

OCPNODE-3201: Default Enablement of system-reserved-compressible in OpenShift #5408

Conversation

ngopalak-redhat commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Nov 12, 2025

Uh oh!

openshift-ci-robot commented Nov 19, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO: Before Review

Uh oh!

openshift-ci-robot commented Nov 20, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Nov 20, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngopalak-redhat commented Nov 20, 2025

Uh oh!

sairameshv Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

ngopalak-redhat Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

sairameshv Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

ngopalak-redhat Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ngopalak-redhat commented Nov 20, 2025

Uh oh!

openshift-ci-robot commented Nov 25, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Nov 25, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Nov 25, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngopalak-redhat commented Nov 25, 2025

Uh oh!

ngopalak-redhat commented Nov 25, 2025

Uh oh!

ngopalak-redhat commented Dec 8, 2025

Uh oh!

ngopalak-redhat commented Dec 17, 2025

Uh oh!

openshift-ci bot commented Dec 17, 2025

Uh oh!

ngopalak-redhat commented Dec 18, 2025

Uh oh!

openshift-ci bot commented Dec 18, 2025

Uh oh!

ngopalak-redhat commented Dec 29, 2025

Uh oh!

openshift-ci bot commented Dec 29, 2025

Uh oh!

ngopalak-redhat commented Dec 29, 2025

Uh oh!

openshift-ci bot commented Dec 29, 2025

Uh oh!

ngopalak-redhat commented Dec 29, 2025

Uh oh!

openshift-ci bot commented Dec 29, 2025

Uh oh!

ngopalak-redhat commented Jan 2, 2026

Uh oh!

openshift-ci bot commented Jan 2, 2026

Uh oh!

ngopalak-redhat commented Jan 2, 2026

Uh oh!

openshift-ci bot commented Jan 2, 2026

Uh oh!

ngopalak-redhat commented Jan 2, 2026

ngopalak-redhat commented Nov 12, 2025 •

edited

Loading

openshift-ci-robot commented Nov 19, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Nov 20, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Nov 20, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Nov 25, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Nov 25, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Nov 25, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Jan 5, 2026 •

edited by openshift-ci bot

Loading