-
Notifications
You must be signed in to change notification settings - Fork 1.8k
efficient polling in waitForStepsToFinish
#8901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
efficient polling in waitForStepsToFinish
#8901
Conversation
303cbfe
to
a1809ff
Compare
@afrittoli, @vdemeester, @AlanGreene, please help review this PR. Thank you! |
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
a1809ff
to
b6b9bca
Compare
The following is the coverage report on the affected files.
|
b6b9bca
to
c8420e5
Compare
The following is the coverage report on the affected files.
|
c8420e5
to
2aa3e22
Compare
The following is the coverage report on the affected files.
|
The current waitForStepsToFinish implementation is a classic busy-wait. It checks for file existence without any sleep, resulting in a high CPU usage. Adding a profile with a unit test to show that almost all time is spent in system calls with a high total sample count. This led to execssive CPU usage by the sidecar even when just waiting. The function now sleeps 100ms between checks, drastically reducing the frequency. The sidecar now uses minimal CPU while waiting. Signed-off-by: Priti Desai <pdesai@us.ibm.com>
2aa3e22
to
e1bda58
Compare
The following is the coverage report on the affected files.
|
/test check-pr-has-kind-label |
@afrittoli: No presubmit jobs available for tektoncd/pipeline@main In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
if intervalStr == "" { | ||
intervalStr = "100ms" | ||
} | ||
interval, err := time.ParseDuration(intervalStr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Isn't the case of an empty string covered here as well?
NIT: Perhaps make a small function to use here and in LookForArtifacts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Isn't the case of an empty string covered here as well?
If we pass an empty string to time.ParseDuration
, it is considered an invalid duration format, and the function will return an error.
NIT: Perhaps make a small function to use here and in LookForArtifacts?
Sure, created a PR: #8909
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @pritidesai - I think this makes a lot of sense. I wonder if we should backport it to TLS; polling without a sleep at all is practically a bug. @vdemeester WDYT?
/approve
@afrittoli I think I agree on the backporting to LTS indeed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/meow
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: afrittoli, vdemeester The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
@pritidesai, could you take care of the backporting? |
Sure, I will backport this to 0.65, 0.68, 1.0, and 1.2. Am I missing any other releases? |
/cherrypick release-v0.65.x |
@pritidesai: #8901 failed to apply on top of branch "release-v0.65.x":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherrypick release-v0.68.x |
@pritidesai: #8901 failed to apply on top of branch "release-v0.68.x":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherrypick release-v1.0.x |
@pritidesai: new pull request created: #8910 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Changes
We discovered the
sidecar-tekton-log-results
is consuming significantly more CPU than the task steps. For example:Analysis:
Profiling of the
sidecarlogresults
component revealed excessive CPU usage. The currentwaitForStepsToFinish
implementation uses a classic busy-wait strategy—it continuously checks for file existence without any sleep interval, resulting in high CPU consumption.Profiling using unit test showed that nearly all CPU time was spent in system calls, with a high total sample count. This led to excessive CPU usage by the sidecar, even when it was simply waiting.
To address this, the function now sleeps for
100ms
between checks, significantly reducing the polling frequency. As a result, the sidecar now consumes minimal CPU while waiting.Current profile:
Profile after adding some sleep:
Total samples reduced down to 70ms compared to 890ms.
Also, the CPU consumption has significantly reduced:
To run the profiling test locally:
Update:
The new key
default-sidecar-log-polling-interval
has been introduced to provide configurable control over how frequently the Tekton sidecar log results container polls for step completion files./kind bug
Submitter Checklist
As the author of this PR, please check off the items in this checklist:
/kind <type>
. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tepRelease Notes