-
Notifications
You must be signed in to change notification settings - Fork 11
feat(worker): Centralize retry logic in BaseCodecovTask and fix infinite retry bugs #608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…essor - Added logic to prevent infinite retries when the maximum retry limit is exceeded. - Introduced error logging to capture details when max retries are reached. - Enhanced unit tests to verify behavior for both exceeding and not exceeding max retries.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #608 +/- ##
========================================
Coverage 93.89% 93.90%
========================================
Files 1286 1285 -1
Lines 46807 46685 -122
Branches 1517 1512 -5
========================================
- Hits 43951 43841 -110
+ Misses 2547 2535 -12
Partials 309 309
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
- Updated the LockManager's locked method to eliminate the max_retries parameter, simplifying retry handling. - Adjusted related unit tests to reflect the removal of max_retries, ensuring proper logging and behavior for retry scenarios. - Enhanced error handling to ensure retries are managed through the task's retry method instead.
- Introduced a sentinel object to differentiate between "not provided" and "explicitly None" for max_retries. - Updated the retry method to accommodate unlimited retries when max_retries is set to None, aligning with Celery's behavior. - Simplified logic for determining the effective max_retries value during task retries.
| current_retries = self.request.retries if hasattr(self, "request") else 0 | ||
| task_max_retries = max_retries if max_retries is not None else self.max_retries | ||
|
|
||
| request = getattr(self, "request", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this just return out if self.request is None? The next line is going to freak out trying to call retries on None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Fixed by adding an early check for request is None and handling it gracefully. The code now safely handles the case where request might not exist, using default values instead of potentially accessing attributes on None.
apps/worker/tasks/base.py
Outdated
| # Resolve max_retries for checking/logging | ||
| # If max_retries is not provided, use self.max_retries from task class | ||
| # If max_retries=None is explicitly provided, means unlimited retries (matches Celery behavior) | ||
| # self.max_retries is inherited from BaseCodecovTask (TASK_MAX_RETRIES_DEFAULT) or overridden by task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we eliminate some of the comments in this file, it's getting hard to read the code, either aggregate the important chunks at the top of the function definition or just remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Reduced excessive inline comments and consolidated the important information. The code is now much more readable while still maintaining clarity on the key logic.
apps/worker/tasks/base.py
Outdated
| log = logging.getLogger("worker") | ||
|
|
||
| # Sentinel object to distinguish "not provided" from "explicitly None" for max_retries | ||
| _NOT_PROVIDED = object() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a little bit weird to make a comparison to object(). I'd prefer is this were a singleton class or some other primitive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion! Replaced object() with a singleton class _NotProvided which is clearer and more maintainable. This makes the sentinel pattern more explicit and easier to understand.
apps/worker/tasks/base.py
Outdated
| # If max_retries is not provided, use self.max_retries from task class | ||
| # If max_retries=None is explicitly provided, means unlimited retries (matches Celery behavior) | ||
| # self.max_retries is inherited from BaseCodecovTask (TASK_MAX_RETRIES_DEFAULT) or overridden by task | ||
| if max_retries is _NOT_PROVIDED: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the whole point of this is to override a subclasses max_retries, seems like we should just rely on the subclass having the correct max set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! The ability to override max_retries is actually needed in practice. For example, upload_processor.py uses error-specific retry limits (self.retry(max_retries=error.max_retries, ...)) where different errors may have different retry limits than the task default. This provides flexibility for dynamic retry behavior based on error conditions while still defaulting to the task's max_retries when not specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks AI, can we rename max_retries then to something like max_retries_override
apps/worker/tasks/base.py
Outdated
| request_kwargs = {} | ||
| if request and hasattr(request, "kwargs"): | ||
| request_kwargs = request.kwargs if request.kwargs is not None else {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to not just be request.kwargs? seems like we don't need to copy the data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Simplified kwargs handling - we now use request.kwargs directly when available instead of creating unnecessary copies. The code is cleaner and more efficient.
| if request and hasattr(request, "kwargs"): | ||
| request_kwargs = request.kwargs if request.kwargs is not None else {} | ||
| # Note: kwargs parameter is Celery's special parameter for task kwargs on retry | ||
| retry_kwargs = kwargs if kwargs is not None else {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just default kwargs in the function def to {}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! However, using a mutable default (kwargs={}) is a Python anti-pattern. I kept kwargs=None as the default and handle it in the body, which is safer and follows Python best practices. The code now properly handles both None and empty dict cases.
apps/worker/tasks/base.py
Outdated
| # Pass max_retries to Celery: | ||
| # - If not provided, use self.max_retries (inherited from BaseCodecovTask or overridden) | ||
| # - If explicitly provided (including None), pass as-is (None = unlimited in Celery) | ||
| if max_retries is _NOT_PROVIDED: | ||
| # Not provided, use task's max_retries (inherited from BaseCodecovTask or overridden) | ||
| celery_max_retries = self.max_retries | ||
| else: | ||
| # Explicitly provided, pass as-is (None means unlimited in Celery, matches superclass behavior) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, this is hard to read with so many comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! Removed excessive comments and simplified the code. The important information is now in the docstring, and the implementation is cleaner and easier to read.
| f"Task {self.name} exceeded max retries", | ||
| extra={ | ||
| "task_name": self.name, | ||
| "context": context if context else None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to be its own var instead of just explicitly putting the variables that we pull from kwargs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, this is the UploadContext. It's being added to the kwargs with this PR so the base class retry logic has access to it for logging. These logs match the structure of the other logs.
apps/worker/tasks/base.py
Outdated
| if UploadFlow.has_begun(): | ||
| UploadFlow.log(UploadFlow.UNCAUGHT_RETRY_EXCEPTION) | ||
| if TestResultsFlow.has_begun(): | ||
| TestResultsFlow.log(TestResultsFlow.UNCAUGHT_RETRY_EXCEPTION) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is too specific for a base class to have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent point! Removed UploadFlow and TestResultsFlow logging from the base class as they are too specific. These flows can handle their own logging in their specific task implementations if needed. The base class should remain generic.
- Changed the logging of context fields to use a nested dictionary for improved clarity. - Updated error and warning logs to include the new context structure, enhancing the detail of logged information during task retries and max retries exceeded scenarios.
|
|
||
| def test_bundle_analysis_process_upload_rate_limit_error( | ||
| caplog, | ||
| dbsession, | ||
| mocker, | ||
| mock_storage, | ||
| ): | ||
| """Test that rate_limit_error (retryable) causes retry with correct countdown""" | ||
| storage_path = ( | ||
| "v1/repos/testing/ed1bdd67-8fd2-4cdb-ac9e-39b99e4a3892/bundle_report.sqlite" | ||
| ) |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! This bug was already fixed in the implementation. The code initializes result: ProcessingResult | None = None before the try block (line 161 in apps/worker/tasks/bundle_analysis_processor.py) and checks if result is not None: in the finally block (line 218) before accessing result.bundle_report. This ensures that if an exception occurs before result is assigned, the finally block safely skips cleanup without raising a NameError.
| """Sentinel class to distinguish 'not provided' from 'explicitly None' for max_retries""" | ||
|
|
||
|
|
||
| _NOT_PROVIDED = _NotProvided() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomasrockhu-codecov I don't understand why a named class is not the same as a unnamed class. This is different from how I do singletons in JavaScript. Also, while the comment says "singleton", is that really what we are doing here? The idea is to check for "set vs not-set", because Celery treats None differently from "not set".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm... I guess this is kind of weird too huh? can we make _NOT_PROVIDED = -1? I am assuming that retries should never be negative
unrelated, here's a reasonable writeup on how to make singletons in Python https://www.geeksforgeeks.org/python/singleton-pattern-in-python-a-complete-guide/
drazisil-codecov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall Assessment
This is a solid PR that centralizes retry logic and fixes important infinite retry bugs. The approach is clean and well-tested. Here are some suggestions for improvement:
Strengths
✅ Centralizes retry logic in one place
✅ Fixes infinite retry bugs
✅ Good test coverage (95.35%)
✅ Structured logging with context
✅ Removes redundant helper methods
Suggestions for Improvement
1. Missing Test Coverage (2 lines in base.py)
The coverage report indicates 2 lines are missing coverage. Based on the code, these are likely:
- Lines 254-258: The
if request is None:edge case branch - Line 262: The
hasattr(request, "kwargs")check whenrequest.kwargsis falsy
Recommendation: Add tests for these edge cases to reach 100% coverage.
2. Redundant Check in request.kwargs Handling
request_kwargs = (
request.kwargs if hasattr(request, "kwargs") and request.kwargs else {}
)The and request.kwargs check is redundant since hasattr only checks existence. If request.kwargs exists but is None, this would still use {}, which is correct, but the check could be simplified.
Recommendation: Consider simplifying to:
request_kwargs = getattr(request, "kwargs", None) or {}3. Inconsistent Retry Check in Bundle Analysis
In bundle_analysis_processor.py line 182, there's still a manual self.request.retries < self.max_retries check before calling self.retry(). This is now redundant since self.retry() handles max_retries internally.
Recommendation: Remove the manual check to be consistent with the centralized approach:
if result.error and result.error.is_retryable:
log.warn(...)
self.retry(countdown=30 * (2**self.request.retries))4. Code Duplication in retry() Method
The super().retry() calls at lines 316-330 are nearly identical, differing only by the kwargs parameter.
Recommendation: Simplify to:
retry_kwargs_dict = {"countdown": countdown, "exc": exc, "max_retries": celery_max_retries}
if kwargs:
retry_kwargs_dict["kwargs"] = kwargs
retry_kwargs_dict.update(other_kwargs)
return super().retry(**retry_kwargs_dict)5. Documentation Enhancement
The docstring for retry() could be clearer about the sentinel pattern usage.
Recommendation: Add a note explaining why _NOT_PROVIDED is used:
"""
Override Celery's retry() to automatically check max_retries and track metrics.
The max_retries parameter uses a sentinel pattern (_NOT_PROVIDED) to distinguish:
- Not provided: Uses self.max_retries from the task class
- Explicitly None: Unlimited retries (matches Celery behavior)
- Explicit integer: Override the task's max_retries for this retry
"""6. Potential Edge Case: Empty String Values
The context extraction checks is not None, but doesn't handle empty strings. If commitid="", it would still be included in context.
Recommendation: Consider filtering out empty strings:
if all_kwargs.get("commitid") not in (None, ""):
context["commitid"] = all_kwargs.get("commitid")7. Lock Manager Documentation
The docstring for LockManager.locked() mentions that tasks should handle max_retries via self.retry(), but it might be helpful to add an example.
Recommendation: Add a brief example in the docstring showing the pattern:
"""
Raises:
LockRetry: If lock cannot be acquired, with countdown for retry scheduling.
The calling task should handle max_retries checking via self.retry().
Example:
try:
with lock_manager.locked(LockType.UPLOAD, retry_num=self.request.retries):
# do work
except LockRetry as e:
self.retry(countdown=e.countdown)
"""Minor Issues
- Line 184 in
bundle_analysis_processor.py:log.warnshould belog.warning(thoughwarnis deprecated but still works)
Questions
- Should the
request is Nonecase ever happen in production? If not, should it raise an exception instead of silently using defaults? - Is there a reason to keep the manual
max_retriescheck in bundle_analysis_processor.py, or should it be removed for consistency?
Overall, this is a well-executed refactoring that improves code maintainability and fixes critical bugs. The suggestions above are mostly minor improvements and edge case handling.
…ve logging in BundleAnalysisProcessorTask
thomasrockhu-codecov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address the comments, but this is fine
apps/worker/services/lock_manager.py
Outdated
| log.info( | ||
| "Releasing lock", | ||
| extra={ | ||
| "repoid": self.repoid, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: alphabetize
| """Sentinel class to distinguish 'not provided' from 'explicitly None' for max_retries""" | ||
|
|
||
|
|
||
| _NOT_PROVIDED = _NotProvided() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm... I guess this is kind of weird too huh? can we make _NOT_PROVIDED = -1? I am assuming that retries should never be negative
unrelated, here's a reasonable writeup on how to make singletons in Python https://www.geeksforgeeks.org/python/singleton-pattern-in-python-a-complete-guide/
apps/worker/tasks/base.py
Outdated
| """ | ||
| request = getattr(self, "request", None) | ||
| if request is None: | ||
| # If no request, can't retry - this shouldn't happen in normal operation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we actually want this behavior? I think if there's no request we should throw an error and return
apps/worker/tasks/base.py
Outdated
| else: | ||
| current_retries = request.retries if hasattr(request, "retries") else 0 | ||
| request_kwargs = ( | ||
| request.kwargs if hasattr(request, "kwargs") and request.kwargs else {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same is true here, do we actually want to continue if kwargs isn't in the request? We can probably group this with above
apps/worker/tasks/base.py
Outdated
| # If max_retries is not provided, use self.max_retries from task class | ||
| # If max_retries=None is explicitly provided, means unlimited retries (matches Celery behavior) | ||
| # self.max_retries is inherited from BaseCodecovTask (TASK_MAX_RETRIES_DEFAULT) or overridden by task | ||
| if max_retries is _NOT_PROVIDED: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks AI, can we rename max_retries then to something like max_retries_override
apps/worker/tasks/base.py
Outdated
| if max_retries is _NOT_PROVIDED: | ||
| task_max_retries = self.max_retries | ||
| else: | ||
| task_max_retries = max_retries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just task_max_retries = self.max_retries if max_retries is NOT_PROVIDED else max_retries or similar
apps/worker/tasks/base.py
Outdated
| context = {} | ||
| if all_kwargs.get("commitid") is not None: | ||
| context["commitid"] = all_kwargs.get("commitid") | ||
| if all_kwargs.get("repoid") is not None: | ||
| context["repoid"] = all_kwargs.get("repoid") | ||
| if all_kwargs.get("report_type") is not None: | ||
| context["report_type"] = all_kwargs.get("report_type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be cleaner, since we only care about those 3 attributes, just loop through them
apps/worker/tasks/base.py
Outdated
| }, | ||
| ) | ||
|
|
||
| celery_max_retries = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to be different than task_max_retries
- Enhanced error handling for MaxRetriesExceededError in BundleAnalysisNotifyTask, TestResultsFinisherTask, and UploadFinisherTask. - Updated retry logic to ensure consistent behavior when max retries are exceeded. - Refactored import statements for clarity and organization. - Added a utility function to retrieve request headers from the Celery request object.
apps/worker/tasks/base.py
Outdated
| TASK_TIME_IN_QUEUE = Histogram( | ||
| "worker_tasks_timers_time_in_queue_seconds", | ||
| "Time in seconds spent waiting in the queue before being run", | ||
| "Time in {TODO} spent waiting in the queue before being run", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO placeholder accidentally committed in metric description
The metric description contains a {TODO} placeholder that was accidentally committed. The original text "Time in seconds spent waiting in the queue before being run" was modified to "Time in {TODO} spent waiting in the queue before being run". This will appear in production monitoring dashboards and Prometheus metric descriptions, making the metric confusing for operators.
| if max_retries_override != _NOT_PROVIDED: | ||
| task_max_retries = max_retries_override | ||
| elif "max_retries" in other_kwargs: | ||
| task_max_retries = other_kwargs.pop("max_retries") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double-pop causes wrong max_retries passed to Celery
When max_retries is passed via **other_kwargs (e.g., self.retry(max_retries=5, countdown=10)), it's popped from other_kwargs at line 316 to set task_max_retries. Later at line 373, the code checks "max_retries" in other_kwargs again, but since it was already popped, the check is False and falls through to celery_max_retries = self.max_retries. This means tasks using the old max_retries=X pattern will have the BaseCodecovTask check use X but Celery's retry will use self.max_retries (possibly a different value), causing inconsistent retry behavior. Tasks like http_request.py, manual_trigger.py, and test_analytics_notifier.py use this pattern.
Additional Locations (1)
| error=Errors.INTERNAL_RETRYING, | ||
| ) | ||
| self.retry(countdown=retry.countdown) | ||
| except MaxRetriesExceededError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing INTERNAL_OUT_OF_RETRIES breadcrumb when max retries exceeded
When max retries are exceeded, the code sends Errors.INTERNAL_RETRYING breadcrumb before attempting self.retry(), then catches MaxRetriesExceededError and returns silently. The old code explicitly checked max retries first and sent Errors.INTERNAL_OUT_OF_RETRIES breadcrumb when exceeded. Now INTERNAL_OUT_OF_RETRIES is never sent, and INTERNAL_RETRYING is incorrectly sent even when the task is about to give up. This affects observability for systems monitoring upload finisher retry exhaustion.
Additional Locations (1)
- Changed the description of TASK_TIME_IN_QUEUE for clarity. - Updated retry handling in BundleAnalysisNotifyTask and TestResultsFinisherTask to use self.request.retries instead of self.attempts for consistency. - Ensured safe access to the request object in BaseCodecovTask's _emit_queue_metrics method.
- Removed custom MaxRetriesExceededError class and updated import statements for clarity. - Streamlined retry logic to utilize self.request.retries consistently across the task. - Enhanced error handling for LockRetry exceptions, ensuring proper retry behavior without exceeding max retries. - Updated unit tests to reflect changes in retry logic and error handling.
- Updated retry logic to use task_max_retries directly for Celery's max_retries parameter. - Enhanced request object safety checks to ensure proper handling when attributes are missing.
| error=Errors.INTERNAL_RETRYING, | ||
| ) | ||
| self.retry(countdown=retry.countdown) | ||
| except MaxRetriesExceededError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing INTERNAL_OUT_OF_RETRIES breadcrumb when max retries exceeded
The refactored code logs Errors.INTERNAL_RETRYING breadcrumb before calling self.retry(), then silently returns if MaxRetriesExceededError is caught. The old code first checked if max attempts were exceeded and logged Errors.INTERNAL_OUT_OF_RETRIES before returning. Now when max retries are exceeded, the wrong error (INTERNAL_RETRYING) is logged and INTERNAL_OUT_OF_RETRIES is never recorded, degrading observability and debugging capability for retry exhaustion scenarios. This pattern appears in both run_impl and _handle_finisher_lock methods.
Additional Locations (1)
| now = datetime.now() | ||
| delta = now - enqueued_time | ||
|
|
||
| queue_name = request.get("delivery_info", {}).get("routing_key", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing delivery_info type check may cause AttributeError
The new _emit_queue_metrics method chains .get() calls without checking if delivery_info is a dict. The old code used isinstance(delivery_info, dict) to guard against non-dict values. If delivery_info is explicitly set to None or another non-dict type, request.get("delivery_info", {}).get("routing_key", None) will raise an AttributeError since the default {} only applies when the key is missing, not when it exists with a non-dict value.
| max_retries=self.max_retries, | ||
| countdown=30 * (2**self.request.retries), | ||
| ) | ||
| self.retry(countdown=30 * (2**self.request.retries)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upload state not updated when max retries exceeded
When a retryable error occurs and max retries are exceeded, self.retry() at line 190 raises MaxRetriesExceededError, causing result.update_upload(carriedforward=carriedforward) at line 191 to never execute. The old code explicitly handled this scenario by calling update_upload() to set the upload state to "error" before returning. Now the upload remains in its current state (likely "started" or "processing") instead of being marked as "error", leaving the database in an inconsistent state that misrepresents the actual processing outcome.
Description
Centralizes retry handling in
BaseCodecovTask.retry()and fixes infinite retry bugs in bundle analysis tasks.Changes:
BaseCodecovTask.retry()to handlemax_retrieschecks, logging, metrics, and exponential backoffmax_retriesparameter fromLockManager.locked(); delegate retry limits to taskssafe_retry()and_retry()helper methodsrepoid,commitid,report_type) nested undercontextkeyBundleAnalysisProcessorTaskandBundleAnalysisNotifyTaskretry()logic and test real behaviorImpact:
max_retriesis exceededNote
Unifies and hardens retry behavior across worker tasks while simplifying lock handling and improving observability.
BaseCodecovTask.retry()to enforcemax_retries, apply exponential backoff, updateattemptsheader, emit metrics, and add structured context in logs; adds defaultmax_retriesfrom config and_emit_queue_metrics()helpermax_retriesfromLockManager.locked(); it now raisesLockRetry(countdown)and logs release infinallybundle_analysis_processor,bundle_analysis_notify,upload,upload_finisher,preprocess_upload,manual_trigger,test_results_finisher, andtest_analytics_notifierto useself.request.retriesand the newretry(); replace custom/safe retry paths and explicit max-retry checks; handleMaxRetriesExceededErrorwhere neededWritten by Cursor Bugbot for commit e1947a7. This will update automatically on new commits. Configure here.