UN-2865 [FIX] Remove premature COMPLETED status update in general worker after async file orchestration #1574

muhammad-ali-e · 2025-10-09T16:27:20Z

What

Removed premature workflow execution status update to COMPLETED in workers/general/tasks.py after orchestrating async file processing
Removed WorkflowExecutionStatusUpdate import and usage that incorrectly marked execution as complete before files finished processing
Updated comments to clarify orchestration time vs total execution time
Updated logging message from "Successfully completed" to "Successfully orchestrated" with clarification about async processing
Removed incorrect total_files calculation based on input parameters

Why

Critical Bug: General worker was incorrectly setting execution status to COMPLETED immediately after orchestrating async file processing via chord
Files were still being processed asynchronously by file_processing workers when status showed "COMPLETED" in UI, causing user confusion
Created race condition where general worker's COMPLETED status could overwrite callback worker's final status determination
execution_time was only measuring orchestration time (~2 seconds), not actual file processing time (potentially minutes), leading to incorrect execution time display
total_files calculation was incorrect - used input parameter hash_values_of_files instead of actual discovered source files count
Inconsistent behavior compared to API deployment worker which correctly leaves final status to callback worker

How

Removed lines 276-285 that created WorkflowExecutionStatusUpdate and called update_workflow_execution_status() with COMPLETED status
Removed WorkflowExecutionStatusUpdate from imports (line 19) as it's no longer needed
Updated comment from "Calculate execution time" to "Calculate orchestration time (not total execution time - callback worker calculates that)"
Added comment: "Status remains EXECUTING - callback worker will set COMPLETED/ERROR after all files finish"
Updated logging from "Successfully completed general workflow execution" to "Successfully orchestrated general workflow execution (files processing asynchronously)"
Pattern now matches workers/api-deployment/tasks.py behavior which correctly leaves final status determination to callback worker
Callback worker (workers/callback/tasks.py) already handles setting final COMPLETED or ERROR status using _determine_execution_status_unified() after all files complete

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

No, this PR will not break any existing features. In fact, it fixes broken behavior:

Before: Execution status incorrectly showed COMPLETED while files were still processing (broken state)
After: Execution status correctly remains EXECUTING until callback worker sets final status based on actual file processing results
The callback worker already handles final status determination correctly - this just removes premature/incorrect status override
Zero-files case is still handled correctly (synchronous COMPLETED status set inside _execute_general_workflow())
All pre-commit hooks passed successfully
Pattern matches existing API deployment worker implementation which works correctly
No database schema changes, no API changes, no breaking changes to worker interfaces

Database Migrations

No migrations required

Env Config

No environment configuration changes required

Relevant Docs

Worker architecture: workers/WORKERS_ARCHITECTURE_CONFLUENCE.md
Callback worker documentation: workers/callback/tasks.py
Execution status flow: callback worker determines final status using _determine_execution_status_unified()

Related Issues or PRs

Jira ticket: UN-2865
Related pattern: workers/api-deployment/tasks.py (correct implementation)

Dependencies Versions

No dependency version changes

Notes on Testing

All pre-commit hooks passed successfully (ruff, ruff-format, pycln, pyupgrade)
Manual verification: Removed code was only executed after async orchestration, causing premature COMPLETED status
Callback worker already has comprehensive status determination logic in _determine_execution_status_unified()
Zero-files edge case still handled correctly via synchronous COMPLETED in _execute_general_workflow()
Pattern verified against working API deployment worker implementation

Screenshots

N/A - Backend worker logic fix only

Checklist

I have read and understood the Contribution Guidelines.

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

This fix addresses a critical bug where the general worker incorrectly marked workflow executions as COMPLETED immediately after orchestrating async file processing, while files were still being processed. Changes: - Removed WorkflowExecutionStatusUpdate that set status to COMPLETED - Removed incorrect execution_time update (only orchestration time) - Removed incorrect total_files calculation - Updated comments to clarify orchestration vs execution completion - Updated logging to reflect async orchestration behavior The callback worker now properly handles setting the final COMPLETED or ERROR status after all files finish processing, matching the pattern used by the API deployment worker. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai · 2025-10-09T16:27:58Z

Summary by CodeRabbit

New Features
- Execution time now reflects orchestration duration in results and activity.
Bug Fixes
- Workflow runs remain in Executing until all files finish; completion is finalized asynchronously.
Refactor
- Adjusted orchestration flow to rely on asynchronous completion callbacks.
Chores
- Updated success logs to clarify orchestration with asynchronous file processing.

Walkthrough

The general worker’s orchestration flow was adjusted to no longer finalize workflow status. It now records orchestration execution time, logs accordingly, and defers status completion and cache cleanup to a callback worker. An import of WorkflowExecutionStatusUpdate was removed.

Changes

Cohort / File(s)	Summary
General Worker Orchestration Update `workers/general/tasks.py`	Removed import of `WorkflowExecutionStatusUpdate`; `async_execute_bin_general` no longer sets status to COMPLETED, leaving it EXECUTING; records `execution_result["execution_time"]` as orchestration time; updated success log to reflect async file processing; completion and cache cleanup delegated to callback worker.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant T as Trigger
  participant GW as General Worker (async_execute_bin_general)
  participant AJ as Async File Processing
  participant CB as Callback Worker
  participant DS as Data Store

  T->>GW: Start workflow execution
  GW->>DS: Mark workflow status: EXECUTING
  GW->>AJ: Dispatch async file processing jobs
  Note right of GW: Record orchestration-only execution_time<br/>(not total runtime)
  GW-->>T: Return orchestration success (async processing ongoing)

  par For each file
    AJ-->>CB: Completion callback with file result
  end

  CB->>DS: Aggregate results and finalize status (e.g., COMPLETED/FAILED)
  CB->>DS: Perform cache cleanup

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The PR title clearly and specifically summarizes the main change by referencing the Jira ticket, indicating it is a fix, and describing the removal of the premature COMPLETED status update in the general worker’s async file orchestration workflow. It is concise, focused on the primary fix, and aligns with the changes in the diff.
Description Check	✅ Passed	The pull request description follows the repository’s template by including and populating the What, Why, How, risk assessment, Database Migrations, Env Config, Relevant Docs, Related Issues or PRs, Dependencies Versions, Notes on Testing, and Screenshots sections. All required headings are present and each section contains sufficient detail to understand the change and its impact.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/UN-2865-FIX_premature-completed-status-general-worker

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to Reviews > Disable Cache setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between d7113fc and a780b58.

📒 Files selected for processing (1)

workers/general/tasks.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build

🔇 Additional comments (3)

workers/general/tasks.py (3)

272-280: LGTM! Async orchestration pattern correctly implemented.

The changes properly defer workflow completion status to the callback worker while recording orchestration time. The updated log message and comments clearly communicate that files are processing asynchronously and the orchestration task is not waiting for completion.

This pattern correctly aligns with the API deployment worker behavior where status finalization happens in the callback after all file processing completes.

654-692: LGTM! Zero-files synchronous completion preserved correctly.

The zero-files case properly bypasses async orchestration and immediately marks execution as COMPLETED. This is the correct behavior since no file processing workers are spawned and no callback is needed.

The synchronous completion path ensures pipelines with no files complete immediately rather than remaining in EXECUTING state indefinitely.

313-360: LGTM! Error handling properly maintained.

The error path correctly:

Updates execution status to ERROR with detailed error message

Maintains pipeline status consistency by updating to FAILED

Cleans up StateStore to prevent data leaks between tasks

Re-raises exception to trigger Celery retry mechanism

This ensures failed orchestrations are properly tracked and don't leave resources in inconsistent states.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2025-10-09T16:28:12Z

filepath	function	$$\textcolor{#23d18b}{\tt{passed}}$$	SUBTOTAL
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_logs}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_client\_init}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_run\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_get\_image\_for\_sidecar}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{runner/src/unstract/runner/clients/test\_docker.py}}$$	$$\textcolor{#23d18b}{\tt{test\_sidecar\_container}}$$	$$\textcolor{#23d18b}{\tt{1}}$$	$$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$		$$\textcolor{#23d18b}{\tt{11}}$$	$$\textcolor{#23d18b}{\tt{11}}$$

github-actions · 2025-10-09T16:28:13Z

sonarqubecloud · 2025-10-09T16:28:15Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

muhammad-ali-e requested review from chandrasekharan-zipstack, harini-venkataraman, jagadeeswaran-zipstack, ritwik-g and vishnuszipstack October 9, 2025 16:29

vishnuszipstack approved these changes Oct 10, 2025

View reviewed changes

johnyrahul approved these changes Oct 10, 2025

View reviewed changes

ritwik-g merged commit 77e14ea into main Oct 10, 2025
7 checks passed

ritwik-g deleted the fix/UN-2865-FIX_premature-completed-status-general-worker branch October 10, 2025 04:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UN-2865 [FIX] Remove premature COMPLETED status update in general worker after async file orchestration #1574

UN-2865 [FIX] Remove premature COMPLETED status update in general worker after async file orchestration #1574

Uh oh!

muhammad-ali-e commented Oct 9, 2025

Uh oh!

coderabbitai bot commented Oct 9, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 9, 2025

Uh oh!

github-actions bot commented Oct 9, 2025

Uh oh!

sonarqubecloud bot commented Oct 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

UN-2865 [FIX] Remove premature COMPLETED status update in general worker after async file orchestration #1574

UN-2865 [FIX] Remove premature COMPLETED status update in general worker after async file orchestration #1574

Uh oh!

Conversation

muhammad-ali-e commented Oct 9, 2025

What

Why

How

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

Database Migrations

Env Config

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

Screenshots

Checklist

Uh oh!

coderabbitai bot commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

github-actions bot commented Oct 9, 2025

Uh oh!

github-actions bot commented Oct 9, 2025

Uh oh!

sonarqubecloud bot commented Oct 9, 2025

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

coderabbitai bot commented Oct 9, 2025 •

edited

Loading