Skip to content

Conversation

@schustmi
Copy link
Contributor

@schustmi schustmi commented Dec 1, 2025

Describe changes

This PR adds support for restarting the orchestrator environment when running dynamic pipelines. For orchestrators to support this, they must implement restarts of the orchestration container, and additionally also make sure that the get_orchestrator_run_id() method returns the same value even after a restart. This is currently only implemented for the Kubernetes orchestrator.

Technical implementation details:

  • When restarting the orchestration environment, we re-execute the pipeline function
  • When a step function is executed, we first check if a step run for the given invocation ID already exists
    • If a step run exists, we either return its results (in case it finished) or restart monitoring (in case it's still running). If the step is running in inline mode, we instead mark it as failed and potentially retry it.
    • If no step run exists, we run it as usual

Pre-requisites

Please ensure you have done the following:

  • I have read the CONTRIBUTING.md document.
  • I have added tests to cover my changes.
  • I have based my new branch on develop and the open PR is targeting develop. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.
  • IMPORTANT: I made sure that my changes are reflected properly in the following resources:
    • ZenML Docs
    • Dashboard: Needs to be communicated to the frontend team.
    • Templates: Might need adjustments (that are not reflected in the template tests) in case of non-breaking changes and deprecations.
    • Projects: Depending on the version dependencies, different projects might get affected.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Other (add details above)

@github-actions github-actions bot added internal To filter out internal PRs and issues enhancement New feature or request labels Dec 1, 2025
@schustmi schustmi force-pushed the feature/dynamic-pipelines-orchestrator-retries branch 9 times, most recently from 957fa61 to ae87507 Compare December 4, 2025 04:51
@schustmi schustmi added the no-release-notes Release notes will NOT be attached and used publicly for this PR. label Dec 4, 2025
@schustmi schustmi force-pushed the feature/dynamic-pipelines-orchestrator-retries branch 5 times, most recently from 250bae6 to f601983 Compare December 5, 2025 04:28
@schustmi schustmi marked this pull request as ready for review December 5, 2025 04:36
@schustmi schustmi force-pushed the feature/dynamic-pipelines-orchestrator-retries branch 2 times, most recently from f963880 to 8d84908 Compare December 5, 2025 05:09
@schustmi schustmi force-pushed the feature/dynamic-pipelines-orchestrator-retries branch from 8d84908 to 01e37f9 Compare December 5, 2025 05:34
@schustmi schustmi changed the title Enable retries for dynamic pipeline function execution Enable orchestration environment restarts for dynamic pipelines Dec 5, 2025
@bcdurak bcdurak linked an issue Dec 5, 2025 that may be closed by this pull request
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request internal To filter out internal PRs and issues no-release-notes Release notes will NOT be attached and used publicly for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle restarts during dynamic pipeline function execution

2 participants