-
Notifications
You must be signed in to change notification settings - Fork 559
Add pipeline run index attribute #4288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Documentation Link Check Results✅ Absolute links check passed |
cf5dea6 to
b0c1aa3
Compare
a9cb52f to
e6c8b05
Compare
1fa5cf3 to
8788126
Compare
8788126 to
cdc3723
Compare
src/zenml/zen_stores/migrations/versions/6e4eb89f632d_unique_run_index.py
Outdated
Show resolved
Hide resolved
src/zenml/zen_stores/migrations/versions/6e4eb89f632d_unique_run_index.py
Show resolved
Hide resolved
| index_within_pipeline = 0 | ||
| run_updates: list[dict[str, Any]] = [] | ||
| run_counts: dict[str, int] = {} | ||
| for row in result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally I would advocate to execute data population for optional values in a post-deployment or post-migration step. Let's think about the worst case scenario in terms of execution time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the index should be non-optional and the column type is changed at the end of the DB migration to reflect that, which is why we need some values for each existing row. I guess we could populate it with a placeholder and then later on replace by real values, not sure it's worth the effort here though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah let's no change the migration logic, let's just estimate the execution time for the backfill (worst case scenario). The general rule is migrations should complete quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally I would advocate to execute data population for optional values in a post-deployment or post-migration step
@Json-Andriopoulos I'm curious, why is it not acceptable to do these sort of changes in the alembic migration script ? what are the pros/cons ?
src/zenml/zen_stores/migrations/versions/6e4eb89f632d_unique_run_index.py
Outdated
Show resolved
Hide resolved
| session.execute( | ||
| update(PipelineSchema) | ||
| .where(col(PipelineSchema.id) == pipeline_id) | ||
| .values(run_count=col(PipelineSchema.run_count) + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good one! I think this should be ok with the default isolation level in MySQL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I think so! Also in my tests it seemed to work just fine. I also didn't add a unique constraint on pipeline_id, index, so even if this for some reason fails to generate unique values, we will not fail to create a run but instead only have a duplicate index.
042c2f0 to
5fb6fcd
Compare
Json-Andriopoulos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
5fb6fcd to
3394503
Compare
Describe changes
This PR adds an
indexattribute for each pipeline run, which specifies the index of the run within its pipeline.Additionally, the
pipelineattribute of thePipelineRunRequestwas removed as the information is already included in thesnapshotthat is part of the request.Limitations:
Pre-requisites
Please ensure you have done the following:
developand the open PR is targetingdevelop. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.Types of changes