Skip to content

Conversation

@svij-sc
Copy link
Collaborator

@svij-sc svij-sc commented Nov 26, 2025

Background Context: #405
Some guidance on stack of PRs: #405 (comment)

Scope of PR

  • Generally, updated github actions / workflows to make use of uv
    • Updated the actions and workflows so they can be more easily tested
  • Other issues had to fix
    • p100 support is not available in new pytorch version wheels anymore unless we build from source - had to update to use t4s
    • some config file changes had to be done to enable testing to work again due to directory changes
  • This PR also does extensive testing on PRs 1/6 - 5/6 using the updated github workflows

@svij-sc svij-sc changed the title Update all workflows and actions to use GLT [5/6 Deps Update] Update all workflows and actions to use GLT Nov 26, 2025
Copy link
Collaborator

@kmontemayor2-sc kmontemayor2-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW the name should be
[5/6 Deps Update] Update all workflows and actions to use UV right?

Comment on lines 43 to 56
- name: Checkout PR Branch (on-dispatch)
if: ${{ github.event_name == 'workflow_dispatch' }}
uses: snapchat/gigl/.github/actions/checkout-pr-branch@main
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
pr_number: ${{ inputs.pr_number }}
- name: Checkout repository (on-push)
if: ${{ github.event_name == 'push' }}
uses: actions/checkout@v4
- name: Setup Machine for building Docker images
uses: snapchat/gigl/.github/actions/setup-python-tools@main
uses: ./.github/actions/setup-python-tools
with:
setup_gcloud: "true"
try_cleaning_disk_space: "true"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to keep the two branches here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just introducing so this is easier to test in future; but I can remove if you prefer - I don't have strong feelings, just more work to test it as these will have to be re-introduced.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when do we run build-cuda-base-image and friends on-push?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whenever u make changes to the rule and want to test it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove it tbh if its easier ; seems its also confusing @yliu2-sc

Copy link
Collaborator Author

@svij-sc svij-sc Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do these prior to merge this PR btw; keeping them around to make testing easier for now.
cc @yliu2-sc


unit-test-python:
if: ${{ github.event.issue.pull_request && (contains(github.event.comment.body, '/unit_test_py') || endsWith(github.event.comment.body, '/unit_test')) }}
unit-test:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert these changes?

@svij-sc svij-sc changed the title [5/6 Deps Update] Update all workflows and actions to use GLT [5/6 Deps Update] Update all workflows and actions to use UV Nov 26, 2025
Comment on lines 10 to 11
# Uncomment strictly for testing purposes
# push:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is incomplete and shouldn't be here?

# Once before it gets into the merge queue and once when it is in the merge queue.
# Our tests take a long time to run, so this is not ideal.
if: github.event_name == 'merge_group'
# if: github.event_name == 'merge_group'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove if not needed

# Once before it gets into the merge queue and once when it is in the merge queue.
# Our tests take a long time to run, so this is not ideal.
if: github.event_name == 'merge_group'
# if: github.event_name == 'merge_group'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove if not needed

Comment on lines 68 to 70
build-cpu-base-image:
runs-on: ubuntu-latest
# runs-on: gigl-large-instances # x64 Ubuntu:latest w/ 8-cores, 32GB RAM, 300 GB SSD
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping :)

Copy link
Collaborator Author

@svij-sc svij-sc Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was running into issues building cuda image w/ updates - it was inflating right to 16GB of context space to build the images so it would cause failures > 50% of the time. Thus I updated the machines for gpu in the backend.

Subsequently, my investigation also yielded we dont need the bigger instances for building cpu and dataflow images so i downgraded them to use default instances.

Comment on lines 102 to 103
runs-on: ubuntu-latest
# runs-on: gigl-large-instances # x64 Ubuntu:latest w/ 8-cores, 32GB RAM, 300 GB SSD
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto on machine change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +172 to 174
- build-cpu-base-image
- build-dataflow-base-image
- build-builder-image
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason we need this change now?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#399

The older version of building Dataflow base image does not work anymore i.e. COPY --from=apache/beam_python3.9_sdk:2.53.0 /opt/apache/beam /opt/apache/beam with the updated python and sdk version creates runtime failures when running dataflow jobs. Thus, we are now just using apache/beam_python3.11_sdk:2.56.0 as base image instead of trying to copy from it.

Since we have updated building the docker image - it made sense to update this too.

tests:
cora_nalp_test:
task_config_uri: "gigl/src/mocking/configs/e2e_node_anchor_based_link_prediction_template_gbml_config.yaml"
task_config_uri: "python/gigl/src/mocking/configs/e2e_node_anchor_based_link_prediction_template_gbml_config.yaml"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this change happen?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the root directory has now changed to be root directory not gigl directory - as per pyproject.toml moving and all as well.

This should hopefully simplify all future path resolutions without worrying what dir you are in, etc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I'm surprised this was the only place that this caused issues! Well I guess we got lucky :)

Comment on lines 43 to 56
- name: Checkout PR Branch (on-dispatch)
if: ${{ github.event_name == 'workflow_dispatch' }}
uses: snapchat/gigl/.github/actions/checkout-pr-branch@main
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
pr_number: ${{ inputs.pr_number }}
- name: Checkout repository (on-push)
if: ${{ github.event_name == 'push' }}
uses: actions/checkout@v4
- name: Setup Machine for building Docker images
uses: snapchat/gigl/.github/actions/setup-python-tools@main
uses: ./.github/actions/setup-python-tools
with:
setup_gcloud: "true"
try_cleaning_disk_space: "true"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when do we run build-cuda-base-image and friends on-push?

pr_number: ${{ inputs.pr_number }}
- name: Setup Machine for building Docker images
uses: snapchat/gigl/.github/actions/setup-python-tools@main
uses: ./.github/actions/setup-python-tools
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are planning on reverting this right?

Copy link
Collaborator Author

@svij-sc svij-sc Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wont merge if I revert since it needs updated workflows to run and those workflows wont be in main when this is running CI.

Maybe post merging everything I can raise a PR.

tests:
cora_nalp_test:
task_config_uri: "gigl/src/mocking/configs/e2e_node_anchor_based_link_prediction_template_gbml_config.yaml"
task_config_uri: "python/gigl/src/mocking/configs/e2e_node_anchor_based_link_prediction_template_gbml_config.yaml"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I'm surprised this was the only place that this caused issues! Well I guess we got lucky :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. this change should be in a different pr right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ye.... whoops.

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@svij-sc svij-sc marked this pull request as ready for review December 12, 2025 18:30
@svij-sc svij-sc merged commit 5952343 into svij/update-docker-images-use-uv Dec 12, 2025
6 checks passed
@svij-sc svij-sc deleted the svij/update-git-CI-actions branch December 12, 2025 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants