Skip to content

Conversation

@svij-sc
Copy link
Collaborator

@svij-sc svij-sc commented Nov 26, 2025

Background Context: #405
Some guidance on stack of PRs: #405 (comment)

Scope of PR

  • Updated requirements/install_py_deps.sh so uv is installed. Introduced a couple new flags that can help with finer grain control when installing gigl deps, installing glt, etc. Subsequently, also cleaned up this script quite a bit as it was lacking love.
  • Migrated Makefile to now us uv

@svij-sc svij-sc changed the title Install UV + update dev usage to UV [3/6 Deps Update] Install UV + update dev usage to UV Nov 26, 2025
Comment on lines 56 to 59
if result != 0:
raise RuntimeError(
f"Post-install script finished running, with return code: {result}"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change need to be part of these prs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can pull it out if you prefer.
This was failing at some point during testing and I wanted to ensure that it didn't silently fail.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to just address these unrelated things out of the scope of this uv+py3.11 change as the core changes are already pretty massive.

e.g. #409

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will raise this: #410

Comment on lines 125 to 130
if [[ "${UV_SYSTEM_PYTHON}" == "true" ]]
then
echo "Recognized using system python."
echo "Will use inexact match for dependencies so we don't override system packages."
flag_use_inexact_match="--inexact"
fi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems risky? Is this ever true? I'd imagine we'd want to error out?

Or do we need this for prod?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment, lso as an example here:

# Already has python 3.11 installed - no need to install it again.
# We use system python since it has packages pre-installed for us.
ENV UV_SYSTEM_PYTHON=true

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm - should we be doing this? I feel like this makes our environment less hermetic and more susceptible to e.g. gcp changes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand what you mean by gcp changes?

The alternative would be we create a complicated docker image that is able to build / install torch and cuda libs closely matching how pytorch genreates their docker images: https://hub.docker.com/layers/pytorch/pytorch/2.9.1-cuda12.6-cudnn9-runtime/images/sha256-28fb54912f51b59d80f696da9e551ac1da601ce761195f5ff8dd59dccdcd1516

I'd say the preference we want to have is leave the heavy lifting of creating optimized docker base images to pytorch and cuda engineers and just to use the existing installed python interpreter and the related cuda / pytorch libs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the system python and non-system python executables have different binaries? Or is your point that torch/cuda libs are installed specially and just doing pip install torch is not going to give us the same behavior?

My concern here is that if torch/etc end up putting some other deps in their "system install" then we may run into conflicts. This is what I meant by "gcp changes" e.g. our upstream sources change and now we're broken for mysterious reasons.

But if there is in fact special (if not secret) sauce in the dep installation it makes sense to resuse that.

Copy link
Collaborator Author

@svij-sc svij-sc Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torch/cuda libs are installed specially and just doing pip install torch is not going to give us the same behavior?

exactly this.
Note that we have the same issue w/ the dataflow image. I need to use system python as it has apache beam installed. For whatever reason I am unable to just "install the right version of apache beam" and it working for dataflow workers. See: #399 (comment)

The older version of building Dataflow base image does not work anymore i.e. COPY --from=apache/beam_python3.9_sdk:2.53.0 /opt/apache/beam /opt/apache/beam with the updated python and sdk version creates runtime failures when running dataflow jobs. Thus, we are now just using apache/beam_python3.11_sdk:2.56.0 as base image instead of trying to copy from it.

Re:

torch/cuda libs are installed specially and just doing pip install torch is not going to give us the same behavior?

Sure, but that would mean we update the docker source image - but this happens very rarely.
And, when we update that image likely other things might also be dreprecated that will need to be fixed beyond dep resolution.

git diff --quiet || { echo Branch is dirty, please commit changes and ensure branch is clean; exit 1; }
$(eval GIT_COMMIT=$(shell git log -1 --pretty=format:"%H"))

initialize_environment:
Copy link
Collaborator

@yliu2-sc yliu2-sc Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is documented in a couple .md files as well, and need to be removed. My understanding is that with uv we don't need conda virtualenv anymore right

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, updated

COPY requirements gigl_deps/requirements
COPY python/gigl/scripts gigl_deps/python/gigl/scripts
RUN pip install --upgrade pip
RUN cd gigl_deps && bash ./requirements/install_py_deps.sh --no-pip-cache --dev
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv cache is diff how pip cache is handled.
pip cache gave us issues in the past due to some libs being preinstalled w/ the conda python interpreters.

This is not an issue anymore as we are not using pip anymore and have replaced it with uv


check_if_valid_env:
#@command -v docker >/dev/null 2>&1 || { echo >&2 "docker is required but it's not installed. Aborting."; exit 1; }
@command -v docker >/dev/null 2>&1 || { echo >&2 "docker is required but it's not installed. Aborting."; exit 1; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. Should probably go in a separate PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a pr for other related comment.
Leaving this in since its 1 line / nit.

Comment on lines 56 to 59
if result != 0:
raise RuntimeError(
f"Post-install script finished running, with return code: {result}"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to just address these unrelated things out of the scope of this uv+py3.11 change as the core changes are already pretty massive.

e.g. #409

Comment on lines 125 to 130
if [[ "${UV_SYSTEM_PYTHON}" == "true" ]]
then
echo "Recognized using system python."
echo "Will use inexact match for dependencies so we don't override system packages."
flag_use_inexact_match="--inexact"
fi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm - should we be doing this? I feel like this makes our environment less hermetic and more susceptible to e.g. gcp changes.

Copy link
Collaborator

@kmontemayor2-sc kmontemayor2-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI there are some other references to pip install in our code base - let's update them too (at some point) https://github.com/search?q=repo%3ASnapchat%2FGiGL+%22pip+install%22&type=code

pip install -e ./python/

# Can only be run on an arm64 mac, otherwise generated hashed req file will be wrong
generate_mac_arm64_cpu_hashed_requirements:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 125 to 130
if [[ "${UV_SYSTEM_PYTHON}" == "true" ]]
then
echo "Recognized using system python."
echo "Will use inexact match for dependencies so we don't override system packages."
flag_use_inexact_match="--inexact"
fi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the system python and non-system python executables have different binaries? Or is your point that torch/cuda libs are installed specially and just doing pip install torch is not going to give us the same behavior?

My concern here is that if torch/etc end up putting some other deps in their "system install" then we may run into conflicts. This is what I meant by "gcp changes" e.g. our upstream sources change and now we're broken for mysterious reasons.

But if there is in fact special (if not secret) sauce in the dep installation it makes sense to resuse that.

# 10.0 = Blackwell support i.e. B200 - CUDA 12.6 or later
# 12.0 = Blackwell support i.e. RTX6000 - CUDA 12.8 or later
# List of Nvidia GPUS: https://developer.nvidia.com/cuda-gpus
TORCH_CUDA_ARCH_LIST="7.5" WITH_CUDA="ON" python setup.py bdist_wheel
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm do we not need to bump this too if we want to use newer GPUs (more efficiently)? Or is that something we want to do later?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably as a seperate change.
Hopefully it does become more efficient.?

But let me add that to the tracker - this is a good call.

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@svij-sc svij-sc marked this pull request as ready for review December 12, 2025 22:05
@svij-sc svij-sc merged commit 5cbd507 into svij/update-deps-migrate-uv Dec 12, 2025
6 checks passed
@svij-sc svij-sc deleted the svij/install-uv branch December 12, 2025 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants