-
Notifications
You must be signed in to change notification settings - Fork 12
[3/6 Deps Update] Install UV + update dev usage to UV #398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
python/gigl/scripts/post_install.py
Outdated
| if result != 0: | ||
| raise RuntimeError( | ||
| f"Post-install script finished running, with return code: {result}" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this change need to be part of these prs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can pull it out if you prefer.
This was failing at some point during testing and I wanted to ensure that it didn't silently fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to just address these unrelated things out of the scope of this uv+py3.11 change as the core changes are already pretty massive.
e.g. #409
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will raise this: #410
| if [[ "${UV_SYSTEM_PYTHON}" == "true" ]] | ||
| then | ||
| echo "Recognized using system python." | ||
| echo "Will use inexact match for dependencies so we don't override system packages." | ||
| flag_use_inexact_match="--inexact" | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems risky? Is this ever true? I'd imagine we'd want to error out?
Or do we need this for prod?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment, lso as an example here:
GiGL/containers/Dockerfile.cuda.base
Lines 9 to 11 in b771633
| # Already has python 3.11 installed - no need to install it again. | |
| # We use system python since it has packages pre-installed for us. | |
| ENV UV_SYSTEM_PYTHON=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm - should we be doing this? I feel like this makes our environment less hermetic and more susceptible to e.g. gcp changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expand what you mean by gcp changes?
The alternative would be we create a complicated docker image that is able to build / install torch and cuda libs closely matching how pytorch genreates their docker images: https://hub.docker.com/layers/pytorch/pytorch/2.9.1-cuda12.6-cudnn9-runtime/images/sha256-28fb54912f51b59d80f696da9e551ac1da601ce761195f5ff8dd59dccdcd1516
I'd say the preference we want to have is leave the heavy lifting of creating optimized docker base images to pytorch and cuda engineers and just to use the existing installed python interpreter and the related cuda / pytorch libs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the system python and non-system python executables have different binaries? Or is your point that torch/cuda libs are installed specially and just doing pip install torch is not going to give us the same behavior?
My concern here is that if torch/etc end up putting some other deps in their "system install" then we may run into conflicts. This is what I meant by "gcp changes" e.g. our upstream sources change and now we're broken for mysterious reasons.
But if there is in fact special (if not secret) sauce in the dep installation it makes sense to resuse that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
torch/cuda libs are installed specially and just doing pip install torch is not going to give us the same behavior?
exactly this.
Note that we have the same issue w/ the dataflow image. I need to use system python as it has apache beam installed. For whatever reason I am unable to just "install the right version of apache beam" and it working for dataflow workers. See: #399 (comment)
The older version of building Dataflow base image does not work anymore i.e. COPY --from=apache/beam_python3.9_sdk:2.53.0 /opt/apache/beam /opt/apache/beam with the updated python and sdk version creates runtime failures when running dataflow jobs. Thus, we are now just using apache/beam_python3.11_sdk:2.56.0 as base image instead of trying to copy from it.
Re:
torch/cuda libs are installed specially and just doing pip install torch is not going to give us the same behavior?
Sure, but that would mean we update the docker source image - but this happens very rarely.
And, when we update that image likely other things might also be dreprecated that will need to be fixed beyond dep resolution.
| git diff --quiet || { echo Branch is dirty, please commit changes and ensure branch is clean; exit 1; } | ||
| $(eval GIT_COMMIT=$(shell git log -1 --pretty=format:"%H")) | ||
|
|
||
| initialize_environment: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is documented in a couple .md files as well, and need to be removed. My understanding is that with uv we don't need conda virtualenv anymore right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, updated
| COPY requirements gigl_deps/requirements | ||
| COPY python/gigl/scripts gigl_deps/python/gigl/scripts | ||
| RUN pip install --upgrade pip | ||
| RUN cd gigl_deps && bash ./requirements/install_py_deps.sh --no-pip-cache --dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uv cache is diff how pip cache is handled.
pip cache gave us issues in the past due to some libs being preinstalled w/ the conda python interpreters.
This is not an issue anymore as we are not using pip anymore and have replaced it with uv
|
|
||
| check_if_valid_env: | ||
| #@command -v docker >/dev/null 2>&1 || { echo >&2 "docker is required but it's not installed. Aborting."; exit 1; } | ||
| @command -v docker >/dev/null 2>&1 || { echo >&2 "docker is required but it's not installed. Aborting."; exit 1; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. Should probably go in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a pr for other related comment.
Leaving this in since its 1 line / nit.
python/gigl/scripts/post_install.py
Outdated
| if result != 0: | ||
| raise RuntimeError( | ||
| f"Post-install script finished running, with return code: {result}" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to just address these unrelated things out of the scope of this uv+py3.11 change as the core changes are already pretty massive.
e.g. #409
| if [[ "${UV_SYSTEM_PYTHON}" == "true" ]] | ||
| then | ||
| echo "Recognized using system python." | ||
| echo "Will use inexact match for dependencies so we don't override system packages." | ||
| flag_use_inexact_match="--inexact" | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm - should we be doing this? I feel like this makes our environment less hermetic and more susceptible to e.g. gcp changes.
kmontemayor2-sc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI there are some other references to pip install in our code base - let's update them too (at some point) https://github.com/search?q=repo%3ASnapchat%2FGiGL+%22pip+install%22&type=code
| pip install -e ./python/ | ||
|
|
||
| # Can only be run on an arm64 mac, otherwise generated hashed req file will be wrong | ||
| generate_mac_arm64_cpu_hashed_requirements: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we add a make generate_deps or something and update this? http://github.com/Snapchat/GiGL/pull/397/files#diff-8cd8c57e79e3bb16584ae59589a0bb144d5f9ca8ccc32a972b659601f16b9e65R15
| if [[ "${UV_SYSTEM_PYTHON}" == "true" ]] | ||
| then | ||
| echo "Recognized using system python." | ||
| echo "Will use inexact match for dependencies so we don't override system packages." | ||
| flag_use_inexact_match="--inexact" | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the system python and non-system python executables have different binaries? Or is your point that torch/cuda libs are installed specially and just doing pip install torch is not going to give us the same behavior?
My concern here is that if torch/etc end up putting some other deps in their "system install" then we may run into conflicts. This is what I meant by "gcp changes" e.g. our upstream sources change and now we're broken for mysterious reasons.
But if there is in fact special (if not secret) sauce in the dep installation it makes sense to resuse that.
| # 10.0 = Blackwell support i.e. B200 - CUDA 12.6 or later | ||
| # 12.0 = Blackwell support i.e. RTX6000 - CUDA 12.8 or later | ||
| # List of Nvidia GPUS: https://developer.nvidia.com/cuda-gpus | ||
| TORCH_CUDA_ARCH_LIST="7.5" WITH_CUDA="ON" python setup.py bdist_wheel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm do we not need to bump this too if we want to use newer GPUs (more efficiently)? Or is that something we want to do later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably as a seperate change.
Hopefully it does become more efficient.?
But let me add that to the tracker - this is a good call.
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Background Context: #405
Some guidance on stack of PRs: #405 (comment)
Scope of PR
requirements/install_py_deps.shsouvis installed. Introduced a couple new flags that can help with finer grain control when installing gigl deps, installing glt, etc. Subsequently, also cleaned up this script quite a bit as it was lacking love.Makefileto now usuv