Skip to content

Conversation

@rickeylev
Copy link
Collaborator

@rickeylev rickeylev commented Dec 14, 2025

When pkgutil style namespace packages are used, multiple distributions provide
the same venv path (e.g. foo/__init__.py). The venv symlink logic then tries to
symlink the foo/ directory as it looks like the highest linkable directory. When
conflict merging logic runs later, it then has to flatten a depset with all the files
in the conflicting distributions.

To fix, have whl_library() try to guess when a file is a pkgutil namespace package.
These are then pass onto py_library's venv building logic so it can treat the
directories as not directly linkable. A conflict still occurs, but it only
contains the single __init__.py file.

Along the way, special case the "nvidia" package name and always treat it as a namespace
package. This is because nvidia packages aren't strictly correct: each has a blank
__init__.py file (which marks it as a regular package, not namespace package). Special
casing like this is undesirable, but it greatly reduces the number of conflicts if
e.g. torch is installed, and I couldn't find any other metadata to indicate it's a
namespace package.

Along the way, add some hints to AGENTS.md so they understand repository rules better.

Fixes #3401

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @rickeylev, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the virtual environment (venv) creation process to improve efficiency and reduce build conflicts, particularly when dealing with Python namespace packages. By introducing mechanisms to automatically detect and correctly handle pkgutil style namespace packages and special-casing the 'nvidia' package, the system avoids attempting to symlink entire directories that are shared across multiple distributions. This change streamlines the venv merging logic, leading to smaller and more manageable conflict sets, and ultimately a more robust and performant build system for Python dependencies.

Highlights

  • Optimized Venv Creation: Refactored the virtual environment (venv) creation logic to intelligently handle namespace packages, reducing conflict sets during venv merging.
  • Namespace Package Detection: The whl_library rule now automatically detects pkgutil style namespace packages by inspecting __init__.py files for __path__ = __import__("pkgutil").extend_path(__path__, __name__).
  • New namespace_package_files Attribute: Introduced a namespace_package_files attribute to py_library to explicitly hint which files belong to namespace package directories, preventing their direct symlinking and optimizing venv construction.
  • Special Handling for Nvidia Packages: Added a special case to always treat the 'nvidia' package as a namespace package, as its __init__.py files are technically regular packages but function as namespace packages, which previously led to numerous conflicts, especially when torch was installed.
  • Documentation Updates: Updated AGENTS.md with new guidelines for writing Starlark code, including restrictions on recursion and while loops, and information on repository rules.
  • Enhanced Testing: Added new test infrastructure and comprehensive test cases to validate the optimized venv creation for both implicit and pkgutil style namespace packages, including the 'nvidia' special case.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable optimization for creating virtual environments, specifically targeting pkgutil-style namespace packages and the nvidia package family. By intelligently identifying these namespace packages, the changes reduce analysis-time overhead and conflicts, which is a great performance improvement. The addition of comprehensive tests to validate this new logic is also excellent. However, I've found a critical issue in the implementation that nullifies the optimization for pkgutil and nvidia packages. Please see the detailed comment.

@rickeylev rickeylev added the do not merge Tag that prevents merging label Dec 14, 2025
@rickeylev rickeylev added do not merge Tag that prevents merging and removed do not merge Tag that prevents merging labels Dec 15, 2025
@rickeylev
Copy link
Collaborator Author

Doh, forgot to remove some debugging files...

@rickeylev rickeylev removed the do not merge Tag that prevents merging label Dec 15, 2025
@rickeylev
Copy link
Collaborator Author

PTAL. Debug code removed and tests fixed.

@aignas
Copy link
Collaborator

aignas commented Dec 16, 2025

Should we close #3401 as part of this?

@rickeylev
Copy link
Collaborator Author

Yes, marked this as fixing #3401. I'm not sure what optimizations are left action-count wise.

@rickeylev rickeylev enabled auto-merge December 17, 2025 08:21
@rickeylev rickeylev added this pull request to the merge queue Dec 17, 2025
Merged via the queue into bazel-contrib:main with commit ca2c5b2 Dec 17, 2025
4 checks passed
@rickeylev rickeylev deleted the refactor.optimize.venv.pkgutil.namespace.packages branch December 17, 2025 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

venv site-packages building requires millions of actions

2 participants