Skip to content

Commit ca2c5b2

Browse files
refactor: optimize venv creation for nvidia and pkgutil style namespace packages (#3460)
When pkgutil style namespace packages are used, multiple distributions provide the same venv path (e.g. `foo/__init__.py`). The venv symlink logic then tries to symlink the `foo/` directory as it looks like the highest linkable directory. When conflict merging logic runs later, it then has to flatten a depset with all the files in the conflicting distributions. To fix, have whl_library() try to guess when a file is a pkgutil namespace package. These are then pass onto py_library's venv building logic so it can treat the directories as not directly linkable. A conflict still occurs, but it only contains the single `__init__.py` file. Along the way, special case the "nvidia" package name and always treat it as a namespace package. This is because nvidia packages aren't strictly correct: each has a blank `__init__.py` file (which marks it as a regular package, not namespace package). Special casing like this is undesirable, but it greatly reduces the number of conflicts if e.g. torch is installed, and I couldn't find any other metadata to indicate it's a namespace package. Along the way, add some hints to AGENTS.md so they understand repository rules better. Fixes #3401 --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
1 parent 10b28ef commit ca2c5b2

File tree

28 files changed

+458
-32
lines changed

28 files changed

+458
-32
lines changed

.bazelrc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@ build --//python/config_settings:incompatible_default_to_explicit_init_py=True
1818

1919
# Ensure ongoing compatibility with this flag.
2020
common --incompatible_disallow_struct_provider_syntax
21+
# Makes Bazel 7 act more like Bazel 8
22+
common --incompatible_use_plus_in_repo_names
2123

2224
# Windows makes use of runfiles for some rules
2325
build --enable_runfiles

AGENTS.md

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,68 @@ end of the documentation text.
2727
For doc strings, using triple quoted strings when the doc string is more than
2828
three lines. Do not use a trailing backslack (`\`) for the opening triple-quote.
2929

30+
### Starlark Code
31+
32+
Starlark does not support recursion. Use iterative algorithms instead.
33+
34+
Starlark does not support `while` loops. Use `for` loop with an appropriately
35+
sized iterable instead.
36+
37+
#### Starlark testing
38+
39+
For Starlark tests:
40+
41+
* Use `rules_testing`, not `bazel_skylib`.
42+
* See https://rules-testing.readthedocs.io/en/latest/analysis_tests.html for
43+
examples on using rules_testing.
44+
* See `tests/builders/builders_tests.bzl` for an example of using it in
45+
this project.
46+
47+
A test is defined in two parts:
48+
* A setup function, e.g. `def _test_foo(name)`. This defines targets
49+
and calls `analysis_test`.
50+
* An implementation function, e.g. `def _test_foo_impl(env, target)`. This
51+
contains asserts.
52+
53+
Example:
54+
55+
```
56+
# File: foo_tests.bzl
57+
58+
load("@rules_testing//lib:analysis_test.bzl", "analysis_test")
59+
load("@rules_testing//lib:test_suite.bzl", "test_suite")
60+
61+
_tests = []
62+
63+
def _test_foo(name):
64+
foo_library(
65+
name = name + "_subject",
66+
)
67+
analysis_test(
68+
name = name,
69+
impl = _test_foo_impl,
70+
target = name + "_subject",
71+
)
72+
_tests.append(_test_foo)
73+
74+
def _test_foo_impl(env, target):
75+
env.expect.that_whatever(target[SomeInfo].whatever).equals(expected)
76+
77+
def foo_test_suite(name):
78+
test_suite(name=name, tests=_tests)
79+
```
80+
81+
82+
#### Repository rules
83+
84+
The function argument `rctx` is a hint that the function is a repository rule,
85+
or used by a repository rule.
86+
87+
The function argument `mrctx` is a hint that the function can be used by a
88+
repository rule or module extension.
89+
90+
The `repository_ctx` API docs are at: https://bazel.build/rules/lib/builtins/repository_ctx
91+
3092
### bzl_library targets for bzl source files
3193

3294
* A `bzl_library` target should be defined for every `.bzl` file outside

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,8 @@ END_UNRELEASED_TEMPLATE
118118
`RULES_PYTHON_ENABLE_PIPSTAR=1` by default. Users of `experimental_index_url` that perform
119119
cross-builds should add {obj}`target_platforms` to their `pip.parse` invocations, which will
120120
become mandatory if any cross-builds are required from the next release.
121+
* (py_library) Attribute {obj}`namespace_package_files` added. It is a hint for
122+
optimizing venv creation.
121123

122124
[20251031]: https://github.com/astral-sh/python-build-standalone/releases/tag/20251031
123125
[20251202]: https://github.com/astral-sh/python-build-standalone/releases/tag/20251202

MODULE.bazel

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -238,6 +238,8 @@ use_repo(
238238
"buildkite_config",
239239
"implicit_namespace_ns_sub1",
240240
"implicit_namespace_ns_sub2",
241+
"pkgutil_nspkg1",
242+
"pkgutil_nspkg2",
241243
"rules_python_runtime_env_tc_info",
242244
"somepkg_with_build_files",
243245
"whl_with_build_files",

python/private/BUILD.bazel

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -675,6 +675,7 @@ bzl_library(
675675
"//:__subpackages__",
676676
],
677677
deps = [
678+
":py_internal_bzl",
678679
"@bazel_skylib//lib:types",
679680
],
680681
)

python/private/internal_dev_deps.bzl

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,30 @@ def _internal_dev_deps_impl(mctx):
6767
enable_implicit_namespace_pkgs = False,
6868
)
6969

70+
whl_from_dir_repo(
71+
name = "pkgutil_nspkg1_whl",
72+
root = "//tests/repos/pkgutil_nspkg1:BUILD.bazel",
73+
output = "pkgutil_nspkg1-1.0-any-none-any.whl",
74+
)
75+
whl_library(
76+
name = "pkgutil_nspkg1",
77+
whl_file = "@pkgutil_nspkg1_whl//:pkgutil_nspkg1-1.0-any-none-any.whl",
78+
requirement = "pkgutil_nspkg1",
79+
enable_implicit_namespace_pkgs = False,
80+
)
81+
82+
whl_from_dir_repo(
83+
name = "pkgutil_nspkg2_whl",
84+
root = "//tests/repos/pkgutil_nspkg2:BUILD.bazel",
85+
output = "pkgutil_nspkg2-1.0-any-none-any.whl",
86+
)
87+
whl_library(
88+
name = "pkgutil_nspkg2",
89+
whl_file = "@pkgutil_nspkg2_whl//:pkgutil_nspkg2-1.0-any-none-any.whl",
90+
requirement = "pkgutil_nspkg2",
91+
enable_implicit_namespace_pkgs = False,
92+
)
93+
7094
internal_dev_deps = module_extension(
7195
implementation = _internal_dev_deps_impl,
7296
doc = "This extension creates internal rules_python dev dependencies.",

python/private/py_library.bzl

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,20 @@ The topological order has been removed and if 2 different versions of the same P
9999
package are observed, the behaviour has no guarantees except that it is deterministic
100100
and that only one package version will be included.
101101
:::
102+
""",
103+
),
104+
"namespace_package_files": lambda: attrb.LabelList(
105+
allow_empty = True,
106+
allow_files = True,
107+
doc = """
108+
Files whose directories are namespace packages.
109+
110+
When {obj}`--venv_site_packages=yes` is set, this helps inform which directories should be
111+
treated as namespace packages and expect files from other targets to be contributed.
112+
This allows optimizing the generation of symlinks to be cheaper at analysis time.
113+
114+
:::{versionadded} VERSION_NEXT_FEATURE
115+
:::
102116
""",
103117
),
104118
"_add_srcs_to_runfiles_flag": lambda: attrb.Label(
@@ -251,6 +265,7 @@ def _get_imports_and_venv_symlinks(ctx, semantics):
251265
package,
252266
version_str,
253267
site_packages_root = imports[0],
268+
namespace_package_files = ctx.files.namespace_package_files,
254269
)
255270
else:
256271
imports = collect_imports(ctx, semantics)

python/private/pypi/BUILD.bazel

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -431,16 +431,16 @@ bzl_library(
431431
":attrs_bzl",
432432
":deps_bzl",
433433
":generate_whl_library_build_bazel_bzl",
434+
":parse_whl_name_bzl",
434435
":patch_whl_bzl",
435-
":pep508_requirement_bzl",
436436
":pypi_repo_utils_bzl",
437437
":whl_metadata_bzl",
438438
":whl_target_platforms_bzl",
439439
"//python/private:auth_bzl",
440-
"//python/private:bzlmod_enabled_bzl",
441440
"//python/private:envsubst_bzl",
442441
"//python/private:is_standalone_interpreter_bzl",
443442
"//python/private:repo_utils_bzl",
443+
"//python/private:util_bzl",
444444
"@rules_python_internal//:rules_python_config_bzl",
445445
],
446446
)

python/private/pypi/pypi_repo_utils.bzl

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616

1717
load("@bazel_skylib//lib:types.bzl", "types")
1818
load("//python/private:repo_utils.bzl", "repo_utils")
19+
load("//python/private:util.bzl", "is_importable_name")
1920

2021
def _get_python_interpreter_attr(mrctx, *, python_interpreter = None):
2122
"""A helper function for getting the `python_interpreter` attribute or it's default
@@ -161,9 +162,44 @@ def _execute_checked_stdout(mrctx, *, python, srcs, **kwargs):
161162
**_execute_prep(mrctx, python = python, srcs = srcs, **kwargs)
162163
)
163164

165+
def _find_namespace_package_files(rctx, install_dir):
166+
"""Finds all `__init__.py` files that belong to namespace packages.
167+
168+
A `__init__.py` file belongs to a namespace package if it contains `__path__ =`,
169+
`pkgutil`, and `extend_path(`.
170+
171+
Args:
172+
rctx (repository_ctx): The repository context.
173+
install_dir (path): The path to the install directory.
174+
175+
Returns:
176+
list[str]: A list of relative paths to `__init__.py` files that belong
177+
to namespace packages.
178+
"""
179+
180+
repo_root = str(rctx.path(".")) + "/"
181+
namespace_package_files = []
182+
for top_level_dir in install_dir.readdir():
183+
if not is_importable_name(top_level_dir.basename):
184+
continue
185+
init_py = top_level_dir.get_child("__init__.py")
186+
if not init_py.exists:
187+
continue
188+
content = rctx.read(init_py)
189+
190+
# Look for code resembling the pkgutil namespace setup code:
191+
# __path__ = __import__("pkgutil").extend_path(__path__, __name__)
192+
if ("__path__ =" in content and
193+
"pkgutil" in content and
194+
"extend_path(" in content):
195+
namespace_package_files.append(str(init_py).removeprefix(repo_root))
196+
197+
return namespace_package_files
198+
164199
pypi_repo_utils = struct(
165200
construct_pythonpath = _construct_pypath,
166201
execute_checked = _execute_checked,
167202
execute_checked_stdout = _execute_checked_stdout,
203+
find_namespace_package_files = _find_namespace_package_files,
168204
resolve_python_interpreter = _resolve_python_interpreter,
169205
)

python/private/pypi/whl_library.bzl

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -384,11 +384,13 @@ def _whl_library_impl(rctx):
384384
supports_whl_extraction = rp_config.supports_whl_extraction,
385385
)
386386

387+
install_dir_path = whl_path.dirname.get_child("site-packages")
387388
metadata = whl_metadata(
388-
install_dir = whl_path.dirname.get_child("site-packages"),
389+
install_dir = install_dir_path,
389390
read_fn = rctx.read,
390391
logger = logger,
391392
)
393+
namespace_package_files = pypi_repo_utils.find_namespace_package_files(rctx, install_dir_path)
392394

393395
# NOTE @aignas 2024-06-22: this has to live on until we stop supporting
394396
# passing `twine` as a `:pkg` library via the `WORKSPACE` builds.
@@ -432,6 +434,7 @@ def _whl_library_impl(rctx):
432434
data_exclude = rctx.attr.pip_data_exclude,
433435
group_deps = rctx.attr.group_deps,
434436
group_name = rctx.attr.group_name,
437+
namespace_package_files = namespace_package_files,
435438
)
436439
else:
437440
target_platforms = rctx.attr.experimental_target_platforms or []
@@ -491,6 +494,8 @@ def _whl_library_impl(rctx):
491494
)
492495
entry_points[entry_point_without_py] = entry_point_script_name
493496

497+
namespace_package_files = pypi_repo_utils.find_namespace_package_files(rctx, rctx.path("site-packages"))
498+
494499
build_file_contents = generate_whl_library_build_bazel(
495500
name = whl_path.basename,
496501
sdist_filename = sdist_filename,
@@ -509,6 +514,7 @@ def _whl_library_impl(rctx):
509514
"pypi_name={}".format(metadata["name"]),
510515
"pypi_version={}".format(metadata["version"]),
511516
],
517+
namespace_package_files = namespace_package_files,
512518
)
513519

514520
# Delete these in case the wheel had them. They generally don't cause

0 commit comments

Comments
 (0)