Skip to content

Conversation

@hugovk
Copy link
Member

@hugovk hugovk commented Jan 13, 2026

This is the same as python/cpython#143660.


We can apply @henryiii's improvement to packaging in pypa/packaging#1030 (see also https://iscinumpy.dev/post/packaging-faster/) to improve the performance of normalize and make it ~3.7 times faster.

Benchmark

Run Prepared.normalize(n) on every name in PyPI:

# benchmark_names.py
import sqlite3
import timeit
from importlib_metadata import Prepared

# Get data with:
# curl -L https://github.com/pypi-data/pypi-json-data/releases/download/latest/pypi-data.sqlite.gz | gzip -d > pypi-data.sqlite
# Or ues pre-cached files from:
# https://gist.github.com/hugovk/efdbee0620cc64df7b405b52cf0b6e42

CACHE_FILE = "/tmp/bench/names.txt"
DB_FILE = "/tmp/bench/pypi-data.sqlite"

try:
    with open(CACHE_FILE) as f:
        TEST_ALL_NAMES = [line.rstrip("\n") for line in f]
except FileNotFoundError:
    TEST_ALL_NAMES = []
    with sqlite3.connect(DB_FILE) as conn:
        with open(CACHE_FILE, "w") as cache:
            for (name,) in conn.execute("SELECT name FROM projects"):
                if name:
                    TEST_ALL_NAMES.append(name)
                    cache.write(name + "\n")


def bench():
    for n in TEST_ALL_NAMES:
        Prepared.normalize(n)


if __name__ == "__main__":
    print(f"Loaded {len(TEST_ALL_NAMES):,} names")
    t = timeit.timeit("bench()", globals=globals(), number=1)
    print(f"Time: {t:.4f} seconds")

Benchmark data can be found at https://gist.github.com/hugovk/efdbee0620cc64df7b405b52cf0b6e42

Before

python3.14 --version
Python 3.14.2python3.14 benchmark_names.py
Loaded 8,344,947 names
Time: 4.8933 seconds

After

python3.14 benchmark_names.py
Loaded 8,344,947 names
Time: 1.3266 seconds

3.7 times faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant