gh-143658: importlib.metadata: Use `str.translate` to improve performance of `importlib.metadata.Prepared.normalized` #143660

hugovk · 2026-01-10T13:44:21Z

We can apply @henryiii's improvement to packaging in pypa/packaging#1030 (see also https://iscinumpy.dev/post/packaging-faster/) to improve the performance of canonicalize_name and make it ~3.7 times faster.

Benchmark

Run Prepared.normalize(n) on every name in PyPI:

# benchmark_names_stdlib.py
import sqlite3
import timeit
from importlib.metadata import Prepared

# Get data with:
# curl -L https://github.com/pypi-data/pypi-json-data/releases/download/latest/pypi-data.sqlite.gz | gzip -d > pypi-data.sqlite
# Or ues pre-cached files from:
# https://gist.github.com/hugovk/efdbee0620cc64df7b405b52cf0b6e42

CACHE_FILE = "/tmp/bench/names.txt"
DB_FILE = "/tmp/bench/pypi-data.sqlite"

try:
    with open(CACHE_FILE) as f:
        TEST_ALL_NAMES = [line.rstrip("\n") for line in f]
except FileNotFoundError:
    TEST_ALL_NAMES = []
    with sqlite3.connect(DB_FILE) as conn:
        with open(CACHE_FILE, "w") as cache:
            for (name,) in conn.execute("SELECT name FROM projects"):
                if name:
                    TEST_ALL_NAMES.append(name)
                    cache.write(name + "\n")


def bench():
    for n in TEST_ALL_NAMES:
        Prepared.normalize(n)


if __name__ == "__main__":
    print(f"Loaded {len(TEST_ALL_NAMES):,} names")
    t = timeit.timeit("bench()", globals=globals(), number=1)
    print(f"Time: {t:.4f} seconds")

Benchmark data can be found at https://gist.github.com/hugovk/efdbee0620cc64df7b405b52cf0b6e42

Before

With optimisations:

❯ ./python.exe benchmark_names_stdlib.py
Loaded 8,344,947 names
Time: 5.1483 seconds

After

❯ ./python.exe benchmark_names_stdlib.py
Loaded 8,344,947 names
Time: 1.3754 seconds

3.7 times faster.

Issue: importlib.metadata: Use str.translate to improve performance of importlib.metadata.Prepared.normalized #143658

Co-Authored-By: Henry Schreiner <henryschreineriii@gmail.com>

Misc/NEWS.d/next/Library/2026-01-10-15-40-57.gh-issue-143658.Ox6pE5.rst

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>

picnixz

Do we have tests actually? if not, maybe it'd be good to add some.

Lib/importlib/metadata/__init__.py

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>

johnslavik

Small ideas

Lib/test/test_importlib/metadata/test_api.py

Lib/importlib/metadata/__init__.py

Co-authored-by: Bartosz Sławecki <bartosz@ilikepython.com>

a12k · 2026-01-10T21:12:58Z

Lib/importlib/metadata/__init__.py

        PEP 503 normalization plus dashes as underscores.
        """
-        return re.sub(r"[-_.]+", "-", name).lower().replace('-', '_')
+        # Emulates ``re.sub(r"[-_.]+", "-", name).lower()`` from PEP 503


@hugovk I did a quick scan of the 8.34M package names, and 3.17M are purely lowercase with no separators. Given that, I tried to add a fast path check here before we normalize the table and found strong improvements in the benchmark. I think the most readable version of the fast path would be:

if name.islower() and name.isalnum(): return name

I don't think it's worth it. What Hugo suggested is readable enough.

That's not an unreasonable position. My reasoning was that a significant portion of packages (roughly 38%) are already alphanumeric and lowercase. This fast path allows skipping the translation and loop overhead for the most common case. I felt the performance gain for those users justified the small increase in complexity, but I'm happy to defer to your preference on the balance between speed and code footprint.

How much performance gain are we speaking about though?

They're very close. If anything, the "fast path" seems to be a bit slower :)

❯ # main ❯ ./python.exe -m timeit -s "from importlib.metadata import Prepared" "Prepared.normalize('pillow')" 1000000 loops, best of 5: 390 nsec per loop ❯ ./python.exe -m timeit -s "from importlib.metadata import Prepared" "Prepared.normalize('pillow')" 1000000 loops, best of 5: 393 nsec per loop

❯ # PR ❯ ./python.exe -m timeit -s "from importlib.metadata import Prepared" "Prepared.normalize('pillow')" 5000000 loops, best of 5: 95.8 nsec per loop ❯ ./python.exe -m timeit -s "from importlib.metadata import Prepared" "Prepared.normalize('pillow')" 5000000 loops, best of 5: 96 nsec per loop

❯ # fast path ❯ ./python.exe -m timeit -s "from importlib.metadata import Prepared" "Prepared.normalize('pillow')" 5000000 loops, best of 5: 94.3 nsec per loop ❯ ./python.exe -m timeit -s "from importlib.metadata import Prepared" "Prepared.normalize('pillow')" 5000000 loops, best of 5: 97.5 nsec per loop

❯ hyperfine --warmup 1 --runs 3 \ --prepare "git checkout main" "./python.exe benchmark_names_stdlib.py # main" \ --prepare "git checkout 3.15-importlib.metadata-canonicalize_name" "./python.exe benchmark_names_stdlib.py # PR" \ --prepare "git checkout 3.15-importlib.metadata-canonicalize_name-fast-path" "./python.exe benchmark_names_stdlib.py # fast path" Benchmark 1: ./python.exe benchmark_names_stdlib.py # main Time (mean ± σ): 5.633 s ± 0.046 s [User: 5.491 s, System: 0.101 s] Range (min … max): 5.592 s … 5.683 s 3 runs Benchmark 2: ./python.exe benchmark_names_stdlib.py # PR Time (mean ± σ): 1.879 s ± 0.026 s [User: 1.783 s, System: 0.081 s] Range (min … max): 1.858 s … 1.907 s 3 runs Benchmark 3: ./python.exe benchmark_names_stdlib.py # fast path Time (mean ± σ): 1.952 s ± 0.005 s [User: 1.863 s, System: 0.080 s] Range (min … max): 1.947 s … 1.957 s 3 runs Summary ./python.exe benchmark_names_stdlib.py # PR ran 1.04 ± 0.01 times faster than ./python.exe benchmark_names_stdlib.py # fast path 3.00 ± 0.05 times faster than ./python.exe benchmark_names_stdlib.py # main

Running a slightly modified benchmark as the above (timeit + best of 3), on my Macbook (Apple Silicon), main branch with debug build of cPython:

Current PR (Translate + Loop) 5.4756s

With Fast Path (isalnum) 4.4691s

So for me, about 18.4% reduction in total time (or +22% speedup) on the full pypi benchmarl.

Ah, we posted at about the same time. Interesting results from my end compared to yours, but would considers your canonical (especially since I'm on debug build with the extra overhead), so please disregard my comments then @hugovk :)

hugovk and others added 2 commits January 10, 2026 14:29

Use translate to improve performance of canonicalize_name

76e2272

Co-Authored-By: Henry Schreiner <henryschreineriii@gmail.com>

Add blurb

00c76fc

hugovk requested review from jaraco and warsaw as code owners January 10, 2026 13:44

bedevere-app bot added the awaiting core review label Jan 10, 2026

hugovk changed the title ~~importlib.metadata: Use translate to improve performance of canonicalize_name~~ gh-143658: importlib.metadata: Use translate to improve performance of canonicalize_name Jan 10, 2026

bedevere-app bot mentioned this pull request Jan 10, 2026

importlib.metadata: Use str.translate to improve performance of importlib.metadata.Prepared.normalized #143658

Open

hugovk added performance Performance or resource usage topic-importlib labels Jan 10, 2026

picnixz reviewed Jan 10, 2026

View reviewed changes

Misc/NEWS.d/next/Library/2026-01-10-15-40-57.gh-issue-143658.Ox6pE5.rst Outdated Show resolved Hide resolved

picnixz changed the title ~~gh-143658: importlib.metadata: Use translate to improve performance of canonicalize_name~~ gh-143658: importlib.metadata: Use str.translate to improve performance of importlib.metadata.Prepared.normalized Jan 10, 2026

Improve blurb

c1bb3cc

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>

picnixz reviewed Jan 10, 2026

View reviewed changes

Lib/importlib/metadata/__init__.py Outdated Show resolved Hide resolved

hugovk and others added 3 commits January 10, 2026 17:16

Improve comment

0eae552

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>

Add tests for Prepared.normalize

1657880

Fix lint

ee0e6aa

johnslavik reviewed Jan 10, 2026

View reviewed changes

Lib/test/test_importlib/metadata/test_api.py Show resolved Hide resolved

Lib/importlib/metadata/__init__.py Show resolved Hide resolved

henryiii mentioned this pull request Jan 10, 2026

perf: tiny import time improvement and simplification pypa/packaging#1047

Open

Add extra test case with repeated separator

7b7f9a8

Co-authored-by: Bartosz Sławecki <bartosz@ilikepython.com>

a12k reviewed Jan 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-143658: importlib.metadata: Use `str.translate` to improve performance of `importlib.metadata.Prepared.normalized` #143660

gh-143658: importlib.metadata: Use `str.translate` to improve performance of `importlib.metadata.Prepared.normalized` #143660

hugovk commented Jan 10, 2026 •

edited by bedevere-app bot

Loading

Uh oh!

Uh oh!

picnixz left a comment

Uh oh!

Uh oh!

johnslavik left a comment

Uh oh!

Uh oh!

Uh oh!

a12k Jan 10, 2026

Uh oh!

picnixz Jan 10, 2026

Uh oh!

a12k Jan 10, 2026

Uh oh!

picnixz Jan 10, 2026

Uh oh!

hugovk Jan 10, 2026

Uh oh!

a12k Jan 10, 2026

Uh oh!

a12k Jan 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

gh-143658: importlib.metadata: Use str.translate to improve performance of importlib.metadata.Prepared.normalized #143660

Are you sure you want to change the base?

gh-143658: importlib.metadata: Use str.translate to improve performance of importlib.metadata.Prepared.normalized #143660

Conversation

hugovk commented Jan 10, 2026 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark

Before

After

Uh oh!

Uh oh!

picnixz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

johnslavik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

a12k Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

picnixz Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

a12k Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

picnixz Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

hugovk Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

a12k Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

a12k Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gh-143658: importlib.metadata: Use `str.translate` to improve performance of `importlib.metadata.Prepared.normalized` #143660

gh-143658: importlib.metadata: Use `str.translate` to improve performance of `importlib.metadata.Prepared.normalized` #143660

hugovk commented Jan 10, 2026 •

edited by bedevere-app bot

Loading

a12k Jan 10, 2026 •

edited

Loading