Use `str.translate` to improve performance of `normalize` #529

hugovk · 2026-01-13T12:28:57Z

This is the same as python/cpython#143660.

We can apply @henryiii's improvement to packaging in pypa/packaging#1030 (see also https://iscinumpy.dev/post/packaging-faster/) to improve the performance of normalize and make it ~3.7 times faster.

Benchmark

Run Prepared.normalize(n) on every name in PyPI:

# benchmark_names.py
import sqlite3
import timeit
from importlib_metadata import Prepared

# Get data with:
# curl -L https://github.com/pypi-data/pypi-json-data/releases/download/latest/pypi-data.sqlite.gz | gzip -d > pypi-data.sqlite
# Or ues pre-cached files from:
# https://gist.github.com/hugovk/efdbee0620cc64df7b405b52cf0b6e42

CACHE_FILE = "/tmp/bench/names.txt"
DB_FILE = "/tmp/bench/pypi-data.sqlite"

try:
    with open(CACHE_FILE) as f:
        TEST_ALL_NAMES = [line.rstrip("\n") for line in f]
except FileNotFoundError:
    TEST_ALL_NAMES = []
    with sqlite3.connect(DB_FILE) as conn:
        with open(CACHE_FILE, "w") as cache:
            for (name,) in conn.execute("SELECT name FROM projects"):
                if name:
                    TEST_ALL_NAMES.append(name)
                    cache.write(name + "\n")


def bench():
    for n in TEST_ALL_NAMES:
        Prepared.normalize(n)


if __name__ == "__main__":
    print(f"Loaded {len(TEST_ALL_NAMES):,} names")
    t = timeit.timeit("bench()", globals=globals(), number=1)
    print(f"Time: {t:.4f} seconds")

Benchmark data can be found at https://gist.github.com/hugovk/efdbee0620cc64df7b405b52cf0b6e42

Before

❯ python3.14 --version
Python 3.14.2

❯ python3.14 benchmark_names.py
Loaded 8,344,947 names
Time: 4.8933 seconds

After

❯ python3.14 benchmark_names.py
Loaded 8,344,947 names
Time: 1.3266 seconds

3.7 times faster.

Use translate to improve performance of normalize

89d9346

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Use `str.translate` to improve performance of `normalize` #529

Use `str.translate` to improve performance of `normalize` #529

hugovk commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Use str.translate to improve performance of normalize #529

Are you sure you want to change the base?

Use str.translate to improve performance of normalize #529

Conversation

hugovk commented Jan 13, 2026

Benchmark

Before

After

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Use `str.translate` to improve performance of `normalize` #529

Use `str.translate` to improve performance of `normalize` #529