-
Notifications
You must be signed in to change notification settings - Fork 3k
Open
Description
Describe the bug
If you import
a module that runs datasets.map(f, num_proc=N)
at the top-level, Python hangs.
Steps to reproduce the bug
- Create a file that runs datasets.map at the top-level:
cat <<EOF > import_me.py
import datasets
the_dataset = datasets.load_dataset("openai/openai_humaneval")
the_dataset = the_dataset.map(lambda item: item, num_proc=2)
EOF
- Start Python REPL:
uv run --python 3.12.3 --with "datasets==4.0.0" python3
Python 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
- Import the file:
import import_me
Observe hang.
Expected behavior
Ideally would not hang, or would fallback to num_proc=1 with a warning.
Environment info
datasets
version: 4.0.0- Platform: Linux-6.14.0-29-generic-x86_64-with-glibc2.39
- Python version: 3.12.3
huggingface_hub
version: 0.34.4- PyArrow version: 21.0.0
- Pandas version: 2.3.2
fsspec
version: 2025.3.0
Metadata
Metadata
Assignees
Labels
No labels