Skip to content

Commit 2e3219e

Browse files
fix(deps): require datasets from git for Nifti.embed_storage bug fix (#4)
The stable release of HuggingFace datasets (4.4.x) has a critical bug where NIfTI files are uploaded as empty bytes (0 bytes) to Hub because Nifti.embed_storage was broken. The fix is only in the dev branch. Changes: - Add [tool.uv.sources] to override datasets to git version - Bump datasets minimum to >=4.4.0 (Nifti support) - Add huggingface-hub>=0.32.0 (XET storage support) - Add Critical Dependency section to README with verification steps See: huggingface/datasets#7815
1 parent 1bb192a commit 2e3219e

File tree

2 files changed

+68
-1
lines changed

2 files changed

+68
-1
lines changed

README.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
Template for converting BIDS neuroimaging datasets (e.g., ARC, SOOP) into Hugging Face Datasets with NIfTI + tabular features.
44

5+
> ⚠️ **CRITICAL DEPENDENCY WARNING**: This template requires installing `datasets` from GitHub (not PyPI). See [Critical Dependency Requirement](#critical-dependency-requirement-huggingface-datasets) below.
6+
57
## Overview
68

79
This repository provides a **reusable template** for:
@@ -43,6 +45,62 @@ uv run hf-bids-nifti --help
4345

4446
> **Note:** ARC and SOOP commands are templates that will raise `NotImplementedError` until their file-table builders are implemented.
4547
48+
## Critical Dependency Requirement: HuggingFace Datasets
49+
50+
### The Problem
51+
52+
The stable release of `datasets` (PyPI versions 3.x, 4.x including 4.4.1) has a **critical bug** where NIfTI files are uploaded as **empty bytes (0 bytes)** to HuggingFace Hub. This happens because `Nifti.embed_storage` was broken in stable releases.
53+
54+
- **Silent failure**: `push_to_hub(embed_external_files=True)` uploads 0-byte files
55+
- **No error raised**: Your dataset appears to have data but all NIfTI files are empty
56+
- **Only visible when loading**: `load_dataset()` returns empty/corrupted images
57+
58+
See [huggingface/datasets#7815](https://github.com/huggingface/datasets/pull/7815) for the original Nifti support PR and subsequent bug reports.
59+
60+
### The Fix
61+
62+
Install `datasets` from the GitHub main branch (dev version 4.4.2.dev0 or later). This template is pre-configured to do this via `[tool.uv.sources]` in `pyproject.toml`.
63+
64+
### Verification
65+
66+
After installation, verify the version:
67+
68+
```python
69+
import datasets
70+
print(datasets.__version__) # Should show "4.4.2.dev0" or similar dev version
71+
```
72+
73+
### Manual Installation (if not using this template)
74+
75+
**For uv (pyproject.toml):**
76+
77+
```toml
78+
[project]
79+
dependencies = [
80+
"datasets>=4.4.0", # Minimum version for Nifti support
81+
"huggingface-hub>=0.32.0", # Required for XET storage
82+
# ... other deps
83+
]
84+
85+
[tool.uv.sources]
86+
# CRITICAL: Override datasets to use git version for Nifti.embed_storage fix
87+
datasets = { git = "https://github.com/huggingface/datasets.git" }
88+
```
89+
90+
**For pip/requirements.txt:**
91+
92+
```text
93+
datasets @ git+https://github.com/huggingface/datasets.git
94+
```
95+
96+
**Direct uv command:**
97+
98+
```bash
99+
uv add "datasets @ git+https://github.com/huggingface/datasets.git"
100+
```
101+
102+
> **Note**: This requirement will change once the fix is merged into a stable release. Check [HuggingFace datasets releases](https://github.com/huggingface/datasets/releases) for updates.
103+
46104
## Project Structure
47105

48106
```

pyproject.toml

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,22 @@ classifiers = [
2222
]
2323

2424
dependencies = [
25-
"datasets>=3.4.0",
25+
"datasets>=4.4.0", # Minimum version with Nifti support (see [tool.uv.sources] for critical override)
26+
"huggingface-hub>=0.32.0", # Required for XET storage support
2627
"nibabel>=5.0.0",
2728
"pandas>=2.0.0",
2829
"typer>=0.12.0",
2930
"pydantic>=2.0.0",
3031
]
3132

33+
# CRITICAL: Override datasets to use git version for Nifti.embed_storage fix
34+
# The stable release (4.4.x) has a bug where NIfTI files are uploaded as empty bytes (0 bytes)
35+
# to HuggingFace Hub because Nifti.embed_storage was broken. The fix is in the dev branch only.
36+
# See: https://github.com/huggingface/datasets/pull/7815 (original Nifti support)
37+
# This requirement will change once the fix is merged into a stable release.
38+
[tool.uv.sources]
39+
datasets = { git = "https://github.com/huggingface/datasets.git" }
40+
3241
[project.optional-dependencies]
3342
dev = [
3443
"pytest>=8.0.0",

0 commit comments

Comments
 (0)