Skip to content

Support Nifti feature type (bump datasets dependency) #1

@The-Obstacle-Is-The-Way

Description

@The-Obstacle-Is-The-Way

Problem

The Dataset Viewer fails on datasets that use the Nifti feature type with:

ValueError: Feature type 'Nifti' not found. Available feature types: ['Value', 'ClassLabel', 'Translation', 'TranslationVariableLanguages', 'LargeList', 'List', 'Array2D', 'Array3D', 'Array4D', 'Array5D', 'Audio', 'Image', 'Video', 'Pdf']

Affected dataset: hugging-science/arc-aphasia-bids

Root Cause

  1. huggingface/datasets added Nifti support in PR #7815 (merged 2025-10-24)
  2. dataset-viewer pins datasets==4.1.1 in libs/libcommon/pyproject.toml
  3. Nifti is only in the datasets main branch (will be in next PyPI release)

When the Dataset Viewer parses a README with dtype: nifti in the YAML frontmatter, it crashes because 4.1.1 doesn't know what Nifti is.

Full Traceback

File "/src/services/worker/src/worker/job_runners/dataset/config_names.py", line 66, in compute_config_names_response
  config_names = get_dataset_config_names(
File "/usr/local/lib/python3.12/site-packages/datasets/inspect.py", line 161, in get_dataset_config_names
  dataset_module = dataset_module_factory(
File "/usr/local/lib/python3.12/site-packages/datasets/load.py", line 1031, in dataset_module_factory
  raise e1 from None
File "/usr/local/lib/python3.12/site-packages/datasets/load.py", line 605, in get_module
  dataset_infos = DatasetInfosDict.from_dataset_card_data(dataset_card_data)
File "/usr/local/lib/python3.12/site-packages/datasets/info.py", line 386, in from_dataset_card_data
  dataset_info = DatasetInfo._from_yaml_dict(dataset_card_data["dataset_info"])
File "/usr/local/lib/python3.12/site-packages/datasets/info.py", line 317, in _from_yaml_dict
  yaml_data["features"] = Features._from_yaml_list(yaml_data["features"])
File "/usr/local/lib/python3.12/site-packages/datasets/features/features.py", line 2031, in _from_yaml_list
  return cls.from_dict(from_yaml_inner(yaml_data))
File "/usr/local/lib/python3.12/site-packages/datasets/features/features.py", line 1876, in from_dict
  obj = generate_from_dict(dic)
File "/usr/local/lib/python3.12/site-packages/datasets/features/features.py", line 1463, in generate_from_dict
  return {key: generate_from_dict(value) for key, value in obj.items()}
File "/usr/local/lib/python3.12/site-packages/datasets/features/features.py", line 1469, in generate_from_dict
  raise ValueError(f"Feature type '{_type}' not found. Available feature types: {list(_FEATURE_TYPES.keys())}")
ValueError: Feature type 'Nifti' not found.

Proposed Fix

Bump datasets dependency in libs/libcommon/pyproject.toml to a version that includes Nifti:

# Current
datasets = {version = "4.1.1", extras = ["vision"]}

# After next datasets release (or use git)
datasets = {version = ">=4.5.0", extras = ["vision"]}

Context

  • Nifti is for NIfTI neuroimaging files (.nii, .nii.gz) - standard format for MRI/fMRI data
  • The datasets library now has full support including:
    • Nifti() feature type (#7815)
    • embed_storage for uploads (#7853)
    • Visualization via niivue (#7878)
  • This enables ML workflows on medical imaging datasets

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions