-
Notifications
You must be signed in to change notification settings - Fork 3k
Open
Description
Describe the bug
Experiencing persistent 401 Unauthorized errors when attempting to upload datasets to Hugging Face Hub using the datasets
library. The error occurs specifically with the CAS (Content Addressable Storage) service during the upload process. Tried using HF_HUB_DISABLE_XET=1. It seems to work for smaller files.
Exact error message :
Processing Files (0 / 0) : | | 0.00B / 0.00B 2025-09-10T09:44:35.657565Z ERROR Fatal Error: "cas::upload_xorb" api call failed (request id 01b[...]XXX): HTTP status client error (401 Unauthorized) for url (https://cas-server.xethub.hf.co/xorb/default/7f3abdc[...]XXX)
at /home/runner/work/xet-core/xet-core/cas_client/src/retry_wrapper.rs:113
Processing Files (0 / 0) : 0%| | 0.00B / 184kB, 0.00B/s
New Data Upload : 0%| | 0.00B / 184kB, 0.00B/s
❌ Failed to push some_dataset: Data processing error: CAS service error : Reqwest Error: HTTP status client error (401 Unauthorized), domain: https://cas-server.xethub.hf.co/xorb/default/7f3abdc[...]XXX
Workaround Attempts
- Disabled XET: Set
HF_HUB_DISABLE_XET=1
environment variable - Updated hf-xet: Use
hf-xet==1.1.9
rather than latest - Verified Authentication: Confirmed HF token is valid and has write permissions
- Tested with Smaller Datasets:
- 100 samples: ✅ SUCCESS (uploaded successfully)
- 10,000 samples: ❌ FAILS (401 Unauthorized)
Steps to reproduce the bug
from datasets import Dataset, DatasetDict
# Create dataset (example with 10,000 samples)
dataset = Dataset.from_dict({
"question": questions,
"answer": answers,
# ... other fields
})
# Split into train/test
dataset_dict = dataset.train_test_split(test_size=0.1)
# Upload to Hub
dataset_dict.push_to_hub("Org/some-dataset")
Expected behavior
Expected Behavior
- Dataset should upload successfully to Hugging Face Hub
- Progress bars should complete without authentication errors
- Dataset should be accessible at the specified repository URL
Actual Behavior
- Upload fails consistently with 401 Unauthorized error
- Error occurs specifically during CAS service interaction
- No progress is made on the upload (0% completion)
- Dataset is created on Hugging Face Hub with no data folder
Environment info
- Platform: SageMaker (AWS)
- Python Version: 3.12
- Libraries:
datasets
library (latest version)hf-xet==1.1.9
(attempted fix)
- Authentication: Hugging Face token configured
- Dataset Size: ~10,000 samples, works for smaller sizes (e.g. 100)
Metadata
Metadata
Assignees
Labels
No labels