-
Notifications
You must be signed in to change notification settings - Fork 73
Open
Description
Here is the stacktrace of run_pretrain_bart.sh
error:
[rank0]: IndexError: Caught IndexError in DataLoader worker process 0.
[rank0]: Original Traceback (most recent call last):
[rank0]: File "/home/jupyter-raptor/.local/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
[rank0]: data = fetcher.fetch(index) # type: ignore[possibly-undefined]
[rank0]: File "/home/jupyter-raptor/.local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
[rank0]: data = [self.dataset[idx] for idx in possibly_batched_index]
[rank0]: File "/home/jupyter-raptor/.local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
[rank0]: data = [self.dataset[idx] for idx in possibly_batched_index]
[rank0]: File "/home/jupyter-raptor/pretrain_tokenizer/megatron/data/blendable_dataset.py", line 83, in __getitem__
[rank0]: return self.datasets[dataset_idx][sample_idx]
[rank0]: File "/home/jupyter-raptor/pretrain_tokenizer/megatron/data/bart_dataset.py", line 106, in __getitem__
[rank0]: return self.build_training_sample(sample, self.max_seq_length, np_rng)
[rank0]: File "/home/jupyter-raptor/pretrain_tokenizer/megatron/data/bart_dataset.py", line 148, in build_training_sample
[rank0]: source = self.add_whole_word_mask(source, mask_ratio, replace_length)
[rank0]: File "/home/jupyter-raptor/pretrain_tokenizer/megatron/data/bart_dataset.py", line 360, in add_whole_word_mask
[rank0]: source[indices[mask_random]] = torch.randint(
[rank0]: IndexError: The shape of the mask [2] at index 0 does not match the shape of the indexed tensor [1] at index 0
How can I debug it? It seems the dataset is problematic, but the dataset was generated by the MEGATRON Pre-process scripts in the tools/
folder and it runs without an error. The .bin
and .idx
are generated properly (and I put it in the dataset/
folder).
Metadata
Metadata
Assignees
Labels
No labels