Skip to content

IndexError: Caught IndexError in DataLoader worker process 2 #120

@aboah1994

Description

@aboah1994

[2022-07-05 00:40:55,618 - train - INFO] - One GPU or CPU training mode start...
[2022-07-05 00:40:56,108 - train - INFO] - Dataloader instances created. Train datasets: 1850 samples Validation datasets: 1850 samples.
[2022-07-05 00:40:57,098 - train - INFO] - Model created, trainable parameters: 68567386.
[2022-07-05 00:40:57,099 - train - INFO] - Optimizer and lr_scheduler created.
[2022-07-05 00:40:57,100 - train - INFO] - Max_epochs: 100 Log_per_step: 10 Validation_per_step: 50.
[2022-07-05 00:40:57,100 - train - INFO] - Training start...
[2022-07-05 00:40:57,173 - trainer - WARNING] - Training is using GPU 0!
Traceback (most recent call last):
File "train.py", line 162, in
entry_point(config)
File "train.py", line 126, in entry_point
main(config, local_master, logger if local_master else None)
File "train.py", line 74, in main
trainer.train()
File "/content/gdrive/MyDrive/hdr/PICK-pytorch/trainer/trainer.py", line 135, in train
result_dict = self._train_epoch(epoch)
File "/content/gdrive/MyDrive/hdr/PICK-pytorch/trainer/trainer.py", line 199, in _train_epoch
for step_idx, input_data_item in enumerate(self.data_loader):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 838, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 2.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/gdrive/MyDrive/hdr/PICK-pytorch/data_utils/pick_dataset.py", line 111, in getitem
boxes_and_transcripts_file = self.get_ann_file(Path(dataitem['file_name']).stem)
File "/content/gdrive/MyDrive/hdr/PICK-pytorch/data_utils/pick_dataset.py", line 100, in get_ann_file
filename = list(self.boxes_and_transcripts_folder.glob(f'**/{basename}.*'))[0]
IndexError: list index out of range

Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/pin_memory.py", line 25, in _pin_memory_loop
r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
File "/usr/lib/python3.7/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/reductions.py", line 294, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.7/multiprocessing/resource_sharer.py", line 58, in detach
return reduction.recv_handle(conn)
File "/usr/lib/python3.7/multiprocessing/reduction.py", line 185, in recv_handle
return recvfds(s, 1)[0]
File "/usr/lib/python3.7/multiprocessing/reduction.py", line 153, in recvfds
msg, ancdata, flags, addr = sock.recvmsg(1, socket.CMSG_SPACE(bytes_size))
ConnectionResetError: [Errno 104] Connection reset by peer

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions