-
Notifications
You must be signed in to change notification settings - Fork 27
Description
Hi @qubvel, I was testing the DFine_finetuning_on_a_custom_dataset.ipynb notebook and got stuck on this error. Blocking on 1 GPU works correctly.
RuntimeError Traceback (most recent call last)
Cell In[16], line 13
1 from transformers import Trainer
3 trainer = Trainer(
4 model=model,
5 args=training_args,
(...) 10 compute_metrics=eval_compute_metrics_fn,
11 )
---> 13 trainer.train()
File ~/miniconda3/envs/d-fine/lib/python3.12/site-packages/transformers/trainer.py:2239, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
2237 hf_hub_utils.enable_progress_bars()
2238 else:
-> 2239 return inner_training_loop(
2240 args=args,
2241 resume_from_checkpoint=resume_from_checkpoint,
2242 trial=trial,
2243 ignore_keys_for_eval=ignore_keys_for_eval,
2244 )
File ~/miniconda3/envs/d-fine/lib/python3.12/site-packages/transformers/trainer.py:2554, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
2547 context = (
2548 functools.partial(self.accelerator.no_sync, model=model)
2549 if i != len(batch_samples) - 1
...
File "/home/csi/miniconda3/envs/d-fine/lib/python3.12/site-packages/transformers/models/d_fine/modeling_d_fine.py", line 1647, in forward
reference_points_unact = torch.concat([denoising_bbox_unact, reference_points_unact], 1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 16 but got size 8 for tensor number 1 in the list.
Do you have an idea? Would it be an argument in TrainingArguments to change?
PS: output_dir="d-fine-m-cppe5-finetune-2", in TrainingArguments but 'd-fine-r50-cppe5-finetune' in Inference