Skip to content

[BUG]RuntimeError: stack expects each tensor to be equal size, but got [] at entry 0 and [1] at entry 6 #3137

@GeekTemo

Description

@GeekTemo

When using torchrl’s SyncDataCollector with a custom environment object, in the _step() method, if the "done" value returns True, an error occurs:

/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/torchrl/collectors/collectors.py:870: UserWarning: total_frames (1000) is not exactly divisible by frames_per_batch (30). This means 20 additional frames will be collected.To silence this message, set the environment variable RL_WARNINGS to False.
warnings.warn(
/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/torchrl/collectors/collectors.py:1429: UserWarning: An output with one or more elements was resized since it had shape [], which does not match the required output shape [1]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/Resize.cpp:38.)
traj_ids = traj_ids.masked_scatter(traj_sop, new_traj)
Traceback (most recent call last):
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/torchrl/collectors/collectors.py", line 1586, in rollout
result = torch.stack(
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/base.py", line 658, in torch_function
return TD_HANDLED_FUNCTIONS[func](*args, **kwargs)
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/_torch_func.py", line 737, in _stack
out.stack_onto(list_of_tensordicts, dim)
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/_td.py", line 2665, in stack_onto
new_dest = torch.stack(
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/base.py", line 658, in torch_function
return TD_HANDLED_FUNCTIONS[func](*args, **kwargs)
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/_torch_func.py", line 737, in _stack
out.stack_onto(list_of_tensordicts, dim)
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/_td.py", line 2665, in stack_onto
new_dest = torch.stack(
RuntimeError: stack expects each tensor to be equal size, but got [] at entry 0 and [1] at entry 6

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/xjf/Gaode_Projects/busi_scene_cover/assert_assembly/fine_tunning_model.py", line 240, in
example2()
File "/Users/xjf/Gaode_Projects/busi_scene_cover/assert_assembly/fine_tunning_model.py", line 232, in example2
fine_tunning_model(ds, task_id, model_url_, url_,
File "/Users/xjf/Gaode_Projects/busi_scene_cover/assert_assembly/fine_tunning_model.py", line 213, in fine_tunning_model
fint_tunning(ft_data, model_path, json_tokenizer_file, save_dir)
File "/Users/xjf/Gaode_Projects/busi_scene_cover/assert_assembly/fine_tunning_with_rl_v2.py", line 480, in fint_tunning
for epoch, data in enumerate(collector):
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/torchrl/collectors/collectors.py", line 341, in iter
yield from self.iterator()
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/torchrl/collectors/collectors.py", line 1256, in iterator
tensordict_out = self.rollout()
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/torchrl/_utils.py", line 661, in unpack_rref_and_invoke_function
return func(self, *args, **kwargs)
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/torchrl/collectors/collectors.py", line 1594, in rollout
result = torch.stack(
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/base.py", line 658, in torch_function
return TD_HANDLED_FUNCTIONS[func](*args, **kwargs)
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/_torch_func.py", line 737, in _stack
out.stack_onto(list_of_tensordicts, dim)
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/_td.py", line 2665, in stack_onto
new_dest = torch.stack(
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/base.py", line 658, in torch_function
return TD_HANDLED_FUNCTIONS[func](*args, **kwargs)
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/_torch_func.py", line 737, in _stack
out.stack_onto(list_of_tensordicts, dim)
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/_td.py", line 2665, in stack_onto
new_dest = torch.stack(
RuntimeError: stack expects each tensor to be equal size, but got [] at entry 0 and [1] at entry 6

Through debugging, I found that when merging the TensorDicts, an error is raised for the key "traj_ids". If the "done" value is False, everything works fine.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions