-
Notifications
You must be signed in to change notification settings - Fork 406
Description
When using torchrl’s SyncDataCollector with a custom environment object, in the _step() method, if the "done" value returns True, an error occurs:
/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/torchrl/collectors/collectors.py:870: UserWarning: total_frames (1000) is not exactly divisible by frames_per_batch (30). This means 20 additional frames will be collected.To silence this message, set the environment variable RL_WARNINGS to False.
warnings.warn(
/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/torchrl/collectors/collectors.py:1429: UserWarning: An output with one or more elements was resized since it had shape [], which does not match the required output shape [1]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/Resize.cpp:38.)
traj_ids = traj_ids.masked_scatter(traj_sop, new_traj)
Traceback (most recent call last):
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/torchrl/collectors/collectors.py", line 1586, in rollout
result = torch.stack(
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/base.py", line 658, in torch_function
return TD_HANDLED_FUNCTIONS[func](*args, **kwargs)
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/_torch_func.py", line 737, in _stack
out.stack_onto(list_of_tensordicts, dim)
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/_td.py", line 2665, in stack_onto
new_dest = torch.stack(
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/base.py", line 658, in torch_function
return TD_HANDLED_FUNCTIONS[func](*args, **kwargs)
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/_torch_func.py", line 737, in _stack
out.stack_onto(list_of_tensordicts, dim)
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/_td.py", line 2665, in stack_onto
new_dest = torch.stack(
RuntimeError: stack expects each tensor to be equal size, but got [] at entry 0 and [1] at entry 6
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/xjf/Gaode_Projects/busi_scene_cover/assert_assembly/fine_tunning_model.py", line 240, in
example2()
File "/Users/xjf/Gaode_Projects/busi_scene_cover/assert_assembly/fine_tunning_model.py", line 232, in example2
fine_tunning_model(ds, task_id, model_url_, url_,
File "/Users/xjf/Gaode_Projects/busi_scene_cover/assert_assembly/fine_tunning_model.py", line 213, in fine_tunning_model
fint_tunning(ft_data, model_path, json_tokenizer_file, save_dir)
File "/Users/xjf/Gaode_Projects/busi_scene_cover/assert_assembly/fine_tunning_with_rl_v2.py", line 480, in fint_tunning
for epoch, data in enumerate(collector):
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/torchrl/collectors/collectors.py", line 341, in iter
yield from self.iterator()
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/torchrl/collectors/collectors.py", line 1256, in iterator
tensordict_out = self.rollout()
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/torchrl/_utils.py", line 661, in unpack_rref_and_invoke_function
return func(self, *args, **kwargs)
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/torchrl/collectors/collectors.py", line 1594, in rollout
result = torch.stack(
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/base.py", line 658, in torch_function
return TD_HANDLED_FUNCTIONS[func](*args, **kwargs)
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/_torch_func.py", line 737, in _stack
out.stack_onto(list_of_tensordicts, dim)
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/_td.py", line 2665, in stack_onto
new_dest = torch.stack(
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/base.py", line 658, in torch_function
return TD_HANDLED_FUNCTIONS[func](*args, **kwargs)
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/_torch_func.py", line 737, in _stack
out.stack_onto(list_of_tensordicts, dim)
File "/Users/xjf/miniforge3/envs/drive-into-llm/lib/python3.10/site-packages/tensordict/_td.py", line 2665, in stack_onto
new_dest = torch.stack(
RuntimeError: stack expects each tensor to be equal size, but got [] at entry 0 and [1] at entry 6
Through debugging, I found that when merging the TensorDicts, an error is raised for the key "traj_ids". If the "done" value is False, everything works fine.