Skip to content

Problems about starting training task #103

@MrWiffer

Description

@MrWiffer

When using the following command to start a training task, the error is TypeError: h5py objects cannot be pickled:

export CUDA_VISIBLE_DEVICES=0
export MUJOCO_EGL_DEVICE_ID=0
python libero/lifelong/main.py seed=0 benchmark_name=LIBERO_10 policy=bc_rnn_policy lifelong=base

according to issue #19, I comment part of code in libero/lifelong/main.py as following:

Image

Then, start the training task again, error occurs as shown in below:

Traceback (most recent call last):
  File "libero/lifelong/main.py", line 272, in <module>
    main()
  File "/root/miniforge/envs/libero/lib/python3.8/site-packages/hydra/main.py", line 90, in decorated_main
    _run_hydra(
  File "/root/miniforge/envs/libero/lib/python3.8/site-packages/hydra/_internal/utils.py", line 389, in _run_hydra
    _run_app(
  File "/root/miniforge/envs/libero/lib/python3.8/site-packages/hydra/_internal/utils.py", line 452, in _run_app
    run_and_report(
  File "/root/miniforge/envs/libero/lib/python3.8/site-packages/hydra/_internal/utils.py", line 216, in run_and_report
    raise ex
  File "/root/miniforge/envs/libero/lib/python3.8/site-packages/hydra/_internal/utils.py", line 213, in run_and_report
    return func()
  File "/root/miniforge/envs/libero/lib/python3.8/site-packages/hydra/_internal/utils.py", line 453, in <lambda>
    lambda: hydra.run(
  File "/root/miniforge/envs/libero/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
  File "/root/miniforge/envs/libero/lib/python3.8/site-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/root/miniforge/envs/libero/lib/python3.8/site-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
  File "libero/lifelong/main.py", line 219, in main
    s_fwd, l_fwd = algo.learn_one_task(
  File "/root/workspace/LIBERO-master/libero/lifelong/algos/base.py", line 200, in learn_one_task
    success_rate = evaluate_one_task_success(
  File "/root/workspace/LIBERO-master/libero/lifelong/metric.py", line 112, in evaluate_one_task_success
    env.reset()
  File "/root/workspace/LIBERO-master/libero/libero/envs/venv.py", line 709, in reset
    ret_list = [self.workers[i].recv() for i in id]
  File "/root/workspace/LIBERO-master/libero/libero/envs/venv.py", line 709, in <listcomp>
    ret_list = [self.workers[i].recv() for i in id]
  File "/root/workspace/LIBERO-master/libero/libero/envs/venv.py", line 437, in recv
    result = self.parent_remote.recv()
  File "/root/miniforge/envs/libero/lib/python3.8/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/root/miniforge/envs/libero/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/root/miniforge/envs/libero/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

Accoring to issue #3, add code in libero/lifelong/metric.py:

Image

HOWEVER, error occurs again after start training, which is same as the error at the beginning.

Traceback (most recent call last):
  File "libero/lifelong/main.py", line 272, in <module>
    main()
  File "/root/miniforge/envs/libero/lib/python3.8/site-packages/hydra/main.py", line 90, in decorated_main
    _run_hydra(
  File "/root/miniforge/envs/libero/lib/python3.8/site-packages/hydra/_internal/utils.py", line 389, in _run_hydra
    _run_app(
  File "/root/miniforge/envs/libero/lib/python3.8/site-packages/hydra/_internal/utils.py", line 452, in _run_app
    run_and_report(
  File "/root/miniforge/envs/libero/lib/python3.8/site-packages/hydra/_internal/utils.py", line 216, in run_and_report
    raise ex
  File "/root/miniforge/envs/libero/lib/python3.8/site-packages/hydra/_internal/utils.py", line 213, in run_and_report
    return func()
  File "/root/miniforge/envs/libero/lib/python3.8/site-packages/hydra/_internal/utils.py", line 453, in <lambda>
    lambda: hydra.run(
  File "/root/miniforge/envs/libero/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
  File "/root/miniforge/envs/libero/lib/python3.8/site-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/root/miniforge/envs/libero/lib/python3.8/site-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
  File "libero/lifelong/main.py", line 228, in main
    L = evaluate_loss(cfg, algo, benchmark, datasets[: i + 1])
  File "/root/miniforge/envs/libero/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/root/workspace/LIBERO-master/libero/lifelong/metric.py", line 216, in evaluate_loss
    for data in dataloader:
  File "/root/miniforge/envs/libero/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 440, in __iter__
    return self._get_iterator()
  File "/root/miniforge/envs/libero/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 388, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/root/miniforge/envs/libero/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1038, in __init__
    w.start()
  File "/root/miniforge/envs/libero/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/root/miniforge/envs/libero/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/root/miniforge/envs/libero/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/root/miniforge/envs/libero/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/root/miniforge/envs/libero/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/root/miniforge/envs/libero/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/root/miniforge/envs/libero/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "/root/miniforge/envs/libero/lib/python3.8/site-packages/h5py/_hl/base.py", line 370, in __getnewargs__
    raise TypeError("h5py objects cannot be pickled")
TypeError: h5py objects cannot be pickled

I'm curious to know where the problem is and what needs to be done to resolve it so that I can complete this training task.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions