-
Couldn't load subscription status.
- Fork 137
Open
Description
Hello,
I 've been following this great book faithfully but I have a problem: In chapter 6, the first time that RLLib is used, both when running the code in colab and locally, I get the following behavior: Tune is essentially stuck permanently on "PENDING" status with the following error:
(scheduler +3h2m14s) Error: No available node types can fulfill resource request {'CPU': 5.0}. Add suitable node types to this cluster to resolve this issue.
== Status ==
Current time: 2022-02-01 12:16:44 (running for 00:00:05.13)
Memory usage on this node: 2.6/12.7 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/2 CPUs, 0/0 GPUs, 0.0/6.28 GiB heap, 0.0/3.14 GiB objects
Result logdir: /root/ray_results/APEX_2022-02-01_12-16-39
Number of trials: 1/1 (1 PENDING)
Trial name status loc
APEX_CartPole-v0_cf36b_00000 PENDING
While the code is this:
import pprint
from ray import tune
from ray.rllib.agents.dqn.apex import APEX_DEFAULT_CONFIG
from ray.rllib.agents.dqn.apex import ApexTrainer
ray.shutdown()
if __name__ == "__main__":
config = APEX_DEFAULT_CONFIG.copy()
pp = pprint.PrettyPrinter(indent=4)
config['env'] = "CartPole-v0"
config['num_workers'] = 6
config["num_gpus"] = 0
#config['evaluation_num_workers'] = 1
config['evaluation_interval'] = 1
config['learning_starts'] = 50
pp.pprint(config)
tune.run(ApexTrainer, config=config)
I have no idea what to do, I already set num_workers =1 to accommodate for lower CPU count availability but whatever I do still gets me that error. I do not understand what CPU:5.0 means either, I don't see in the config file anything mentioning 5 CPUs required. Any thoughts?
Metadata
Metadata
Assignees
Labels
No labels