Policy Gradient, SAC doesn't learn

Hi! I have a few more questions about the code that I don't quite get. 

First, I was wondering what pybullet_envs is for. I installed the library but got errors when i tried to import it. I also dont see where its being used.

Second, I was getting really bad scores when i ran the code. I cloned the code from your git, and changed a few things as follows. The first thing I changed is the environment. More specifically, I changed it to `env = gym.make("InvertedPendulum-v4")` and as a result I also changed the following `obs, _ = env.reset()` and `obs_, reward, done, *_ = env.step(action)`. Finally, I commented out the lines in sac_torch.py where we use the reparameterize=True since I ran into some nan Tensors when calling rsample(). 

That's all I've changed, and when I run the code, the score actually decreases (oddly enough). It starts with a score of approx 10 like a random agent, and decreases down to 3 or 4 after 250 episodes. 

Would you have any idea of why this is happening? It would be so greatly appreciated!

Thanks a lot for your time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Policy Gradient, SAC doesn't learn #65

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Policy Gradient, SAC doesn't learn #65

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions