Skip to content

Policy Gradient, SAC doesn't learn #65

@Ling01234

Description

@Ling01234

Hi! I have a few more questions about the code that I don't quite get.

First, I was wondering what pybullet_envs is for. I installed the library but got errors when i tried to import it. I also dont see where its being used.

Second, I was getting really bad scores when i ran the code. I cloned the code from your git, and changed a few things as follows. The first thing I changed is the environment. More specifically, I changed it to env = gym.make("InvertedPendulum-v4") and as a result I also changed the following obs, _ = env.reset() and obs_, reward, done, *_ = env.step(action). Finally, I commented out the lines in sac_torch.py where we use the reparameterize=True since I ran into some nan Tensors when calling rsample().

That's all I've changed, and when I run the code, the score actually decreases (oddly enough). It starts with a score of approx 10 like a random agent, and decreases down to 3 or 4 after 250 episodes.

Would you have any idea of why this is happening? It would be so greatly appreciated!

Thanks a lot for your time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions