Running the halfcheetah_ars.jl
example, I expected to see policy behavior similar to what is shown in the docs. Instead, I see that ARS gets a mean reward of around -23 and the resulting policy tends to move backward. Is this the expected behavior?
I'm using julia 1.8, Ubuntu 20.04, and the main branch of Dojo.jl