Replies: 2 comments 3 replies
-
|
I simply moved the implementation from ReinforcementLearningEnvironmentClassicControl.jl |
Beta Was this translation helpful? Give feedback.
-
|
Another possible issue: refer to the outer constructor of PendulumEnv. The function _step!(env::PendulumEnv, a)
env.t += 1
th, thdot = env.state # two state variablesAlso in the outer constructor itself: env = PendulumEnv(
PendulumEnvParams(max_speed, max_torque, g, m, l, dt, max_steps),
action_space,
Space(ClosedInterval{T}.(-high, high)),
zeros(T, 2), # two state variablesDid I miss anything? Fortunately, in the experiment like JuliaRL_DDPG_Pendulum, the number of state variables is obtained by It seems that most RL algorithms do not touch |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
This may be not a serious issue, and I thus decide to put it in the Discussions part. Currently, I am trying to implement a new environment and have referred to the PendulumEnv.jl for an example. (I have a background mainly in control theory, which is closely related to RL though.)
The
_step!function seems interesting to me. The basic interaction in RL is(s, a) --> (s', r), whererdenotes the reward we get by taking actionaat states. However, in_step!, the reward (i.e.,-coststherein) is calculated beforeais applied: it depends on the old statesand the actiona.In my opinion, it seems to be an improper choice, and the reward should depend on
s'andainstead (i.e., afteraupdates the environment). You can imagine a specific scenario for the underlying rationality. In the above line computingcosts, the sign ofadoes not matter, which means, even if you apply a reverse force (torque), you still get the same reward, which is not what we have expected. (This tutorial explains the dynamics of a simple pendulum.)I can make a PR possibly next week if you think the above statement is reasonable.
Beta Was this translation helpful? Give feedback.
All reactions