-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
After REINFORCE implementation, redesign the learning process and NN architecture design to leverage conceptual processes like bagging, stacking, boosting,etc. in an RL setting.
REINFORCE algorithm will provide a strong baseline in lunarlander environment. From there, we can experiment with the ensembled experts to learn using a novel policy gradient algorithm.
Extensive testing and alterations will occur in this issue's associated branch.
Metadata
Metadata
Assignees
Labels
No labels