Multi-Agent Constrained Policy Optimisation (MACPO; MAPPO-L).
-
Updated
Apr 17, 2024 - Python
Multi-Agent Constrained Policy Optimisation (MACPO; MAPPO-L).
Implementation of a Deep Reinforcement Learning algorithm, Proximal Policy Optimization (SOTA), on a continuous action space openai gym (Box2D/Car Racing v0)
Policy Optimization with Penalized Point Probability Distance: an Alternative to Proximal Policy Optimization
Mirror Descent Policy Optimization
Model-based Policy Gradients
Codebase to fully reproduce the results of "No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO" (Moalla et al. 2024). Uses TorchRL and provides extensive tools for studying representation dynamics in policy optimization.
Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)
This repository contains the code for the paper "Local policy search with Bayesian optimization".
Reinforcement Learning (RL)! This repository is your hands-on guide to implementing RL algorithms, from Markov Decision Processes (MDPs) to advanced methods like PPO and DDPG. Build smart agents, learn the math behind policies, and experiment with real-world applications!
Code for Policy Optimization as Online Learning with Mediator Feedback
An implementation of the reinforcement learning for CartPole-v0 by policy optimization
A collection of Jupyter notebooks implementing core reinforcement learning algorithms: Q-Learning, SARSA, and PPO.
Implementation of a GRPO (Gradient Regularized Policy Optimization) training and evaluation pipeline. Includes utilities for dataset preparation, model training, and evaluation on structured tasks. Designed for experimenting with policy optimization techniques in reinforcement learning and generative AI settings.
This repository contains the code for the NeurIPS 2021 submission "Local policy search with Bayesian optimization".
This repo implements the REINFORCE algorithm for solving the Cart Pole V1 environment of the Gymnasium library using Python 3.8 and PyTorch 2.0.1.
“This project implements a mini LLM alignment pipeline using Reinforcement Learning from Human Feedback (RLHF). It includes training a reward model from human-annotated preference data, fine-tuning the language model via policy optimization, and performing ablation studies to evaluate robustness, fairness, and alignment trade-offs.”
Add a description, image, and links to the policy-optimization topic page so that developers can more easily learn about it.
To associate your repository with the policy-optimization topic, visit your repo's landing page and select "manage topics."