Skip to content
#

policy-optimization

Here are 17 public repositories matching this topic...

Reinforcement Learning (RL)! This repository is your hands-on guide to implementing RL algorithms, from Markov Decision Processes (MDPs) to advanced methods like PPO and DDPG. Build smart agents, learn the math behind policies, and experiment with real-world applications!

  • Updated Oct 5, 2025

This repo implements the REINFORCE algorithm for solving the Cart Pole V1 environment of the Gymnasium library using Python 3.8 and PyTorch 2.0.1.

  • Updated Mar 19, 2024
  • Python

“This project implements a mini LLM alignment pipeline using Reinforcement Learning from Human Feedback (RLHF). It includes training a reward model from human-annotated preference data, fine-tuning the language model via policy optimization, and performing ablation studies to evaluate robustness, fairness, and alignment trade-offs.”

  • Updated Oct 5, 2025
  • Python

Improve this page

Add a description, image, and links to the policy-optimization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the policy-optimization topic, visit your repo's landing page and select "manage topics."

Learn more