Skip to content

Implementation of a GRPO (Gradient Regularized Policy Optimization) training and evaluation pipeline. Includes utilities for dataset preparation, model training, and evaluation on structured tasks. Designed for experimenting with policy optimization techniques in reinforcement learning and generative AI settings.

Notifications You must be signed in to change notification settings

Rohityalavarthy/grpo-codegen

Repository files navigation

GRPO LLM Code Assistant

Fine-tune a small GPT2 LLM with GRPO so it learns to generate correct Python functions for MBPP coding problems (unit-test based reward).

Goals

  • Demonstrate GRPO on LLM fine-tuning.
  • Use MBPP dataset (Hugging Face) as real-world data.
  • Run on a laptop or Google Colab (small model / LoRA / quantization recommended).

Quickstart (Colab / laptop)

  1. Clone repo
  2. Create virtualenv and install requirements:
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
  1. Run data prep:
   python data_prep.py --out_dir data
  1. Run training (example):
python train_grpo.py --train_data data/mbpp_train.jsonl --model_name gpt2
--output_dir outputs/grpo_run
  1. Evaluate:
python eval.py --model outputs/grpo_run/checkpoint --eval_data data/
mbpp_valid.jsonl

Notes

  • Training code uses TRL GRPOTrainer. If you have GPU, set --device cuda.
  • The reward function executes generated code against MBPP unit tests in a safe subprocess with timeouts and limited I/O.
  • For larger models, enable LoRA/QLoRA options to reduce memory use. License: MIT

feel free to reach out if you face any issues.

About

Implementation of a GRPO (Gradient Regularized Policy Optimization) training and evaluation pipeline. Includes utilities for dataset preparation, model training, and evaluation on structured tasks. Designed for experimenting with policy optimization techniques in reinforcement learning and generative AI settings.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages