GRPO LLM Code Assistant

Fine-tune a small GPT2 LLM with GRPO so it learns to generate correct Python functions for MBPP coding problems (unit-test based reward).

Demonstrate GRPO on LLM fine-tuning.
Use MBPP dataset (Hugging Face) as real-world data.
Run on a laptop or Google Colab (small model / LoRA / quantization recommended).

python -m venv venv && source venv/bin/activate
pip install -r requirements.txt

   python data_prep.py --out_dir data

python train_grpo.py --train_data data/mbpp_train.jsonl --model_name gpt2
--output_dir outputs/grpo_run

python eval.py --model outputs/grpo_run/checkpoint --eval_data data/
mbpp_valid.jsonl

Training code uses TRL GRPOTrainer. If you have GPU, set --device cuda.
The reward function executes generated code against MBPP unit tests in a safe subprocess with timeouts and limited I/O.
For larger models, enable LoRA/QLoRA options to reduce memory use. License: MIT

feel free to reach out if you face any issues.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
data_prep.py		data_prep.py
eval.py		eval.py
requirements.txt		requirements.txt
reward_fn.py		reward_fn.py
sandbox_runner.py		sandbox_runner.py
train_grpo.py		train_grpo.py
utils.py		utils.py

Provide feedback