Skip to content

OptimAI-Lab/LLM_Alignment_Survey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Aligning Large Language Models with Human Feedback: Mathematical Foundations and Algorithm Design

Abstract

This article provides an introduction to the mathematical foundations and algorithmic frameworks used to align Large Language Models (LLMs) with human intentions, preferences, and values. We discuss standard alignment techniques, such as fine-tuning (SFT), reinforcement learning with human feedback (RLHF), and direct preference optimization (DPO). We also explore the theoretical underpinnings of learning from human preferences, drawing connections to inverse reinforcement learning (IRL) and discrete choice models. We present state-of-the-art algorithms in a tutorial style, discuss their advantages and limitations, and offer insights into practical implementation. Our exposition is intended to serve as a comprehensive resource for researchers and practitioners, providing both a foundational understanding of alignment methodologies and a framework for developing more robust and scalable alignment techniques.

Alignment Framework Illustration

Alignment Illustration

Figure: Overview of the alignment framework and methodologies discussed in this survey.

Key Topics

  1. Learning from Human Preferences, Reinforcement Learning from Human Feedback
  2. Learning from Demonstrations, Inverse Reinforcement Learning
  3. Alignment Methodologies
  4. Algorithm Design and Implementation
  5. Practical Considerations and Limitations

Citation

@article{zeng2025aligning,
  title={Aligning Large Language Models with Human Feedback: Mathematical Foundations and Algorithm Design},
  author={Zeng, Siliang and Viano, Luca and Li, Chenliang and Li, Jiaxiang and Cevher, Volkan and Wulfmeier, Markus and Ermon, Stefano and Garcia, Alfredo and Hong, Mingyi},
  year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •