Skip to content

PSO-Merging is an innovative deep model fusion method that uses particle swarm optimization algorithm to automatically find optimal model fusion weights.

License

Notifications You must be signed in to change notification settings

ictnlp/PSO-Merging

Repository files navigation

PSO-Merging: Deep Model Fusion Based on Particle Swarm Optimization

📖 Introduction

PSO-Merging is an innovative deep model fusion method that uses particle swarm optimization algorithm to automatically find optimal model fusion weights. This project is built on the FusionBench framework and provides a complete model fusion solution.

🚀 Key Features

  • Intelligent Optimization: Uses PSO algorithm to automatically optimize model fusion weights, avoiding gradient computation
  • Data-Driven: Leverages limited data as guidance to improve fusion effectiveness
  • Multi-Model Support: Supports various pre-trained models (LLaMA, T5, Flan-T5, Mistral, etc.)
  • Multi-Task Evaluation: Supports GLUE, mathematical reasoning, code generation, and other tasks
  • Flexible Configuration: Modular configuration system based on Hydra
  • Fast Convergence: PSO algorithm has fast convergence characteristics, achieving satisfactory results within limited iterations

🛠️ Installation

Requirements

  • Python >= 3.8
  • PyTorch >= 2.0.0
  • CUDA >= 11.0 (recommended)
  • Ollama (for AlpacaEval evaluation)
  • Sufficient disk space (~150GB for models, ~1GB for xFinder, ~40GB for llama3.1:70b-instruct)

Installation Steps

  1. Clone the repository
git clone https://github.com/ictnlp/PSO-Merging
cd PSO-Merging
  1. Install the package
pip install -e .
  1. Install third-party dependencies
# Install AlpacaEval
cd thirdpart/alpaca_eval
pip install -e .
cd ../..

# Install xFinder
cd thirdpart/xFinder
pip install -e .
cd ../..
  1. Download xFinder model
# Download xFinder-qwen1505 model from Hugging Face
git lfs install
git clone https://huggingface.co/IAAR-Shanghai/xFinder-qwen1505
  1. Install and configure Ollama
# Install Ollama (Linux/macOS)
curl -fsSL https://ollama.ai/install.sh | sh

# Start Ollama server on port 8000
ollama serve -p 8000

# In another terminal, download the required model
ollama pull llama3.1:70b-instruct-q4_K_M

Note:

  • The xFinder model is required for evaluation tasks. Make sure you have sufficient disk space (~1GB for the model).
  • Ollama server must be running on port 8000 for AlpacaEval evaluation. The llama3.1:70b-instruct-q4_K_M model is required for evaluation tasks.

🚀 Quick Start

1. Prepare Environment

Make sure you have completed the installation steps above, including:

  • Installing third-party dependencies (AlpacaEval and xFinder)
  • Downloading the xFinder-qwen1505 model
  • Installing Ollama and downloading the llama3.1:70b-instruct-q4_K_M model

Important: Before running experiments, make sure the Ollama server is running:

# Start Ollama server on port 8000 (if not already running)
ollama serve -p 8000

2. Download Models

Before running experiments, you need to download the pre-trained models:

# Create models directory
mkdir -p initial_models

# Download models from Hugging Face
# Language Model
git lfs install
git clone https://huggingface.co/dp66/llama-8b-lm initial_models/llama_8b_lm

# Code Generation Model  
git clone https://huggingface.co/dp66/llama-8b-code initial_models/llama_8b_code

# Science Question Answering Model
git clone https://huggingface.co/dp66/llama-8b-sciq initial_models/llama_8b_sciq

# Mathematical Reasoning Model
git clone https://huggingface.co/dp66/llama-8b-math initial_models/llama_8b_math

# Meta-Llama-3-8B Model
git clone https://huggingface.co/meta-llama/Meta-Llama-3-8B initial_models/Meta-Llama-3-8B

Model Information

Model Task Size Hugging Face Link Description
llama_8b_lm Language Modeling ~30GB dp66/llama-8b-lm Fine-tuned for general language modeling tasks
llama_8b_code Code Generation ~30GB dp66/llama-8b-code Specialized for code generation and programming tasks
llama_8b_sciq Science QA ~30GB dp66/llama-8b-sciq Optimized for science question answering
llama_8b_math Math Reasoning ~30GB dp66/llama-8b-math Fine-tuned for mathematical reasoning and problem solving
Meta-Llama-3-8B General Purpose ~30GB meta-llama/Meta-Llama-3-8B Meta's latest Llama 3 8B model for general tasks

Note: Each model is approximately 30GB in size. Download time depends on your internet connection. Make sure you have sufficient disk space (~150GB total for all models).

2. Run PSO Fusion Experiments

bash run.sh

3. Run Baseline Methods

bash run_baselines.sh

⚙️ Configuration

Main Configuration Parameters

# PSO-related parameters
method:
  pso_merging:
    num_particles: 50          # Number of particles
    max_iterations: 100        # Maximum iterations
    w: 0.7                     # Inertia weight
    c1: 1.5                    # Individual learning factor
    c2: 1.5                    # Social learning factor
    
# Model configuration
modelpool:
  models:
    - name: "llama-7b"
      path: "/path/to/llama-7b"
    - name: "llama-13b" 
      path: "/path/to/llama-13b"
      
# Task configuration
tasks:
  - "alpaca_eval"
  - "gsm8k"
  - "mbpp"
  - "human_eval"

Custom Configuration

Create custom configuration files:

# Copy existing configuration
cp config/GMA_llama_pso.yaml config/my_experiment.yaml

# Edit configuration
vim config/my_experiment.yaml

# Run experiment
fusion_bench --config-name my_experiment

Ollama Configuration

For AlpacaEval evaluation, ensure Ollama is properly configured:

# Check if Ollama is running
curl http://localhost:8000/api/tags

# Check if the required model is available
ollama list | grep llama3.1:70b-instruct-q4_K_M

The evaluation will use the Ollama server at http://localhost:8000 for model inference.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • FusionBench - Model fusion benchmark framework
  • MergeLM - Model merging framework and evaluation code
  • AlpacaEval - Automatic evaluator for instruction-following language models
  • xFinder - Large Language Models as Automated Evaluators for Reliable Evaluation
  • xFinder-qwen1505 - Pre-trained xFinder model for key answer extraction
  • Ollama - Local large language model server for evaluation

📚 Citation

If you use this project in your research, please cite our paper:

@article{pso-merging-2024,
  title={PSO-Merging: Particle Swarm Optimization for Deep Model Fusion},
  author={Zhang, Kehao and others},
  journal={arXiv preprint},
  year={2024}
}

⭐ If this project helps you, please give us a star!

About

PSO-Merging is an innovative deep model fusion method that uses particle swarm optimization algorithm to automatically find optimal model fusion weights.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages