PSO-Merging is an innovative deep model fusion method that uses particle swarm optimization algorithm to automatically find optimal model fusion weights. This project is built on the FusionBench framework and provides a complete model fusion solution.
- Intelligent Optimization: Uses PSO algorithm to automatically optimize model fusion weights, avoiding gradient computation
- Data-Driven: Leverages limited data as guidance to improve fusion effectiveness
- Multi-Model Support: Supports various pre-trained models (LLaMA, T5, Flan-T5, Mistral, etc.)
- Multi-Task Evaluation: Supports GLUE, mathematical reasoning, code generation, and other tasks
- Flexible Configuration: Modular configuration system based on Hydra
- Fast Convergence: PSO algorithm has fast convergence characteristics, achieving satisfactory results within limited iterations
- Python >= 3.8
- PyTorch >= 2.0.0
- CUDA >= 11.0 (recommended)
- Ollama (for AlpacaEval evaluation)
- Sufficient disk space (~150GB for models, ~1GB for xFinder, ~40GB for llama3.1:70b-instruct)
- Clone the repository
git clone https://github.com/ictnlp/PSO-Merging
cd PSO-Merging
- Install the package
pip install -e .
- Install third-party dependencies
# Install AlpacaEval
cd thirdpart/alpaca_eval
pip install -e .
cd ../..
# Install xFinder
cd thirdpart/xFinder
pip install -e .
cd ../..
- Download xFinder model
# Download xFinder-qwen1505 model from Hugging Face
git lfs install
git clone https://huggingface.co/IAAR-Shanghai/xFinder-qwen1505
- Install and configure Ollama
# Install Ollama (Linux/macOS)
curl -fsSL https://ollama.ai/install.sh | sh
# Start Ollama server on port 8000
ollama serve -p 8000
# In another terminal, download the required model
ollama pull llama3.1:70b-instruct-q4_K_M
Note:
- The xFinder model is required for evaluation tasks. Make sure you have sufficient disk space (~1GB for the model).
- Ollama server must be running on port 8000 for AlpacaEval evaluation. The llama3.1:70b-instruct-q4_K_M model is required for evaluation tasks.
Make sure you have completed the installation steps above, including:
- Installing third-party dependencies (AlpacaEval and xFinder)
- Downloading the xFinder-qwen1505 model
- Installing Ollama and downloading the llama3.1:70b-instruct-q4_K_M model
Important: Before running experiments, make sure the Ollama server is running:
# Start Ollama server on port 8000 (if not already running)
ollama serve -p 8000
Before running experiments, you need to download the pre-trained models:
# Create models directory
mkdir -p initial_models
# Download models from Hugging Face
# Language Model
git lfs install
git clone https://huggingface.co/dp66/llama-8b-lm initial_models/llama_8b_lm
# Code Generation Model
git clone https://huggingface.co/dp66/llama-8b-code initial_models/llama_8b_code
# Science Question Answering Model
git clone https://huggingface.co/dp66/llama-8b-sciq initial_models/llama_8b_sciq
# Mathematical Reasoning Model
git clone https://huggingface.co/dp66/llama-8b-math initial_models/llama_8b_math
# Meta-Llama-3-8B Model
git clone https://huggingface.co/meta-llama/Meta-Llama-3-8B initial_models/Meta-Llama-3-8B
Model | Task | Size | Hugging Face Link | Description |
---|---|---|---|---|
llama_8b_lm |
Language Modeling | ~30GB | dp66/llama-8b-lm | Fine-tuned for general language modeling tasks |
llama_8b_code |
Code Generation | ~30GB | dp66/llama-8b-code | Specialized for code generation and programming tasks |
llama_8b_sciq |
Science QA | ~30GB | dp66/llama-8b-sciq | Optimized for science question answering |
llama_8b_math |
Math Reasoning | ~30GB | dp66/llama-8b-math | Fine-tuned for mathematical reasoning and problem solving |
Meta-Llama-3-8B |
General Purpose | ~30GB | meta-llama/Meta-Llama-3-8B | Meta's latest Llama 3 8B model for general tasks |
Note: Each model is approximately 30GB in size. Download time depends on your internet connection. Make sure you have sufficient disk space (~150GB total for all models).
bash run.sh
bash run_baselines.sh
# PSO-related parameters
method:
pso_merging:
num_particles: 50 # Number of particles
max_iterations: 100 # Maximum iterations
w: 0.7 # Inertia weight
c1: 1.5 # Individual learning factor
c2: 1.5 # Social learning factor
# Model configuration
modelpool:
models:
- name: "llama-7b"
path: "/path/to/llama-7b"
- name: "llama-13b"
path: "/path/to/llama-13b"
# Task configuration
tasks:
- "alpaca_eval"
- "gsm8k"
- "mbpp"
- "human_eval"
Create custom configuration files:
# Copy existing configuration
cp config/GMA_llama_pso.yaml config/my_experiment.yaml
# Edit configuration
vim config/my_experiment.yaml
# Run experiment
fusion_bench --config-name my_experiment
For AlpacaEval evaluation, ensure Ollama is properly configured:
# Check if Ollama is running
curl http://localhost:8000/api/tags
# Check if the required model is available
ollama list | grep llama3.1:70b-instruct-q4_K_M
The evaluation will use the Ollama server at http://localhost:8000
for model inference.
This project is licensed under the MIT License - see the LICENSE file for details.
- FusionBench - Model fusion benchmark framework
- MergeLM - Model merging framework and evaluation code
- AlpacaEval - Automatic evaluator for instruction-following language models
- xFinder - Large Language Models as Automated Evaluators for Reliable Evaluation
- xFinder-qwen1505 - Pre-trained xFinder model for key answer extraction
- Ollama - Local large language model server for evaluation
If you use this project in your research, please cite our paper:
@article{pso-merging-2024,
title={PSO-Merging: Particle Swarm Optimization for Deep Model Fusion},
author={Zhang, Kehao and others},
journal={arXiv preprint},
year={2024}
}
⭐ If this project helps you, please give us a star!