This repository provides a unified evaluation pipeline for comparing various portrait animation models under consistent inference and evaluation conditions.
portrait-eval-pipeline/
├── checkpoint/ # Model checkpoints (manually downloaded)
├── data/ # Evaluation data (expected structure: `data/test/{video_name}/{frame}.png`)
├── dataset/ # Dataset preprocessing and loader
├── eval/ # Generated results (see Output Format)
├── models/
│ └── <model_name>/
│ ├── config.yaml
│ ├── runner.py
│ └── model/ # Model definition
├── pretrained_model/ # HuggingFace checkpoints
├── scripts/
│ ├── inference/
│ │ ├── gt.py
│ │ └── run.py
│ ├── metrics/
│ │ ├── reconstruction_eval.py
│ │ └── animation_eval.py
│ └── vis/ # Optional visualization tools
└── README.md
Model | Setup Method |
---|---|
FOMM | Unified (via requirements.txt ) |
LIA | Unified (via requirements.txt ) |
Portrait Stage 1–3 (ours) | Unified (via requirements.txt ) |
X-Portrait | Follow official repo |
LivePortrait | Follow official repo |
Follow-Your-Emoji | Follow official repo |
Models marked "Unified" run under the same conda environment for reproducible results.
Others must be executed in their original environments.
To run the unified models (FOMM, LIA, Portrait Stage 1–3 (ours)), set up the environment:
conda create -n evaluation python=3.11
conda activate evaluation
This project uses PyTorch 2.5.1.
Install the appropriate version for your system (CPU or specific CUDA version) following the official instructions:
👉 https://pytorch.org/get-started/previous-versions/
Example (CUDA 12.1):
conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.1 -c pytorch -c nvidia
Then install the remaining dependencies and required assets:
pip install -r requirements.txt
# Download MediaPipe model (used for AED/APD metrics)
wget -q -P checkpoint/ https://storage.googleapis.com/mediapipe-models/face_landmarker/face_landmarker/float16/1/face_landmarker.task
# Download HuggingFace checkpoints
python huggingface_download.py
For the implementation and full research results of the Portrait Stage 1–3 (ours) model, see this repository.
Model | Checkpoint File | Download Link |
---|---|---|
FOMM | vox-cpk.pth.tar |
Download |
LIA | vox.pt |
Download |
X-Portrait | model_state-415001.pth |
Download |
LivePortrait | — | Run huggingface_download.py |
Follow-Your-Emoji | — | Run huggingface_download.py |
Portrait Stage 1–3 (ours) | — | — |
Place all downloaded files into the checkpoint/
folder.
Prepare ground-truth sequences for evaluation:
python -m scripts.inference.gt --mode reconstruction
python -m scripts.inference.gt --mode animation
Then run inference for each model:
# Reconstruction mode
python -m scripts.inference.run --mode reconstruction --model fomm
python -m scripts.inference.run --mode reconstruction --model portrait --tag stage1
# Animation mode
python -m scripts.inference.run --mode animation --model portrait --tag stage2
You can edit or add configs in models/<model_name>/config.yaml
.
python -m scripts.metrics.reconstruction_eval --gen_dirs fomm lia portrait_stage1 portrait_stage2
python -m scripts.metrics.animation_eval --gen_dirs fomm lia portrait_stage1 portrait_stage2
Metrics:
- Reconstruction:
L1
,SSIM
,LPIPS
,FVD
(5-sample average) - Animation:
ID-SIM
,AED
,APD
,FVD
ID-SIM: ArcFace, AED/APD: MediaPipe, FVD: I3D
After inference, you can optionally generate comparison grids:
python -m scripts.vis.make_grid \
--mode reconstruction \
--frame_range 10 11 12 13 14 \
--label_frame_idx 15 \
--ids id123 id456 ...
This produces:
- comparison grids
- labeled grids
- per-frame outputs (under
eval/{mode}/selected/frames/
)
eval/
├── reconstruction/
│ ├── gt/
│ ├── fomm/
│ ├── portrait/
│ │ └── stage1/
│ ├── selected/ # gathered for visualization
│ │ └── <id>/
│ │ ├── gt/
│ │ ├── fomm/
│ │ └── portrait_stage1/
│ └── metrics.json
├── animation/
│ ├── gt/
│ │ ├── driving/ # Driving video frames
│ │ └── source/ # Source image frames
│ ├── fomm/
│ ├── portrait/
│ │ └── stage2/
│ ├── selected/
│ │ └── <id>/
│ │ ├── gt_driving/
│ │ ├── gt_source/
│ │ ├── fomm/
│ │ └── portrait_stage2/
│ └── metrics.json
- This repo is not a full benchmarking toolkit.
- Model training, fine-tuning, or custom setups (e.g., X-Portrait) are external to this pipeline.
- This is built for controlled evaluation and reproducibility across models.
-
A. Siarohin et al., “First Order Motion Model for Image Animation,” NeurIPS, 2019. [paper] [code]
-
Y. Wang et al., “Latent Image Animator: Learning to Animate Images via Latent Space Navigation,” ICLR, 2022. [paper] [code]
-
J. Guo et al., “LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control,” arXiv, 2024. [paper] [code]
-
Y. Xie et al., “X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention,” SIGGRAPH, 2024. [paper] [code]
-
Y. Ma et al., “Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation,” SIGGRAPH Asia, 2024. [paper] [code]