Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
readme.md		readme.md

Repository files navigation

awesome-talking-head-generation

Papers for Talking Head Generation, released codes collections.

Any addition or bug about talking head generation,please open an issue, pull requests or e-mail me by fhongac@cse.ust.hk. If you are researching in talking head generation task, you can add my discord account: Fa-Ting Hong#6563 for better communication and cooperations.

🔥I am currently seeking a job or postdoctoral position. If you are interested in my qualifications and experience, please feel free to contact me. 🔥

🔥 We released a new work: ACTalker, which can generate portrait videos driven by both audio and expression simultaneously. Please view HERE 🔥

Related Group

Datasets

VoxCeleb1 [Download link].
VoxCeleb2 [Download link].
Faceforensics++ [Download link].
CelebV [Download link].
TalkingHead-1KH [Download link].
LRW (Lip Reading in the Wild) [Download link].
MEAD [Download link].
CelebV-HQ [Download link].
CHDTF [Download link].
MultiTalk [Download link].
VFHQ [Download link].
Hallo3 [Download link].
AVSpeech [Download link].

Image-driven

2025

[HunyuanPortrait] HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation, CVPR 2025. [Code] [Project]

2024

[X-Portrait] X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention, arXiv 2024.
[LivePortrait] LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control [Code] [Project]
[EMOPortraits] EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars, CVPR 2024. [Code], [Project]
[SMA] Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation, CVPR 2024. [Project]

2023

[AVFR-GAN]Audio-Visual Face Reenactment, WACV 2023. [Code], [Project]
[TS-Net]Cross-identity Video Motion Retargeting with Joint Transformation and Synthesis, WACV 2023. [Code]
[MCNET]Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head Video Generation, ICCV 2023. [Project] [Code]

2022

[DaGAN]Depth-Aware Generative Adversarial Network for Talking Head Video Generation, CVPR 2022. [Code], [Project]
[TPSM]Thin-Plate Spline Motion Model for Image Animation, CVPR 2022. [Code]
[StyleHEAT]StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pretrained StyleGAN, ECCV 2022. [Code], [Project]
[MegaPortraits]MegaPortraits: One-shot Megapixel Neural Head Avatars, ACM MM 2022. [Project]
[DAM]Structure-Aware Motion Transfer with Deformable Anchor Model, CVPR 2022. [Code]
[StyleMask]StyleMask: Disentangling the Style Space of StyleGAN2 for Neural Face Reenactment, FG, 2023. [Code]
[CoRF]Controllable Radiance Fields for Dynamic Face Synthesis, Arxiv 2022.
[AniFaceGAN]Animatable 3D-Aware Face Image Generation for Video Avatars, NeurIPS 2022. [Project]
[IW]Implicit Warping for Animation with Image Sets, NeurIPS 2022. [Project]
[HifiHead]HifiHead: One-Shot High Fidelity Neural Head Synthesis with 3D Control, IJCAI 2022.
Face Animation with Multiple Source Images, Arxiv 2022.
[MetaPortrait]MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation, Arxiv 2022.
Compressing Video Calls using Synthetic Talking Heads, BMVC 2022. [Project]
Finding Directions in GAN’s Latent Space for Neural Face Reenactment, BMVC 2022. [Project] [Code]
[LIA]Latent Image Animator: Learning to Animate Images via Latent Space Navigation, ICLR 2022. [Project] [Code]

2021

[face-vid2vid] One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing, CVPR 2021 Oral. [Project]
[S2D] Sparse to Dense Motion Transfer for Face Image Animation, ICCV 2021.
[SAFA] SAFA: Structure Aware Face Animation, 3DV 2021. [Code]
[SAA] Self-appearance-aided Differential Evolution for Motion Transfer, arXiv 2021.
[PIRenderer]PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering, ICCV 2021. [Code]
[FaceGAN]FACEGAN: Facial Attribute Controllable rEenactment GAN, WACV 2021.
[F^3A-GAN]F3A-GAN: Facial Flow for Face Animation With Generative Adversarial Networks, IEEE TIP 2021.
[FACIAL]FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning, ICCV 2021.
[MRAA] Motion Representations for Articulated Animation, CVPR 2021. [Code]
[HeadGAN]HeadGAN: One-shot Neural Head Synthesis and Editing, ICCV 2021. [Project]

2020

[MeshG] Mesh Guided One-shot Face Reenactment Using Graph Convolutional Networks, ACM Multimedia 2020. [Code]
[MarioNETte] MarioNETte: Few-shot Face Reenactment Preserving Identity of Unseen Targets, AAAI 2020. [Project]
[CrossID-GAN] Learning Identity-Invariant Motion Representations for Cross-ID Face Reenactment, CVPR 2020.

2019

[FOMM] First order motion model for image animation, NeurIPS 2019. [Code]
[NeuralHead]Few-Shot Adversarial Learning of Realistic Neural Talking Head models, ICCV 2019. [Code]
[Monkey-Net]Animating Arbitrary Objects via Deep Motion Transfer, CVPR 2019 Oral. [Code], [Project]
[fs-vid2vid]Few-shot Video-to-Video Synthesis, NeurIPS 2019. [Code], [Project]

2018

[ReenactGAN] ReenactGAN: Learning to Reenact Faces via Boundary Transfer, ECCV 2018. [Code]
[X2Face] X2Face: A network for controlling face generation by using images, audio, and pose codes, ECCV 2018. [Code], [Project]

2016

[Face2face] Face2Face: Real-time face capture and reenactment of RGB videos, CVPR 2016.

Audio-driven

2025

[OmniHuman-1]OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models, arXiv 2025. [Project]
[ACTalker]Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modelling for Natural Talking Head Generation, arXiv 2025. [Project]
[OmniAvatar]OmniAvatar: Efficient Audio-Driven Avatar Video Generation with Adaptive Body Animation, arXiv 2025. [Code] [Project]

2024

[Real3DPortrait] Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis, ICLR 2024. [Project] [Code]
[EMO] Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions, arXiv 2024. [Project] [Code]
[Style2Talker] Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style, AAAI 2024.
[SaaS] Say Anything with Any Style, AAAI 2024.
[MuseTalk] Real-Time High Quality Lip Synchorization with Latent Space Inpainting, [Code].
[VASA-1] VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time, arXiv 2024. [Project]
[THQA] THQA: A Perceptual Quality Assessment Database for Talking Heads, arXiv 2024. [Code]
[Talk3D] Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior, arXiv 2024. [Code] [Project]
[EDTalk] EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis, arXiv 2024. [Code] [Project]
[AniPortrait] AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animations, arXiv 2024. [Code]
[FlowVQTalker] FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization, arXiv 2024.
[FaceChain-ImagineID] FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio, arXiv 2024. [Code]
[Hallo] Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation, arXiv 2024. [Code]
[EchoMimic]EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions, arXiv 2024. [Code], [Project]
[RealTalk]RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network, arXiv 2024.
[Emotional Conversation]Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation, arXiv 2024.
[Make Your Actor Talk]Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement, arXiv 2024.
[FD2Talk]FD2Talk: Towards Generalized Talking Head Generation with Facial Decoupled Diffusion Model, arXiv 2024.
[ReSyncer]ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer, arXiv 2024.
[StyleSync]Style-Preserving Lip Sync via Audio-Aware Style Reference, arXiv 2024.
[Loopy]Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency, arXiv 2024. [Project]
[DAWN]DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation, arXiv 2024. [Project], [Code]
[EchoMimicV2]EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation, arXiv 2024. [Code], [Project]
[LetsTalk]Latent Diffusion Transformer for Talking Video Synthesis, arXiv 2024. [Code], [Project]
[IF-MDM]Implicit Face Motion Diffusion Model for High-Fidelity Realtime Talking Head Generation, arXiv 2024. [Project]
[INFP]Audio-Driven Interactive Head Generation in Dyadic Conversations, arXiv 2024. [Project]
[MEMO]Memory-Guided Diffusion for Expressive Talking Video Generation, arXiv 2024. [Project], [Code]
[FLOAT] Generative Motion Latent Flow Matching for Audio-driven Talking Portrait, arXiv 2024. [Project]
[Hallo3]Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks, arXiv 2024.
[VQTalker]VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization, arXiv 2024.
[PortraitTalk]Towards Customizable One-Shot Audio-to-Talking Face Generation, arXiv 2024.
[IF-MDM]IF-MDM: Implicit Face Motion Diffusion Model for High-Fidelity Realtime Talking Head Generation, arXiv 2024.
[LatentSync]LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync, arXiv 2024. [Code]

2023

[Diffused Heads] Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation, Arxiv 2023. [Project] 🔥Diffusion🔥
[DiffTalk] DiffTalk: Crafting Diffusion Models for Generalized Talking Head Synthesis, Arxiv 2023. [Project] [Code] 🔥Diffusion🔥
[READ] [READ Avatars: Realistic Emotion-controllable Audio Driven Avatars](READ Avatars: Realistic Emotion-controllable Audio Driven Avatars), Arxiv 2023.
[DAE-Talker] DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder, Arxiv 2023. 🔥Diffusion🔥
[EmoGen] Emotionally Enhanced Talking Face Generation, Arxiv 2023. [Code]
[TalkLip] Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert, CVPR 2023. [Code]
[StyleSync] StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator, CVPR 2023. [Project] [Code]
[GeneFace++] GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation, arXiv 2023. [Project] [Code]
[MODA] MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions, ICCV 2023.
[VividTalk] VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior, Arxiv 2023. [Project] [Code]
[IP_LAP] IP_LAP: Identity-Preserving Talking Face Generation with Landmark and Appearance Priors, CVPR 2023. [Code]
[HyperLips] HyperLips: Hyper Control Lips with High Resolution Decoder for Talking Face Generation, CVPR 2023. [Code]
[EAT] Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation, ICCV 2023. [Project] [Code]
[SadTalker] SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Talking Head Animation, CVPR 2023. [Project] [Code]

2022

[GC-AVT] Expressive Talking Head Generation with Granular Audio-Visual Control , CVPR 2022.
Talking Face Generation with Multilingual TTS, CVPR 2022. [Demo Track]
[EAMM] EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model, SIGGRAPH 2022.
[SPACEx] SPACEx 🚀: Speech-driven Portrait Animation with Controllable Expression, arXiv 2022. [Project] CVPR 2023
[AV-CAT] Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers, SIGGRAPH Asia 2022.
[MemFace] Memories are One-to-Many Mapping Alleviators in Talking Face Generation, arXiv 2022.

2021

[PC-AVS] Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation, CVPR 2021. [Code], [Project]
[IATS]Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis,ACM Multimedia 2021.
[EVP] Audio-Driven Emotional Video Portraits, CVPR 2021. [Code]
[FAU] Talking Head Generation with Audio and Speech Related Facial Action Units, arxiv 2021.
[Speech2Talking-Face] Speech2Talking-Face: Inferring and Driving a Face with Synchronized Audio-Visual Representation, IJCAI 2021.
[IATS] Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis, ACM MM 2021.
[LSP] Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation, ACM TOG 2021. [Code]
[Audio2head] Audio2head: Audio-driven one-shot talking-head generation with natural head motion, ArXiv 2021.

2020

[Wav2Lip] A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild, ACM Multimedia 2020. [Code], [Project]
[RhythmicHead] Talking-head Generation with Rhythmic Head Motion, ECCV 2020. [Code]
[MakeItTalk] MakeItTalk: Speaker-Aware Talking-Head Animation, SIGGRAPH Asia 2020. [Code], [Project]
[Neural Voice Puppetry] Neural Voice Puppetry: Audio-driven Facial Reenactment, ECCV 2020. [Code], [Project]
[MEAD] MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation, ECCV 2020. [Code], [Project]
Realistic Speech-Driven Facial Animation with GANs, IJCV 2020.

2019

[DAVS] Talking Face Generation by Adversarially Disentangled Audio-Visual Representation, AAAI 2019. [Code]
[ATVGnet] Hierarchical Cross-modal Talking Face Generation with Dynamic Pixel-wise Loss, CVPR 2019. [Code]

2018

Lip Movements Generation at a Glance, ECCV 2018. [Code]
[VisemeNet] VisemeNet: Audio-Driven Animator-Centric Speech Animation, SIGGRAPH 2018.

2017

[Synthesizing-Obama] Synthesizing Obama: Learning Lip Sync From Audio, SIGGRAPH 2017. [Project]
[You-Said-That?] You Said That?: Synthesising Talking Faces From Audio, IJCV 2019. [Code]
Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion, SIGGRAPH 2017.
A Deep Learning Approach for Generalized Speech Animation, SIGGRAPH 2017.

2016

[LRW] Lip Reading in the Wild, ACCV 2016.

Nerf & 3D

2024

[CVTHead] CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer, WACV 2024. [Code].
[Head3D] 3D-Aware Talking-Head Video Motion Transfer, WACV 2024.

2022

[SSP-NeRFF] Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation, arxiv, 2022.
[HeadNeRF] HeadNeRF: A Real-time NeRF-based Parametric Head Model, CVPR 2022. [Code], [Project]
[IMavatar] I M Avatar: Implicit Morphable Head Avatars from Videos, CVPR 2022. [Code]
[ROME] Realistic One-shot Mesh-based Head Avatars, ECCV 2022.
[FNeVR] FNeVR: Neural Volume Rendering for Face Animation, Arxiv 2022. [Code]
[3DFaceShop] 3DFaceShop: Explicitly Controllable 3D-Aware Portrait Generation, Arxiv 2022. [Code], [Project]
[Next3D] Generative Neural Texture Rasterization for 3D-Aware Head Avatars, Arxiv 2022. [Project]
[NeRFInvertor] NeRFInvertor: High Fidelity NeRF-GAN Inversion for Single-shot Real Image Animation, Arxiv 2022.
[DFRF] Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis, ECCV 2022. [Code]

2021

[DFA-NeRF] DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering, arxiv, 2021.
[NerFACE] NerFACE: Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction, CVPR 2021 Oral. [Code], [Project]
[AD-NeRF] AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis, ICCV 2021. [Code], [Code]

2020

[DiscoFaceGAN ] Disentangled and Controllable Face Image Generation via 3D Imitative-Contrastive Learning , CVPR 2020 Oral. [Code]

Survey

2024

A Comparative Study of Perceptual Quality Metrics for Audio-driven Talking Head Videos [Code]

2020

What comprises a good talking-head video generation?: A Survey and Benchmark

Star History