Adaptive Super Resolution For One-Shot Talking-Head Generation

The repository for ICASSP2024 Adaptive Super Resolution For One-Shot Talking-Head Generation (AdaSR TalkingHead)

Abstract

The one-shot talking-head generation learns to synthesize a talking-head video with one source portrait image under the driving of same or different identity video. Usually these methods require plane-based pixel transformations via Jacobin matrices or facial image warps for novel poses generation. The constraints of using a single image source and pixel displacements often compromise the clarity of the synthesized images. Some methods try to improve the quality of synthesized videos by introducing additional super-resolution modules, but this will undoubtedly increase computational consumption and destroy the original data distribution. In this work, we propose an adaptive high-quality talking-head video generation method, which synthesizes high-resolution video without additional pre-trained modules. Specifically, inspired by existing super-resolution methods, we down-sample the one-shot source image, and then adaptively reconstruct high-frequency details via an encoder-decoder module, resulting in enhanced video clarity. Our method consistently improves the quality of generated videos through a straightforward yet effective strategy, substantiated by quantitative and qualitative evaluations. The code and demo video are available on: https://github.com/Songluchuan/AdaSR-TalkingHead/

Updates

[03/2024] Inference code and pretrained model are released.
[03/2024] Arxiv Link: https://arxiv.org/abs/2403.15944.
[COMING] Super-resolution model (based on StyleGANEX and ESRGAN).
[COMING] Train code and processed datasets.

Installation

Clone this repo:

git clone [email protected]:Songluchuan/AdaSR-TalkingHead.git
cd AdaSR-TalkingHead

Dependencies:

We have tested on:

CUDA 11.3-11.6
PyTorch 1.10.1
Matplotlib 3.4.3; Matplotlib 3.4.2; opencv-python 4.7.0; scikit-learn 1.0; tqdm 4.62.3

Inference Code

Download the pretrained model on google drive: https://drive.google.com/file/d/1g58uuAyZFdny9_twvbv0AHxB9-03koko/view?usp=sharing (it is trained on the HDTF dataset), and put it under checkpoints/
The demo video and reference image are under DEMO/
The inference code is in the run_demo.sh, please run it with

bash run_demo.sh

You can set different demo image and driven video in the run_demo.sh

--source_image DEMO/demo_img_3.jpg

and

--driving_video DEMO/demo_video_1.mp4

Video

Citation

@inproceedings{song2024adaptive,
  title={Adaptive Super Resolution for One-Shot Talking Head Generation},
  author={Song, Luchuan and Liu, Pinxin and Yin, Guojun and Xu, Chenliang},
  year={2024},
  organization={IEEE International Conference on Acoustics, Speech, and Signal Processing}
}

Acknowledgments

The code is mainly developed based on styleGANEX, ESRGAN and unofficial face2vid. Thanks to the authors contribution.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
DEMO		DEMO
__pycache__		__pycache__
config		config
media		media
modules		modules
sync_batchnorm		sync_batchnorm
upsampler		upsampler
.gitignore		.gitignore
README.md		README.md
animate.py		animate.py
augmentation.py		augmentation.py
demo.py		demo.py
environment.yaml		environment.yaml
frames_dataset.py		frames_dataset.py
logger.py		logger.py
run_demo.sh		run_demo.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adaptive Super Resolution For One-Shot Talking-Head Generation

Abstract

Updates

Installation

Inference Code

Video

Citation

Acknowledgments

About

Releases

Packages

Languages

jdola/AdaSR-TalkingHead

Folders and files

Latest commit

History

Repository files navigation

Adaptive Super Resolution For One-Shot Talking-Head Generation

Abstract

Updates

Installation

Inference Code

Video

Citation

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages