- Inference Code
- Pretrained Models
- A web demo
- Training Code
conda create -n gesturelsm python=3.12
conda activate gesturelsm
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt
bash demo/install_mfa.sh
# Download the pretrained model (Shortcut) + (Diffusion) + (RVQ-VAEs)
gdown https://drive.google.com/drive/folders/1OfYWWJbaXal6q7LttQlYKWAy0KTwkPRw?usp=drive_link -O ./ckpt --folder
# Download the SMPL model
gdown https://drive.google.com/drive/folders/1MCks7CMNBtAzU2XihYezNmiGT_6pWex8?usp=drive_link -O ./datasets/hub --folder
For evaluation and training, not necessary for running a web demo or inference.
- Download the original raw data
bash preprocess/bash_raw_cospeech_download.sh
Require download dataset
python test.py -c configs/shortcut_rvqvae_128.yaml
python demo.py -c configs/shortcut_rvqvae_128_hf.yaml
Thanks to SynTalker, EMAGE, DiffuseStyleGesture, our code is partially borrowing from them. Please check these useful repos.
If you find our code or paper helps, please consider citing:
@misc{liu2025gesturelsmlatentshortcutbased,
title={GestureLSM: Latent Shortcut based Co-Speech Gesture Generation with Spatial-Temporal Modeling},
author={Pinxin Liu and Luchuan Song and Junhua Huang and Chenliang Xu},
year={2025},
eprint={2501.18898},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2501.18898},
}