Minh Tran*
·
Di Chang*
·
Maksim Siniukov
·
Mohammad Soleymani
University of Southern California
*Equal Contribution
We propose Dyadic Interaction Modeling, a pre-training strategy that jointly models speakers’ and listeners’ motions and learns representations that capture the dyadic context. We then utilize the pre-trained weights and feed multimodal inputs from the speaker into DIM-Listener. DIM-Listener is capable of generating photorealistic videos for the listener's motion.
Clone repo:
git clone https://github.com/Boese0601/Dyadic-Interaction-Modeling.git
cd Dyadic-Interaction-Modeling
The code is tested with Python == 3.12.3, PyTorch == 2.3.1 and CUDA == 12.5 on 2 x NVIDIA L40S. We recommend you to use anaconda to manage dependencies. You may need to change the torch and cuda version in the requirements.txt
according to your computer.
conda create -n dim python=3.12.3
conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 cudatoolkit=12.1 -c pytorch -c conda-forge
conda activate dim
pip install -r requirements.txt
Download the CANDOR Corpus dataset from the official website here. Extract it and put it under the folder 'data/candor_processed/'.
Download the LM_Listener dataset from the official website here. Extract it and put it under the folder 'data/lm_listener_data/'.
Download the BIWI dataset from the official website here. Extract it and put it under the folder 'data/BIWI_data/'.
Download the ViCo dataset from the official website here. Extract it and put it under the folder 'data/vico_processed/', put 'RLD_data.csv' as 'data/RLD_data.csv'. Use the following script to preprocess ViCo dataset.
python vico_preprocessing.py
- Launch the following line to train VQ-VAE.
python train_vq.py
- Launch the following line to pretrain the model on CANDOR dataset.
python seq2seq_pretrain.py
- Launch the following line to train the model on ViCo dataset.
python train_s2s.py
- Launch the following line to train the converter model on BIWI.
python train_converter.py
- (Optional) Launch the following line to finetune the model on a specific datset.
python finetune_s2s_pretrain.py
Launch the following lines to evaluate the model on each of the datasets.
python test_s2s_pretrain.py
python test_biwi.py
python test_l2l.py
python test_s2s.py
- [2024.7.04] Instructions on training and inference are released.
- [2024.6.23] Code is fully released. Instructions on training and inference coming soon.
- [2024.6.23] Release Dyadic Interaction Modeling project page.
- [2024.3.27] Release Dyadic Interaction Modeling paper.
If you find our work useful, please consider citing:
@article{tran2024dyadic,
title={Dyadic Interaction Modeling for Social Behavior Generation},
author={Tran, Minh and Chang, Di and Siniukov, Maksim and Soleymani, Mohammad},
journal={arXiv preprint arXiv:2403.09069},
year={2024}
}
Our code is distributed under the USC research license. See LICENSE.txt
file for more information.
This work is supported by the National Science Foundation under Grant No. 2211550. The work was also sponsored by the Army Research Office and was accomplished under Cooperative Agreement Number W911NF-20-2-0053. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Office or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.
We appreciate the support from Haiwen Feng, Quankai Gao and Hongyi Xu for their suggestions and discussions.