Skip to content

cmakcay/audio-driven-talking-video

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Audio Driven Speech Mimicry

This repository contains the code to manipulate a source input video of a speaking person to mimic the audio from a given target video.

This code is tested with Ubuntu operating system and Python 3.7.

Requirements (only first time using the code)

  1. First, a virtual environment should be created, e.g. using venv by:
python -m venv venv

The virtual environment is activated by:

source venv/bin/activate
  1. After activating the virtual environment, run the following to install the requirements.
pip install --upgrade pip
pip install -r requirements.txt
  1. Create a directory named "Documents" inside the folder, which will contain some necessary files to run the pipeline. Create a subdirectory in this directory named "pipeline_files". Download FLAME 2020 and move "generic_model.pkl" to Documents/pipeline_files/flame_files. Download "landmark_embedding.npy" from DECA and move it to the "flame_files" as well. Download deepspeech trained model, and move "output_graph.pb" to Documents/pipeline_files/trained_models. Finally, download trained tracker model and audio2exp model and move "tracker.pt" and "a2e.pt" to "trained_models" as well.

Run

  1. Training the source
    To use a video as source (i.e. the actor), the model should be trained for that video first. Training the model for source video is initated by running the following inside the directory.
source venv/bin/activate
./train_source.sh SOURCE_NAME

Where SOURCE_NAME is the name (without extension) of the source video. The source video should be located in the "Documents/video" folder.

  1. Creating the fake video (inference)
    Generation of the fake video is initated by running the following inside the directory.
source venv/bin/activate
./create_fake_video.sh SOURCE_NAME TARGET_NAME

Where SOURCE_NAME and TARGET_NAME are the names (without extension) of the source and target videos respectively. The videos shouls be located in the "Documents/video" folder. For example, if the source video is "aa.mp4" and target video is "bb.mp4", you should run:

./create_fake_video.sh aa bb

The target can also be an audio file. This will get the speech from the target video/audio and manipulate the source video based on this speech.

When the run is complete, the resulting fake video will be inside the "Documents/video" folder.

Tuning

In preprocess/ds_to_flame_params.py, there are two parameters "jaw_gain" and "jaw_closure". jaw_gain determines how much the mouth opens given the speech. Higher gain leads to more movement at the mouth. jaw_closure determines how much the mouth should be closed during silence. By default, there is an offset in mouth in silence, so it aims to solve this. Current parameters seem to work well, but it can be experimented with other parameters to test the effects.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published