Audio Driven Speech Mimicry

This repository contains the code to manipulate a source input video of a speaking person to mimic the audio from a given target video.

This code is tested with Ubuntu operating system and Python 3.7.

Requirements (only first time using the code)

First, a virtual environment should be created, e.g. using venv by:

python -m venv venv

The virtual environment is activated by:

source venv/bin/activate

After activating the virtual environment, run the following to install the requirements.

pip install --upgrade pip
pip install -r requirements.txt

Create a directory named "Documents" inside the folder, which will contain some necessary files to run the pipeline. Create a subdirectory in this directory named "pipeline_files". Download FLAME 2020 and move "generic_model.pkl" to Documents/pipeline_files/flame_files. Download "landmark_embedding.npy" from DECA and move it to the "flame_files" as well. Download deepspeech trained model, and move "output_graph.pb" to Documents/pipeline_files/trained_models. Finally, download trained tracker model and audio2exp model and move "tracker.pt" and "a2e.pt" to "trained_models" as well.

Run

Training the source
To use a video as source (i.e. the actor), the model should be trained for that video first. Training the model for source video is initated by running the following inside the directory.

source venv/bin/activate
./train_source.sh SOURCE_NAME

Where SOURCE_NAME is the name (without extension) of the source video. The source video should be located in the "Documents/video" folder.

Creating the fake video (inference)
Generation of the fake video is initated by running the following inside the directory.

source venv/bin/activate
./create_fake_video.sh SOURCE_NAME TARGET_NAME

Where SOURCE_NAME and TARGET_NAME are the names (without extension) of the source and target videos respectively. The videos shouls be located in the "Documents/video" folder. For example, if the source video is "aa.mp4" and target video is "bb.mp4", you should run:

./create_fake_video.sh aa bb

The target can also be an audio file. This will get the speech from the target video/audio and manipulate the source video based on this speech.

When the run is complete, the resulting fake video will be inside the "Documents/video" folder.

Tuning

In preprocess/ds_to_flame_params.py, there are two parameters "jaw_gain" and "jaw_closure". jaw_gain determines how much the mouth opens given the speech. Higher gain leads to more movement at the mouth. jaw_closure determines how much the mouth should be closed during silence. By default, there is an offset in mouth in silence, so it aims to solve this. Current parameters seem to work well, but it can be experimented with other parameters to test the effects.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
pix2pix		pix2pix
preprocess		preprocess
.gitignore		.gitignore
create_fake_video.sh		create_fake_video.sh
readme.md		readme.md
requirements.txt		requirements.txt
train_source.sh		train_source.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Audio Driven Speech Mimicry

Requirements (only first time using the code)

Run

Tuning

About

Uh oh!

Releases

Packages

Uh oh!

Languages

cmakcay/audio-driven-talking-video

Folders and files

Latest commit

History

Repository files navigation

Audio Driven Speech Mimicry

Requirements (only first time using the code)

Run

Tuning

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages