Audio-Visual Speech Enhancement with Score-Based Generative Models

This repository contains the inference code for the paper "Audio-Visual Speech Enhancement with Score-Based Generative Models".

Setup

A requirements.txt file has been created using pip freeze from the virtual environment. Not all packages listed may be necessary for your use. To install all the packages, use the following command:

pip install -r requirements.txt

Inference

To run the inference code, use the following command:

python src/eval.py --noisy_dir $noisy_dir --video_roi_dir $video_roi_dir --out_dir $out_dir

where $noisy_dir is the directory containing the noisy audio files and $video_roi_dir is the directory containing the video ROI files. The output will be saved in the directory $out_dir.

Make sure that the noisy audio and video files are named the same way, e.g., audio_0001.wav and video_0001.mp4.

Citation

If you find this code useful, please consider citing the following paper:

@inproceedings{richter2023audio,
  title={Audio-visual speech enhancement with score-based generative models},
  author={Richter, Julius and Frintrop, Simone and Gerkmann, Timo},
  booktitle={Proceedings of ITG Conference on Speech Communication},
  pages={275--279},
  doi={10.30420/456164054},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
checkpoints/hydra		checkpoints/hydra
src		src
.gitignore		.gitignore
.project-root		.project-root
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio-Visual Speech Enhancement with Score-Based Generative Models

Setup

Inference

Citation

About

Releases

Packages

Languages

sp-uhh/avgen

Folders and files

Latest commit

History

Repository files navigation

Audio-Visual Speech Enhancement with Score-Based Generative Models

Setup

Inference

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages