Generating Noisy Speech Data using Deep Learning methods

A repository comprising of code for generation of noisy speech data from clean data in the frequency domain using deep learning methods.

We explore two architectures -- one uses a style transfer method and the other uses an image-to-image translation model.

Architectures:

Style-Transfer Method:

The code makes use of the official SinGAN implementaion to generate noisy spectrograms of audio data. We make use of the Paint2Image task of SinGAN.

Image-to-Image Translation:

This repo houses a modified version of CUT: Contrastive unpaired Translation GAN which we use to learn a mapping from clean to noisy spectrograms. We have tuned the model enough for it to work on spectrograms and produce recontructable audio. The code is heavily derived from the official implementation available at official CUT implementaion

Note

Refer to the directories pertaining to the two architectures to learn more and test them out for yourselves!

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
Code		Code
Links and Bookmarks		Links and Bookmarks
PPTs/Shashank		PPTs/Shashank
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generating Noisy Speech Data using Deep Learning methods

Architectures:

Style-Transfer Method:

Image-to-Image Translation:

Note

Consolidated list of important links

Documentation

About

Releases

Packages

Contributors 2

Languages

shashankshirol/GeneratingNoisySpeechData

Folders and files

Latest commit

History

Repository files navigation

Generating Noisy Speech Data using Deep Learning methods

Architectures:

Style-Transfer Method:

Image-to-Image Translation:

Note

Consolidated list of important links

Documentation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages