Multispeaker and new neural voice creation #88

kafan1986 · 2022-11-07T10:48:51Z

I used the fastpitch model for generating TTS for know speaker. Can I extend this model to multispeakers by using speaker embedding? If yes, then can the solution be used to extended so as to fine tune and mimic a new voice on limited audio data? Has anyone experimented on this path?

cschaefer26 · 2022-12-20T13:49:20Z

Hi, just to let you know I am currently working on a multispeaker implementation that will be live soon. Fine-tuning is possible with about 5mins of fresh data.

kafan1986 · 2023-01-04T20:05:57Z

@cschaefer26 I can see you are actively developing multi-speaker implementation in one of the branches. Is it at a stage where I can experiment with it or should I wait some more?

cschaefer26 · 2023-01-05T09:30:29Z

Hi, yeah I am currently implementing it in the below branch:

https://github.com/as-ideas/ForwardTacotron/tree/feature/multispeaker

Its probably going to be ready in 2 weeks or so. I am currently testing it on the VCTK dataset and cannot guarantee it is working properly. It could be worth a try though if you like, training is implemented, inference will come soon. Use the multispeaker.yaml config, it supports vctk and a variant of the ljspeech format (can be set in preprocessing.audio_format). For the ljspeech format it expects rows as: id|speaker_id|text

kafan1986 · 2023-01-06T14:18:20Z

@cschaefer26 Thanks for the update. I will wait for another 2 weeks before experimenting with it. GPU time is expensive at my end. But I think you have only made the multi-speaker TTS with forward tacotron and not with FastPitch. Is it so? As per my previous experiment, FastPitch gave slightly better output quality compared to ForwardTacotron, can we get a FastPitch version of the same? Thanks again for all your work.

cschaefer26 · 2023-01-06T15:33:54Z

Hi, yeah I gonna implement both (ForwardTaco first, then FastPitch) - in my experience ForwardTaco is actually performing better, but it may depend on the dataset...

debasish-mihup · 2023-01-19T18:36:34Z

@cschaefer26 I can see you are still experimenting through multiple branches. Can you keep one provision for keeping emotion as a parameter. So that apart from providing speaker embedding during training phase, I would also be able to provide emotion type of the audio segment. In case this emotion information is not available, it can be assumed to be of "neutral" emotion.

kafan1986 · 2023-02-03T19:23:24Z

@cschaefer26 Is the multispeaker branch ready for testing? Also, can you create a branch with Fastpitch?

cschaefer26 · 2023-02-05T09:09:36Z

Hi multispeaker is merged and ready for testing. I tested it on a custom dataset but as always with such large merges, there may be bugs - pls let me know if you find anything fishy. My colleague @alexteua will work on implementing FastPitch from next week.

@debasish-mihup Currently there is no plan to support emotion conditioning in the vanilla version, but it should be easy to add in a branch if you like. Hint - you can simply concatenate it to the speaker embedding.
I would be curious if you are experimenting with an annotated dataset?

rmcpantoja · 2023-02-11T13:12:52Z

Hi @cschaefer26, Congratulations on the final work on multispeaker.
I would like to try this new multi-forward to make a pretrained model with more than 15 Spanish speakers. Each speaker has a dataset, so merging all into one would be a good idea. Each dataset lasts from a minimum of 10 minutes to a maximum of one hour and 30 minutes. How many hours does it take at least to make a decent model?

kafan1986 · 2023-02-18T08:33:31Z

@cschaefer26 @alexteua Thanks for the multispeaker variant. Is there any progress on the Fastpitch version of it? I could not find a working branch for that. Also, if I have to train the model and work decently for unseen speaker, what should be the usual no. of speakers in the training data for both genders and how much hours per speaker? Any idea based on your experimentation?

alexteua · 2023-02-20T07:42:47Z

hi @kafan1986 Fastpitch version is coming in the following days

alexteua · 2023-03-23T09:57:23Z

@kafan1986 multispeaker fastpitch is ready to use ( #95 )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multispeaker and new neural voice creation #88

Multispeaker and new neural voice creation #88

kafan1986 commented Nov 7, 2022

cschaefer26 commented Dec 20, 2022

kafan1986 commented Jan 4, 2023

cschaefer26 commented Jan 5, 2023 •

edited

Loading

kafan1986 commented Jan 6, 2023

cschaefer26 commented Jan 6, 2023

debasish-mihup commented Jan 19, 2023

kafan1986 commented Feb 3, 2023

cschaefer26 commented Feb 5, 2023 •

edited

Loading

rmcpantoja commented Feb 11, 2023

kafan1986 commented Feb 18, 2023 •

edited

Loading

alexteua commented Feb 20, 2023

alexteua commented Mar 23, 2023 •

edited

Loading

Multispeaker and new neural voice creation #88

Multispeaker and new neural voice creation #88

Comments

kafan1986 commented Nov 7, 2022

cschaefer26 commented Dec 20, 2022

kafan1986 commented Jan 4, 2023

cschaefer26 commented Jan 5, 2023 • edited Loading

kafan1986 commented Jan 6, 2023

cschaefer26 commented Jan 6, 2023

debasish-mihup commented Jan 19, 2023

kafan1986 commented Feb 3, 2023

cschaefer26 commented Feb 5, 2023 • edited Loading

rmcpantoja commented Feb 11, 2023

kafan1986 commented Feb 18, 2023 • edited Loading

alexteua commented Feb 20, 2023

alexteua commented Mar 23, 2023 • edited Loading

cschaefer26 commented Jan 5, 2023 •

edited

Loading

cschaefer26 commented Feb 5, 2023 •

edited

Loading

kafan1986 commented Feb 18, 2023 •

edited

Loading

alexteua commented Mar 23, 2023 •

edited

Loading