add google tts for all voice families #73

mobarski · 2024-10-15T22:51:35Z

I've added the support for google's tts voices (tested: studio, journey, wavenet, neural, standard).
The new method also shows how to render each speaker as a separate audio track and how to combine both tracks into a single output. The tracks are also saved separately to facilitate workflows with animating AI avatars.

I haven't touched pyproject.toml (or requirements.txt) as poetry had issues with pyprojects.
Please feel free to adjust the code as you wish to merge it as in next few days I might have less time for FOSS projects.

souzatharsis

Please add section to config doc on how to setup google cloud in order to run google tts model. I would benefit from that too in order to test the PR :)

https://github.com/souzatharsis/podcastfy/blob/main/usage/config.md

mobarski · 2024-10-17T17:29:10Z

Configuring google's TTS is pain in the a**. Here, I've tried to capture all the required steps but there is >0 chance that I've missed something. When you will be replicating these steps - please take notes, they might be required to create proper documentation.

Enable text to speech API

go to "https://console.cloud.google.com/apis/dashboard"
select your project (or create one by clicking on project list and then on "new project"
click "+ ENABLE APIS AND SERVICES" at the top of the screen
enter "text-to-speech" into the search box
click on "Cloud Text-to-Speech API" and then on "ENABLE"
you should be here: "https://console.cloud.google.com/apis/library/texttospeech.googleapis.com?project=..."

Configure Billing account

click "..." on the left of the profile picture (top-right corner)
select "Payment method" and add payment method
again click on the "..." near the profile picture
select "Billing account management"
enable billing in your project

Configure API client

open the terminal
install google cloud CLI tools, on ubuntu its sudo snap install google-cloud-cli
create application credentials file by running gcloud auth application-default login
the browser will open - log into your account
in the terminal you should see information about the file with your credentials Credentials saved to file: /home/mobarski/.config/gcloud/application_default_credentials.json
set environment variable GOOGLE_APPLICATION_CREDENTIALS to path of that file
install python package pip3 install google-cloud-texttospeech

souzatharsis · 2024-10-18T00:36:28Z

thank you so much for the detailed instructions;

evandempsey · 2024-10-24T13:13:36Z

Configuring google's TTS is pain in the a**. Here, I've tried to capture all the required steps but there is >0 chance that I've missed something. When you will be replicating these steps - please take notes, they might be required to create proper documentation.

Enable text to speech API

go to "https://console.cloud.google.com/apis/dashboard"

select your project (or create one by clicking on project list and then on "new project"

click "+ ENABLE APIS AND SERVICES" at the top of the screen

enter "text-to-speech" into the search box

click on "Cloud Text-to-Speech API" and then on "ENABLE"

you should be here: "https://console.cloud.google.com/apis/library/texttospeech.googleapis.com?project=..."

Configure Billing account

click "..." on the left of the profile picture (top-right corner)

select "Payment method" and add payment method

again click on the "..." near the profile picture

select "Billing account management"

enable billing in your project

Configure API client

open the terminal

install google cloud CLI tools, on ubuntu its sudo snap install google-cloud-cli

create application credentials file by running gcloud auth application-default login

the browser will open - log into your account

in the terminal you should see information about the file with your credentials Credentials saved to file: /home/mobarski/.config/gcloud/application_default_credentials.json

set environment variable GOOGLE_APPLICATION_CREDENTIALS to path of that file

install python package pip3 install google-cloud-texttospeech

You can get around all this by just enabling the Cloud Text-to-Speech API on the API key you are already using for Gemini and passing it in when you instantiate the client.

client = texttospeech.TextToSpeechClient(client_options={'api_key': os.environ['GOOGLE_API_KEY']})

souzatharsis · 2024-10-24T13:15:10Z

Thank you so much!! Will try it out.

…

On Thu, Oct 24, 2024, 10:14 AM Evan Dempsey ***@***.***> wrote: Configuring google's TTS is pain in the a**. Here, I've tried to capture all the required steps but there is >0 chance that I've missed something. When you will be replicating these steps - *please take notes*, they might be required to create proper documentation. Enable text to speech API 1. go to "https://console.cloud.google.com/apis/dashboard" 2. select your project (or create one by clicking on project list and then on "new project" 3. click "+ ENABLE APIS AND SERVICES" at the top of the screen 4. enter "text-to-speech" into the search box 5. click on "Cloud Text-to-Speech API" and then on "ENABLE" 6. you should be here: " https://console.cloud.google.com/apis/library/texttospeech.googleapis.com?project= ..." Configure Billing account 1. click "..." on the left of the profile picture (top-right corner) 2. select "Payment method" and add payment method 3. again click on the "..." near the profile picture 4. select "Billing account management" 5. enable billing in your project Configure API client 1. open the terminal 2. install google cloud CLI tools, on ubuntu its sudo snap install google-cloud-cli 3. create application credentials file by running gcloud auth application-default login 4. the browser will open - log into your account 5. in the terminal you should see information about the file with your credentials Credentials saved to file: /home/mobarski/.config/gcloud/application_default_credentials.json 6. set environment variable GOOGLE_APPLICATION_CREDENTIALS to path of that file 7. install python package pip3 install google-cloud-texttospeech You can get around all this by just enabling the Cloud Text-to-Speech API on the API key you are already using for Gemini and passing it in when you instantiate the client. client = texttospeech.TextToSpeechClient(client_options={'api_key': os.environ['GOOGLE_API_KEY']}) — Reply to this email directly, view it on GitHub <#73 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADTMY3JRAOH3MVDSKOZ3TNDZ5DXBNAVCNFSM6AAAAABQAFJAOCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZVGI3DKMJUGQ> . You are receiving this because you commented.Message ID: ***@***.***>

souzatharsis · 2024-11-05T20:06:23Z

@evandempsey It didn't work...

"Requests to this API texttospeech.googleapis.com method google.cloud.texttospeech.v1.TextToSpeech.SynthesizeSpeech are blocked."

Where did you import it from?

from google.cloud import texttospeech

evandempsey · 2024-11-05T20:33:32Z

@souzatharsis Yes, that's the one.

You probably need to add the API permission to the key you're using on the Google Cloud console.

Go to https://console.cloud.google.com/apis/credentials, click on whatever key you're using for Gemini, then go down to API Restrictions and add the Cloud Text-to-Speech API.

souzatharsis · 2024-11-05T20:51:27Z

you are genius! (and GCloud is a maze) It works! thanks! <http://linkedin.com/in/tharsissouza>

…

On Tue, Nov 5, 2024 at 5:33 PM Evan Dempsey ***@***.***> wrote: @souzatharsis <https://github.com/souzatharsis> Yes, that's the one. You probably need to add the API permission to the key you're using on the Google Cloud console. Go to https://console.cloud.google.com/apis/credentials, click on whatever key you're using, then go down to *API Restrictions* and add the *Cloud Text-to-Speech API*. — Reply to this email directly, view it on GitHub <#73 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADTMY3LB22KV2ATDS2TQJNTZ7ETTDAVCNFSM6AAAAABQAFJAOCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJYGEYDCOBQGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

souzatharsis · 2024-11-06T00:24:54Z

I've managed to integrate podcastfy with Google's multispeaker model and I think we've found what NotebookLM is using...

I am curious about your feedback before we merge into main. @brumar @mobarski @evandempsey @lfnovo

Should we make this the default TTS model?
Only a Gemini API Key would be required for running it end-to-end.

Transformer paper pdf

https://www.veed.io/view/eb65150f-ef2a-447c-8cb9-43674453ca8f?panel=share

Website: www.open-notebook.ai

https://www.veed.io/view/4c514532-9311-41a6-8af6-9053e14f7a5b?panel=share

evandempsey · 2024-11-06T10:24:02Z

Ah, you think they're using this? https://cloud.google.com/text-to-speech/docs/create-dialogue-with-multispeakers

It sounds great. It sounds a bit more natural than what I was able to achieve in my experiments by burning through Elevenlabs credits.

My concern about setting it as the default is the rather irritating GCloud setup you are forcing on people then. But it should definitely be an option.

Have you found out the maximum length of audio you can synthesize with this? It doesn't seem to be documented.

souzatharsis · 2024-11-06T10:27:24Z

It's short per turn. I had to update the prompt such that speaker text max lenght is about 333 characters per turn. The setup was not that painful. Just get the api key, enable the TTS service and do the last step you mentioned. It's multiple clicks but once done you have a better quality and cheaper option to ElevenLabs. Thanks for your feedback.

…

On Wed, Nov 6, 2024, 7:24 AM Evan Dempsey ***@***.***> wrote: Ah, you think they're using this? https://cloud.google.com/text-to-speech/docs/create-dialogue-with-multispeakers It sounds great. It sounds a bit more natural than what I was able to achieve in my experiments by burning through Elevenlabs credits. My concern about setting it as the default is the rather irritating GCloud setup you are forcing on people then. But it should definitely be an option. Have you found out the maximum length of audio you can synthesize with this? It doesn't seem to be documented. — Reply to this email directly, view it on GitHub <#73 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADTMY3IYRA43B6QE2BRQWPDZ7HU5RAVCNFSM6AAAAABQAFJAOCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJZGIZTQMRSHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

souzatharsis · 2024-11-06T13:38:35Z

OK, done.
I've added set up instructions: https://github.com/souzatharsis/podcastfy/blob/main/usage/config.md
I'd welcome feedback.
Thanks!

souzatharsis · 2024-11-16T22:48:20Z

Thank you so much for the feedback!

Google's Multispeaker and Journey models have been released: v0.4.0.
(so much trouble to get them working since they have several limitations: 5000 bytes max in input and 1500 bytes max per turn).

All sample audio in README have been updated to use the new TTS Model. Added some longform podcasts too.

I've updated python notebook describing longform podcast + new Google TTS model work:

https://github.com/souzatharsis/podcastfy/blob/main/podcastfy.ipynb

Would love your feedback!
Do you think it's closer to NotebookLM's?

add google tts for all voice families

12051b5

souzatharsis requested changes Oct 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add google tts for all voice families #73

add google tts for all voice families #73

mobarski commented Oct 15, 2024

souzatharsis left a comment

mobarski commented Oct 17, 2024

souzatharsis commented Oct 18, 2024

evandempsey commented Oct 24, 2024

Enable text to speech API

Configure Billing account

Configure API client

souzatharsis commented Oct 24, 2024 via email

souzatharsis commented Nov 5, 2024

evandempsey commented Nov 5, 2024 •

edited

Loading

souzatharsis commented Nov 5, 2024 via email

souzatharsis commented Nov 6, 2024

evandempsey commented Nov 6, 2024

souzatharsis commented Nov 6, 2024 via email

souzatharsis commented Nov 6, 2024

souzatharsis commented Nov 16, 2024

add google tts for all voice families #73

Are you sure you want to change the base?

add google tts for all voice families #73

Conversation

mobarski commented Oct 15, 2024

souzatharsis left a comment

Choose a reason for hiding this comment

mobarski commented Oct 17, 2024

Enable text to speech API

Configure Billing account

Configure API client

souzatharsis commented Oct 18, 2024

evandempsey commented Oct 24, 2024

Enable text to speech API

Configure Billing account

Configure API client

souzatharsis commented Oct 24, 2024 via email

souzatharsis commented Nov 5, 2024

evandempsey commented Nov 5, 2024 • edited Loading

souzatharsis commented Nov 5, 2024 via email

souzatharsis commented Nov 6, 2024

evandempsey commented Nov 6, 2024

souzatharsis commented Nov 6, 2024 via email

souzatharsis commented Nov 6, 2024

souzatharsis commented Nov 16, 2024

evandempsey commented Nov 5, 2024 •

edited

Loading