Skip to content

Sound Studio

noco-ai edited this page Feb 17, 2024 · 11 revisions

Text to Speech

The text to speech UI allows you to experiment directly with any TTS model that Spell Book is running. The UI allows for each user to store generated sound files for later review and download. xTTS supports using ASR voice samples for voice generation.

image

  • #1 Text Prompt Text to turn into speech
  • #2 Selected Model Dropdown to select TTS generation model
  • #3 Current waveform Waveform of the currently loaded sound file
  • #4 Current WAV data Information and controls for currently loaded file
  • #5 Delete WAV Delete WAV file from the server
  • #6 Download WAV Download WAV file for TTS from server
  • #7 Load WAV Load and play the WAV file
  • #8 Advanced Settings Update generation voice used for each model

Speech Recognition

The ASR UI allow you to test speech recognition models and assign labels for TTS generation. Each user can upload files or make recording with their PC connected microphone.

image

  • #1 File Upload Upload WAV or other sound media for ASR processing
  • #2 Microphone Waveform Waveform for sound detected from microphone
  • #3 Selected Model ASR model selected to detect voice
  • #4 Input Microphone Dropdown selector for active microphone
  • #5 Mic Control Start/stop recording
  • #7 Loaded File Controls Controls playing currently loaded file and displays detected speech
  • #8 Delete WAV Delete the WAV file and data
  • #9 Download WAV Download the speech file from the server
  • #10 Assign Label Assign the speech file and label to be used in TTS generation

Music Generation

The music generation UI lets you create sound files from text prompts using Meta's MusicGen models.

image

  • #1 Music Prompt Text string describing the music or sound effect you want generated
  • #2 Selected Model Dropdown to select what model to use for generation. Models are loaded in Skills Configuration
  • #3 Advanced Options Advanced generation options like file length and guidance scale
  • #4 Current Waveform Waveform of the currently loaded music file
  • #5 File Control Controls playing loaded music file and info about how it was generated.
  • #6 Delete WAV Delete the WAV file and data
  • #7 Download WAV Download the speech file from the server
  • #8 Play WAV Load the wav file and play it
Clone this wiki locally