Skip to content

Sound Studio

noco-ai edited this page Feb 17, 2024 · 11 revisions

Text to Speech

The text to speech UI allows you to experiment directly with TTS model that Spell Book is running. The UI allows for each user to store generated sound files for later review and download. xTTS supports using ASR voice samples for voice generation.

image

  • #1 Text Prompt Text to turn into speech
  • #2 Selected Model Dropdown to select TTS generation model
  • #3 Current waveform Waveform of the currently loaded sound file
  • #4 Current WAV data Information and controls for currently loaded file
  • #5 Delete WAV Delete WAV file from the server
  • #6 Download WAV Download WAV file for TTS from server
  • #7 Load WAV Load and play the WAV file
  • #8 Advanced Settings Update generation voice used for each model

Speech Recognition

The ASR UI allow you to test speech recognition models, assign voice samples labels for TTS generation and more. Each user can upload files or make recording with their PC connected microphone.

image

  • #1 File Upload Upload WAV or other sound media for ASR processing
  • #2 Microphone Waveform Waveform for sound detected from microphone
  • #3 Selected Model ASR model selected to detect voice
  • #4 Input Microphone Dropdown selector for active microphone
  • #5 Mic Control Start/stop recording
  • #7 Loaded File Controls Controls playing currently loaded file and displays detected speech
  • #8 Delete WAV Delete the WAV file and data
  • #9 Download WAV Download the speech file from the server
  • #10 Assign Label Assign the speech file and label to be used in TTS generation

Music Generation

image

Clone this wiki locally