GitHub - aza-ali/speech-to-text-in-python: Local audio transcription using python and a neural net to transcribe spoken words from an audio file into text.

Goal: Transcribe audio to text using local processing without sending sensitive data to the cloud.

Key Features:

Accurate: OpenAI Whisper powered transcription. Robust: Error handling for common issues. Flexible: Model choice for speed/accuracy. Simple: Terminal input for audio file path. Automatic: Saves transcript to a .txt file. Versatile: Wide audio format support (via ffmpeg). Quick Setup:

Prerequisites:

Python: Install Python 3.7+ from python.org.
FFmpeg: Crucial for audio processing. See "FFmpeg Install (Windows)" below if needed.

Installation:

Open Terminal/Command Prompt.
Run:

pip install openai-whisper ffmpeg-python torchaudio

FFmpeg Install (Windows - If needed):

Download: From [FFmpeg website link] (Gyan.dev builds recommended). Get the static ZIP for Windows.
Extract: To a folder (e.g., C:\ffmpeg).
Add to PATH:
Win + R, type sysdm.cpl, Enter.
Advanced tab -> Environment Variables.
Under "System variables," find and edit "Path."
Add a "New" entry: Path to your ffmpeg\bin folder (e.g., C:\ffmpeg\bin).
OK all windows to save.
Verify: Open new terminal, run ffmpeg -version. Should show version info.

Usage:

Save script: Save the Python code as speech_to_text.py.

import whisper
import os

# Function to load the model and handle errors
def load_model():
    try:
        model = whisper.load_model("base")  # You can change to "small", "medium", or "large"
        return model
    except Exception as e:
        print(f"Error loading the model: {e}")
        return None

# Function to transcribe audio
def transcribe_audio(model, audio_path):
    if not os.path.exists(audio_path):
        print("Error: Audio file does not exist.")
        return None

    try:
        # Perform transcription
        result = model.transcribe(audio_path)
        return result['text']
    except Exception as e:
        print(f"Error transcribing audio: {e}")
        return None

# Main code execution
def main():
    # Load the Whisper model
    model = load_model()

    if model:
        # Path to the audio file
        audio_path = r"C:\Users\rtx11\Downloads\newsroundup.wav"  # Update with your actual audio file path
        
        # Transcribe the audio
        transcription = transcribe_audio(model, audio_path)
        
        if transcription:
            print("Transcription: ", transcription)
        else:
            print("Transcription failed.")
    else:
        print("Failed to load Whisper model.")

# Run the main function
if __name__ == "__main__":
    main()

Open terminal, navigate to script folder (cd ...),
Run

python speech_to_text.py

Enter audio path: When prompted, paste/type your audio file path and press Enter.
Output: Transcription text in terminal and saved as audio_file_name.txt in the same folder as audio.

Example:

Audio file: voice_note.wav in Downloads folder.
Run script, enter path (adjust username): C:\Users\YourUsername\Downloads\voice_note.wav
Transcript: voice_note.txt in Downloads folder.

Model Options (Edit Script for Change):

"tiny": Fastest, least accurate.
"base": Default, balanced speed/accuracy.
"small": Higher accuracy, slower.
"medium": Even higher accuracy, slower.
"large": Highest accuracy, slowest, needs more resources.
To change: Edit speech_to_text.py, find model_name="base" in transcribe_with_whisper function, change "base" to your desired model (e.g., "small").

Troubleshooting (Common Issues):

FFmpeg errors:
Re-install FFmpeg, check PATH setup, restart terminal, verify with ffmpeg -version.
FileNotFoundError:
Check audio path spelling, capitalization, file exists at path, use absolute paths.
Model loading errors:
Internet needed for first model download, ensure connection, check disk space.
Poor accuracy:
Improve audio quality (less noise), try larger Whisper model, consider language/technical vocabulary limitations.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
README.md		README.md
text-to-speech.py		text-to-speech.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

aza-ali/speech-to-text-in-python

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages