Open Transcribe

A high-performance audio recording and transcription tool using OpenAI's Whisper model, built with Rust. Open Transcribe provides both server and client functionality in a single unified CLI application.

Features

Server Mode

Instance Reuse: Single Whisper model instance shared across all requests for optimal performance
Thread-Safe: Concurrent request handling with mutex-protected model access
Memory Efficient: Optimized audio processing and resampling
Fast Startup: Model loaded once at application startup
RESTful API: HTTP endpoints for health checks and transcription

Client Mode

File Transcription: Process existing audio files
Live Recording: Record audio directly from your microphone and transcribe
Cross-platform Audio: Works on Linux, Windows, macOS using the cpal library
Flexible Audio Settings: Configurable sample rate, channels, and bit depth
Real-time Feedback: Recording countdown and progress indicators

Model Management

Easy Downloads: Built-in model downloader for Whisper models
Multiple Model Sizes: Support for tiny, base, small, medium, large-v3 models
Flexible Storage: Choose where to store downloaded models

Configuration

Configure Whisper behavior using environment variables:

export WHISPER_MODEL_PATH="./models/ggml-base.en.bin"
export WHISPER_USE_GPU="true"
export WHISPER_LANGUAGE="en"
export WHISPER_AUDIO_CONTEXT="768"
export WHISPER_NO_SPEECH_THRESHOLD="0.6"
export WHISPER_NUM_THREADS="4"

Configuration Options:

WHISPER_MODEL_PATH: Path to the Whisper model file (default: ./models/ggml-base.en.bin)
WHISPER_USE_GPU: Enable GPU acceleration if available (default: true)
WHISPER_LANGUAGE: Target language code (default: en)
WHISPER_AUDIO_CONTEXT: Audio context window size (default: 768)
WHISPER_NO_SPEECH_THRESHOLD: Threshold for detecting speech vs silence (default: 0.6)
WHISPER_NUM_THREADS: Number of threads to use (default: auto-detected)

Installation

Build from source:

git clone https://github.com/your-username/open-transcribe
cd open-transcribe
cargo build --release

Download a Whisper model:

# Download base English model to ./models directory
open-transcribe download base

# Download to specific directory
open-transcribe download base ./my-models

# Available models: tiny, base, small, medium, large-v3
open-transcribe download large-v3

Usage

Open Transcribe provides a unified CLI with multiple subcommands:

Start the Server

# Start server on default host/port (127.0.0.1:8080)
open-transcribe serve

# Custom host and port
open-transcribe serve --host 0.0.0.0 --port 9000

Download Models

# Download base model to current directory
open-transcribe download base

# Download to specific directory
open-transcribe download large-v3 ./models

# Available models: tiny, base, small, medium, large-v3

Transcribe Audio Files

# Transcribe existing audio file (server must be running)
open-transcribe file audio.wav

# Use custom server URL
open-transcribe file audio.wav --server-url http://192.168.1.100:8080

# Specify audio format details
open-transcribe file audio.wav --sample-rate 44100 --channels 2 --bit-depth 24

Record and Transcribe

# Record 5 seconds (default) and transcribe
open-transcribe record

# Record for 10 seconds
open-transcribe record --duration 10

# Record with high-quality settings
open-transcribe record --duration 15 --sample-rate 44100 --channels 2 --bit-depth 24

# Use custom server
open-transcribe record --server-url http://my-server:8080

API Endpoints

When running in server mode, the following HTTP endpoints are available:

Health Check

GET /api/v1/health

Transcribe Audio

POST /api/v1/transcribe

Multipart Form Data:

audio (required): Raw audio data
sample_rate (optional): Audio sample rate (default: 16000)
channels (optional): Number of channels (default: 1)
bit_depth (optional): Bit depth - 16, 24, or 32 (default: 16)

Response:

{
  "text": "Complete transcription text",
  "segments": [
    {
      "start": 0,
      "end": 1000,
      "text": "Hello world",
      "confidence": 0.95
    }
  ]
}

Example using curl:

curl -X POST http://localhost:8080/api/v1/transcribe \
  -F "[email protected]" \
  -F "sample_rate=16000" \
  -F "channels=1" \
  -F "bit_depth=16"

Complete Workflow Example

Here's a complete example from setup to transcription:

# 1. Build the application
cargo build --release

# 2. Download a model
open-transcribe download base

# 3. Start the server in one terminal
open-transcribe serve

# 4. In another terminal, record and transcribe audio
open-transcribe record --duration 10

# Example output:
# 🎵 Open Transcribe
# ==================
# 🎤 Recording for 10 seconds...
# 🔴 Recording starting in...
#    3... 2... 1... 🎙️  GO!
#    10 seconds remaining...
#    5 seconds remaining...
#    Recording complete!
#
# ✅ Transcription completed!
# 📝 Result:
# {
#   "text": "Hello, this is a test of the audio recording feature.",
#   "segments": [
#     {
#       "start": 0,
#       "end": 3500,
#       "text": "Hello, this is a test of the audio recording feature.",
#       "confidence": 0.92
#     }
#   ]
# }

Audio Format Support

Sample Rates: Any rate supported by your audio device (commonly 8kHz to 192kHz)
Channels: Mono (1) or Stereo (2)
Bit Depths: 16-bit, 24-bit, or 32-bit PCM
Input Devices: Automatic detection of default microphone

Performance Optimizations

This implementation includes several key optimizations:

Singleton Pattern: The Whisper model is loaded once at startup and reused for all requests
Thread Safety: Uses Arc<Mutex<>> to safely share the model across concurrent requests
Comprehensive Error Handling: Proper error messages with detailed error propagation
Optimized Audio Processing: Efficient sample conversion and resampling
Resource Management: Proper cleanup and memory management
Environment-based Configuration: Flexible configuration through environment variables

Requirements

Rust: 1.70+ (2024 edition)
Audio System: ALSA (Linux), WASAPI (Windows), CoreAudio (macOS)
CUDA Toolkit: Optional, for GPU acceleration
Whisper Model: Download using the built-in download command

Troubleshooting

Recording Issues

"No input device available": Check that your microphone is connected and recognized by the system
Permission errors: On some systems, microphone access may require additional permissions
Audio quality issues: Try adjusting sample rate and bit depth settings for your hardware

Server Issues

"Cannot connect to server": Ensure the server is running with open-transcribe serve
Model loading errors: Verify the model path exists and is accessible
GPU issues: Disable GPU with WHISPER_USE_GPU=false if experiencing CUDA problems

Model Issues

Download failures: Check internet connection and try different model sizes
Model path errors: Ensure the downloaded model path matches WHISPER_MODEL_PATH

Development

To contribute to Open Transcribe:

# Clone and build
git clone https://github.com/your-username/open-transcribe
cd open-transcribe
cargo build

# Run tests
cargo test

# Run with logging
RUST_LOG=debug cargo run -- serve

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github/workflows		.github/workflows
.vscode		.vscode
src		src
.env		.env
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Open Transcribe

Features

Server Mode

Client Mode

Model Management

Configuration

Installation

Usage

Start the Server

Download Models

Transcribe Audio Files

Record and Transcribe

API Endpoints

Health Check

Transcribe Audio

Complete Workflow Example

Audio Format Support

Performance Optimizations

Requirements

Troubleshooting

Recording Issues

Server Issues

Model Issues

Development

License

About

Uh oh!

Releases 1

Packages

Languages

ThilinaTLM/open-transcribe

Folders and files

Latest commit

History

Repository files navigation

Open Transcribe

Features

Server Mode

Client Mode

Model Management

Configuration

Installation

Usage

Start the Server

Download Models

Transcribe Audio Files

Record and Transcribe

API Endpoints

Health Check

Transcribe Audio

Complete Workflow Example

Audio Format Support

Performance Optimizations

Requirements

Troubleshooting

Recording Issues

Server Issues

Model Issues

Development

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages