Skip to content

Transcribe is a real time transcription, conversation, Language learning platform. It provides live transcripts from microphone and speaker. It generates a suggested conversation response using OpenAI's GPT API. It will read out the responses, simulating a real live conversation in English or another language.

License

Notifications You must be signed in to change notification settings

vivekuppal/transcribe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

We all are here to help. File issues for any problems and we will resolve them.

Promo

kiwi-optimize

Kiwi

  • The First Advanced KVM over USB 3 - Streamline Your Multi-Device Workflow Effortlessly.

  • We use the product and love it. It makes cross platform development seamless with a windows laptop and a raspberry pi / linux mini pc / mac mini.

  • Use promo code KIWI-25765463-P5 for a 5% discount

Source Code Install Video

Thanks to Fahd Mirza for the installation video for Transcribe. Subscribe to his Youtube channel and read his blog.

Watch the video

πŸ‘‚πŸ»οΈ Transcribe ✍🏼️

Join the community

Write an email to receive invite to the community channel or share your email in an issue.

Transcribe provides real time transcription for microphone and speaker output. It generates a suggested conversation response using OpenAI's chatGPT (or OpenAI API compatible provider) relevant to the current conversation.

Why Transcribe over other Speech to Text apps

  • Use Most of the functionality for FREE
  • Multi Lingual support
  • Choose between GPT 4o, 4, 3.5 or other inference models from OpenAI, or a plethora of inference models from Together
  • Streaming LLM responses instead of waiting for a complete response
  • Upto date with the latest OpenAI libraries
  • Get LLM responses for selected text
  • Install and use without python or other dependencies
  • Security Features
  • Choose Audio Inputs (Speaker or Mic or Both)
  • Speech to Text
    • Offline - FREE
    • Online - paid
      • OpenAI Whisper - (Encouraged)
      • Deepgram
  • Chat Inference Engines
    • OpenAI
    • Together
    • Perplexity
    • Azure hosted OpenAI - Some users have reported requiring code changes to make Azure work. Feedback appreciated.
  • Conversation Summary
  • Prompt customization
  • Save chat history
  • Response Audio

Response Generation

Response generation requires a paid account with an API Key for an OpenAI API compatible provider like OpenAI (Encouraged) or Deepgram ($200 free credits) or Together ($25 free Credits) or Azure

Based on feedback from users, OpenAI gpt-4o model provides the best response generation capabilities. Earlier models work ok, but can sometimes provide inaccurate answers if there is not enough conversation content at the beginning. Together provides a large selection of Inference models. Any of these can be used by making changes to override.yaml file.

When using OpenAI, without the OpenAI key, using continuous response or any action that requires interaction with the online LLM gives an error similar to below

Error when attempting to get a response from LLM.
Error code: 401 - {'error': {'message': 'Incorrect API key provided: API_KEY. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

With a valid OpenAI key and no available credits, using continuous response gives an error similar to below

Error when attempting to get a response from LLM. Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}} 

Alt text On Demand Features Alt text

We develop mutually beneficial features on demand.

Create an issue in the repo to request mutually beneficial on demand features.

Connect on LinkedIn to discuss further.

Features

Security

  • Secret scanning: Continuous Integration with GitGuardian
  • Static Code Analysis: Regular static code scan scan with Bandit
  • Static Code Analysis: Incorporate Snyk for static analysis of code on every check in
  • Secure Transmission: All secure communications for any network communications
  • Dependency Security: All strictest security features enabled in the Github repo

Developer Guide

Developer Guide

Software Installation

Note that installation files are generated every few weeks. Generated binaries will almost always trail the latest codebase available in the repo.

Latest Binary

  • Generated: 2024-06-02
  • Git version: 3b3502d
  1. Install ffmpeg

First, install Chocolatey, a package manager for Windows.

Open PowerShell as Administrator and run the following command:

Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))

Once Chocolatey is installed, install FFmpeg by running the following command in PowerShell:

choco install ffmpeg

Run these commands in a PowerShell window with administrator privileges. For any issues during the installation, visit the official Chocolatey and FFmpeg websites for troubleshooting.

  1. Download the zip file from
https://drive.google.com/file/d/1kcgGbTKxZqgbJOShL0bc3Do34lLouYxF/view?usp=drive_link


Using GPU provides 2-3 times faster reseponse time depending on processing power of GPU.
  1. Unzip the files in a folder.

  2. (Optional) Add Open API key in override.yaml file in the transcribe directory:

    Create an OpenAI account or account from another provider

    Add OpenAI API key in override.yaml file manually. Open in a text editor and add these lines:

OpenAI:
   api_key: 'API_KEY'

Replace "API_KEY" with the actual OpenAI API key. Save the file.

  1. Execute the file transcribe\transcribe.exe\transcribe.exe

πŸ†• Best Performance with GPU πŸ₯‡

Application performs best with GPU support.

Make sure you have installed CUDA libraries if you have GPU: https://developer.nvidia.com/cuda-downloads

Application will automatically detect and use GPU once CUDA libraries are installed.

πŸ†• Getting Started πŸ₯‡

Follow below steps to run transcribe on your local machine.

πŸ“‹ Prerequisites

  • Python >=3.11.0
  • (Optional) An OpenAI API key (set up a paid OpenAI account)
  • Windows OS (Not tested on others)
  • FFmpeg

Steps to install FFmpeg on your system.

First, install Chocolatey, a package manager for Windows.

Open PowerShell as Administrator and run the following command:

Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))

Once Chocolatey is installed, install FFmpeg by running the following command in PowerShell:

choco install ffmpeg

Run these commands in a PowerShell window with administrator privileges. For any issues during the installation, visit the official Chocolatey and FFmpeg websites for troubleshooting.

πŸ”§ Code Installation

  1. Clone transcribe repository:

    git clone https://github.com/vivekuppal/transcribe
    
  2. Run setup file

    setup.bat
    
  3. (Optional) Provide OpenAI API key in override.yaml file in the transcribe directory:

    Create the following section in override.yaml file

    OpenAI:
      api_key: 'API_KEY'

    Alter the line:

      api_key: 'API_KEY'
    

    Replace "API_KEY" with the actual OpenAI API key. Save the file.

🎬 Running Transcribe

Run the main script from app\transcribe\ folder:

python main.py

Upon initiation, Transcribe will begin transcribing microphone input and speaker output in real-time, optionally generating a suggested response based on the conversation. It is suggested to use continuous response feature after 1-2 minutes, once there is enough content in transcription window to provide enough context to the LLM.

πŸ‘€ License πŸ“–

This project is licensed under the MIT License - see the LICENSE file for details.

🀝 Contributions 🀝

Contributions are welcome! Open issues or submit pull requests to improve Transcribe.

Videos

Acknowledgements

This project started out as a fork of ecoute. It has diverged significantly from the original implementation so we decided to remove the link to ecoute.

About

Transcribe is a real time transcription, conversation, Language learning platform. It provides live transcripts from microphone and speaker. It generates a suggested conversation response using OpenAI's GPT API. It will read out the responses, simulating a real live conversation in English or another language.

Topics

Resources

License

Stars

Watchers

Forks