Skip to content

A realtime speech to text grpc server made with faster-whisper

Notifications You must be signed in to change notification settings

dariopellegrino00/whisper_realtime_server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

whisper_realtime_server

Warning

Most of the informations in README are work in progress whisper_realtime_server is under development.

Installation

Building with Docker

Prerequisites

Make sure Docker is installed. Follow the official Docker Installation Guide if needed.

Clone the repository:

git clone https://github.com/dariopellegrino00/whisper_realtime_server.git
cd whisper_realtime_server

Before builing using docker

  • When built, you will be able to test the server with your mic or a simulation of realtime audio streaming using audio files
  • there are already two audio examples in the resources folder, if you want to add new ones, then BEFORE the next steps, add the audio files to whisper_realtime_server/resources

Steps to Build and Run the Docker Image

  1. Navigate to the project root directory:

    cd whisper_realtime_server
  2. Build the Docker image:

    docker build -t whisper_realtime_server .
  3. Run the Docker container with GPU support and port mapping:

    docker run --gpus all -p 50051:50051 --name whisper_server whisper_realtime_server
    • You can change the port range 50051:50051 if needed. Remember to change port on whisper_server.py and gprcclient.py.
    • The server is now running and ready to accept connections. You can access it at port 50051 using the grpcclient.py script.
  4. To stop the Docker container:

    docker stop whisper_server
  5. To restart the Docker container:

    docker start whisper_server

Running the test client

if you want to run the client directly in the docker container follow these steps:

  1. Ensure the container in running:

    docker ps 

    if you see whisper_server listed then you are good to go, otherwise start the container

    docker start whisper_server
  2. Open a terminal in the container

    docker exec -it whisper_server /bin/bash 

    now you should see something like:

    root@<imageid>:/app/src# 
  3. Run the grpc client

    • Run the client using your system microphone:

      python3 grpcclient.py 
    • Run a realtime simulation using an audio file:

      python3 grpcclient.py --simulate ../resources/sample1.wav 

      You can also try a new interactive mode (WORK IN PROGRESS):

      python3 grpcclient.py --simulate ../resources/sample1.wav --interactive

      this will print results as sentences insead of timestamps and segments on the client. Standard output:

      0 600 Hi my names
      1000 2300 is Dario, nice 
      3000 4500 to meet you.
      5000 7000 How are you?
      

      Interactive output:

      Hi my names is Dario, nice to meet you. 
      How are you? 
      

Custom Environment

Important

TODO

Whisper Server Config File JSON Tutorial

Caution

For now, avoid modifying the config.json file. If you need to experiment, it is advisable to only adjust the model size parameter.

Num Workers and Token Confirmation Threshold

Important

TODO: tweaking tutorial and explanation

Nvidia Developer Kit

The Nvidia Developer Kit is required for GPU support. The server has been tested with CUDA 12.X and cuDNN 9, as specified in the Dockerfile. The Whisper Streaming project has been tested with CUDA 11.7 and cuDNN 8.5.0, so it is recommended to use at least CUDA 11.7 and cuDNN 8.5.0.

Documentation

Important

TODO: Add more documentation

Before setting up your own client, it's important to understand the server architecture. The client first connects to a GRPC server on the default port (50051). After connecting, the GRPC server assigns a service to the client. Then the client streams audio data to this port, and receives real-time transcriptions.

Testing the server locally

Install all dependencies:

  • I suggest the use of python enviroments: Python Enviroments
  • Check requirements.txt for pip packages intallation
  • Check Dockerfile for addictional OS packages you may miss
  • An actual tutorial for local installations is in the TODO list
  1. Navigate to the src directory:

    Inside the repository folder get in src, run:

    cd src
  2. Run the server directly with Python:

    python3 whisper_server.py
  3. To use a microphone for audio input:

    python3 grpcclient.py
  4. To simulate audio streaming from a file:

    python3 grpcclient.py --simulate <file-audio-path> 

Credits

  • This project uses parts of the Whisper Streaming project. Other projects involved in whisper streaming are credited in their repo, check it out: whisper streaming
  • Credits also to: faster whisper

Contributing

This project is still in an early stage of development, and there may be significant bugs or issues. All contributions are welcome and greatly appreciated! If you'd like to contribute, here's how you can help:

  • Fork the repository.
  • Create a new branch for your feature or bug fix.
  • Submit a pull request with a clear description of your changes.

For major changes, please open an issue first to discuss what you'd like to change. Thank you for helping improve this project and making it better for everyone!

TODO

  • Rapidfuzz token confirmation
  • grpc implementation
  • Secure grpc connections
  • Custom enviroment setup
  • remove unused packages in Dockerfile and requirements

FIXED

  • Server fail to always return indipendent ports on concurrent requests now fixed
  • Send back last confirmed token when client send silent audio for a prolonged time (or no human speech audio)
  • Rarely other client words can end in others buffer
  • MultiProcessingFasterWhisperASR and the Grpc Speech to text services can get stuck with high number of streaming active concurrently (10 to 20)

KNOWN BUGS - UNKNOWN CAUSE

  • Random words like ok or thank you are transcribed when client stays silent

About

A realtime speech to text grpc server made with faster-whisper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published