Warning
Most of the informations in README are work in progress whisper_realtime_server
is under development.
Make sure Docker is installed. Follow the official Docker Installation Guide if needed.
Clone the repository:
git clone https://github.com/dariopellegrino00/whisper_realtime_server.git
cd whisper_realtime_server
- When built, you will be able to test the server with your mic or a simulation of realtime audio streaming using audio files
- there are already two audio examples in the resources folder, if you want to add new ones, then BEFORE the next steps, add the audio files to
whisper_realtime_server/resources
-
Navigate to the project root directory:
cd whisper_realtime_server
-
Build the Docker image:
docker build -t whisper_realtime_server .
-
Run the Docker container with GPU support and port mapping:
docker run --gpus all -p 50051:50051 --name whisper_server whisper_realtime_server
- You can change the port range
50051:50051
if needed. Remember to change port onwhisper_server.py
andgprcclient.py
. - The server is now running and ready to accept connections. You can access it at port
50051
using thegrpcclient.py
script.
- You can change the port range
-
To stop the Docker container:
docker stop whisper_server
-
To restart the Docker container:
docker start whisper_server
if you want to run the client directly in the docker container follow these steps:
-
Ensure the container in running:
docker ps
if you see whisper_server listed then you are good to go, otherwise start the container
docker start whisper_server
-
Open a terminal in the container
docker exec -it whisper_server /bin/bash
now you should see something like:
root@<imageid>:/app/src#
-
Run the grpc client
-
Run the client using your system microphone:
python3 grpcclient.py
-
Run a realtime simulation using an audio file:
python3 grpcclient.py --simulate ../resources/sample1.wav
You can also try a new interactive mode (WORK IN PROGRESS):
python3 grpcclient.py --simulate ../resources/sample1.wav --interactive
this will print results as sentences insead of timestamps and segments on the client. Standard output:
0 600 Hi my names 1000 2300 is Dario, nice 3000 4500 to meet you. 5000 7000 How are you?
Interactive output:
Hi my names is Dario, nice to meet you. How are you?
-
Important
TODO
Caution
For now, avoid modifying the config.json
file. If you need to experiment, it is advisable to only adjust the model size parameter.
Important
TODO: tweaking tutorial and explanation
The Nvidia Developer Kit is required for GPU support. The server has been tested with CUDA 12.X and cuDNN 9, as specified in the Dockerfile. The Whisper Streaming project has been tested with CUDA 11.7 and cuDNN 8.5.0, so it is recommended to use at least CUDA 11.7 and cuDNN 8.5.0.
Important
TODO: Add more documentation
Before setting up your own client, it's important to understand the server architecture. The client first connects to a GRPC server on the default port (50051
). After connecting, the GRPC server assigns a service to the client. Then the client streams audio data to this port, and receives real-time transcriptions.
Install all dependencies:
- I suggest the use of python enviroments: Python Enviroments
- Check requirements.txt for pip packages intallation
- Check Dockerfile for addictional OS packages you may miss
- An actual tutorial for local installations is in the TODO list
-
Navigate to the
src
directory:Inside the repository folder get in
src
, run:cd src
-
Run the server directly with Python:
python3 whisper_server.py
-
To use a microphone for audio input:
python3 grpcclient.py
-
To simulate audio streaming from a file:
python3 grpcclient.py --simulate <file-audio-path>
- This project uses parts of the Whisper Streaming project. Other projects involved in whisper streaming are credited in their repo, check it out: whisper streaming
- Credits also to: faster whisper
This project is still in an early stage of development, and there may be significant bugs or issues. All contributions are welcome and greatly appreciated! If you'd like to contribute, here's how you can help:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Submit a pull request with a clear description of your changes.
For major changes, please open an issue first to discuss what you'd like to change. Thank you for helping improve this project and making it better for everyone!
- Rapidfuzz token confirmation
- grpc implementation
- Secure grpc connections
- Custom enviroment setup
- remove unused packages in Dockerfile and requirements
- Server fail to always return indipendent ports on concurrent requests now fixed
- Send back last confirmed token when client send silent audio for a prolonged time (or no human speech audio)
- Rarely other client words can end in others buffer
-
MultiProcessingFasterWhisperASR
and the Grpc Speech to text services can get stuck with high number of streaming active concurrently (10 to 20)
- Random words like
ok
orthank you
are transcribed when client stays silent