whisper_realtime_server

Warning

Most of the informations in README are work in progress whisper_realtime_server is under development.

Installation

Building with Docker

Prerequisites

Make sure Docker is installed. Follow the official Docker Installation Guide if needed.

Clone the repository:

git clone https://github.com/dariopellegrino00/whisper_realtime_server.git
cd whisper_realtime_server

Before builing using docker

When built, you will be able to test the server with your mic or a simulation of realtime audio streaming using audio files
there are already two audio examples in the resources folder, if you want to add new ones, then BEFORE the next steps, add the audio files to whisper_realtime_server/resources

Steps to Build and Run the Docker Image

Navigate to the project root directory:
```
cd whisper_realtime_server
```

Build the Docker image:

docker build -t whisper_realtime_server .

Run the Docker container with GPU support and port mapping:
```
docker run --gpus all -p 50051:50051 --name whisper_server whisper_realtime_server
```
- You can change the port range 50051:50051 if needed. Remember to change port on whisper_server.py and gprcclient.py.
- The server is now running and ready to accept connections. You can access it at port 50051 using the grpcclient.py script.
To stop the Docker container:
```
docker stop whisper_server
```
To restart the Docker container:
```
docker start whisper_server
```

Running the test client

if you want to run the client directly in the docker container follow these steps:

Ensure the container in running:
```
docker ps 
```
if you see whisper_server listed then you are good to go, otherwise start the container
```
docker start whisper_server
```

Open a terminal in the container

docker exec -it whisper_server /bin/bash

now you should see something like:

root@<imageid>:/app/src#

Run the grpc client

Run the client using your system microphone:
```
python3 grpcclient.py 
```

Run a realtime simulation using an audio file:

python3 grpcclient.py --simulate ../resources/sample1.wav

You can also try a new interactive mode (WORK IN PROGRESS):

python3 grpcclient.py --simulate ../resources/sample1.wav --interactive

this will print results as sentences insead of timestamps and segments on the client. Standard output:

0 600 Hi my names
1000 2300 is Dario, nice 
3000 4500 to meet you.
5000 7000 How are you?

Interactive output:

Hi my names is Dario, nice to meet you. 
How are you?

Custom Environment

Important

TODO

Whisper Server Config File JSON Tutorial

Caution

For now, avoid modifying the config.json file. If you need to experiment, it is advisable to only adjust the model size parameter.

Num Workers and Token Confirmation Threshold

Important

TODO: tweaking tutorial and explanation

Nvidia Developer Kit

The Nvidia Developer Kit is required for GPU support. The server has been tested with CUDA 12.X and cuDNN 9, as specified in the Dockerfile. The Whisper Streaming project has been tested with CUDA 11.7 and cuDNN 8.5.0, so it is recommended to use at least CUDA 11.7 and cuDNN 8.5.0.

Documentation

Important

TODO: Add more documentation

Before setting up your own client, it's important to understand the server architecture. The client first connects to a GRPC server on the default port (50051). After connecting, the GRPC server assigns a service to the client. Then the client streams audio data to this port, and receives real-time transcriptions.

Testing the server locally

Install all dependencies:

I suggest the use of python enviroments: Python Enviroments
Check requirements.txt for pip packages intallation
Check Dockerfile for addictional OS packages you may miss
An actual tutorial for local installations is in the TODO list

Navigate to the src directory:

Inside the repository folder get in src, run:
```
cd src
```
Run the server directly with Python:
```
python3 whisper_server.py
```
To use a microphone for audio input:
```
python3 grpcclient.py
```

To simulate audio streaming from a file:

python3 grpcclient.py --simulate <file-audio-path>

Credits

This project uses parts of the Whisper Streaming project. Other projects involved in whisper streaming are credited in their repo, check it out: whisper streaming
Credits also to: faster whisper

Contributing

This project is still in an early stage of development, and there may be significant bugs or issues. All contributions are welcome and greatly appreciated! If you'd like to contribute, here's how you can help:

Fork the repository.
Create a new branch for your feature or bug fix.
Submit a pull request with a clear description of your changes.

For major changes, please open an issue first to discuss what you'd like to change. Thank you for helping improve this project and making it better for everyone!

TODO

Rapidfuzz token confirmation
grpc implementation
Secure grpc connections
Custom enviroment setup
remove unused packages in Dockerfile and requirements

FIXED

Server fail to always return indipendent ports on concurrent requests now fixed
Send back last confirmed token when client send silent audio for a prolonged time (or no human speech audio)
Rarely other client words can end in others buffer
MultiProcessingFasterWhisperASR and the Grpc Speech to text services can get stuck with high number of streaming active concurrently (10 to 20)

KNOWN BUGS - UNKNOWN CAUSE

Random words like ok or thank you are transcribed when client stays silent

Name		Name	Last commit message	Last commit date
Latest commit History 180 Commits
proto		proto
resources		resources
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

whisper_realtime_server

Installation

Building with Docker

Prerequisites

Before builing using docker

Steps to Build and Run the Docker Image

Running the test client

Custom Environment

Whisper Server Config File JSON Tutorial

Num Workers and Token Confirmation Threshold

Nvidia Developer Kit

Documentation

Testing the server locally

Credits

Contributing

TODO

FIXED

KNOWN BUGS - UNKNOWN CAUSE

About

Releases

Packages

Languages

dariopellegrino00/whisper_realtime_server

Folders and files

Latest commit

History

Repository files navigation

whisper_realtime_server

Installation

Building with Docker

Prerequisites

Before builing using docker

Steps to Build and Run the Docker Image

Running the test client

Custom Environment

Whisper Server Config File JSON Tutorial

Num Workers and Token Confirmation Threshold

Nvidia Developer Kit

Documentation

Testing the server locally

Credits

Contributing

TODO

FIXED

KNOWN BUGS - UNKNOWN CAUSE

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages