Skip to content

Containerized REST API for interacting with Hugging Face Faster Whisper models.

License

Notifications You must be signed in to change notification settings

doppeltilde/automatic_speech_recognition

Repository files navigation

Automatic Speech Recognition utilizing Faster Whisper.

Stack:

Installation

  • For ease of use it's recommended to use the provided docker-compose.yml. CPU Support: Use the latest tag.
services:
  automatic_speech_recognition:
    image: ghcr.io/doppeltilde/automatic_speech_recognition:latest
    ports:
      - "8000:8000"
    volumes:
      - models:/root/.cache/huggingface/hub:rw
    environment:
      - DEFAULT_ASR_MODEL_NAME
      - COMPUTE_TYPE
      - USE_API_KEYS
      - API_KEYS
    restart: unless-stopped

volumes:
  models:

NVIDIA GPU Support: Use the latest-cuda tag.

services:
  automatic_speech_recognition_cuda:
    image: ghcr.io/doppeltilde/automatic_speech_recognition:latest-cuda
    ports:
      - "8000:8000"
    volumes:
      - models:/root/.cache/huggingface/hub:rw
    environment:
      - DEFAULT_ASR_MODEL_NAME
      - COMPUTE_TYPE
      - USE_API_KEYS
      - API_KEYS
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [ gpu ]

volumes:
  models:
  • Create a .env file and set the preferred values.
DEFAULT_ASR_MODEL_NAME=base
COMPUTE_TYPE=float16

# False == Public Access
# True == Access Only with API Key
USE_API_KEYS=False

# Comma seperated api keys
API_KEYS=abc,123,xyz

Models

Any model designed and compatible with faster-whisper should work.

Usage

Note

Please be aware that the initial process may require some time, as the model is being downloaded.

Tip

Interactive API documentation can be found at: http://localhost:8000/docs


Notice: This project was initally created to be used in-house, as such the development is first and foremost aligned with the internal requirements.