Skip to content

an optimized, production-ready implementation of active speaker detection

License

Notifications You must be signed in to change notification settings

AlexJrDevs/fast-asd

 
 

Repository files navigation

fast-asd-updated

Run fast-asd on local with gpu / cpu ( It automatically checks to see if your gpu has cuda ), and use it as a python library

Example Usage:

file = path/to/your/file

videotracker = VideoTalkingTracker()

data = videotrack.process(file)

fast-asd

This repository is an optimized, production-ready implementation of active speaker detection. Read more about the research area here.

It contains of two parts:

  • The open-source implementation of the active speaker detection application that runs on the Sieve platform.
  • The standalone, optimized implementation of TalkNet, a leading model for active speaker detection.

The TalkNet implementation significantly improve on the original primarily from the perspective of performance. The pre-processing and post-processing steps are faster and it support variable frame-rate videos (not just 25 FPS like the original). The active speaker detection implementation is a further productionized version of this that parallelizes processing through TalkNet and a separate standalone face detection model to provide faster, higher-quality speaker tracking and detection results.

Usage

TalkNet

If you plan to just use the standalone implementation of TalkNet, follow the steps below:

  1. go to the talknet directory
  2. run pip install -r requirements.txt
  3. run python main.py

You can change the input video file being used by modifying the main function in main.py.

Active Speaker Detection

The easiest way to run active speaker detection is to use the version already deployed on the Sieve platform available here.

While the core application can be run locally, it still calls public functions available on Sieve, such at the YOLO object detection model so you will need to sign up for a free account and get an API key. You can do so here.

After you've signed up and run sieve login, you can run main.py from the root directory of this repository to run the active speaker detection application.

About

an optimized, production-ready implementation of active speaker detection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%