Run fast-asd on local with gpu / cpu ( It automatically checks to see if your gpu has cuda ), and use it as a python library
Example Usage:
file = path/to/your/file
videotracker = VideoTalkingTracker()
data = videotrack.process(file)
This repository is an optimized, production-ready implementation of active speaker detection. Read more about the research area here.
It contains of two parts:
- The open-source implementation of the active speaker detection application that runs on the Sieve platform.
- The standalone, optimized implementation of TalkNet, a leading model for active speaker detection.
The TalkNet implementation significantly improve on the original primarily from the perspective of performance. The pre-processing and post-processing steps are faster and it support variable frame-rate videos (not just 25 FPS like the original). The active speaker detection implementation is a further productionized version of this that parallelizes processing through TalkNet and a separate standalone face detection model to provide faster, higher-quality speaker tracking and detection results.
If you plan to just use the standalone implementation of TalkNet, follow the steps below:
- go to the
talknet
directory - run
pip install -r requirements.txt
- run
python main.py
You can change the input video file being used by modifying the main
function in main.py
.
The easiest way to run active speaker detection is to use the version already deployed on the Sieve platform available here.
While the core application can be run locally, it still calls public functions available on Sieve, such at the YOLO object detection model so you will need to sign up for a free account and get an API key. You can do so here.
After you've signed up and run sieve login
, you can run main.py
from the root directory of this repository to run the active speaker detection application.