Skip to content

Latest commit

 

History

History
31 lines (14 loc) · 576 Bytes

File metadata and controls

31 lines (14 loc) · 576 Bytes

Visually Assisted Self-supervised Audio Speaker Localization and Tracking

Pre-request

python 3.6, pytorch 1.7

Dataset

AV16.3 dataset: https://zenodo.org/record/4449274#.YrQ6v-yZPJ8

Feature Extraction

DSFD: https://github.com/Tencent/FaceDetection-DSFD

pytorch-segmentation: https://github.com/yassouali/pytorch-segmentation

calculating gccphat: https://github.com/smartcameras/AV3T/tree/master/gcf

Data Preparation

make sure to obtain the segmentation results for every image

python gccphat.py

Training and Evaluation

python train.py