VAD-Benchmark: Voice Activity Detection Evaluation Framework

This repository contains code and evaluation tools for the paper:

"Privacy-Preserving Voice Activity Detection: Evaluating AI Model Performance on Domestic Audio"
Gabriel Bibbo, Arshdeep Singh, Mark D. Plumbley
Centre for Vision Speech & Signal Processing (CVSSP), University of Surrey, UK
2026 IEEE International Conference on Acoustics, Speech, and Signal Processing

Requisitos

Python 3.10 (recomendado con conda)
PyTorch instalado según tu plataforma (CPU por defecto) usando el índice oficial de PyTorch
Resto de dependencias vía requirements.txt

PyTorch se instala desde su índice oficial (CPU/CUDA/ROCm). No fijamos PyTorch en requirements.txt para evitar incompatibilidades entre plataformas.

Instalación rápida (automática)

git clone https://github.com/gbibbo/vad_benchmark.git
cd vad_benchmark
chmod +x install.sh
./install.sh              # crea el entorno py310 e instala deps (no descarga datasets)
source activate_vad.sh    # activa el entorno y exporta PYTHONPATH
python test_installation.py

(Opcional) Reusar datasets y modelos ya descargados

Si ya tienes carpetas grandes (p.ej. datasets/chime o models/), puedes reutilizarlas para evitar descargas:

# desde la raíz del repo nuevo
ln -s /ruta/a/tu/otro/repo/models models               # reutiliza pesos/modelos
mkdir -p datasets
ln -s /ruta/a/tu/otro/repo/datasets/chime datasets/chime  # reutiliza CHiME (~3.9GB)

Manual Installation (Alternative)

1. Install PyTorch (CPU by default)

Install PyTorch according to your platform (official guide):

pip install --index-url https://download.pytorch.org/whl/cpu torch torchaudio torchvision

2. Install remaining dependencies

pip install -r requirements.txt

Note: If soundfile gives an error about libsndfile, check the official documentation; on some distributions you may need to install libsndfile from your system package manager.

3. Test installation

python test_installation.py

Quick Start

Ejecutar demos/experimentos

# demo corto
python scripts/run_evaluation.py --config configs/config_demo.yaml

# escenarios del paper (CMF / CMFV)
python scripts/run_evaluation.py --config configs/config_chime_cmf.yaml
python scripts/run_evaluation.py --config configs/config_chime_cmfv.yaml

CHiME-Home Dataset Setup

To reproduce the exact paper results, you need the CHiME-Home dataset:

1. Download CHiME-Home Dataset

Get the CHiME-Home dataset from the CHiME Challenge website:

Descargar CHiME (solo si no lo tienes)

chmod +x download_chime.sh
./download_chime.sh

Or manually:

# Create dataset directory
mkdir -p datasets/chime/chunks

# Download CHiME-Home dataset
# Visit: https://www.chimehome.org/ 
# Or use the direct download link from CHiME organizers
# Extract audio files to: datasets/chime/chunks/

# Expected structure:
# datasets/chime/chunks/
# ├── CR_lounge_220110_0731.s0_chunk0.wav
# ├── CR_lounge_220110_0731.s0_chunk1.wav
# ├── ...
# └── [additional 4-second audio chunks at 16kHz]

2. Alternative: Use Download Script

If available, you can use the provided download script:

# Make download script executable
chmod +x download_chime.sh

# Download dataset automatically
./download_chime.sh

# Verify dataset structure
ls -la datasets/chime/chunks/ | head -10

3. Dataset Requirements

Format: WAV files, 16kHz sample rate
Duration: 4-second chunks
Size: ~1946 files for full evaluation
Scenarios: CMF (Child, Male, Female) and CMFV (+ Television)
Ground Truth: Files in ground_truth/chime/cmf.csv and ground_truth/chime/cmfv.csv

Paper Results

1. Run Paper Evaluations

# Human speech detection (CMF scenario) - Table results from paper
python scripts/run_evaluation.py --config configs/config_chime_cmf.yaml

# Broad vocal content detection (CMFV scenario)  
python scripts/run_evaluation.py --config configs/config_chime_cmfv.yaml

# Run all models on both scenarios
python scripts/run_all_scenarios.py --config configs/config_paper_full.yaml

2. Results Location

Individual metrics: results/metrics_[model].json
Comparison plots: results/comparison_all_models.png
Logs: results/evaluation_[timestamp].log

Ground truth annotations are in ground_truth/chime/

Key Findings

The evaluation results show clear patterns in VAD model behavior:

What the results tell us:

CMF Scenario (detecting human speech): PaSST and AST models work best (F1 = 0.86)
CMFV Scenario (detecting any vocal content): Most models reach F1 = 0.97, making this task easier
ROC Curves show model trade-offs between catching true speech vs avoiding false alarms
Threshold sensitivity varies greatly between models

Model efficiency patterns:

Small models (Silero, WebRTC) offer good value: decent F1 scores with tiny memory footprint
Large models (80M+ parameters) give the best F1 scores but cost much more memory
Sweet spot appears around 24M parameters (EPANNs) for balanced efficiency

VAD Models Tested

The framework tests 8 VAD models across 4 families:

Family	Models	CMF F1-Score
Lightweight VAD	Silero, WebRTC	0.806, 0.708
AudioSet Pre-trained	PANNs, EPANNs, AST, PaSST	0.848, 0.847, 0.860, 0.861
Speech Recognition	Whisper-Tiny, Whisper-Small	0.668, 0.654

Results for CMF scenario (human speech detection)

Run Your Own Tests

This repository includes scripts for deep dive evaluation:

1. Run Tests

# Go to test scripts
cd analysis/scripts/

# Run complete VAD tests  
python analyze_vad_results.py

# Run parameter count vs F1 tests
python analyze_vad_parameters.py

# Compare ground truth versions (if needed)
python compare_gt_old_new.py

2. Generated Outputs

The test scripts create publication-ready figures and metrics:

analysis/data/Figures/
├── f1_vs_threshold_comparison.png          # F1 score comparisons
├── accuracy_vs_threshold_comparison.png    # Accuracy comparisons
├── roc_curves_comparison.png               # ROC curve tests
├── pr_curves_comparison.png                # Precision-Recall curves
├── performance_vs_speed_comparison.png     # F1 vs RTF scatter plots
├── parameter_count_vs_performance_*.png    # Model size vs F1 score
├── performance_summary_cmf.csv             # CMF scenario metrics
├── performance_summary_cmfv.csv            # CMFV scenario metrics
└── parameter_count_analysis.csv            # Efficiency tests

3. What You Get

Side-by-side comparisons: CMF vs CMFV scenario results
Speed tests: Real-Time Factor (RTF) vs F1-score relationships
Efficiency tests: Parameter count vs F1 score trade-offs
Threshold tests: How models behave across different VAD thresholds
ROC/PR Curves: Detailed classification metrics

Notas de compatibilidad

webrtcvad se instala como webrtcvad-wheels para usar ruedas precompiladas (sin compilar C).
soundfile usa libsndfile; en algunos sistemas esta librería del SO puede ser necesaria.

Project Structure

vad_benchmark/
├── install.sh                    # Automatic installer
├── configs/                     # Evaluation setups
│   ├── config_demo.yaml           # Demo with test data
│   ├── config_chime_cmf.yaml      # Paper: Human speech scenario
│   └── config_chime_cmfv.yaml     # Paper: Broad vocal content
├── analysis/                    # Test suite
│   ├── scripts/                   # Test scripts
│   ├── data/                     # Results and ground truth data
│   └── figures/                  # Generated plots and figures
├── ground_truth/               # Paper ground truth annotations  
│   └── chime/                 # CHiME-Home labels (CMF/CMFV)
├── datasets/                   # Dataset directory
│   └── chime/chunks/           # CHiME-Home audio files (download required)
├── src/wrappers/              # VAD model code
├── scripts/                   # Evaluation scripts
├── models/                    # Downloaded model weights
└── results/                   # Output metrics and plots

System Requirements

Python: 3.10+
Storage: 2GB (models + dependencies)
Memory: 4GB RAM recommended
OS: Linux, macOS, Windows (WSL supported)

The installer handles all dependencies including PyTorch (CPU version for stability).

Troubleshooting

Check the installation test: python test_installation.py
Verify dataset structure: ls datasets/chime/chunks/ | wc -l (should show ~1946 files)
Review evaluation logs in results/evaluation_*.log

Citation

@inproceedings{bibbo2025privacy,
  title={Privacy-Preserving Voice Activity Detection: Evaluating AI Model Performance on Domestic Audio},
  author={Bibbo, Gabriel and Singh, Arshdeep and Plumbley, Mark D.},
  booktitle={2026 IEEE International Conference on Acoustics, Speech, and Signal Processing},
  year={2025},
  address={Barcelona, Spain}
}

License

MIT License - see LICENSE file for details.

Repository: https://github.com/gbibbo/vad_benchmark
Paper: 2026 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
analysis		analysis
configs		configs
ground_truth/chime		ground_truth/chime
models		models
scripts		scripts
src		src
test_data		test_data
.gitignore		.gitignore
=0.1.0		=0.1.0
F1_ROC_combined.png		F1_ROC_combined.png
README.md		README.md
activate_vad.sh		activate_vad.sh
download_chime.sh		download_chime.sh
image.png		image.png
install.sh		install.sh
parameter_count_performance.png		parameter_count_performance.png
requirements.no_vad_notorch.txt		requirements.no_vad_notorch.txt
requirements.notorch.txt		requirements.notorch.txt
requirements.txt		requirements.txt
test_installation.py		test_installation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VAD-Benchmark: Voice Activity Detection Evaluation Framework

Requisitos

Instalación rápida (automática)

(Opcional) Reusar datasets y modelos ya descargados

Manual Installation (Alternative)

1. Install PyTorch (CPU by default)

2. Install remaining dependencies

3. Test installation

Quick Start

Ejecutar demos/experimentos

CHiME-Home Dataset Setup

1. Download CHiME-Home Dataset

Descargar CHiME (solo si no lo tienes)

2. Alternative: Use Download Script

3. Dataset Requirements

Paper Results

1. Run Paper Evaluations

2. Results Location

Key Findings

VAD Models Tested

Run Your Own Tests

1. Run Tests

2. Generated Outputs

3. What You Get

Notas de compatibilidad

Project Structure

System Requirements

Troubleshooting

Citation

License

About

Uh oh!

Releases

Packages

Languages

gbibbo/vad_benchmark

Folders and files

Latest commit

History

Repository files navigation

VAD-Benchmark: Voice Activity Detection Evaluation Framework

Requisitos

Instalación rápida (automática)

(Opcional) Reusar datasets y modelos ya descargados

Manual Installation (Alternative)

1. Install PyTorch (CPU by default)

2. Install remaining dependencies

3. Test installation

Quick Start

Ejecutar demos/experimentos

CHiME-Home Dataset Setup

1. Download CHiME-Home Dataset

Descargar CHiME (solo si no lo tienes)

2. Alternative: Use Download Script

3. Dataset Requirements

Paper Results

1. Run Paper Evaluations

2. Results Location

Key Findings

VAD Models Tested

Run Your Own Tests

1. Run Tests

2. Generated Outputs

3. What You Get

Notas de compatibilidad

Project Structure

System Requirements

Troubleshooting

Citation

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages