This repository contains the python implementation for the paper "Uncertainty Estimation for Sound Source Localization with Deep Learning".
- Source signals: LibriSpeech
- Noise signals: Noise92X
- The real-world dataset: LOCATA
These datasets mentioned above can be downloaded from this OneDrive link.
The data directory structure is shown as follows:
.
|---data
|---LibriSpeech
|---dev-clean
|---test-clean
|---train-clean-100
|---NoiSig
|---test
|---train
|---dev
Note: The data/
file does not have to be within your project, you can put it somewhere you want. Please remembet to fill the correct data path in config/tcrnn.yaml
.
We strongly recommend that you can use VSCode and Docker for this project, it can save you much time😁! Note that the related configurations has already been within .devcontainer
. The detail information can be found in this Tutorial_for_Vscode&Dokcer.
The environment:
- cuda:11.8.0
- cudnn: 8
- python: 3.10
- pytorch: 2.2.0
- pytorch lightning: 2.2
The realted configurations are all saved in config/
.
- The
data_simu.yaml
is used to configure the data generation. - The
tcrnn.yaml
is used to configure the dataloader, model training & test.
You can change the value of these items based on your need.
Note: Do not forget to intall gpuRIR and webrtcvad.
-
Inference We provide the checkpoint to help you reproduce the results represented in the paper. ckpt download
-
Data Generation
Generate the training data:
python data_simu.py DATA_SIMU.TRAIN=True DATA_SIMU.TRAIN_NUM=10000
In the same way, you can also generate the validation and test datasets by changing the DATA_SIMU.TRAIN=True
to DATA_SIMU.DEV=True
or DATA_SIMU.TEST=True
.
- Model Training
python main_crnn.py fit --config /workspaces/TCRNN/config/tcrnn.yaml
The parameter for --config
should point to your config file path.
- Model Evaluation
- Change the
ckpt_path
in theconfig/tcrnn.yaml
to the trained model weight. - Use Multiple GPUs or Single GPU to test the model performance.
python main_crnn.py test --config /workspaces/TCRNN/config/tcrnn.yaml
If you want to evaluate the model using the Single GPU, you can change the value of the devices
from "0,1"
to "0,"
in the config/tcrnn.yaml
.
If you find our work useful in your research, please consider citing:
@article{pi2025uncertainty,
author={Pi, Rendong and Yu, Xiang},
journal={IEEE Transactions on Instrumentation and Measurement},
title={Uncertainty Estimation for Sound Source Localization With Deep Learning},
year={2025},
volume={74},
number={},
pages={1-12},
doi={10.1109/TIM.2024.3522632}
}
@inproceedings{pi2024tssl,
title={TSSL: Trusted Sound Source Localization},
author={Pi, Rendong and Song, Yang and Li, Linfeng and Yu, Xiang and Cheng, Li},
booktitle={INTER-NOISE and NOISE-CON Congress and Conference Proceedings},
volume={270},
number={11},
pages={941--949},
year={2024},
organization={Institute of Noise Control Engineering}
}
This repository adapts and integrates from some wonderful works, shown as follows: