Dataset Structure

Folder Structure

.
└── multi/
    ├── audio/
    │   ├── train/
    │   │   └── {clip-id}/
    │   │       ├── speech.flac
    │   │       ├── music.flac
    │   │       ├── sfx.flac
    │   │       ├── sfx_fg.flac
    │   │       ├── sfx_bg.flac
    │   │       └── mixture.flac
    │   ├── val/
    │   │   └── {clip-id}/
    │   │       └── ...
    │   └── test/
    │       └── {clip-id}/
    │           └── ...
    ├── manifest/
    │   ├── train/
    │   │   └── {clip-id}/
    │   │       ├── speech.csv
    │   │       ├── music.csv
    │   │       ├── sfx_fg.csv
    │   │       └── sfx_bg.csv
    │   ├── val/
    │   │   └── {clip-id}/
    │   │       └── ...
    │   └── test/
    │       └── {clip-id}/
    │           └── ...
    └── audio_metadata/
        ├── train/
        │   └── {clip-id}.csv
        ├── val/
        │   └── {clip-id}.csv
        └── test/
            └── {clip-id}.csv

Audio Files

The audio files are mono and 60 seconds in duration. All files are sampled at 48 kHz with a bit depth of 24 bits. The audio files are provided in lossless FLAC format to reduce the archive size. You can use

ffmpeg -i input.flac -c:a pcm_s24le output.wav

to convert the audio files back to wav.

Manifests

The manifest files are CSV files with each row representing a sound event. Each CSV contains the following columns

file: path to the raw audio event
start_sample, start_seconds: start time relative to the track
length_sample, length_seconds: duration relative to the track
end_seconds: end time relative to the track
segment_start_sampl: start time relative to the raw file
lufs: Nominal event loudness in LKFS.
submix_lufs: Actual track loudness in LKFS (same across all rows)
submix_lufs_target: Nominal track loudness in LKFS (same across all rows)

Example

file,start_sample,length_sample,segment_start_sample,start_seconds,length_seconds,end_seconds,lufs,submix_lufs,submix_lufs_target
speech-kazakh-slr140/audio/full/48k/test/878_188.wav,0,374976,0,0.0,7.812,7.812,-20.631814741589345,-20.3484730207178,-20.3484730207178
speech-yoruba-slr86-google/audio/full/48k/test/yom_02484_01663235147.wav,364685,184320,0,7.597604166666667,3.84,11.437604166666667,-28.772182172953954,-20.3484730207178,-20.3484730207178
speech-indic-slr-google/audio/full/48k/test/ban_02194_00413042161.wav,678539,192512,0,14.136229166666666,4.010666666666666,18.146895833333332,-26.709850320179136,-20.3484730207178,-20.3484730207178
speech-english-slr12-librispeech-hq/audio/clean-100h/48k/test/7021/79730/7021-79730-0007.wav,1005221,598080,0,20.942104166666667,12.46,33.40210416666667,-31.115058841609176,-20.3484730207178,-20.3484730207178
speech-chinese-slr93-aishell3/audio/full/48k/test/SSB08170448.wav,1732212,145445,0,36.08775,3.030104166666667,39.11785416666667,-24.0928698524041,-20.3484730207178,-20.3484730207178
speech-english-slr83-google-british-isles/audio/full/48k/test/nom_07508_01121578934.wav,1871093,323584,0,38.98110416666667,6.741333333333333,45.7224375,-27.721156680891227,-20.3484730207178,-20.3484730207178
speech-chinese-slr93-aishell3/audio/full/48k/test/SSB13400390.wav,2160552,242309,0,45.0115,5.048104166666667,50.05960416666667,-13.796885474819632,-20.3484730207178,-20.3484730207178
speech-indic-slr-google/audio/full/48k/test/mrt_04310_01923290054.wav,2383229,417792,0,49.65060416666667,8.704,58.35460416666667,-21.24941089227363,-20.3484730207178,-20.3484730207178
speech-chinese-slr93-aishell3/audio/full/48k/test/SSB07360485.wav,2795026,73561,0,58.229708333333335,1.5325208333333333,59.762229166666664,-30.01494917037914,-20.3484730207178,-20.3484730207178

Audio Metadata

The audio metadata file lists the loudness and peak information for each stem.

Example

,loudness_integrated,true_peak,naive_peak
speech,-25.499287264933606,-5.192297023017089,-5.186877826519533
music,-36.3792485121433,-18.407167845133067,-18.407241622817892
sfx_fg,-30.47502611345025,-1.9941356820023546,-1.9990573851479834
sfx_bg,-44.26378922558933,-14.588409210045402,-14.586635164569977
mixture,-25.359276162635503,-1.1463566178057358,-1.230612943333371

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dataset Structure

Folder Structure

Audio Files

Manifests

Audio Metadata

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally