From af92caa4f1870c25283c822b05acdb1ec8055988 Mon Sep 17 00:00:00 2001 From: Labbeti Date: Sat, 23 Mar 2024 10:49:54 +0100 Subject: [PATCH] Mod: Update doc. --- docs/data_subsets.rst | 47 +++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 45 insertions(+), 2 deletions(-) diff --git a/docs/data_subsets.rst b/docs/data_subsets.rst index 63548ad..e2b8019 100644 --- a/docs/data_subsets.rst +++ b/docs/data_subsets.rst @@ -5,9 +5,9 @@ AudioCaps ######################## The original AudioCaps dataset contains only 3 subsets : `train`, `val` and `test`. -A fourth subset named `train_v2` is another version of the train subset where captions has been manually corrected or deleted. For more details, see paper "CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding". +A fourth subset named `train_v2` is another version of the train subset where captions has been manually corrected or deleted. For more details, see paper `"CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding" `_. -Clotho +Clotho ######################## Clotho contains 7 subsets: @@ -19,6 +19,23 @@ Clotho contains 7 subsets: - `dcase_t2a_audio` : contains 1K audio files without captions used in the DCASE challenge task 6b (Text-to-Audio retrieval), - `dcase_t2a_captions` : contains 1K captions (queries) without audios files used in the DCASE challenge task 6b (Text-to-Audio retrieval). +In the DCASE challenge for Audio Captioning, organizers followed a `different convention `_ about the subsets names. + +.. list-table:: Clotho subsets names + :header-rows: 1 + + * - Clotho convention + - DCASE convention + * - dev + - development-training + * - val + - development-validation + * - eval + - development-testing + * - dcase_aac_test + - evaluation (-testing) + * - dcase_aac_analysis + - analysis MACS ######################## @@ -37,3 +54,29 @@ WavCaps contains 6 subsets: Since WavCaps does not contains validation or testing subsets, all of their data is used as additional training data. The subsets as_noac and `freesound_no_clotho` are provided to avoid biases when evaluating on AudioCaps or Clotho datasets. + +Datasets overlaps +######################## +Audio-Text datasets typically comme from other audio classification datasets or similar websites, which might lead to overlaps that can create data leaks in your training. +Here is a list of known overlaps between differents sound events that should be aware of: + +.. list-table:: Clotho subsets names + :header-rows: 1 + + * - Dataset A + - Dataset B + - Proportion of A in B (%) + * - AudioCaps + - AudioSet-train + - 100 + * - Clotho + - FSD50K + - 5.4 + * - AudioCaps + - WavCaps + - 17.6 + * - Clotho + - WavCaps + - 89.0 + +If you do not take this overlaps into account, you might overestimate your results of your AAC model.