Mod: Update doc.

Labbeti · Mar 23, 2024 · af92caa · af92caa
1 parent 10367cb
commit af92caa
Showing 1 changed file with 45 additions and 2 deletions.
diff --git a/docs/data_subsets.rst b/docs/data_subsets.rst
@@ -5,9 +5,9 @@ AudioCaps
 ########################
 The original AudioCaps dataset contains only 3 subsets : `train`, `val` and `test`.
 
-A fourth subset named `train_v2` is another version of the train subset where captions has been manually corrected or deleted. For more details, see paper "CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding".
+A fourth subset named `train_v2` is another version of the train subset where captions has been manually corrected or deleted. For more details, see paper `"CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding" <https://arxiv.org/abs/2309.00454>`_.
 
-Clotho 
+Clotho
 ########################
 Clotho contains 7 subsets:
 
@@ -19,6 +19,23 @@ Clotho contains 7 subsets:
 - `dcase_t2a_audio` : contains 1K audio files without captions used in the DCASE challenge task 6b (Text-to-Audio retrieval),
 - `dcase_t2a_captions` : contains 1K captions (queries) without audios files used in the DCASE challenge task 6b (Text-to-Audio retrieval).
 
+In the DCASE challenge for Audio Captioning, organizers followed a `different convention <https://dcase.community/challenge2022/task-automatic-audio-captioning#development-validation-and-evaluation-datasets-of-clotho>`_ about the subsets names.
+
+.. list-table:: Clotho subsets names
+   :header-rows: 1
+
+   * - Clotho convention
+     - DCASE convention
+   * - dev
+     - development-training
+   * - val
+     - development-validation
+   * - eval
+     - development-testing
+   * - dcase_aac_test
+     - evaluation (-testing)
+   * - dcase_aac_analysis
+     - analysis
 
 MACS
 ########################
@@ -37,3 +54,29 @@ WavCaps contains 6 subsets:
 
 Since WavCaps does not contains validation or testing subsets, all of their data is used as additional training data.
 The subsets as_noac and `freesound_no_clotho` are provided to avoid biases when evaluating on AudioCaps or Clotho datasets.
+
+Datasets overlaps
+########################
+Audio-Text datasets typically comme from other audio classification datasets or similar websites, which might lead to overlaps that can create data leaks in your training.
+Here is a list of known overlaps between differents sound events that should be aware of:
+
+.. list-table:: Clotho subsets names
+   :header-rows: 1
+
+   * - Dataset A
+     - Dataset B
+     - Proportion of A in B (%)
+   * - AudioCaps
+     - AudioSet-train
+     - 100
+   * - Clotho
+     - FSD50K
+     - 5.4
+   * - AudioCaps
+     - WavCaps
+     - 17.6
+   * - Clotho
+     - WavCaps
+     - 89.0
+
+If you do not take this overlaps into account, you might overestimate your results of your AAC model.