Skip to content

Commit

Permalink
Mod: Update doc.
Browse files Browse the repository at this point in the history
  • Loading branch information
Labbeti committed Mar 23, 2024
1 parent 10367cb commit af92caa
Showing 1 changed file with 45 additions and 2 deletions.
47 changes: 45 additions & 2 deletions docs/data_subsets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ AudioCaps
########################
The original AudioCaps dataset contains only 3 subsets : `train`, `val` and `test`.

A fourth subset named `train_v2` is another version of the train subset where captions has been manually corrected or deleted. For more details, see paper "CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding".
A fourth subset named `train_v2` is another version of the train subset where captions has been manually corrected or deleted. For more details, see paper `"CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding" <https://arxiv.org/abs/2309.00454>`_.

Clotho
Clotho
########################
Clotho contains 7 subsets:

Expand All @@ -19,6 +19,23 @@ Clotho contains 7 subsets:
- `dcase_t2a_audio` : contains 1K audio files without captions used in the DCASE challenge task 6b (Text-to-Audio retrieval),
- `dcase_t2a_captions` : contains 1K captions (queries) without audios files used in the DCASE challenge task 6b (Text-to-Audio retrieval).

In the DCASE challenge for Audio Captioning, organizers followed a `different convention <https://dcase.community/challenge2022/task-automatic-audio-captioning#development-validation-and-evaluation-datasets-of-clotho>`_ about the subsets names.

.. list-table:: Clotho subsets names
:header-rows: 1

* - Clotho convention
- DCASE convention
* - dev
- development-training
* - val
- development-validation
* - eval
- development-testing
* - dcase_aac_test
- evaluation (-testing)
* - dcase_aac_analysis
- analysis

MACS
########################
Expand All @@ -37,3 +54,29 @@ WavCaps contains 6 subsets:

Since WavCaps does not contains validation or testing subsets, all of their data is used as additional training data.
The subsets as_noac and `freesound_no_clotho` are provided to avoid biases when evaluating on AudioCaps or Clotho datasets.

Datasets overlaps
########################
Audio-Text datasets typically comme from other audio classification datasets or similar websites, which might lead to overlaps that can create data leaks in your training.
Here is a list of known overlaps between differents sound events that should be aware of:

.. list-table:: Clotho subsets names
:header-rows: 1

* - Dataset A
- Dataset B
- Proportion of A in B (%)
* - AudioCaps
- AudioSet-train
- 100
* - Clotho
- FSD50K
- 5.4
* - AudioCaps
- WavCaps
- 17.6
* - Clotho
- WavCaps
- 89.0

If you do not take this overlaps into account, you might overestimate your results of your AAC model.

0 comments on commit af92caa

Please sign in to comment.