Version 0.4.0

Labbeti · Sep 25, 2023 · a467d9c · a467d9c
1 parent 5213354
commit a467d9c
Show file tree

Hide file tree

Showing 49 changed files with 114,155 additions and 1,790 deletions.
diff --git a/.github/workflows/python-package-pip.yaml b/.github/workflows/python-package-pip.yaml
@@ -8,6 +8,14 @@ on:
   pull_request:
     branches: [ main, dev ]
 
+env:
+  CACHE_NUMBER: 0  # increase to reset cache manually
+
+# Cancel workflow if a new push occurs
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
+  cancel-in-progress: true
+
 jobs:
   build:
     runs-on: ${{ matrix.os }}
@@ -23,21 +31,28 @@ jobs:
       uses: actions/checkout@v2
 
     - name: Set up Python ${{ matrix.python-version }}
-      uses: actions/setup-python@v2
+      uses: actions/setup-python@v4
       with:
         python-version: ${{ matrix.python-version }}
         cache: 'pip'
 
     - name: Install package
+      shell: bash
+      # note: ${GITHUB_REF##*/} gives the branch name
+      # note 2: dev is NOT the branch here, but the dev dependencies
       run: |
-        python -m pip install -e .[dev]
+        python -m pip install "aac-datasets[dev] @ git+https://github.com/Labbeti/aac-datasets@${GITHUB_REF##*/}"
 
     - name: Install soundfile for torchaudio
       run: |
         # For soundfile dep
         sudo apt-get install libsndfile1
 
     # --- TESTS ---
+    - name: Compile python files
+      run: |
+        python -m compileall src
+
     - name: Lint with flake8
       run: |
         python -m flake8 --config .flake8 --exit-zero --show-source --statistics src
@@ -49,3 +64,7 @@ jobs:
     - name: Print install info
       run: |
         aac-datasets-info
+  
+    - name: Test with pytest
+      run: |
+        python -m pytest -v
diff --git a/.gitignore b/.gitignore
@@ -132,3 +132,5 @@ dmypy.json
 .vscode/
 
 examples/CLOTHO_v2.1
+core-python*
+core-srun*
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,19 @@
 
 All notable changes to this project will be documented in this file.
 
+## [0.4.0] 2023-09-25
+### Added
+- First experimental implementation of **WavCaps** dataset.
+- Subsets `dcase_t2a_audio` and `dcase_t2a_captions` from the DCASE Challenge task 6b, in Clotho dataset.
+- Subset `train_v2` for AudioCaps dataset.
+- Dataset cards as separate dataclasses for each dataset.
+- Get and set global user paths for root, ffmpeg and ytdl.
+- Base class for all datasets to simplify manipulation of loaded data.
+
+### Changed
+- Rename `test` subset to `dcase_aac_test`, `analysis` subset to `dcase_aac_analysis` from the DCASE Challenge task 6a, in Clotho dataset.
+- Function `get_install_info` now returns `package_path`.
+
 ## [0.3.3] 2023-05-11
 ### Added
 - Script check.py now check if the audio files exists.

diff --git a/CITATION.cff b/CITATION.cff
@@ -22,5 +22,5 @@ keywords:
   - captioning
   - audio-captioning
 license: MIT
-version: 0.3.3
-date-released: '2023-05-11'
+version: 0.4.0
+date-released: '2023-09-25'
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -2,3 +2,5 @@
 recursive-include src *.py
 global-exclude *.pyc
 global-exclude __pycache__
+
+recursive-include data *.csv
diff --git a/README.md b/README.md
@@ -12,7 +12,7 @@
     <img src='https://readthedocs.org/projects/aac-datasets/badge/?version=stable&style=for-the-badge' alt='Documentation Status' />
 </a>
 
-Audio Captioning unofficial datasets source code for **AudioCaps** [[1]](#audiocaps), **Clotho** [[2]](#clotho), and **MACS** [[3]](#macs), designed for PyTorch.
+Audio Captioning unofficial datasets source code for **AudioCaps** [[1]](#audiocaps), **Clotho** [[2]](#clotho), **MACS** [[3]](#macs), and **WavCaps** [[4]](#wavcaps), designed for PyTorch.
 
 </div>
 
@@ -21,6 +21,11 @@ Audio Captioning unofficial datasets source code for **AudioCaps** [[1]](#audioc
 pip install aac-datasets
 ```
 
+If you want to check if the package has been installed and the version, you can use this command:
+```bash
+aac-datasets-info
+```
+
 ## Examples
 
 ### Create Clotho dataset
@@ -54,16 +59,16 @@ for batch in dataloader:
 ## Datasets stats
 Here is the statistics for each dataset :
 
-| | AudioCaps | Clotho | MACS |
-|:---:|:---:|:---:|:---:|
-| Subsets | train, val, test | dev, val, eval, test, analysis | full |
-| Sample rate (Hz) | 32,000 | 44,100 | 48,000 |
-| Estimated size (GB) | 43 | 27 | 13 |
-| Audio source | AudioSet (YouTube) | FreeSound | TAU Urban Acoustic Scenes 2019 |
+| | AudioCaps | Clotho | MACS | WavCaps |
+|:---:|:---:|:---:|:---:|:---:|
+| Subsets | train, val, test | dev, val, eval, dcase_aac_test, dcase_aac_analysis, dcase_t2a_audio, dcase_t2a_captions | full | as, as_noac, bbc, fsd, fsd_nocl, sb |
+| Sample rate (kHz) | 32 | 44.1 | 48 | 32 |
+| Estimated size (GB) | 43 | 53 | 13 | 941 |
+| Audio source | AudioSet | FreeSound | TAU Urban Acoustic Scenes 2019 | AudioSet, BBC Sound Effects, FreeSound, SoundBible |
 
-For Clotho, the dev subset should be used for training, val for validation and eval for testing. The test and analysis subsets contains only audio files without labels from the DCASE challenge.
+For Clotho, the dev subset should be used for training, val for validation and eval for testing.
 
-Here is the **train** subset statistics for each dataset :
+Here is the **train** subset statistics for AudioCaps, Clotho and MACS datasets :
 
 | | AudioCaps/train | Clotho/dev | MACS/full |
 |:---:|:---:|:---:|:---:|
@@ -80,41 +85,41 @@ Here is the **train** subset statistics for each dataset :
 <sup>2</sup> The sentences are cleaned (lowercase+remove punctuation) and tokenized using the spacy tokenizer to count the words.
 
 ## Requirements
+
+This package has been developped for Ubuntu 20.04, and it is expected to work on most Linux distributions.
 ### Python packages
 
-The requirements are automatically installed when using pip on this repository.
+Python requirements are automatically installed when using pip on this repository.
 ```
 torch >= 1.10.1
 torchaudio >= 0.10.1
 py7zr >= 0.17.2
 pyyaml >= 6.0
 tqdm >= 4.64.0
+huggingface-hub >= 0.15.1
+numpy >= 1.21.2
 ```
 
 ### External requirements (AudioCaps only)
 
-The external requirements needed to download **AudioCaps** are **ffmpeg** and **youtube-dl**.
+The external requirements needed to download **AudioCaps** are **ffmpeg** and **youtube-dl** (yt-dlp should work too).
 These two programs can be download on Ubuntu using `sudo apt install ffmpeg youtube-dl`.
 
 You can also override their paths for AudioCaps:
 ```python
 from aac_datasets import AudioCaps
-AudioCaps.FFMPEG_PATH = "/my/path/to/ffmpeg"
-AudioCaps.YOUTUBE_DL_PATH = "/my/path/to/youtube_dl"
-dataset = AudioCaps(root=".", download=True)
+dataset = AudioCaps(
+    download=True,
+    ffmpeg_path="/my/path/to/ffmpeg",
+    ytdl_path="/my/path/to/youtube_dl",
+)
 ```
 
 ## Download datasets
 To download a dataset, you can use `download` argument in dataset construction :
 ```python
 dataset = Clotho(root=".", subset="dev", download=True)
 ```
-Or use the corresponding function in the code :
-```python
-from aac_datasets.download import download_clotho
-
-download_clotho(root=".", subsets=["dev"])
-```
 However, if you want to download datasets from a script, you can also use the following command :
 ```bash
 aac-datasets-download --root "." clotho --subsets "dev"
@@ -130,18 +135,21 @@ aac-datasets-download --root "." clotho --subsets "dev"
 #### MACS
 [3] F. Font, A. Mesaros, D. P. W. Ellis, E. Fonseca, M. Fuentes, and B. Elizalde, Proceedings of the 6th Workshop on Detection and Classication of Acoustic Scenes and Events (DCASE 2021). Barcelona, Spain: Music Technology Group - Universitat Pompeu Fabra, Nov. 2021. Available: https://doi.org/10.5281/zenodo.5770113
 
+#### WavCaps
+[1] X. Mei et al., “WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research,” arXiv preprint arXiv:2303.17395, 2023, [Online]. Available: https://arxiv.org/pdf/2303.17395.pdf 
+
 ## Cite the aac-datasets package
 If you use this software, please consider cite it as below :
 
 ```
 @software{
-    Labbe_aac-datasets_2022,
+    Labbe_aac_datasets_2022,
     author = {Labbé, Etienne},
     license = {MIT},
-    month = {05},
+    month = {09},
     title = {{aac-datasets}},
     url = {https://github.com/Labbeti/aac-datasets/},
-    version = {0.3.3},
+    version = {0.4.0},
     year = {2023}
 }
 ```