Skip to content

Commit

Permalink
Version 0.4.0
Browse files Browse the repository at this point in the history
  • Loading branch information
Labbeti committed Sep 25, 2023
1 parent 5213354 commit a467d9c
Show file tree
Hide file tree
Showing 49 changed files with 114,155 additions and 1,790 deletions.
23 changes: 21 additions & 2 deletions .github/workflows/python-package-pip.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,14 @@ on:
pull_request:
branches: [ main, dev ]

env:
CACHE_NUMBER: 0 # increase to reset cache manually

# Cancel workflow if a new push occurs
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

jobs:
build:
runs-on: ${{ matrix.os }}
Expand All @@ -23,21 +31,28 @@ jobs:
uses: actions/checkout@v2

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'

- name: Install package
shell: bash
# note: ${GITHUB_REF##*/} gives the branch name
# note 2: dev is NOT the branch here, but the dev dependencies
run: |
python -m pip install -e .[dev]
python -m pip install "aac-datasets[dev] @ git+https://github.com/Labbeti/aac-datasets@${GITHUB_REF##*/}"
- name: Install soundfile for torchaudio
run: |
# For soundfile dep
sudo apt-get install libsndfile1
# --- TESTS ---
- name: Compile python files
run: |
python -m compileall src
- name: Lint with flake8
run: |
python -m flake8 --config .flake8 --exit-zero --show-source --statistics src
Expand All @@ -49,3 +64,7 @@ jobs:
- name: Print install info
run: |
aac-datasets-info
- name: Test with pytest
run: |
python -m pytest -v
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -132,3 +132,5 @@ dmypy.json
.vscode/

examples/CLOTHO_v2.1
core-python*
core-srun*
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,19 @@

All notable changes to this project will be documented in this file.

## [0.4.0] 2023-09-25
### Added
- First experimental implementation of **WavCaps** dataset.
- Subsets `dcase_t2a_audio` and `dcase_t2a_captions` from the DCASE Challenge task 6b, in Clotho dataset.
- Subset `train_v2` for AudioCaps dataset.
- Dataset cards as separate dataclasses for each dataset.
- Get and set global user paths for root, ffmpeg and ytdl.
- Base class for all datasets to simplify manipulation of loaded data.

### Changed
- Rename `test` subset to `dcase_aac_test`, `analysis` subset to `dcase_aac_analysis` from the DCASE Challenge task 6a, in Clotho dataset.
- Function `get_install_info` now returns `package_path`.

## [0.3.3] 2023-05-11
### Added
- Script check.py now check if the audio files exists.
Expand Down
4 changes: 2 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,5 @@ keywords:
- captioning
- audio-captioning
license: MIT
version: 0.3.3
date-released: '2023-05-11'
version: 0.4.0
date-released: '2023-09-25'
2 changes: 2 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,5 @@
recursive-include src *.py
global-exclude *.pyc
global-exclude __pycache__

recursive-include data *.csv
54 changes: 31 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
<img src='https://readthedocs.org/projects/aac-datasets/badge/?version=stable&style=for-the-badge' alt='Documentation Status' />
</a>

Audio Captioning unofficial datasets source code for **AudioCaps** [[1]](#audiocaps), **Clotho** [[2]](#clotho), and **MACS** [[3]](#macs), designed for PyTorch.
Audio Captioning unofficial datasets source code for **AudioCaps** [[1]](#audiocaps), **Clotho** [[2]](#clotho), **MACS** [[3]](#macs), and **WavCaps** [[4]](#wavcaps), designed for PyTorch.

</div>

Expand All @@ -21,6 +21,11 @@ Audio Captioning unofficial datasets source code for **AudioCaps** [[1]](#audioc
pip install aac-datasets
```

If you want to check if the package has been installed and the version, you can use this command:
```bash
aac-datasets-info
```

## Examples

### Create Clotho dataset
Expand Down Expand Up @@ -54,16 +59,16 @@ for batch in dataloader:
## Datasets stats
Here is the statistics for each dataset :

| | AudioCaps | Clotho | MACS |
|:---:|:---:|:---:|:---:|
| Subsets | train, val, test | dev, val, eval, test, analysis | full |
| Sample rate (Hz) | 32,000 | 44,100 | 48,000 |
| Estimated size (GB) | 43 | 27 | 13 |
| Audio source | AudioSet (YouTube) | FreeSound | TAU Urban Acoustic Scenes 2019 |
| | AudioCaps | Clotho | MACS | WavCaps |
|:---:|:---:|:---:|:---:|:---:|
| Subsets | train, val, test | dev, val, eval, dcase_aac_test, dcase_aac_analysis, dcase_t2a_audio, dcase_t2a_captions | full | as, as_noac, bbc, fsd, fsd_nocl, sb |
| Sample rate (kHz) | 32 | 44.1 | 48 | 32 |
| Estimated size (GB) | 43 | 53 | 13 | 941 |
| Audio source | AudioSet | FreeSound | TAU Urban Acoustic Scenes 2019 | AudioSet, BBC Sound Effects, FreeSound, SoundBible |

For Clotho, the dev subset should be used for training, val for validation and eval for testing. The test and analysis subsets contains only audio files without labels from the DCASE challenge.
For Clotho, the dev subset should be used for training, val for validation and eval for testing.

Here is the **train** subset statistics for each dataset :
Here is the **train** subset statistics for AudioCaps, Clotho and MACS datasets :

| | AudioCaps/train | Clotho/dev | MACS/full |
|:---:|:---:|:---:|:---:|
Expand All @@ -80,41 +85,41 @@ Here is the **train** subset statistics for each dataset :
<sup>2</sup> The sentences are cleaned (lowercase+remove punctuation) and tokenized using the spacy tokenizer to count the words.

## Requirements

This package has been developped for Ubuntu 20.04, and it is expected to work on most Linux distributions.
### Python packages

The requirements are automatically installed when using pip on this repository.
Python requirements are automatically installed when using pip on this repository.
```
torch >= 1.10.1
torchaudio >= 0.10.1
py7zr >= 0.17.2
pyyaml >= 6.0
tqdm >= 4.64.0
huggingface-hub >= 0.15.1
numpy >= 1.21.2
```

### External requirements (AudioCaps only)

The external requirements needed to download **AudioCaps** are **ffmpeg** and **youtube-dl**.
The external requirements needed to download **AudioCaps** are **ffmpeg** and **youtube-dl** (yt-dlp should work too).
These two programs can be download on Ubuntu using `sudo apt install ffmpeg youtube-dl`.

You can also override their paths for AudioCaps:
```python
from aac_datasets import AudioCaps
AudioCaps.FFMPEG_PATH = "/my/path/to/ffmpeg"
AudioCaps.YOUTUBE_DL_PATH = "/my/path/to/youtube_dl"
dataset = AudioCaps(root=".", download=True)
dataset = AudioCaps(
download=True,
ffmpeg_path="/my/path/to/ffmpeg",
ytdl_path="/my/path/to/youtube_dl",
)
```

## Download datasets
To download a dataset, you can use `download` argument in dataset construction :
```python
dataset = Clotho(root=".", subset="dev", download=True)
```
Or use the corresponding function in the code :
```python
from aac_datasets.download import download_clotho

download_clotho(root=".", subsets=["dev"])
```
However, if you want to download datasets from a script, you can also use the following command :
```bash
aac-datasets-download --root "." clotho --subsets "dev"
Expand All @@ -130,18 +135,21 @@ aac-datasets-download --root "." clotho --subsets "dev"
#### MACS
[3] F. Font, A. Mesaros, D. P. W. Ellis, E. Fonseca, M. Fuentes, and B. Elizalde, Proceedings of the 6th Workshop on Detection and Classication of Acoustic Scenes and Events (DCASE 2021). Barcelona, Spain: Music Technology Group - Universitat Pompeu Fabra, Nov. 2021. Available: https://doi.org/10.5281/zenodo.5770113

#### WavCaps
[1] X. Mei et al., “WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research,” arXiv preprint arXiv:2303.17395, 2023, [Online]. Available: https://arxiv.org/pdf/2303.17395.pdf

## Cite the aac-datasets package
If you use this software, please consider cite it as below :

```
@software{
Labbe_aac-datasets_2022,
Labbe_aac_datasets_2022,
author = {Labbé, Etienne},
license = {MIT},
month = {05},
month = {09},
title = {{aac-datasets}},
url = {https://github.com/Labbeti/aac-datasets/},
version = {0.3.3},
version = {0.4.0},
year = {2023}
}
```
Expand Down
Loading

0 comments on commit a467d9c

Please sign in to comment.