Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSD50K Speech Model Fine-tuning Tutorial #201

Closed
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
3488fb2
Upload whole project.
FlorentMeyer Oct 22, 2022
92410db
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 22, 2022
1e3fb82
Clean Colab notebook
FlorentMeyer Oct 26, 2022
f8bce98
Remove useless brackets, add gdown requirement
FlorentMeyer Oct 26, 2022
f62fea4
Merge branch 'fsd50K_speech_model_finetuning' of github.com:FlorentMe…
FlorentMeyer Oct 26, 2022
b5e9f1e
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 26, 2022
5482d08
Apply suggestions from code review
Borda Nov 4, 2022
ee62851
Remove torch.cuda.device_count check
FlorentMeyer Nov 5, 2022
78a9672
Replace sklearn.metrics with torchmetrics
FlorentMeyer Nov 7, 2022
344c7ab
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 7, 2022
ee68dfc
Remove cells associated with torch.cuda.device_count check
FlorentMeyer Nov 7, 2022
5a0b5a0
Replace average_precision with multilabel_average_precision
FlorentMeyer Nov 7, 2022
e63873c
Merge branch 'fsd50K_speech_model_finetuning' of github.com:FlorentMe…
FlorentMeyer Nov 7, 2022
cce4022
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 7, 2022
a6b0791
Merge branch 'main' into fsd50K_speech_model_finetuning
Borda Jan 1, 2023
b79986f
Apply suggestions from code review
Borda Jan 1, 2023
223d4ce
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 1, 2023
c8f74fc
Merge branch 'main' into fsd50K_speech_model_finetuning
Borda Jan 4, 2023
9794cfc
Merge branch 'main' into fsd50K_speech_model_finetuning
Borda Aug 14, 2023
1577e9e
Merge branch 'main' into fsd50K_speech_model_finetuning
Borda Jul 23, 2024
22000ef
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 23, 2024
7aab773
Merge branch 'main' into fsd50K_speech_model_finetuning
Borda Jul 26, 2024
3f5e630
requirements
Borda Jul 26, 2024
df23aa3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions lightning_examples/fsd50K-speech-model-finetuning/.meta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
title: FSD50K Speech Model Fine-tuning Tutorial
author: Florent Meyer ([email protected])
created: 2022-10-22
license: CC BY-SA
description: |
This notebook will walk you through the fine-tuning
of a Transformer-based speech embedder (e.g. wav2vec 2.0 [https://arxiv.org/abs/2006.11477])
on a 500-element subset of the original FSD50K dataset [https://zenodo.org/record/4060432],
using the following repository: [https://github.com/FlorentMeyer/fsd50k_speech_model_finetuning].
The FSD50K dataset is licensed under CC BY 4.0,
and its subset used here only includes CC0 1.0 audio samples.
requirements:
- git+https://github.com/FlorentMeyer/fsd50k_speech_model_finetuning
- gdown
- numpy
- pandas
- scikit-learn
- torchaudio
- transformers
accelerator:
- GPU
Original file line number Diff line number Diff line change
@@ -0,0 +1,271 @@
# ---
# jupyter:
# jupytext:
# formats: ipynb,py:percent
# text_representation:
# extension: .py
# format_name: percent
# format_version: '1.3'
# jupytext_version: 1.14.1
# kernelspec:
# display_name: Python 3
# name: python3
# ---
Borda marked this conversation as resolved.
Show resolved Hide resolved

# %% [markdown] id="CI0JECKA9AnY"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets remove all the ID

# # README
#
# MWE of fine-tuning a Transformer-based speech embedder (e.g. [wav2vec 2.0](https://arxiv.org/abs/2006.11477)) on a subset of FSD50K using `pytorch_lightning` and HuggingFace `transformers`.
#
# Please refer to this executable [Colab notebook](https://colab.research.google.com/drive/1NddRCV1BtwgK6tvnylkLHY8d7t4OhAEw?usp=sharing) importing the code from this repo as well as a 500-element subset of the original FSD50K dataset for a concrete train+test example.
#
# Note: intended as an editable incentive for jumping into FSD50K and the Pytorch-Lightning+HuggingFace framework, and as a showcase for an end-of-studies project -- choices have been made and some logic has been altered to (greatly) reduce the size of the original code.
#
Borda marked this conversation as resolved.
Show resolved Hide resolved
# Attribution and licenses:
# - [The FSD50K dataset](https://zenodo.org/record/4060432) is licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
# - The 500-element subset used here only includes [CC0 1.0](http://creativecommons.org/publicdomain/zero/1.0/) audio samples

# %% [markdown] id="Y3ptRruWL2OP"
# # Check GPU availability

# %% id="L-wt4ld74fjq"
import torch

# %% id="iLzcZH-DL3Sz"
if torch.cuda.device_count() < 1:
raise ValueError("Please run this notebook inside a GPU environment.")
FlorentMeyer marked this conversation as resolved.
Show resolved Hide resolved

# %% [markdown] id="NxjeSJxGw_9S"
# # Init

# %% id="e1LjqtE67h6a"
# !pip install pytorch_lightning
# !pip install transformers

# %% id="XzeAP33RilCh"
# !pip install git+https://github.com/FlorentMeyer/fsd50k_speech_model_finetuning

# %% id="kjazzR__WBGh"
import os
import os.path as osp

import pytorch_lightning as pl
from fsd50k_speech_model_finetuning.data_preparation_inspection import (
CollatorVariableLengths,
FSD50KDataDict,
FSD50KDataModule,
gather_preds,
get_preds_fpaths,
get_preds_max_logits_indices,
inspect_data,
sort_highest_logits,
tokens_to_names,
)
from fsd50k_speech_model_finetuning.model_architecture import (
Classifier,
EmbedderClassifier,
EmbedderHF,
EmbeddingsMerger,
Unfreeze,
)
from sklearn.metrics import average_precision_score
from torch import nn
from torch.optim import Adam

# %% id="vz8PAflMXFam"
from transformers import Wav2Vec2Model, logging

logging.set_verbosity_error()

# %% [markdown] id="_3-Aacr3KFOv"
# # Configure whole pipeline

# %% [markdown] id="RaTWmuW1WAGt"
# ## Check embedder layers names to unfreeze

# %% [markdown] id="z_XRcQmFWX47"
# (Chosen layers are to be put in `FULL_CONFIG["unfreeze"]` along with the epoch at which to unfreeze.)

# %% id="Wf6FpgQiKc_W"
wav2vec2 = Wav2Vec2Model.from_pretrained("facebook/wav2vec2-base-960h")
for n, _ in wav2vec2.named_parameters():
print(n)

# %% [markdown] id="aU9jI2MkV6Ay"
# ## Define the configuration dict

# %% id="VMpkjV0lKHxA"
LOG_INTERVAL_SAMPLES = 10

FULL_CONFIG = {
"seed": 42,
"datamodule_config": {
"datadict_prm": {
"dpath_data": osp.join(os.getcwd(), "dataset"),
},
"batch_size": 4,
"collate_cls": CollatorVariableLengths,
"shuffle": True,
"drop_last": True,
"dataset_prm": {
"orig_sr": 44_100,
"goal_sr": 16_000,
},
"pin_memory": True,
"num_workers": 10,
},
"model_config": {
"embedder_cls": EmbedderHF,
"embedder_prm": {
"model_name": Wav2Vec2Model,
"hubpath_weights": "facebook/wav2vec2-base-960h",
},
"embeddings_merger_cls": EmbeddingsMerger,
"embeddings_merger_prm": {
"red_T": "mean",
"red_L": "mean",
"Ls": [_ for _ in range(12)],
},
"classifier_cls": Classifier,
"classifier_prm": {
"in_size": 768,
"activation": nn.ReLU,
"hidden_size": 512,
"normalization": nn.BatchNorm1d,
},
"loss_cls": nn.BCEWithLogitsLoss,
"loss_prm": {},
"optimizer_cls": Adam,
"optimizer_prm": {
"lr": 1e-5,
},
"unfreeze": {
"encoder.layers.11": 1,
},
},
"trainer_config": {
"max_epochs": 3,
"auto_select_gpus": True,
"accelerator": "gpu",
"devices": 1,
"check_val_every_n_epoch": 1,
"precision": 16,
"callbacks": [
pl.callbacks.ModelCheckpoint(
filename="{epoch}-{val_loss:.5f}",
save_top_k=-1,
monitor="val_loss",
mode="min",
),
Unfreeze(),
],
"logger": pl.loggers.TensorBoardLogger(
save_dir="tb_logs",
),
},
}

# %% [markdown] id="dUGV1hdbVsUV"
# ## Set seed and add params deduced from user's configuration

# %% id="uNNvu1Jd5kIH"
pl.utilities.seed.seed_everything(FULL_CONFIG["seed"])
Borda marked this conversation as resolved.
Show resolved Hide resolved
FULL_CONFIG["trainer_config"]["log_every_n_steps"] = (
LOG_INTERVAL_SAMPLES // FULL_CONFIG["datamodule_config"]["batch_size"]
)

# %% [markdown] id="daXR-HHz4ze7"
# # Prepare (reduced) dataset

# %% [markdown] id="KZ9vf9o4UWEg"
# ## Download and unzip

# %% id="pqQlayizo8YX"
# !gdown 1cOcOEK56p6k2RNbM-10QFHOD4jenqHym -O /content/
# !unzip './FSD50K_500.zip' -d './dataset'

# %% [markdown] id="W-vcoLY1RqxH"
# ## Inspect data

# %% id="S89t3R0zi0nj"
fsd50k_datadict = FSD50KDataDict(**FULL_CONFIG["datamodule_config"]["datadict_prm"])

# %% id="F1BlO17ViNew"
train_datadict = fsd50k_datadict.get_dict("train")

# %% [markdown] id="N4SIRhKjgqw8"
# Add a dict entry containing the labels as strings for inspection.

# %% id="LiuSKBUThw1O"
train_datadict["ys_true_names"] = tokens_to_names(train_datadict["ys_true"], fsd50k_datadict.token_to_name)

# %% id="UwDU12Ewj-fA"
inspect_data(
datadict=train_datadict,
show_keys=["paths", "ys_true_names"],
samples_indices=range(5),
)

# %% [markdown] id="wVe_yrk54Axz"
# # Launch fine-tuning

# %% id="BP7HBa7C87IY"
# Inside an interactive environment, logs could be observed using:
# # %load_ext tensorboard
# # %tensorboard --logdir ./tb_logs

# %% id="dZl2TdlTxUkF"
model = EmbedderClassifier(**FULL_CONFIG["model_config"])

datamodule = FSD50KDataModule(**FULL_CONFIG["datamodule_config"])

trainer = pl.Trainer(**FULL_CONFIG["trainer_config"])

trainer.fit(model, datamodule=datamodule)

# %% [markdown] id="jzVXKW-L4E2R"
# # Evaluate performance

# %% [markdown] id="bPbyVevnZXTX"
# ## Predict on test data

# %% id="6VTZHQDw2y5p"
preds = trainer.predict(ckpt_path="best", datamodule=datamodule)

# %% id="FM0fc3UbSp-2"
preds = gather_preds(preds)
rohitgr7 marked this conversation as resolved.
Show resolved Hide resolved

# %% [markdown] id="cD556ynvAbYa"
# ## Compute metrics

# %% id="zlTooqqp8FWk"
mAP_micro = average_precision_score(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time I wrote the code, torchmetrics.functional.average_precision's target took "integer labels" therefore not accepting multi-hot labels. Just let me check whether this was fixed and if I get the same results as with scikit-learn!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, the new implementations of multilabel_average_precision give the same results as scikit-learn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use that :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, it looks like you have true values for preds, I'd recommend using test_step instead to show the metrics.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? Something like this?

With on_step=False, on_epoch=True to only log the end of the epoch according to https://pytorch-lightning.readthedocs.io/en/stable/extensions/logging.html#logging-from-a-lightningmodule:

The above config for validation applies for test hooks as well.

Suggested change
mAP_micro = average_precision_score(
# In __init__:
self.mAP = torchmetrics.classification.MultilabelAveragePrecision()
# In test_step:
self.mAP(preds, y)
self.log('mAP', self.mAP)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the class version of torchmetrics should be prefered to functional I'd say?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay.. it's fine.. let's use functional metrics here since you already have all the targets and predictions.
modular metrics are useful, when you are aggregating the metrics, let's say on step-level

preds["ys_true"],
preds["logits"],
average="micro",
)

# %% id="D7dpSPpf9uKS"
print("mAP_micro:", mAP_micro)

# %% [markdown] id="oqwX4z1RT8Uw"
# ## Explore samples with highest prediction scores

# %% [markdown] id="SYH5XclifZpJ"
# Retrieve audio file paths from their IDs.
#
# For each of the samples on which a prediction was made, rank (for example the first 4, hence the `"logits_4_highest"` key) highest confidence logits with their corresponding class names.
#
# As an example for choosing some samples among the 100 in the test set, rank samples according to the highest logit they contain to inspect audios for which the model seemed confident -- as an alternative, one could also choose a handful of samples at random.

Borda marked this conversation as resolved.
Show resolved Hide resolved
# %% id="N3Dk23p_O-38"
preds = get_preds_fpaths(preds)
preds = sort_highest_logits(preds, fsd50k_datadict.token_to_name, num_classes=4)
preds_max_logits_indices = get_preds_max_logits_indices(preds)

# %% id="Vnyr_f7tmNYJ"
inspect_data(
rohitgr7 marked this conversation as resolved.
Show resolved Hide resolved
datadict=preds,
show_keys=["paths", "logits_4_highest", "ys_true_names"],
samples_indices=preds_max_logits_indices[:5],
)