diff --git a/notebooks/en/_toctree.yml b/notebooks/en/_toctree.yml index e46efc44..4f84f2a1 100644 --- a/notebooks/en/_toctree.yml +++ b/notebooks/en/_toctree.yml @@ -82,6 +82,8 @@ title: HuatuoGPT-o1 Medical RAG and Reasoning - local: fine_tune_chatbot_docs_synthetic title: Documentation Chatbot with Meta Synthetic Data Kit + - local: optuna_hpo_with_transformers + title: Hyperparameter Optimization with Optuna and Transformers diff --git a/notebooks/en/optuna_hpo_with_transformers.ipynb b/notebooks/en/optuna_hpo_with_transformers.ipynb new file mode 100644 index 00000000..773a84a5 --- /dev/null +++ b/notebooks/en/optuna_hpo_with_transformers.ipynb @@ -0,0 +1,1709 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "08092aa8", + "metadata": {}, + "source": [ + "### Recipe: Hyperparameter Optimization with Optuna and Transformers\n", + "\n", + "_Authored by: [Parag Ekbote](https://github.com/ParagEkbote)_\n", + "\n", + "**Problem:** \n", + "Find the best hyperparameters to fine-tune a lightweight BERT model for text classification on a subset of the IMDB dataset.\n", + "\n", + "**Overview:**\n", + "This recipe demonstrates how to systematically optimize hyperparameters for transformer-based text classification models using automated search techniques. You'll learn to implement HPO using Optuna to find optimal learning rates and weight decay values for fine-tuning BERT on sentiment analysis tasks.\n", + "\n", + "**When to Use This Recipe:**\n", + "\n", + "* You need to fine-tune pre-trained language models for classification tasks.\n", + "\n", + "* Your model performance is plateauing and requires parameter refinement.\n", + "\n", + "* You want to implement systematic, reproducible hyperparameter optimization.\n", + "\n", + "#### Notes\n", + "\n", + "* For detailed guidance on hyperparameter search with Transformers, refer to the [Hugging Face HPO documentation](https://huggingface.co/docs/transformers/en/hpo_train)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a309e1a0", + "metadata": {}, + "outputs": [], + "source": [ + "!pip install -q datasets evaluate transformers optuna wandb scikit-learn nbformat" + ] + }, + { + "cell_type": "markdown", + "id": "eff9ccd6", + "metadata": {}, + "source": [ + "### Prepare Dataset and Set Model\n", + "\n", + "Before you can train and evaluate a sentiment analysis model, you’ll need to prep the dataset. This section ensures that your data is structured and your model is primed for learning from scratch or fine-tuning in the case of BERT.\n", + "\n", + "\n", + "1. **Load the IMDB Dataset** \n", + " Begin by selecting a dataset focused on sentiment classification. IMDB is a well-known benchmark that features movie reviews labeled as either positive or negative.\n", + "\n", + "2. **Select Input and Output Columns** \n", + " Focus only on the essentials: \n", + " - `text` column serves as the input (review content) \n", + " - `label` column serves as the target (0 for negative, 1 for positive sentiment)\n", + "\n", + "3. **Define the Train/Validation Split** \n", + " Choose a consistent sampling strategy by selecting: \n", + " - 2000 examples for training \n", + " - 1000 examples for validation \n", + " Use a fixed random seed when shuffling to ensure reproducibility across sessions.\n", + "\n", + "4. **Tokenize the Dataset** \n", + " Apply a tokenizer compatible with the model you're planning to use. Tokenization converts raw text into numerical format so the model can ingest it effectively. Use batch processing to make this step efficient.\n", + "\n", + "5. **Load an Evaluation Metric** \n", + " Choose “accuracy” as the primary evaluation metric—simple and effective for binary classification tasks like this. It will later help gauge how well your model is learning the difference between positive and negative sentiment.\n", + "\n", + "6. **Initialize a Pretrained BERT Model** \n", + " Select a pretrained BERT-based model tailored for sequence classification tasks. Set the number of output classes to 2 (positive and negative) to align with your sentiment labels. This model will serve as the learner throughout the training process." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2cfb9d5e", + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8ee77ecb1a0f41c2a715840e76cd5727", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "config.json: 0%| | 0.00/285 [00:00" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Run data is saved locally in /teamspace/studios/this_studio/cookbook/notebooks/en/wandb/run-20250625_151029-ivr2ci8c" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Syncing run transformers_optuna_study to Weights & Biases (docs)
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + " View project at https://wandb.ai/ai_novice2005/hf-optuna" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + " View run at https://wandb.ai/ai_novice2005/hf-optuna/runs/ivr2ci8c" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "2a4b6835e125481597ce8871f368aeeb", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "pytorch_model.bin: 0%| | 0.00/17.8M [00:00" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + " View run transformers_optuna_study at: https://wandb.ai/ai_novice2005/hf-optuna/runs/ivr2ci8c
View project at: https://wandb.ai/ai_novice2005/hf-optuna
Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Find logs at: ./wandb/run-20250625_151029-ivr2ci8c/logs" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Tracking run with wandb version 0.20.1" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Run data is saved locally in /teamspace/studios/this_studio/cookbook/notebooks/en/wandb/run-20250625_151042-up8j8xgb" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Syncing run leafy-breeze-5 to Weights & Biases (docs)
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + " View project at https://wandb.ai/ai_novice2005/huggingface" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + " View run at https://wandb.ai/ai_novice2005/huggingface/runs/up8j8xgb" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " [471/471 00:24, Epoch 3/3]\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
EpochTraining LossValidation LossAccuracy
10.6492000.6053740.682000
20.5299000.5282730.751000
30.4407000.5090030.764000

" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[I 2025-06-25 15:11:09,298] Trial 0 finished with value: 0.764 and parameters: {'learning_rate': 7.23655165533393e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.013798094328723032}. Best is trial 0 with value: 0.764.\n", + "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at prajjwal1/bert-tiny and are newly initialized: ['classifier.bias', 'classifier.weight']\n", + "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n" + ] + }, + { + "data": { + "text/html": [], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "

Run history:


eval/accuracy▁▇█
eval/loss█▂▁
eval/runtime█▇▁
eval/samples_per_second▁▂█
eval/steps_per_second▁▂█
train/epoch▁▁▅▅███
train/global_step▁▁▄▄███
train/grad_norm▁█▂
train/learning_rate█▅▁
train/loss█▄▁

Run summary:


eval/accuracy0.764
eval/loss0.509
eval/runtime1.0937
eval/samples_per_second914.299
eval/steps_per_second114.287
total_flos9528652800000.0
train/epoch3
train/global_step471
train/grad_norm13.57101
train/learning_rate0.0
train/loss0.4407
train_loss0.53993
train_runtime26.9493
train_samples_per_second278.3
train_steps_per_second17.477

" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + " View run leafy-breeze-5 at: https://wandb.ai/ai_novice2005/huggingface/runs/up8j8xgb
View project at: https://wandb.ai/ai_novice2005/huggingface
Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Find logs at: ./wandb/run-20250625_151042-up8j8xgb/logs" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Tracking run with wandb version 0.20.1" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Run data is saved locally in /teamspace/studios/this_studio/cookbook/notebooks/en/wandb/run-20250625_151110-1dgqb1s1" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Syncing run radiant-sound-6 to Weights & Biases (docs)
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + " View project at https://wandb.ai/ai_novice2005/huggingface" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + " View run at https://wandb.ai/ai_novice2005/huggingface/runs/1dgqb1s1" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " [60/60 00:18, Epoch 3/3]\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
EpochTraining LossValidation LossAccuracy
10.6831000.6774680.613000
20.6731000.6697550.639000
30.6695000.6676550.630000

" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[I 2025-06-25 15:11:29,907] Trial 1 finished with value: 0.63 and parameters: {'learning_rate': 2.756288216246014e-05, 'per_device_train_batch_size': 128, 'weight_decay': 0.28503663896216014}. Best is trial 0 with value: 0.764.\n", + "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at prajjwal1/bert-tiny and are newly initialized: ['classifier.bias', 'classifier.weight']\n", + "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n" + ] + }, + { + "data": { + "text/html": [], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "

Run history:


eval/accuracy▁█▆
eval/loss█▂▁
eval/runtime▁█▂
eval/samples_per_second█▁▇
eval/steps_per_second█▁▇
train/epoch▁▁▅▅███
train/global_step▁▁▅▅███
train/grad_norm▅█▁
train/learning_rate█▄▁
train/loss█▃▁

Run summary:


eval/accuracy0.63
eval/loss0.66765
eval/runtime1.111
eval/samples_per_second900.116
eval/steps_per_second112.515
total_flos9528652800000.0
train/epoch3
train/global_step60
train/grad_norm0.66353
train/learning_rate0.0
train/loss0.6695
train_loss0.67521
train_runtime19.5595
train_samples_per_second383.445
train_steps_per_second3.068

" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + " View run radiant-sound-6 at: https://wandb.ai/ai_novice2005/huggingface/runs/1dgqb1s1
View project at: https://wandb.ai/ai_novice2005/huggingface
Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Find logs at: ./wandb/run-20250625_151110-1dgqb1s1/logs" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Tracking run with wandb version 0.20.1" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Run data is saved locally in /teamspace/studios/this_studio/cookbook/notebooks/en/wandb/run-20250625_151130-jt5mavd3" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Syncing run ancient-dream-7 to Weights & Biases (docs)
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + " View project at https://wandb.ai/ai_novice2005/huggingface" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + " View run at https://wandb.ai/ai_novice2005/huggingface/runs/jt5mavd3" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " [120/120 00:20, Epoch 3/3]\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
EpochTraining LossValidation LossAccuracy
10.6903000.6884250.553000
20.6891000.6877750.562000
30.6891000.6875760.570000

" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[I 2025-06-25 15:11:52,797] Trial 2 finished with value: 0.57 and parameters: {'learning_rate': 1.2177346043359053e-06, 'per_device_train_batch_size': 64, 'weight_decay': 0.02906341093983704}. Best is trial 0 with value: 0.764.\n", + "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at prajjwal1/bert-tiny and are newly initialized: ['classifier.bias', 'classifier.weight']\n", + "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n" + ] + }, + { + "data": { + "text/html": [], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "

Run history:


eval/accuracy▁▅█
eval/loss█▃▁
eval/runtime▁██
eval/samples_per_second█▁▁
eval/steps_per_second█▁▁
train/epoch▁▁▅▅███
train/global_step▁▁▅▅███
train/grad_norm▆█▁
train/learning_rate█▅▁
train/loss█▁▁

Run summary:


eval/accuracy0.57
eval/loss0.68758
eval/runtime1.0959
eval/samples_per_second912.502
eval/steps_per_second114.063
total_flos9528652800000.0
train/epoch3
train/global_step120
train/grad_norm2.34479
train/learning_rate0.0
train/loss0.6891
train_loss0.68947
train_runtime21.6288
train_samples_per_second346.76
train_steps_per_second5.548

" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + " View run ancient-dream-7 at: https://wandb.ai/ai_novice2005/huggingface/runs/jt5mavd3
View project at: https://wandb.ai/ai_novice2005/huggingface
Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Find logs at: ./wandb/run-20250625_151130-jt5mavd3/logs" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Tracking run with wandb version 0.20.1" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Run data is saved locally in /teamspace/studios/this_studio/cookbook/notebooks/en/wandb/run-20250625_151153-6eexo8uv" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Syncing run grateful-sound-8 to Weights & Biases (docs)
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + " View project at https://wandb.ai/ai_novice2005/huggingface" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + " View run at https://wandb.ai/ai_novice2005/huggingface/runs/6eexo8uv" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " [120/120 00:16, Epoch 3/3]\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
EpochTraining LossValidation LossAccuracy
10.6894000.6867300.570000
20.6870000.6853270.581000
30.6867000.6849040.581000

" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[I 2025-06-25 15:12:12,894] Trial 3 finished with value: 0.581 and parameters: {'learning_rate': 2.973185825213819e-06, 'per_device_train_batch_size': 64, 'weight_decay': 0.09102292466460353}. Best is trial 0 with value: 0.764.\n", + "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at prajjwal1/bert-tiny and are newly initialized: ['classifier.bias', 'classifier.weight']\n", + "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n" + ] + }, + { + "data": { + "text/html": [], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "

Run history:


eval/accuracy▁██
eval/loss█▃▁
eval/runtime▁█▃
eval/samples_per_second█▁▆
eval/steps_per_second█▁▆
train/epoch▁▁▅▅███
train/global_step▁▁▅▅███
train/grad_norm▆█▁
train/learning_rate█▅▁
train/loss█▂▁

Run summary:


eval/accuracy0.581
eval/loss0.6849
eval/runtime1.0808
eval/samples_per_second925.219
eval/steps_per_second115.652
total_flos9528652800000.0
train/epoch3
train/global_step120
train/grad_norm2.30065
train/learning_rate0.0
train/loss0.6867
train_loss0.68768
train_runtime18.9103
train_samples_per_second396.61
train_steps_per_second6.346

" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + " View run grateful-sound-8 at: https://wandb.ai/ai_novice2005/huggingface/runs/6eexo8uv
View project at: https://wandb.ai/ai_novice2005/huggingface
Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Find logs at: ./wandb/run-20250625_151153-6eexo8uv/logs" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Tracking run with wandb version 0.20.1" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Run data is saved locally in /teamspace/studios/this_studio/cookbook/notebooks/en/wandb/run-20250625_151213-5w18j6iv" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "Syncing run hopeful-moon-9 to Weights & Biases (docs)
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + " View project at https://wandb.ai/ai_novice2005/huggingface" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + " View run at https://wandb.ai/ai_novice2005/huggingface/runs/5w18j6iv" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " [120/120 00:17, Epoch 3/3]\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
EpochTraining LossValidation LossAccuracy
10.6890000.6860280.573000
20.6861000.6843370.589000
30.6857000.6838330.597000

" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "[I 2025-06-25 15:12:32,824] Trial 4 finished with value: 0.597 and parameters: {'learning_rate': 3.763988365260261e-06, 'per_device_train_batch_size': 64, 'weight_decay': 0.1502192542358606}. Best is trial 0 with value: 0.764.\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "BestRun(run_id='0', objective=0.764, hyperparameters={'learning_rate': 7.23655165533393e-05, 'per_device_train_batch_size': 16, 'weight_decay': 0.013798094328723032}, run_summary=None)\n" + ] + } + ], + "source": [ + "def optuna_hp_space(trial):\n", + " return {\n", + " \"learning_rate\": trial.suggest_float(\"learning_rate\", 1e-6, 1e-4, log=True),\n", + " \"per_device_train_batch_size\": trial.suggest_categorical(\n", + " \"per_device_train_batch_size\", [16, 32, 64, 128]\n", + " ),\n", + " \"weight_decay\": trial.suggest_float(\"weight_decay\", 0.0, 0.3),\n", + " }\n", + "\n", + "\n", + "best_run = trainer.hyperparameter_search(\n", + " direction=\"maximize\",\n", + " backend=\"optuna\",\n", + " hp_space=optuna_hp_space,\n", + " n_trials=5,\n", + " compute_objective=compute_objective,\n", + " study_name=\"transformers_optuna_study\",\n", + " storage=\"sqlite:///optuna_trials.db\",\n", + " load_if_exists=True\n", + ")\n", + "\n", + "print(best_run)" + ] + }, + { + "cell_type": "markdown", + "id": "26a95ef3", + "metadata": {}, + "source": [ + "### Visualize Results\n", + "\n", + "Once your Optuna study completes its trials, it’s time to peel back the layers and interpret what happened. Visualization brings clarity to how hyperparameters shaped the outcome and uncovers patterns that might otherwise stay buried in raw data.\n", + "\n", + "1. **Track Optimization Progress** \n", + " Use the optimization history to see how objective scores evolved over trials. This helps you understand whether performance steadily improved, plateaued, or oscillated. It’s your window into the pace and trajectory of the search process.\n", + "\n", + "2. **Inspect Training Behavior via Intermediate Values** \n", + " If your model reports evaluation metrics during training (like per epoch), intermediate value plots let you monitor how each trial performed in real time. This is especially valuable for early-stopping decisions and assessing learning stability.\n", + "\n", + "3. **Reveal Key Hyperparameters through Importance Rankings** \n", + " Parameter importance plots uncover which hyperparameters actually mattered—did tweaking the learning rate move the needle, or was batch size the star? Understanding this lets you simplify or refine your future search space." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "a8f14007", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/tmp/ipykernel_16014/3851300317.py:18: ExperimentalWarning: plot_optimization_history is experimental (supported from v2.2.0). The interface can change in the future.\n", + " ax1 = plot_optimization_history(study)\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "

" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/tmp/ipykernel_16014/3851300317.py:23: ExperimentalWarning: plot_intermediate_values is experimental (supported from v2.2.0). The interface can change in the future.\n", + " ax2 = plot_intermediate_values(study)\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/tmp/ipykernel_16014/3851300317.py:28: ExperimentalWarning: plot_param_importances is experimental (supported from v2.2.0). The interface can change in the future.\n", + " ax3 = plot_param_importances(study)\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import optuna\n", + "from optuna.visualization.matplotlib import (\n", + " plot_optimization_history,\n", + " plot_intermediate_values,\n", + " plot_param_importances\n", + ")\n", + "import matplotlib.pyplot as plt\n", + "\n", + "# Load the study from RDB storage\n", + "storage = optuna.storages.RDBStorage(\"sqlite:///optuna_trials.db\")\n", + "\n", + "study = optuna.load_study(\n", + " study_name=\"transformers_optuna_study\",\n", + " storage=storage\n", + ")\n", + "\n", + "# Plot optimization history\n", + "ax1 = plot_optimization_history(study)\n", + "plt.show()\n", + "ax1.figure.savefig(\"optimization_history.png\")\n", + "\n", + "# Plot intermediate values (if using pruning and intermediate reports)\n", + "ax2 = plot_intermediate_values(study)\n", + "plt.show()\n", + "ax2.figure.savefig(\"intermediate_values.png\")\n", + "\n", + "# Plot parameter importances\n", + "ax3 = plot_param_importances(study)\n", + "plt.show()\n", + "ax3.figure.savefig(\"param_importances.png\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "ae8def79", + "metadata": {}, + "source": [ + "### Perform Final Training\n", + "\n", + "Once you've completed hyperparameter optimization with Optuna, it’s time to capitalize on your best findings and carry out the final round of training. \n", + "\n", + "1. **Retrieve Your Ingredients** \n", + " Access the best set of hyperparameters identified during the tuning process. \n", + "\n", + "2. **Configure Training Parameters** \n", + " Plug those hyperparameter values into your training setup. This might include adjustments to learning rate, batch size, number of epochs, dropout rate, and other model-specific knobs that influence training behavior.\n", + "\n", + "3. **Incorporate into Model Setup** \n", + " Apply the optimized values to initialize and configure your model. This ensures your final training run is guided by the most effective settings discovered through trial and error.\n", + "\n", + "4. **Fine-Tune Your Training Pipeline** \n", + " Set up your optimizer, loss function, and data loaders using the best parameters. Everything from how fast your model learns to how much data it sees at once should reflect your refined configuration.\n", + "\n", + "5. **Run Full Training** \n", + " Begin training your model using the entire training dataset (or at least the train/validation split you used during HPO). This pass should reflect your best shot at learning the patterns in the data without exploratory variation." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "6cd2f800", + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "8493882fcaf841f7bf64606edcf2b28e", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "Map: 0%| | 0/25000 [00:00\n", + " \n", + " \n", + " [375/375 09:21, Epoch 3/3]\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
EpochTraining LossValidation LossAccuracy
10.4567000.3375110.856000
20.2152000.4382200.876000
30.0846000.4991590.888000

" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer\n", + "\n", + "# Define the model\n", + "model = AutoModelForSequenceClassification.from_pretrained(\"bert-base-uncased\", num_labels=2)\n", + "\n", + "# Load best hyperparameters (already defined earlier as best_hparams)\n", + "training_args = TrainingArguments(\n", + " output_dir=\"./final_model\",\n", + " learning_rate=best_hparams[\"learning_rate\"],\n", + " per_device_train_batch_size=best_hparams[\"per_device_train_batch_size\"],\n", + " weight_decay=best_hparams[\"weight_decay\"], \n", + " eval_strategy=\"epoch\",\n", + " save_strategy=\"epoch\",\n", + " load_best_model_at_end=True,\n", + " logging_strategy=\"epoch\",\n", + " num_train_epochs=3,\n", + " report_to=\"wandb\",\n", + " run_name=\"final_run_with_best_hparams\"\n", + ")\n", + "\n", + "# Create Trainer\n", + "trainer = Trainer(\n", + " model=model,\n", + " args=training_args,\n", + " train_dataset=train_dataset,\n", + " eval_dataset=valid_dataset,\n", + " tokenizer=tokenizer, # ✅ use tokenizer here, not processing_class\n", + " compute_metrics=lambda eval_pred: {\n", + " \"accuracy\": (eval_pred.predictions.argmax(-1) == eval_pred.label_ids).mean()\n", + " }\n", + ")\n", + "\n", + "# Train\n", + "trainer.train()\n", + "\n", + "# Save the model\n", + "trainer.save_model(\"./final_model\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "de469553", + "metadata": {}, + "source": [ + "### Uploading to Hugging Face Hub\n", + "\n", + "You've successfully trained a powerful and optimized model, it's time to serve it up to the world. Sharing your model on the Hugging Face Hub not only makes it reusable and accessible for inference, but also contributes to the open-source community.\n", + "\n", + "1. **Celebrate the Optimization Payoff** \n", + " After rigorous tuning and final training, your model now performs more efficiently and consistently. These improvements make it ideal for real-world tasks such as sentiment analysis, like classifying movie reviews to fine-tune content recommendations.\n", + "\n", + "2. **Save Your Work Locally** \n", + " Before sharing, save the trained model—including the weights, configuration, tokenizer (if applicable), and training artifacts—on your local system. This step ensures that your model setup is reproducible and ready to be uploaded.\n", + "\n", + "3. **Authenticate with Hugging Face Hub** \n", + " To upload your model, you’ll need to log in to the Hugging Face Hub. Whether through a terminal or notebook interface, authentication links your environment to your personal or organizational space on the platform, enabling push access.\n", + "\n", + "4. **Upload and Share** \n", + " Push your saved model to the Hugging Face Hub. This makes the model publicly accessible—or private and enables others to load, use, and fine-tune it. You’ll also create a model card to explain what the model does, its intended use cases, and performance benchmarks.\n", + "\n", + "##### 📌 Why It Matters:\n", + "- Centralized model storage encourages versioning, reproducibility, and transparency.\n", + "- The Hub simplifies integration for downstream tasks through `transformers`compatible APIs.\n", + "- Sharing models builds your profile and supports collaboration within the machine learning community." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "e524424b", + "metadata": {}, + "outputs": [ + { + "data": { + "application/vnd.jupyter.widget-view+json": { + "model_id": "cb7eabc38cd845929c74f7f52ae98aba", + "version_major": 2, + "version_minor": 0 + }, + "text/plain": [ + "README.md: 0%| | 0.00/31.0 [00:00