Skip to content

Commit

Permalink
Included metric_prob_input in config (#211)
Browse files Browse the repository at this point in the history
* enabled two more parameters to GATE model

* temp ommit

* reverted binary/multiclass metrics

* reverted adding two params to GATE
Will add in later  PR

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bugfix: metric_prob_input type

* Added new parameter `metrics_prob_input`
updated all tests
updated documentation
updated examples and notebooks

* updated pre-commit version

* downgraded precommit version

* removed pre-commit dependency

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
manujosephv and pre-commit-ci[bot] authored May 21, 2023
1 parent f932230 commit 0612db5
Show file tree
Hide file tree
Showing 23 changed files with 186 additions and 43 deletions.
2 changes: 1 addition & 1 deletion docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ git clone [email protected]:your_name_here/pytorch_tabular.git

If you are adding a new feature, please add a test for it.

* When you are done making changes and all test cases are passing, crun `pre-commit` to make sure all the linting and formatting is done correctly.
* When you are done making changes and all test cases are passing, run `pre-commit` to make sure all the linting and formatting is done correctly.

```bash
pre-commit run --all-files
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
Expand Down Expand Up @@ -68,6 +69,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
Expand Down Expand Up @@ -124,6 +126,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
Expand Down Expand Up @@ -184,6 +187,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
Expand Down Expand Up @@ -258,7 +262,8 @@
" head_config = head_config, # Linear Head Config\n",
" learning_rate = 1e-3,\n",
" metrics=[\"accuracy\", \"f1_score\"],\n",
" metrics_params=[{},{\"average\":\"micro\"}]\n",
" metrics_params=[{},{\"average\":\"micro\"}],\n",
" metrics_prob_input=[False, True]\n",
")\n",
"tabular_model = TabularModel(\n",
" data_config=data_config,\n",
Expand Down Expand Up @@ -1163,6 +1168,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
Expand Down Expand Up @@ -1442,6 +1448,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
Expand Down
10 changes: 9 additions & 1 deletion docs/tutorials/06-Imbalanced Classification.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
Expand Down Expand Up @@ -73,6 +74,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
Expand All @@ -97,6 +99,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
Expand Down Expand Up @@ -129,6 +132,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
Expand Down Expand Up @@ -189,11 +193,13 @@
" head_config = head_config, # Linear Head Config\n",
" learning_rate = 1e-3,\n",
" metrics=[\"f1_score\",\"accuracy\"], \n",
" metrics_params=[{\"num_classes\":2},{}]\n",
" metrics_params=[{\"num_classes\":2},{}],\n",
" metrics_prob_input=[True, False]\n",
")\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
Expand Down Expand Up @@ -1830,6 +1836,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand Down Expand Up @@ -3464,6 +3471,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand Down
7 changes: 6 additions & 1 deletion docs/tutorials/10-Test Time Augmentation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
Expand Down Expand Up @@ -76,6 +77,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
Expand All @@ -100,6 +102,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
Expand Down Expand Up @@ -128,6 +131,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"Collapsed": "false"
Expand Down Expand Up @@ -533,6 +537,7 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -556,7 +561,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0]"
"version": "3.10.6"
},
"vscode": {
"interpreter": {
Expand Down
11 changes: 7 additions & 4 deletions examples/__only_for_dev__/adhoc_scaffold.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,17 +63,20 @@ def print_metrics(y_true, y_pred, tag):
)
trainer_config = TrainerConfig(
auto_lr_find=True, # Runs the LRFinder to automatically derive a learning rate
batch_size=1024,
batch_size=32,
max_epochs=10,
fast_dev_run=True,
)
optimizer_config = OptimizerConfig()
model_config = CategoryEmbeddingModelConfig(
task="classification",
layers="4096-4096-512", # Number of nodes in each layer
activation="LeakyReLU", # Activation between each layers
# gflu_stages=3,
# tree_depth=2,
# layers="4096-4096-512", # Number of nodes in each layer
# activation="LeakyReLU", # Activation between each layers
learning_rate=1e-3,
metrics=["accuracy"],
metrics=["auroc"],
metrics_prob_input=[True],
)
tabular_model = TabularModel(
data_config=data_config,
Expand Down
10 changes: 9 additions & 1 deletion examples/__only_for_dev__/to_test_classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,14 +111,19 @@
normalize_continuous_features=False,
)
# model_config = CategoryEmbeddingModelConfig(
# task="classification", metrics=["f1","accuracy"], metrics_params=[{"num_classes":num_classes},{}])
# task="classification",
# metrics=["f1", "accuracy"],
# metrics_params=[{"num_classes": num_classes}, {}],
# metrics_prob_input=[False, False],
# )
# model_config = NodeConfig(
# task="classification",
# depth=4,
# num_trees=1024,
# input_dropout=0.0,
# metrics=["f1", "accuracy"],
# metrics_params=[{"num_classes": num_classes, "average": "macro"}, {}],
# metrics_prob_input=[False,False]
# )
# model_config = TabTransformerConfig(
# task="classification",
Expand All @@ -127,6 +132,7 @@
# share_embedding_strategy="add",
# shared_embedding_fraction=0.25,
# metrics_params=[{"num_classes": num_classes, "average": "macro"}, {}],
# metrics_prob_input=[False,False]
# )
# model_config = FTTransformerConfig(
# task="classification",
Expand All @@ -137,6 +143,7 @@
# share_embedding_strategy="fraction",
# shared_embedding_fraction=0.25,
# metrics_params=[{"num_classes": num_classes, "average": "macro"}, {}],
# metrics_prob_input=[False,False]
# )
# model_config_params = dict(
# task="regression",
Expand All @@ -148,6 +155,7 @@
task="classification",
metrics=["f1_score", "accuracy"],
metrics_params=[{"num_classes": num_classes, "average": "macro"}, {}],
metrics_prob_input=[False, False],
)
trainer_config = TrainerConfig(auto_select_gpus=True, fast_dev_run=False, max_epochs=5, batch_size=512)
# experiment_config = ExperimentConfig(project_name="PyTorch Tabular Example",
Expand Down
1 change: 1 addition & 0 deletions examples/__only_for_dev__/to_test_regression.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ def fake_metric(y_hat, y):
train=train,
test=test,
metrics=[fake_metric],
metrics_prob_inputs=[False],
target_transform=tr,
loss=torch.nn.L1Loss(),
optimizer=torch.optim.Adagrad,
Expand Down
18 changes: 3 additions & 15 deletions examples/__only_for_dev__/to_test_regression_custom_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
from typing import Dict, List, Optional

import pandas as pd
import pytorch_lightning as pl
import torch
import torch.nn as nn
from omegaconf import DictConfig
Expand Down Expand Up @@ -252,21 +251,10 @@ def calculate_loss(self, y, classification_logits, y_hat, tag):
)
return computed_loss

# Escaping metric calculation for cause default calculation would fail and not make sense
# for this type of combined classification and regression task
def calculate_metrics(self, y, y_hat, tag):
for metric, metric_str, metric_params in zip(self.metrics, self.hparams.metrics, self.hparams.metrics_params):
if metric.__name__ == pl.metrics.functional.mean_squared_log_error.__name__:
# MSLE should only be used in strictly positive targets. It is undefined otherwise
metric_ = metric(torch.clamp(y_hat, min=0), torch.clamp(y[:, 1], min=0), **metric_params)
else:
metric_ = metric(y_hat, y[:, 1], **metric_params)
self.log(
f"{tag}_{metric_str}",
metric_,
on_epoch=True,
on_step=False,
logger=True,
prog_bar=True,
)
pass


dataset = fetch_california_housing(data_home="data", as_frame=True)
Expand Down
5 changes: 4 additions & 1 deletion examples/covertype_classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,10 @@
layers="", dropout=0.1, initialization="kaiming" # No additional layer in head, just a mapping layer to output_dim
).__dict__ # Convert to dict to pass to the model config (OmegaConf doesn't accept objects)
model_config = CategoryEmbeddingModelConfig(
task="classification", metrics=["f1_score", "accuracy"], metrics_params=[{"num_classes": num_classes}, {}]
task="classification",
metrics=["f1_score", "accuracy"],
metrics_params=[{"num_classes": num_classes}, {}],
metrics_prob_input=[True, False],
)
trainer_config = TrainerConfig(auto_lr_find=True, fast_dev_run=False, max_epochs=5, batch_size=512)
optimizer_config = OptimizerConfig()
Expand Down
31 changes: 27 additions & 4 deletions src/pytorch_tabular/config/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -776,7 +776,13 @@ class ModelConfig:
should be one of the functional metrics implemented in ``torchmetrics``. By default, it is
accuracy if classification and mean_squared_error for regression
metrics_params (Optional[List]): The parameters to be passed to the metrics function
metrics_prob_input (Optional[bool]): Is a mandatory parameter for classification metrics defined in
the config. This defines whether the input to the metric function is the probability or the class.
Length should be same as the number of metrics. Defaults to None.
metrics_params (Optional[List]): The parameters to be passed to the metrics function. `task` is forced to
be `multiclass` because the multiclass version can handle binary as well and for simplicity we are
only using `multiclass`.
target_range (Optional[List]): The range in which we should limit the output variable. Currently
ignored for multi-target regression. Typically used for Regression problems. If left empty, will
Expand Down Expand Up @@ -843,13 +849,26 @@ class ModelConfig:
default=None,
metadata={
"help": "the list of metrics you need to track during training. The metrics should be one "
"of the functional metrics implemented in ``torchmetrics``. By default, "
"it is accuracy if classification and mean_squared_error for regression"
"of the functional metrics implemented in ``torchmetrics``. To use your own metric, please "
"use the `metric` param in the `fit` method By default, it is accuracy if classification "
"and mean_squared_error for regression"
},
)
metrics_prob_input: Optional[List[bool]] = field(
default=None,
metadata={
"help": "Is a mandatory parameter for classification metrics defined in the config. This defines "
"whether the input to the metric function is the probability or the class. Length should be same "
"as the number of metrics. Defaults to None."
},
)
metrics_params: Optional[List] = field(
default=None,
metadata={"help": "The parameters to be passed to the metrics function"},
metadata={
"help": "The parameters to be passed to the metrics function. `task` is forced to be `multiclass`` "
"because the multiclass version can handle binary as well and for simplicity we are only using "
"`multiclass`."
},
)
target_range: Optional[List] = field(
default=None,
Expand All @@ -874,10 +893,14 @@ def __post_init__(self):
self.loss = self.loss or "MSELoss"
self.metrics = self.metrics or ["mean_squared_error"]
self.metrics_params = [{} for _ in self.metrics] if self.metrics_params is None else self.metrics_params
self.metrics_prob_input = [False for _ in self.metrics] # not used in Regression. just for compatibility
elif self.task == "classification":
self.loss = self.loss or "CrossEntropyLoss"
self.metrics = self.metrics or ["accuracy"]
self.metrics_params = [{} for _ in self.metrics] if self.metrics_params is None else self.metrics_params
self.metrics_prob_input = (
[False for _ in self.metrics] if self.metrics_prob_input is None else self.metrics_prob_input
)
elif self.task == "backbone":
self.loss = None
self.metrics = None
Expand Down
4 changes: 4 additions & 0 deletions src/pytorch_tabular/models/autoint/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,10 @@ class AutoIntConfig(ModelConfig):
metrics_params (Optional[List]): The parameters to be passed to the metrics function
metrics_prob_input (Optional[List]): Is a mandatory parameter for classification metrics defined in the config.
This defines whether the input to the metric function is the probability or the class. Length should be
same as the number of metrics. Defaults to None.
target_range (Optional[List]): The range in which we should limit the output variable. Currently
ignored for multi-target regression. Typically used for Regression problems. If left empty, will
not apply any restrictions
Expand Down
Loading

0 comments on commit 0612db5

Please sign in to comment.