updated README and docs

jrzaurin · Mar 10, 2022 · 108cebc · 108cebc
1 parent 23b9331
commit 108cebc
Show file tree

Hide file tree

Showing 4 changed files with 159 additions and 177 deletions.
diff --git a/README.md b/README.md
@@ -15,14 +15,14 @@
 
 # pytorch-widedeep
 
-A flexible package to use Deep Learning with tabular data, text and images
-using wide and deep models.
+A flexible package for multimodal-deep-learning to combine tabular data with
+text and images using Wide and Deep models in Pytorch
 
 **Documentation:** [https://pytorch-widedeep.readthedocs.io](https://pytorch-widedeep.readthedocs.io/en/latest/index.html)
 
 **Companion posts and tutorials:** [infinitoml](https://jrzaurin.github.io/infinitoml/)
 
-**Experiments and comparisson with `LightGBM`**: [TabularDL vs LightGBM](https://github.com/jrzaurin/tabulardl-benchmark)
+**Experiments and comparison with `LightGBM`**: [TabularDL vs LightGBM](https://github.com/jrzaurin/tabulardl-benchmark)
 
 The content of this document is organized as follows:
 
@@ -33,7 +33,8 @@ The content of this document is organized as follows:
 
 ### Introduction
 
-``pytorch-widedeep`` is based on Google's [Wide and Deep Algorithm](https://arxiv.org/abs/1606.07792)
+``pytorch-widedeep`` is based on Google's [Wide and Deep Algorithm](https://arxiv.org/abs/1606.07792),
+adjusted for multi-modal datasets
 
 In general terms, `pytorch-widedeep` is a package to use deep learning with
 tabular data. In particular, is intended to facilitate the combination of text
@@ -89,15 +90,11 @@ into:
   <img width="300" src="docs/figures/architecture_2_math.png">
 </p>
 
-I recommend using the ``wide`` and ``deeptabular`` models in
-``pytorch-widedeep``. However it is very likely that users will want to use
-their own models for the ``deeptext`` and ``deepimage`` components. That is
-perfectly possible as long as the the custom models have an attribute called
+It is perfectly possible to use custom models (and not necessarily those in
+the library) as long as the the custom models have an attribute called
 ``output_dim`` with the size of the last layer of activations, so that
-``WideDeep`` can be constructed. Again, examples on how to use custom
-components can be found in the Examples folder. Just in case
-``pytorch-widedeep`` includes standard text (stack of LSTMs) and image
-(pre-trained ResNets or stack of CNNs) models.
+``WideDeep`` can be constructed. Examples on how to use custom components can
+be found in the Examples folder.
 
 ### The ``deeptabular`` component
 
@@ -110,15 +107,17 @@ its own, i.e. what one might normally refer as Deep Learning for Tabular
 Data. Currently, ``pytorch-widedeep`` offers the following different models
 for that component:
 
-
+0. **Wide**: a simple linear model where the nonlinearities are captured via
+cross-product transformations, as explained before.
 1. **TabMlp**: a simple MLP that receives embeddings representing the
-categorical features, concatenated with the continuous features.
+categorical features, concatenated with the continuous features, which can
+also be embedded.
 2. **TabResnet**: similar to the previous model but the embeddings are
 passed through a series of ResNet blocks built with dense layers.
 3. **TabNet**: details on TabNet can be found in
 [TabNet: Attentive Interpretable Tabular Learning](https://arxiv.org/abs/1908.07442)
 
-And the ``Tabformer`` family, i.e. Transformers for Tabular data:
+The ``Tabformer`` family, i.e. Transformers for Tabular data:
 
 4. **TabTransformer**: details on the TabTransformer can be found in
 [TabTransformer: Tabular Data Modeling Using Contextual Embeddings](https://arxiv.org/pdf/2012.06678.pdf).
@@ -133,12 +132,19 @@ on the Fasformer can be found in
 the Perceiver can be found in
 [Perceiver: General Perception with Iterative Attention](https://arxiv.org/abs/2103.03206)
 
+And probabilistic DL models for tabular data based on
+[Weight Uncertainty in Neural Networks](https://arxiv.org/abs/1505.05424):
+
+9. **BayesianWide**: Probabilistic adaptation of the `Wide` model.
+10. **BayesianTabMlp**: Probabilistic adaptation of the `TabMlp` model
+
 Note that while there are scientific publications for the TabTransformer,
 SAINT and FT-Transformer, the TabFasfFormer and TabPerceiver are our own
 adaptation of those algorithms for tabular data.
 
-For details on these models and their options please see the examples in the
-Examples folder and the documentation.
+For details on these models (and all the other models in the library for the
+different data modes) and their corresponding options please see the examples
+in the Examples folder and the documentation.
 
 ###  Installation
 
@@ -165,13 +171,6 @@ cd pytorch-widedeep
 pip install -e .
 ```
 
-**Important note for Mac users**: Since `python
-3.8`, [the `multiprocessing` library start method changed from `'fork'` to`'spawn'`](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods) which affects the data-loaders.
-For the time being, `pytorch-widedeep` sets the `num_workers` to 0 when using
-Mac and python version 3.8+.
-
-Note that this issue does not affect Linux users.
-
 ### Quick start
 
 Binary classification with the [adult
@@ -181,7 +180,6 @@ using `Wide` and `DeepDense` and defaults settings.
 Building a wide (linear) and deep model with ``pytorch-widedeep``:
 
 ```python
-
 import pandas as pd
 import numpy as np
 import torch
@@ -191,16 +189,15 @@ from pytorch_widedeep import Trainer
 from pytorch_widedeep.preprocessing import WidePreprocessor, TabPreprocessor
 from pytorch_widedeep.models import Wide, TabMlp, WideDeep
 from pytorch_widedeep.metrics import Accuracy
+from pytorch_widedeep.datasets import load_adult
+
 
-# the following 4 lines are not directly related to ``pytorch-widedeep``. I
-# assume you have downloaded the dataset and place it in a dir called
-# data/adult/
-df = pd.read_csv("data/adult/adult.csv.zip")
+df = load_adult(as_frame=True)
 df["income_label"] = (df["income"].apply(lambda x: ">50K" in x)).astype(int)
 df.drop("income", axis=1, inplace=True)
 df_train, df_test = train_test_split(df, test_size=0.2, stratify=df.income_label)
 
-# prepare wide, crossed, embedding and continuous columns
+# Define the 'column set up'
 wide_cols = [
     "education",
     "relationship",
@@ -209,49 +206,53 @@ wide_cols = [
     "native-country",
     "gender",
 ]
-cross_cols = [("education", "occupation"), ("native-country", "occupation")]
-embed_cols = [
-    ("education", 16),
-    ("workclass", 16),
-    ("occupation", 16),
-    ("native-country", 32),
-]
-cont_cols = ["age", "hours-per-week"]
-target_col = "income_label"
+crossed_cols = [("education", "occupation"), ("native-country", "occupation")]
 
-# target
-target = df_train[target_col].values
+cat_embed_cols = [
+    "workclass",
+    "education",
+    "marital-status",
+    "occupation",
+    "relationship",
+    "race",
+    "gender",
+    "capital-gain",
+    "capital-loss",
+    "native-country",
+]
+continuous_cols = ["age", "hours-per-week"]
+target = "income_label"
+target = df_train[target].values
 
-# wide
-wide_preprocessor = WidePreprocessor(wide_cols=wide_cols, crossed_cols=cross_cols)
+# prepare the data
+wide_preprocessor = WidePreprocessor(wide_cols=wide_cols, crossed_cols=crossed_cols)
 X_wide = wide_preprocessor.fit_transform(df_train)
-wide = Wide(wide_dim=np.unique(X_wide).shape[0], pred_dim=1)
 
-# deeptabular
-tab_preprocessor = TabPreprocessor(cat_embed_cols=embed_cols, continuous_cols=cont_cols)
+tab_preprocessor = TabPreprocessor(
+    cat_embed_cols=cat_embed_cols, continuous_cols=continuous_cols  # type: ignore[arg-type]
+)
 X_tab = tab_preprocessor.fit_transform(df_train)
-deeptabular = TabMlp(
-    mlp_hidden_dims=[64, 32],
+
+# build the model
+wide = Wide(input_dim=np.unique(X_wide).shape[0], pred_dim=1)
+tab_mlp = TabMlp(
     column_idx=tab_preprocessor.column_idx,
-    embed_input=tab_preprocessor.cat_embed_input,
-    continuous_cols=cont_cols,
+    cat_embed_input=tab_preprocessor.cat_embed_input,
+    continuous_cols=continuous_cols,
 )
+model = WideDeep(wide=wide, deeptabular=tab_mlp)
 
-# wide and deep
-model = WideDeep(wide=wide, deeptabular=deeptabular)
-
-# train the model
+# train and validate
 trainer = Trainer(model, objective="binary", metrics=[Accuracy])
 trainer.fit(
     X_wide=X_wide,
     X_tab=X_tab,
     target=target,
     n_epochs=5,
     batch_size=256,
-    val_split=0.1,
 )
 
-# predict
+# predict on test
 X_wide_te = wide_preprocessor.transform(df_test)
 X_tab_te = tab_preprocessor.transform(df_test)
 preds = trainer.predict(X_wide=X_wide_te, X_tab=X_tab_te)
@@ -268,14 +269,11 @@ torch.save(model.state_dict(), "model_weights/wd_model.pt")
 # From here in advance, Option 1 or 2 are the same. I assume the user has
 # prepared the data and defined the new model components:
 # 1. Build the model
-model_new = WideDeep(wide=wide, deeptabular=deeptabular)
+model_new = WideDeep(wide=wide, deeptabular=tab_mlp)
 model_new.load_state_dict(torch.load("model_weights/wd_model.pt"))
 
 # 2. Instantiate the trainer
-trainer_new = Trainer(
-    model_new,
-    objective="binary",
-)
+trainer_new = Trainer(model_new, objective="binary")
 
 # 3. Either start the fit or directly predict
 preds = trainer_new.predict(X_wide=X_wide, X_tab=X_tab)

diff --git a/docs/index.rst b/docs/index.rst
@@ -31,7 +31,8 @@ Documentation
 Introduction
 ------------
 ``pytorch-widedeep`` is based on Google's `Wide and Deep Algorithm
-<https://arxiv.org/abs/1606.07792>`_.
+<https://arxiv.org/abs/1606.07792>`_, adjusted for multi-modal datasets
+
 
 In general terms, ``pytorch-widedeep`` is a package to use deep learning with
 tabular and multimodal data. In particular, is intended to facilitate the
@@ -97,17 +98,20 @@ own, i.e. what one might normally refer as Deep Learning for Tabular Data.
 Currently, ``pytorch-widedeep`` offers the following different models for
 that component:
 
+0. **Wide**: a simple linear model where the nonlinearities are captured via
+cross-product transformations, as explained before.
 
 1. **TabMlp**: a simple MLP that receives embeddings representing the
-categorical features, concatenated with the continuous features.
+categorical features, concatenated with the continuous features, which can
+also be embedded.
 
 2. **TabResnet**: similar to the previous model but the embeddings are
 passed through a series of ResNet blocks built with dense layers.
 
 3. **TabNet**: details on TabNet can be found in `TabNet: Attentive
 Interpretable Tabular Learning <https://arxiv.org/abs/1908.07442>`_
 
-And the ``Tabformer`` family, i.e. Transformers for Tabular data:
+The ``Tabformer`` family, i.e. Transformers for Tabular data:
 
 4. **TabTransformer**: details on the TabTransformer can be found in
 `TabTransformer: Tabular Data Modeling Using Contextual Embeddings
@@ -130,22 +134,24 @@ Models for Natural Language Understanding
 the Perceiver can be found in `Perceiver: General Perception with Iterative
 Attention <https://arxiv.org/abs/2103.03206>`_
 
+And probabilistic DL models for tabular data based on
+`Weight Uncertainty in Neural Networks <https://arxiv.org/abs/1505.05424>`_:
+
+9. **BayesianWide**: Probabilistic adaptation of the `Wide` model.
+
+10. **BayesianTabMlp**: Probabilistic adaptation of the `TabMlp` model
+
 Note that while there are scientific publications for the TabTransformer,
 SAINT and FT-Transformer, the TabFasfFormer and TabPerceiver are our own
-adaptation of those algorithms for tabular data.
-
-For details on these models and their options please see the examples in the
-Examples folder and the documentation.
-
-Finally, while I recommend using the ``wide`` and ``deeptabular`` models in
-``pytorch-widedeep`` it is very likely that users will want to use their own
-models for the ``deeptext`` and ``deepimage`` components. That is perfectly
-possible as long as the the custom models have an attribute called
-``output_dim`` with the size of the last layer of activations, so that
-``WideDeep`` can be constructed. Again, examples on how to use custom
-components can be found in the Examples folder. Just in case
-``pytorch-widedeep`` includes standard text (stack of LSTMs or GRUs) and
-image(pre-trained ResNets or stack of CNNs) models.
+adaptation of those algorithms for tabular data. For details on these models
+and their options please see the examples in the Examples folder and the
+documentation.
+
+Finally, it is perfectly possible to use custom models as long as the the
+custom models have an attribute called ``output_dim`` with the size of the
+last layer of activations, so that ``WideDeep`` can be constructed. Again,
+examples on how to use custom components can be found in the Examples
+folder.
 
 Indices and tables
 ==================