Merge pull request #20 from jrzaurin/wide_embedding

Wide embedding
jrzaurin · Aug 9, 2020 · 627caf4 · 627caf4
2 parents 65465a4 + e40a088
commit 627caf4
Show file tree

Hide file tree

Showing 37 changed files with 686 additions and 385 deletions.
diff --git a/README.md b/README.md
@@ -5,6 +5,12 @@
 
 [![Build Status](https://travis-ci.org/jrzaurin/pytorch-widedeep.svg?branch=master)](https://travis-ci.org/jrzaurin/pytorch-widedeep)
 [![Documentation Status](https://readthedocs.org/projects/pytorch-widedeep/badge/?version=latest)](https://pytorch-widedeep.readthedocs.io/en/latest/?badge=latest)
+[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/jrzaurin/pytorch-widedeep/graphs/commit-activity)
+
+Platform | Version Support
+---------|:---------------
+OSX      | [![Python 3.6 3.7](https://img.shields.io/badge/python-3.6%20%203.7-blue.svg)](https://www.python.org/)
+Linux    | [![Python 3.6 3.7 3.8](https://img.shields.io/badge/python-3.6%20%203.7%203.8-blue.svg)](https://www.python.org/)
 
 # pytorch-widedeep
 
@@ -34,11 +40,11 @@ few lines of code.
   <img width="600" src="docs/figures/architecture_1.png">
 </p>
 
-Architecture 1 combines the `Wide`, one-hot encoded features with the outputs
-from the `DeepDense`, `DeepText` and `DeepImage` components connected to a
-final output neuron or neurons, depending on whether we are performing a
-binary classification or regression, or a multi-class classification. The
-components within the faded-pink rectangles are concatenated.
+Architecture 1 combines the `Wide`, Linear model with the outputs from the
+`DeepDense`, `DeepText` and `DeepImage` components connected to a final output
+neuron or neurons, depending on whether we are performing a binary
+classification or regression, or a multi-class classification. The components
+within the faded-pink rectangles are concatenated.
 
 In math terms, and following the notation in the
 [paper](https://arxiv.org/abs/1606.07792), Architecture 1 can be formulated
@@ -65,10 +71,10 @@ otherwise".*
   <img width="600" src="docs/figures/architecture_2.png">
 </p>
 
-Architecture 2 combines the `Wide` one-hot encoded features with the Deep
-components of the model connected to the output neuron(s), after the different
-Deep components have been themselves combined through a FC-Head (that I refer
-as `deephead`).
+Architecture 2 combines the `Wide`, Linear model with the Deep components of
+the model connected to the output neuron(s), after the different Deep
+components have been themselves combined through a FC-Head (that I refer as
+`deephead`).
 
 In math terms, and following the notation in the
 [paper](https://arxiv.org/abs/1606.07792), Architecture 2 can be formulated
@@ -84,7 +90,8 @@ and `DeepImage` are optional. `pytorch-widedeep` includes standard text (stack
 of LSTMs) and image (pre-trained ResNets or stack of CNNs) models. However,
 the user can use any custom model as long as it has an attribute called
 `output_dim` with the size of the last layer of activations, so that
-`WideDeep` can be constructed. See the examples folder for more information.
+`WideDeep` can be constructed. See the examples folder or the docs for more
+information.
 
 
 ### Installation
@@ -112,14 +119,6 @@ cd pytorch-widedeep
 pip install -e .
 ```
 
-### Examples
-
-There are a number of notebooks in the `examples` folder plus some additional
-files. These notebooks cover most of the utilities of this package and can
-also act as documentation. In the case that github does not render the
-notebooks, or it renders them missing some parts, they are saved as markdown
-files in the `docs` folder.
-
 ### Quick start
 
 Binary classification with the [adult
@@ -128,6 +127,7 @@ using `Wide` and `DeepDense` and defaults settings.
 
 ```python
 import pandas as pd
+import numpy as np
 from sklearn.model_selection import train_test_split
 
 from pytorch_widedeep.preprocessing import WidePreprocessor, DensePreprocessor
@@ -166,7 +166,7 @@ target = df_train[target_col].values
 # wide
 preprocess_wide = WidePreprocessor(wide_cols=wide_cols, crossed_cols=cross_cols)
 X_wide = preprocess_wide.fit_transform(df_train)
-wide = Wide(wide_dim=X_wide.shape[1], pred_dim=1)
+wide = Wide(wide_dim=np.unique(X_wide).shape[0], pred_dim=1)
 
 # deepdense
 preprocess_deep = DensePreprocessor(embed_cols=embed_cols, continuous_cols=cont_cols)

diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-0.4.2
+0.4.5
diff --git a/docs/wide_deep/callbacks.rst → docs/callbacks.rst b/docs/wide_deep/callbacks.rst → docs/callbacks.rst
diff --git a/docs/figures/architecture_1.png b/docs/figures/architecture_1.png
diff --git a/docs/figures/architecture_2.png b/docs/figures/architecture_2.png
diff --git a/docs/index.rst b/docs/index.rst
@@ -18,7 +18,10 @@ Documentation
     Utilities <utils/index>
     Preprocessing <preprocessing>
     Model Components <model_components>
-    Wide and Deep Models <wide_deep/index>
+    Metrics <metrics>
+    Callbacks <callbacks>
+    Focal Loss <losses>
+    Wide and Deep Models <wide_deep>
     Examples <examples>
 
 
@@ -45,12 +48,11 @@ Architectures
    :width: 600px
    :align: center
 
-Architecture 1 combines the ``Wide``, one-hot encoded features with the
-outputs from the ``DeepDense``, ``DeepText`` and ``DeepImage`` components
-connected to a final output neuron or neurons, depending on whether we are
-performing a binary classification or regression, or a multi-class
-classification. The components within the faded-pink rectangles are
-concatenated.
+Architecture 1 combines the `Wide`, Linear model with the outputs from the
+`DeepDense`, `DeepText` and `DeepImage` components connected to a final output
+neuron or neurons, depending on whether we are performing a binary
+classification or regression, or a multi-class classification. The components
+within the faded-pink rectangles are concatenated.
 
 In math terms, and following the notation in the `paper
 <https://arxiv.org/abs/1606.07792>`_, Architecture 1 can be formulated as:
@@ -76,10 +78,10 @@ is the activation function.
    :width: 600px
    :align: center
 
-Architecture 2 combines the ``Wide`` one-hot encoded features with the Deep
-components of the model connected to the output neuron(s), after the different
-Deep components have been themselves combined through a FC-Head (referred as
-as ``deephead``).
+Architecture 2 combines the `Wide`, Linear model with the Deep components of
+the model connected to the output neuron(s), after the different Deep
+components have been themselves combined through a FC-Head (that I refer as
+`deephead`).
 
 In math terms, and following the notation in the `paper
 <https://arxiv.org/abs/1606.07792>`_, Architecture 2 can be formulated as:

diff --git a/docs/wide_deep/losses.rst → docs/losses.rst b/docs/wide_deep/losses.rst → docs/losses.rst
diff --git a/docs/wide_deep/metrics.rst → docs/metrics.rst b/docs/wide_deep/metrics.rst → docs/metrics.rst
diff --git a/docs/model_components.rst b/docs/model_components.rst
@@ -1,10 +1,9 @@
 The ``models`` module
-=====================
+======================
 
 This module contains the four main Wide and Deep model component. These are:
 ``Wide``, ``DeepDense``, ``DeepText`` and ``DeepImage``.
 
-
 .. autoclass:: pytorch_widedeep.models.wide.Wide
     :members:
     :undoc-members:

diff --git a/docs/quick_start.rst b/docs/quick_start.rst
@@ -15,6 +15,7 @@ The following code snippet is not directly related to ``pytorch-widedeep``.
 .. code-block:: python
 
     import pandas as pd
+    import numpy as np
     from sklearn.model_selection import train_test_split
 
     df = pd.read_csv("data/adult/adult.csv.zip")
@@ -23,6 +24,7 @@ The following code snippet is not directly related to ``pytorch-widedeep``.
     df_train, df_test = train_test_split(df, test_size=0.2, stratify=df.income_label)
 
 
+
 Prepare the wide and deep columns
 ---------------------------------
 
@@ -63,7 +65,7 @@ Preprocessing and model components definition
     # wide
     preprocess_wide = WidePreprocessor(wide_cols=wide_cols, crossed_cols=cross_cols)
     X_wide = preprocess_wide.fit_transform(df_train)
-    wide = Wide(wide_dim=X_wide.shape[1], pred_dim=1)
+    wide = Wide(wide_dim=np.unique(X_wide).shape[0], pred_dim=1)
 
     # deepdense
     preprocess_deep = DensePreprocessor(embed_cols=embed_cols, continuous_cols=cont_cols)

diff --git a/docs/wide_deep/wide_deep.rst → docs/wide_deep.rst b/docs/wide_deep/wide_deep.rst → docs/wide_deep.rst
@@ -1,6 +1,9 @@
 Building Wide and Deep Models
 =============================
 
+Here is the documentation to build the two architectures, and the different
+options available in ``pytorch-widedeep`` as one builds the model.
+
 :class:`pytorch_widedeep.models.wide_deep.WideDeep` is the main class. It will
 collect all model components and build one of the two possible architectures
 with a series of optional parameters.

diff --git a/docs/wide_deep/index.rst b/docs/wide_deep/index.rst
diff --git a/examples/01_Preprocessors_and_utils.ipynb b/examples/01_Preprocessors_and_utils.ipynb
@@ -50,7 +50,9 @@
    "source": [
     "##  1. WidePreprocessor\n",
     "\n",
-    "This class simply takes a dataset and one-hot encodes it, with a few additional rings and bells. "
+    "The Wide component of the model is a linear model that in principle, could be implemented as a linear layer receiving the result of on one-hot encoding categorical columns. However, this is not memory efficient. Therefore, we implement a liner layer as an Embedding layer plus a bias. I will explain in a bit more detail later. \n",
+    "\n",
+    "With that in mind, `WidePreprocessor` simply encodes the categories numerically so that they are the indexes of the lookup table that is an Embedding layer."
    ]
   },
   {
@@ -284,13 +286,13 @@
     {
      "data": {
       "text/plain": [
-       "array([[0., 1., 0., ..., 0., 0., 0.],\n",
-       "       [0., 0., 0., ..., 0., 0., 0.],\n",
-       "       [0., 0., 0., ..., 0., 0., 0.],\n",
+       "array([[  1,  17,  23, ...,  89,  91, 316],\n",
+       "       [  2,  18,  23, ...,  89,  92, 317],\n",
+       "       [  3,  18,  24, ...,  89,  93, 318],\n",
        "       ...,\n",
-       "       [0., 0., 0., ..., 0., 0., 0.],\n",
-       "       [0., 0., 0., ..., 0., 0., 0.],\n",
-       "       [0., 0., 0., ..., 0., 0., 0.]])"
+       "       [  2,  20,  23, ...,  90, 103, 323],\n",
+       "       [  2,  17,  23, ...,  89, 103, 323],\n",
+       "       [  2,  21,  29, ...,  90, 115, 324]])"
       ]
      },
      "execution_count": 6,
@@ -306,45 +308,103 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "or sparse"
+    "Let's take from example the first entry"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 7,
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array([  1,  17,  23,  32,  47,  89,  91, 316])"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
    "source": [
-    "wide_preprocessor_sparse = WidePreprocessor(wide_cols=wide_cols, crossed_cols=crossed_cols, sparse=True)\n",
-    "X_wide_sparse = wide_preprocessor_sparse.fit_transform(df)"
+    "X_wide[0]"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 10,
    "metadata": {},
    "outputs": [
     {
      "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>education</th>\n",
+       "      <th>relationship</th>\n",
+       "      <th>workclass</th>\n",
+       "      <th>occupation</th>\n",
+       "      <th>native-country</th>\n",
+       "      <th>gender</th>\n",
+       "      <th>education_occupation</th>\n",
+       "      <th>native-country_occupation</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>11th</td>\n",
+       "      <td>Own-child</td>\n",
+       "      <td>Private</td>\n",
+       "      <td>Machine-op-inspct</td>\n",
+       "      <td>United-States</td>\n",
+       "      <td>Male</td>\n",
+       "      <td>11th-Machine-op-inspct</td>\n",
+       "      <td>United-States-Machine-op-inspct</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
       "text/plain": [
-       "<48842x796 sparse matrix of type '<class 'numpy.float64'>'\n",
-       "\twith 390736 stored elements in Compressed Sparse Row format>"
+       "  education relationship workclass         occupation native-country gender  \\\n",
+       "0      11th    Own-child   Private  Machine-op-inspct  United-States   Male   \n",
+       "\n",
+       "     education_occupation        native-country_occupation  \n",
+       "0  11th-Machine-op-inspct  United-States-Machine-op-inspct  "
       ]
      },
-     "execution_count": 8,
+     "execution_count": 10,
      "metadata": {},
      "output_type": "execute_result"
     }
    ],
    "source": [
-    "X_wide_sparse"
+    "wide_preprocessor.inverse_transform(X_wide[:1])"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Note that while this will save memory on disk, due to the batch generation process for `WideDeep` the running time will be notably slow. See [here](https://github.com/jrzaurin/pytorch-widedeep/blob/bfbe6e5d2309857db0dcc5cf3282dfa60504aa52/pytorch_widedeep/models/_wd_dataset.py#L47) for more details."
+    "As we can see, `wide_preprocessor` numerically encodes the `wide_cols` and the `crossed_cols`, which can be recovered using the method `inverse_transform`."
    ]
   },
   {