diff --git a/README.md b/README.md index 4c6f1d7e..adf85e35 100644 --- a/README.md +++ b/README.md @@ -5,6 +5,12 @@ [![Build Status](https://travis-ci.org/jrzaurin/pytorch-widedeep.svg?branch=master)](https://travis-ci.org/jrzaurin/pytorch-widedeep) [![Documentation Status](https://readthedocs.org/projects/pytorch-widedeep/badge/?version=latest)](https://pytorch-widedeep.readthedocs.io/en/latest/?badge=latest) +[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/jrzaurin/pytorch-widedeep/graphs/commit-activity) + +Platform | Version Support +---------|:--------------- +OSX | [![Python 3.6 3.7](https://img.shields.io/badge/python-3.6%20%203.7-blue.svg)](https://www.python.org/) +Linux | [![Python 3.6 3.7 3.8](https://img.shields.io/badge/python-3.6%20%203.7%203.8-blue.svg)](https://www.python.org/) # pytorch-widedeep @@ -34,11 +40,11 @@ few lines of code.

-Architecture 1 combines the `Wide`, one-hot encoded features with the outputs -from the `DeepDense`, `DeepText` and `DeepImage` components connected to a -final output neuron or neurons, depending on whether we are performing a -binary classification or regression, or a multi-class classification. The -components within the faded-pink rectangles are concatenated. +Architecture 1 combines the `Wide`, Linear model with the outputs from the +`DeepDense`, `DeepText` and `DeepImage` components connected to a final output +neuron or neurons, depending on whether we are performing a binary +classification or regression, or a multi-class classification. The components +within the faded-pink rectangles are concatenated. In math terms, and following the notation in the [paper](https://arxiv.org/abs/1606.07792), Architecture 1 can be formulated @@ -65,10 +71,10 @@ otherwise".*

-Architecture 2 combines the `Wide` one-hot encoded features with the Deep -components of the model connected to the output neuron(s), after the different -Deep components have been themselves combined through a FC-Head (that I refer -as `deephead`). +Architecture 2 combines the `Wide`, Linear model with the Deep components of +the model connected to the output neuron(s), after the different Deep +components have been themselves combined through a FC-Head (that I refer as +`deephead`). In math terms, and following the notation in the [paper](https://arxiv.org/abs/1606.07792), Architecture 2 can be formulated @@ -84,7 +90,8 @@ and `DeepImage` are optional. `pytorch-widedeep` includes standard text (stack of LSTMs) and image (pre-trained ResNets or stack of CNNs) models. However, the user can use any custom model as long as it has an attribute called `output_dim` with the size of the last layer of activations, so that -`WideDeep` can be constructed. See the examples folder for more information. +`WideDeep` can be constructed. See the examples folder or the docs for more +information. ### Installation @@ -112,14 +119,6 @@ cd pytorch-widedeep pip install -e . ``` -### Examples - -There are a number of notebooks in the `examples` folder plus some additional -files. These notebooks cover most of the utilities of this package and can -also act as documentation. In the case that github does not render the -notebooks, or it renders them missing some parts, they are saved as markdown -files in the `docs` folder. - ### Quick start Binary classification with the [adult @@ -128,6 +127,7 @@ using `Wide` and `DeepDense` and defaults settings. ```python import pandas as pd +import numpy as np from sklearn.model_selection import train_test_split from pytorch_widedeep.preprocessing import WidePreprocessor, DensePreprocessor @@ -166,7 +166,7 @@ target = df_train[target_col].values # wide preprocess_wide = WidePreprocessor(wide_cols=wide_cols, crossed_cols=cross_cols) X_wide = preprocess_wide.fit_transform(df_train) -wide = Wide(wide_dim=X_wide.shape[1], pred_dim=1) +wide = Wide(wide_dim=np.unique(X_wide).shape[0], pred_dim=1) # deepdense preprocess_deep = DensePreprocessor(embed_cols=embed_cols, continuous_cols=cont_cols) diff --git a/VERSION b/VERSION index f7abe273..c8a5397f 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.4.2 \ No newline at end of file +0.4.5 \ No newline at end of file diff --git a/docs/wide_deep/callbacks.rst b/docs/callbacks.rst similarity index 100% rename from docs/wide_deep/callbacks.rst rename to docs/callbacks.rst diff --git a/docs/figures/architecture_1.png b/docs/figures/architecture_1.png index a9fb25df..5ffa7c98 100644 Binary files a/docs/figures/architecture_1.png and b/docs/figures/architecture_1.png differ diff --git a/docs/figures/architecture_2.png b/docs/figures/architecture_2.png index 074af49d..7c0068bc 100644 Binary files a/docs/figures/architecture_2.png and b/docs/figures/architecture_2.png differ diff --git a/docs/index.rst b/docs/index.rst index 8dbe21e6..02f4291a 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -18,7 +18,10 @@ Documentation Utilities Preprocessing Model Components - Wide and Deep Models + Metrics + Callbacks + Focal Loss + Wide and Deep Models Examples @@ -45,12 +48,11 @@ Architectures :width: 600px :align: center -Architecture 1 combines the ``Wide``, one-hot encoded features with the -outputs from the ``DeepDense``, ``DeepText`` and ``DeepImage`` components -connected to a final output neuron or neurons, depending on whether we are -performing a binary classification or regression, or a multi-class -classification. The components within the faded-pink rectangles are -concatenated. +Architecture 1 combines the `Wide`, Linear model with the outputs from the +`DeepDense`, `DeepText` and `DeepImage` components connected to a final output +neuron or neurons, depending on whether we are performing a binary +classification or regression, or a multi-class classification. The components +within the faded-pink rectangles are concatenated. In math terms, and following the notation in the `paper `_, Architecture 1 can be formulated as: @@ -76,10 +78,10 @@ is the activation function. :width: 600px :align: center -Architecture 2 combines the ``Wide`` one-hot encoded features with the Deep -components of the model connected to the output neuron(s), after the different -Deep components have been themselves combined through a FC-Head (referred as -as ``deephead``). +Architecture 2 combines the `Wide`, Linear model with the Deep components of +the model connected to the output neuron(s), after the different Deep +components have been themselves combined through a FC-Head (that I refer as +`deephead`). In math terms, and following the notation in the `paper `_, Architecture 2 can be formulated as: diff --git a/docs/wide_deep/losses.rst b/docs/losses.rst similarity index 100% rename from docs/wide_deep/losses.rst rename to docs/losses.rst diff --git a/docs/wide_deep/metrics.rst b/docs/metrics.rst similarity index 100% rename from docs/wide_deep/metrics.rst rename to docs/metrics.rst diff --git a/docs/model_components.rst b/docs/model_components.rst index b308d21c..cca672c6 100644 --- a/docs/model_components.rst +++ b/docs/model_components.rst @@ -1,10 +1,9 @@ The ``models`` module -===================== +====================== This module contains the four main Wide and Deep model component. These are: ``Wide``, ``DeepDense``, ``DeepText`` and ``DeepImage``. - .. autoclass:: pytorch_widedeep.models.wide.Wide :members: :undoc-members: diff --git a/docs/quick_start.rst b/docs/quick_start.rst index 2fc305a5..b37fff65 100644 --- a/docs/quick_start.rst +++ b/docs/quick_start.rst @@ -15,6 +15,7 @@ The following code snippet is not directly related to ``pytorch-widedeep``. .. code-block:: python import pandas as pd + import numpy as np from sklearn.model_selection import train_test_split df = pd.read_csv("data/adult/adult.csv.zip") @@ -23,6 +24,7 @@ The following code snippet is not directly related to ``pytorch-widedeep``. df_train, df_test = train_test_split(df, test_size=0.2, stratify=df.income_label) + Prepare the wide and deep columns --------------------------------- @@ -63,7 +65,7 @@ Preprocessing and model components definition # wide preprocess_wide = WidePreprocessor(wide_cols=wide_cols, crossed_cols=cross_cols) X_wide = preprocess_wide.fit_transform(df_train) - wide = Wide(wide_dim=X_wide.shape[1], pred_dim=1) + wide = Wide(wide_dim=np.unique(X_wide).shape[0], pred_dim=1) # deepdense preprocess_deep = DensePreprocessor(embed_cols=embed_cols, continuous_cols=cont_cols) diff --git a/docs/wide_deep/wide_deep.rst b/docs/wide_deep.rst similarity index 72% rename from docs/wide_deep/wide_deep.rst rename to docs/wide_deep.rst index 82f909e7..0675afef 100644 --- a/docs/wide_deep/wide_deep.rst +++ b/docs/wide_deep.rst @@ -1,6 +1,9 @@ Building Wide and Deep Models ============================= +Here is the documentation to build the two architectures, and the different +options available in ``pytorch-widedeep`` as one builds the model. + :class:`pytorch_widedeep.models.wide_deep.WideDeep` is the main class. It will collect all model components and build one of the two possible architectures with a series of optional parameters. diff --git a/docs/wide_deep/index.rst b/docs/wide_deep/index.rst deleted file mode 100644 index c95cbf17..00000000 --- a/docs/wide_deep/index.rst +++ /dev/null @@ -1,15 +0,0 @@ -Wide and Deep Models -===================== - -Here is the documentation to build the two architectures, and the different -options available in ``pytorch-widedeep`` as one builds the model. - -Objects -------- - -.. toctree:: - - metrics - callbacks - losses - wide_deep \ No newline at end of file diff --git a/examples/01_Preprocessors_and_utils.ipynb b/examples/01_Preprocessors_and_utils.ipynb index b0fbdb48..8457bb6c 100644 --- a/examples/01_Preprocessors_and_utils.ipynb +++ b/examples/01_Preprocessors_and_utils.ipynb @@ -50,7 +50,9 @@ "source": [ "## 1. WidePreprocessor\n", "\n", - "This class simply takes a dataset and one-hot encodes it, with a few additional rings and bells. " + "The Wide component of the model is a linear model that in principle, could be implemented as a linear layer receiving the result of on one-hot encoding categorical columns. However, this is not memory efficient. Therefore, we implement a liner layer as an Embedding layer plus a bias. I will explain in a bit more detail later. \n", + "\n", + "With that in mind, `WidePreprocessor` simply encodes the categories numerically so that they are the indexes of the lookup table that is an Embedding layer." ] }, { @@ -284,13 +286,13 @@ { "data": { "text/plain": [ - "array([[0., 1., 0., ..., 0., 0., 0.],\n", - " [0., 0., 0., ..., 0., 0., 0.],\n", - " [0., 0., 0., ..., 0., 0., 0.],\n", + "array([[ 1, 17, 23, ..., 89, 91, 316],\n", + " [ 2, 18, 23, ..., 89, 92, 317],\n", + " [ 3, 18, 24, ..., 89, 93, 318],\n", " ...,\n", - " [0., 0., 0., ..., 0., 0., 0.],\n", - " [0., 0., 0., ..., 0., 0., 0.],\n", - " [0., 0., 0., ..., 0., 0., 0.]])" + " [ 2, 20, 23, ..., 90, 103, 323],\n", + " [ 2, 17, 23, ..., 89, 103, 323],\n", + " [ 2, 21, 29, ..., 90, 115, 324]])" ] }, "execution_count": 6, @@ -306,45 +308,103 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "or sparse" + "Let's take from example the first entry" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 1, 17, 23, 32, 47, 89, 91, 316])" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ - "wide_preprocessor_sparse = WidePreprocessor(wide_cols=wide_cols, crossed_cols=crossed_cols, sparse=True)\n", - "X_wide_sparse = wide_preprocessor_sparse.fit_transform(df)" + "X_wide[0]" ] }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 10, "metadata": {}, "outputs": [ { "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
educationrelationshipworkclassoccupationnative-countrygendereducation_occupationnative-country_occupation
011thOwn-childPrivateMachine-op-inspctUnited-StatesMale11th-Machine-op-inspctUnited-States-Machine-op-inspct
\n", + "
" + ], "text/plain": [ - "<48842x796 sparse matrix of type ''\n", - "\twith 390736 stored elements in Compressed Sparse Row format>" + " education relationship workclass occupation native-country gender \\\n", + "0 11th Own-child Private Machine-op-inspct United-States Male \n", + "\n", + " education_occupation native-country_occupation \n", + "0 11th-Machine-op-inspct United-States-Machine-op-inspct " ] }, - "execution_count": 8, + "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "X_wide_sparse" + "wide_preprocessor.inverse_transform(X_wide[:1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Note that while this will save memory on disk, due to the batch generation process for `WideDeep` the running time will be notably slow. See [here](https://github.com/jrzaurin/pytorch-widedeep/blob/bfbe6e5d2309857db0dcc5cf3282dfa60504aa52/pytorch_widedeep/models/_wd_dataset.py#L47) for more details." + "As we can see, `wide_preprocessor` numerically encodes the `wide_cols` and the `crossed_cols`, which can be recovered using the method `inverse_transform`." ] }, { diff --git a/examples/02_Model_Components.ipynb b/examples/02_Model_Components.ipynb index 8e4fe64a..d5ea250d 100644 --- a/examples/02_Model_Components.ipynb +++ b/examples/02_Model_Components.ipynb @@ -23,7 +23,11 @@ "source": [ "### 1. Wide\n", "\n", - "The wide component is simply a Linear layer \"plugged\" into the output neuron(s)" + "The wide component is a Linear layer \"plugged\" into the output neuron(s)\n", + "\n", + "The only particularity of our implementation is that we have implemented the linear layer via an Embedding layer plus a bias. While the implementations are equivalent, the latter is faster and far more memory efficient, since we do not need to one hot encode the categorical features. \n", + "\n", + "Let's assume we the following dataset:" ] }, { @@ -31,13 +35,199 @@ "execution_count": 1, "metadata": {}, "outputs": [], + "source": [ + "import torch\n", + "import pandas as pd\n", + "import numpy as np\n", + "\n", + "from torch import nn" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
colorsize
0rs
1bn
2gl
\n", + "
" + ], + "text/plain": [ + " color size\n", + "0 r s\n", + "1 b n\n", + "2 g l" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df = pd.DataFrame({'color': ['r', 'b', 'g'], 'size': ['s', 'n', 'l']})\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "one hot encoded, the first observation would be" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "obs_0_oh = (np.array([1., 0., 0., 1., 0., 0.])).astype('float32')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "if we simply numerically encode (label encode or `le`) the values, starting from 1 (we will save 0 for padding, i.e. unseen values)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "obs_0_le = (np.array([0, 3])).astype('int64')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "now, let's see if the two implementations are equivalent" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "# we have 6 different values. Let's assume we are performing a regression, so pred_dim = 1\n", + "lin = nn.Linear(6, 1)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "emb = nn.Embedding(6, 1) \n", + "emb.weight = nn.Parameter(lin.weight.reshape_as(emb.weight))" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "tensor([-0.9452], grad_fn=)" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "lin(torch.tensor(obs_0_oh))" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "tensor([-0.9452], grad_fn=)" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "emb(torch.tensor(obs_0_le)).sum() + lin.bias" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And this is precisely how the linear component `Wide` is implemented" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], "source": [ "from pytorch_widedeep.models import Wide" ] }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 92, "metadata": {}, "outputs": [], "source": [ @@ -46,27 +236,34 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Wide(\n", - " (wide_linear): Linear(in_features=100, out_features=1, bias=True)\n", + " (wide_linear): Embedding(11, 1, padding_idx=0)\n", ")" ] }, - "execution_count": 2, + "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "wide = Wide(100, 1)\n", + "wide = Wide(wide_dim=10, pred_dim=1)\n", "wide" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that even though the input dim is 10, the Embedding layer has 11 weights. This is because we save 0 for padding, which is used for unseen values during the encoding process" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -78,12 +275,10 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 96, "metadata": {}, "outputs": [], "source": [ - "import torch\n", - "\n", "from pytorch_widedeep.models import DeepDense" ] }, diff --git a/examples/03_Binary_Classification_with_Defaults.ipynb b/examples/03_Binary_Classification_with_Defaults.ipynb index 1d97d8fa..c645333c 100644 --- a/examples/03_Binary_Classification_with_Defaults.ipynb +++ b/examples/03_Binary_Classification_with_Defaults.ipynb @@ -419,14 +419,14 @@ "name": "stdout", "output_type": "stream", "text": [ - "[[0. 1. 0. ... 0. 0. 0.]\n", - " [0. 0. 0. ... 0. 0. 0.]\n", - " [0. 0. 0. ... 0. 0. 0.]\n", + "[[ 1 17 23 ... 89 91 316]\n", + " [ 2 18 23 ... 89 92 317]\n", + " [ 3 18 24 ... 89 93 318]\n", " ...\n", - " [0. 0. 0. ... 0. 0. 0.]\n", - " [0. 0. 0. ... 0. 0. 0.]\n", - " [0. 0. 0. ... 0. 0. 0.]]\n", - "(48842, 796)\n" + " [ 2 20 23 ... 90 103 323]\n", + " [ 2 17 23 ... 89 103 323]\n", + " [ 2 21 29 ... 90 115 324]]\n", + "(48842, 8)\n" ] } ], @@ -479,7 +479,7 @@ "metadata": {}, "outputs": [], "source": [ - "wide = Wide(wide_dim=X_wide.shape[1], pred_dim=1)\n", + "wide = Wide(wide_dim=np.unique(X_wide).shape[0], pred_dim=1)\n", "deepdense = DeepDense(hidden_layers=[64,32], \n", " deep_column_idx=preprocess_deep.deep_column_idx,\n", " embed_input=preprocess_deep.embeddings_input,\n", @@ -497,7 +497,7 @@ "text/plain": [ "WideDeep(\n", " (wide): Wide(\n", - " (wide_linear): Linear(in_features=796, out_features=1, bias=True)\n", + " (wide_linear): Embedding(797, 1, padding_idx=0)\n", " )\n", " (deepdense): Sequential(\n", " (0): DeepDense(\n", @@ -577,7 +577,7 @@ "output_type": "stream", "text": [ "\r", - " 0%| | 0/153 [00:00 Tuple[int, int]: r""" diff --git a/pytorch_widedeep/models/_wd_dataset.py b/pytorch_widedeep/models/_wd_dataset.py index fafde273..aa5dc6e4 100644 --- a/pytorch_widedeep/models/_wd_dataset.py +++ b/pytorch_widedeep/models/_wd_dataset.py @@ -11,12 +11,8 @@ class WideDeepDataset(Dataset): Parameters ---------- - X_wide: np.ndarray, scipy csr sparse matrix. - wide input.Note that if a sparse matrix is passed to the - WideDeepDataset class, the loading process will be notably slow since - the transformation to a dense matrix is done on an index basis 'on the - fly'. At the moment this is the best option given the current support - offered for sparse tensors for pytorch. + X_wide: np.ndarray + wide input X_deep: np.ndarray deepdense input X_text: np.ndarray @@ -24,13 +20,14 @@ class WideDeepDataset(Dataset): X_img: np.ndarray deepimage input target: np.ndarray - transforms: MultipleTransforms() object (which is in itself a torchvision - Compose). See in models/_multiple_transforms.py + target array + transforms: :obj:`MultipleTransforms` + torchvision Compose object. See models/_multiple_transforms.py """ def __init__( self, - X_wide: Union[np.ndarray, sparse_matrix], + X_wide: np.ndarray, X_deep: np.ndarray, target: Optional[np.ndarray] = None, X_text: Optional[np.ndarray] = None, @@ -53,10 +50,7 @@ def __init__( def __getitem__(self, idx: int): # X_wide and X_deep are assumed to be *always* present - if isinstance(self.X_wide, sparse_matrix): - X = Bunch(wide=np.array(self.X_wide[idx].todense()).squeeze()) - else: - X = Bunch(wide=self.X_wide[idx]) + X = Bunch(wide=self.X_wide[idx]) X.deepdense = self.X_deep[idx] if self.X_text is not None: X.deeptext = self.X_text[idx] diff --git a/pytorch_widedeep/models/wide.py b/pytorch_widedeep/models/wide.py index 10cc7906..eaf4c0f3 100644 --- a/pytorch_widedeep/models/wide.py +++ b/pytorch_widedeep/models/wide.py @@ -1,16 +1,24 @@ +import math + +import torch from torch import nn from ..wdtypes import * class Wide(nn.Module): - r"""Simple linear layer that will receive the one-hot encoded `'wide'` - input and connect it to the output neuron(s). + r"""Wide component + + Linear model implemented via an Embedding layer connected to the output + neuron(s). Parameters ----------- wide_dim: int - size of the input tensor + size of the Embedding layer. `wide_dim` is the summation of all the + individual values for all the features that go through the wide + component. For example, if the wide component receives 2 features with + 5 individual values each, `wide_dim = 10` pred_dim: int size of the ouput tensor containing the predictions @@ -23,21 +31,34 @@ class Wide(nn.Module): -------- >>> import torch >>> from pytorch_widedeep.models import Wide - >>> X = torch.empty(4, 4).random_(2) - >>> wide = Wide(wide_dim=X.size(0), pred_dim=1) + >>> X = torch.empty(4, 4).random_(6) + >>> wide = Wide(wide_dim=X.unique().size(0), pred_dim=1) >>> wide(X) - tensor([[-0.8841], - [-0.8633], - [-1.2713], - [-0.4762]], grad_fn=) + tensor([[-0.1138], + [ 0.4603], + [ 1.0762], + [ 0.8160]], grad_fn=) """ def __init__(self, wide_dim: int, pred_dim: int = 1): super(Wide, self).__init__() - self.wide_linear = nn.Linear(wide_dim, pred_dim) + self.wide_linear = nn.Embedding(wide_dim + 1, pred_dim, padding_idx=0) + # (Sum(Embedding) + bias) is equivalent to (OneHotVector + Linear) + self.bias = nn.Parameter(torch.zeros(pred_dim)) + self._reset_parameters() + + def _reset_parameters(self) -> None: + r"""initialize Embedding and bias like nn.Linear. See `original + implementation + `_. + """ + nn.init.kaiming_uniform_(self.wide_linear.weight, a=math.sqrt(5)) + fan_in, _ = nn.init._calculate_fan_in_and_fan_out(self.wide_linear.weight) + bound = 1 / math.sqrt(fan_in) + nn.init.uniform_(self.bias, -bound, bound) def forward(self, X: Tensor) -> Tensor: # type: ignore - r"""Forward pass. Simply connecting the one-hot encoded input with the - ouput neuron(s) """ - out = self.wide_linear(X.float()) + r"""Forward pass. Simply connecting the Embedding layer with the ouput + neuron(s)""" + out = self.wide_linear(X.long()).sum(dim=1) + self.bias return out diff --git a/pytorch_widedeep/models/wide_deep.py b/pytorch_widedeep/models/wide_deep.py index 0430af08..06b4e396 100644 --- a/pytorch_widedeep/models/wide_deep.py +++ b/pytorch_widedeep/models/wide_deep.py @@ -232,7 +232,15 @@ def compile( Parameters ---------- method: str - One of `regression`, `binary` or `multiclass` + One of `regression`, `binary` or `multiclass`. The default when + performing a `regression`, a `binary` classification or a + `multiclass` classification is the `mean squared error + `_ + (MSE), `Binary Cross Entropy + `_ + (BCE) and `Cross Entropy + `_ + (CE) respectively. optimizers: Union[Optimizer, Dict[str, Optimizer]], Optional, Default=AdamW - An instance of ``pytorch``'s ``Optimizer`` object (e.g. :obj:`torch.optim.Adam()`) or - a dictionary where there keys are the model components (i.e. @@ -594,7 +602,7 @@ def fit( loss=train_loss, ) else: - t.set_postfix(loss=np.sqrt(train_loss)) + t.set_postfix(loss=train_loss) if self.lr_scheduler: self._lr_scheduler_step(step_location="on_batch_end") self.callback_container.on_batch_end(batch=batch_idx) @@ -626,7 +634,7 @@ def fit( loss=val_loss, ) else: - v.set_postfix(loss=np.sqrt(val_loss)) + v.set_postfix(loss=val_loss) epoch_logs["val_loss"] = val_loss if score is not None: for k, v in score.items(): diff --git a/pytorch_widedeep/preprocessing/_preprocessors.py b/pytorch_widedeep/preprocessing/_preprocessors.py index 0c04f79a..564c0d67 100644 --- a/pytorch_widedeep/preprocessing/_preprocessors.py +++ b/pytorch_widedeep/preprocessing/_preprocessors.py @@ -42,24 +42,28 @@ def fit_transform(self, df: pd.DataFrame): class WidePreprocessor(BasePreprocessor): r"""Preprocessor to prepare the wide input dataset + This Preprocessor prepares the data for the wide, linear component. This + linear model is implemented via an Embedding layer that is connected to + the output neuron. ``WidePreprocessor`` simply numerically encodes all the + unique values of all categorical columns ``wide_cols + crossed_cols``. See + the Example below. + Parameters ---------- wide_cols: List[str] - List with the name of the columns that will be one-hot encoded and - passed through the Wide model + List with the name of the columns that will label encoded and passed + through the Wide model crossed_cols: List[Tuple[str, str]] List of Tuples with the name of the columns that will be `'crossed'` - and then one-hot encoded. e.g. [('education', 'occupation'), ...] - already_dummies: List[str] - List of columns that are already dummies/one-hot encoded, and - therefore do not need to be processed + and then label encoded. e.g. [('education', 'occupation'), ...] Attributes ---------- - one_hot_enc: :obj:`OneHotEncoder` - an instance of :class:`sklearn.preprocessing.OneHotEncoder` wide_crossed_cols: :obj:`List` - List with the names of all columns that will be one-hot encoded + List with the names of all columns that will be label encoded + feature_dict: :obj:`Dict` + Dictionary where the keys are the result of pasting `colname + '_' + + column value` and the values are the corresponding mapped integer. Examples -------- @@ -69,67 +73,93 @@ class WidePreprocessor(BasePreprocessor): >>> wide_cols = ['color'] >>> crossed_cols = [('color', 'size')] >>> wide_preprocessor = WidePreprocessor(wide_cols=wide_cols, crossed_cols=crossed_cols) - >>> wide_preprocessor.fit_transform(df) - array([[0., 0., 1., 0., 0., 1.], - [1., 0., 0., 1., 0., 0.], - [0., 1., 0., 0., 1., 0.]]) + >>> X_wide = wide_preprocessor.fit_transform(df) + >>> X_wide + array([[1, 4], + [2, 5], + [3, 6]]) + >>> wide_preprocessor.feature_dict + {'color_r': 1, + 'color_b': 2, + 'color_g': 3, + 'color_size_r-s': 4, + 'color_size_b-n': 5, + 'color_size_g-l': 6} + >>> wide_preprocessor.inverse_transform(X_wide) + color color_size + 0 r r-s + 1 b b-n + 2 g g-l """ def __init__( - self, - wide_cols: List[str], - crossed_cols=None, - already_dummies: Optional[List[str]] = None, - sparse=False, - handle_unknown="ignore", + self, wide_cols: List[str], crossed_cols=None, ): super(WidePreprocessor, self).__init__() self.wide_cols = wide_cols self.crossed_cols = crossed_cols - self.already_dummies = already_dummies - self.one_hot_enc = OneHotEncoder(sparse=sparse, handle_unknown=handle_unknown) def fit(self, df: pd.DataFrame) -> BasePreprocessor: """Fits the Preprocessor and creates required attributes """ df_wide = self._prepare_wide(df) self.wide_crossed_cols = df_wide.columns.tolist() - if self.already_dummies: - dummy_cols = [ - c for c in self.wide_crossed_cols if c not in self.already_dummies - ] - self.one_hot_enc.fit(df_wide[dummy_cols]) - else: - self.one_hot_enc.fit(df_wide[self.wide_crossed_cols]) + vocab = self._make_global_feature_list(df_wide[self.wide_crossed_cols]) + # leave 0 as padding index + self.feature_dict = {v: i + 1 for i, v in enumerate(vocab)} return self - def transform(self, df: pd.DataFrame) -> Union[sparse_matrix, np.ndarray]: - """Returns the processed dataframe as a one hot encoded dense or - sparse matrix + def transform(self, df: pd.DataFrame) -> np.array: + r"""Returns the processed dataframe """ try: - self.one_hot_enc.categories_ + self.feature_dict except: raise NotFittedError( "This WidePreprocessor instance is not fitted yet. " "Call 'fit' with appropriate arguments before using this estimator." ) df_wide = self._prepare_wide(df) - if self.already_dummies: - X_oh_1 = df_wide[self.already_dummies].values - dummy_cols = [ - c for c in self.wide_crossed_cols if c not in self.already_dummies - ] - X_oh_2 = self.one_hot_enc.transform(df_wide[dummy_cols]) - return np.hstack((X_oh_1, X_oh_2)) - else: - return self.one_hot_enc.transform(df_wide[self.wide_crossed_cols]) + encoded = np.zeros([len(df_wide), len(self.wide_crossed_cols)], dtype=np.long) + for col_i, col in enumerate(self.wide_crossed_cols): + encoded[:, col_i] = df_wide[col].apply( + lambda x: self.feature_dict[col + "_" + str(x)] + if col + "_" + str(x) in self.feature_dict + else 0 + ) + return encoded.astype("int64") + + def inverse_transform(self, encoded: np.ndarray) -> pd.DataFrame: + r"""Takes as input the output from the ``transform`` method and it will + return the original values. - def fit_transform(self, df: pd.DataFrame) -> Union[sparse_matrix, np.ndarray]: + Parameters + ---------- + encoded: np.ndarray + array with the output of the ``transform`` method + """ + decoded = pd.DataFrame(encoded, columns=self.wide_crossed_cols) + inverse_dict = {k: v for v, k in self.feature_dict.items()} + decoded = decoded.applymap(lambda x: inverse_dict[x]) + for col in decoded.columns: + rm_str = "".join([col, "_"]) + decoded[col] = decoded[col].apply(lambda x: x.replace(rm_str, "")) + return decoded + + def fit_transform(self, df: pd.DataFrame) -> np.ndarray: """Combines ``fit`` and ``transform`` """ return self.fit(df).transform(df) + def _make_global_feature_list(self, df: pd.DataFrame) -> List: + vocab = [] + for column in df.columns: + vocab += self._make_column_feature_list(df[column]) + return vocab + + def _make_column_feature_list(self, s: pd.Series) -> List: + return [s.name + "_" + str(x) for x in s.unique()] + def _cross_cols(self, df: pd.DataFrame): df_cc = df.copy() crossed_colnames = [] diff --git a/pytorch_widedeep/version.py b/pytorch_widedeep/version.py index df124332..98a433b3 100644 --- a/pytorch_widedeep/version.py +++ b/pytorch_widedeep/version.py @@ -1 +1 @@ -__version__ = "0.4.2" +__version__ = "0.4.5" diff --git a/pytorch_widedeep/wdtypes.py b/pytorch_widedeep/wdtypes.py index 232e7d83..ed46ddc3 100644 --- a/pytorch_widedeep/wdtypes.py +++ b/pytorch_widedeep/wdtypes.py @@ -18,7 +18,6 @@ from torch import Tensor from torch.nn import Module -from scipy.sparse.csr import csr_matrix as sparse_matrix from torch.optim.optimizer import Optimizer from torchvision.transforms import ( Pad, diff --git a/tests/test_data_utils/test_du_wide.py b/tests/test_data_utils/test_du_wide.py index ec250eb3..ed02d8ae 100644 --- a/tests/test_data_utils/test_du_wide.py +++ b/tests/test_data_utils/test_du_wide.py @@ -39,7 +39,7 @@ def create_test_dataset(input_type, with_crossed=True): ) def test_preprocessor1(input_df, expected_shape): wide_mtx = preprocessor1.fit_transform(input_df) - assert wide_mtx.shape[1] == expected_shape + assert np.unique(wide_mtx).shape[0] == expected_shape ############################################################################### @@ -63,4 +63,4 @@ def test_preprocessor1(input_df, expected_shape): ) def test_prepare_wide_wo_crossed(input_df, expected_shape): wide_mtx = preprocessor2.fit_transform(input_df) - assert wide_mtx.shape[1] == expected_shape + assert np.unique(wide_mtx).shape[0] == expected_shape diff --git a/tests/test_model_functioning/test_callbacks.py b/tests/test_model_functioning/test_callbacks.py index 9c866815..6dcb6ff7 100644 --- a/tests/test_model_functioning/test_callbacks.py +++ b/tests/test_model_functioning/test_callbacks.py @@ -16,7 +16,7 @@ ) # Wide array -X_wide = np.random.choice(2, (100, 100), p=[0.8, 0.2]) +X_wide = np.random.choice(50, (100, 10)) # Deep Array colnames = list(string.ascii_lowercase)[:10] @@ -38,7 +38,7 @@ ############################################################################### # Test that history saves the information adequately ############################################################################### -wide = Wide(100, 1) +wide = Wide(np.unique(X_wide).shape[0], 1) deepdense = DeepDense( hidden_layers=[32, 16], dropout=[0.5, 0.5], @@ -92,7 +92,7 @@ def test_history_callback(optimizers, schedulers, len_loss_output, len_lr_output # Test that EarlyStopping stops as expected ############################################################################### def test_early_stop(): - wide = Wide(100, 1) + wide = Wide(np.unique(X_wide).shape[0], 1) deepdense = DeepDense( hidden_layers=[32, 16], dropout=[0.5, 0.5], @@ -105,7 +105,7 @@ def test_early_stop(): method="binary", callbacks=[ EarlyStopping( - min_delta=0.1, patience=3, restore_best_weights=True, verbose=1 + min_delta=5.0, patience=3, restore_best_weights=True, verbose=1 ) ], verbose=1, @@ -122,7 +122,7 @@ def test_early_stop(): "save_best_only, max_save, n_files", [(True, 2, 2), (False, 2, 2), (False, 0, 5)] ) def test_model_checkpoint(save_best_only, max_save, n_files): - wide = Wide(100, 1) + wide = Wide(np.unique(X_wide).shape[0], 1) deepdense = DeepDense( hidden_layers=[32, 16], dropout=[0.5, 0.5], diff --git a/tests/test_model_functioning/test_data_inputs.py b/tests/test_model_functioning/test_data_inputs.py index 7189c2a0..1819fb8b 100644 --- a/tests/test_model_functioning/test_data_inputs.py +++ b/tests/test_model_functioning/test_data_inputs.py @@ -14,7 +14,7 @@ ) # Wide array -X_wide = np.random.choice(2, (100, 100), p=[0.8, 0.2]) +X_wide = np.random.choice(50, (100, 100)) # Deep Array colnames = list(string.ascii_lowercase)[:10] @@ -50,7 +50,7 @@ ) = train_test_split(X_wide, X_deep, X_text, X_img, target) # build model components -wide = Wide(100, 1) +wide = Wide(np.unique(X_wide).shape[0], 1) deepdense = DeepDense( hidden_layers=[32, 16], dropout=[0.5, 0.5], diff --git a/tests/test_model_functioning/test_fit_methods.py b/tests/test_model_functioning/test_fit_methods.py index d41042c1..e3ec4134 100644 --- a/tests/test_model_functioning/test_fit_methods.py +++ b/tests/test_model_functioning/test_fit_methods.py @@ -6,7 +6,7 @@ from pytorch_widedeep.models import Wide, WideDeep, DeepDense # Wide array -X_wide = np.random.choice(2, (100, 100), p=[0.8, 0.2]) +X_wide = np.random.choice(50, (100, 100)) # Deep Array colnames = list(string.ascii_lowercase)[:10] @@ -51,7 +51,7 @@ def test_fit_methods( pred_dim, probs_dim, ): - wide = Wide(100, pred_dim) + wide = Wide(np.unique(X_wide).shape[0], pred_dim) deepdense = DeepDense( hidden_layers=[32, 16], dropout=[0.5, 0.5], diff --git a/tests/test_model_functioning/test_focal_loss.py b/tests/test_model_functioning/test_focal_loss.py index 2009e67a..82bb33d5 100644 --- a/tests/test_model_functioning/test_focal_loss.py +++ b/tests/test_model_functioning/test_focal_loss.py @@ -6,7 +6,7 @@ from pytorch_widedeep.models import Wide, WideDeep, DeepDense # Wide array -X_wide = np.random.choice(2, (100, 100), p=[0.8, 0.2]) +X_wide = np.random.choice(50, (100, 10)) # Deep Array colnames = list(string.ascii_lowercase)[:10] @@ -32,7 +32,7 @@ ], ) def test_focal_loss(X_wide, X_deep, target, method, pred_dim, probs_dim): - wide = Wide(100, pred_dim) + wide = Wide(np.unique(X_wide).shape[0], pred_dim) deepdense = DeepDense( hidden_layers=[32, 16], dropout=[0.5, 0.5], diff --git a/tests/test_model_functioning/test_initializers.py b/tests/test_model_functioning/test_initializers.py index e6129544..d97a6d79 100644 --- a/tests/test_model_functioning/test_initializers.py +++ b/tests/test_model_functioning/test_initializers.py @@ -19,7 +19,7 @@ ) # Wide array -X_wide = np.random.choice(2, (100, 100), p=[0.8, 0.2]) +X_wide = np.random.choice(50, (100, 100)) # Deep Array colnames = list(string.ascii_lowercase)[:10] @@ -58,7 +58,7 @@ def test_initializers_1(): - wide = Wide(100, 1) + wide = Wide(np.unique(X_wide).shape[0], 1) deepdense = DeepDense( hidden_layers=[32, 16], dropout=[0.5, 0.5], diff --git a/tests/test_warm_up/test_warm_up_routines.py b/tests/test_warm_up/test_warm_up_routines.py index 8fc2164e..4cd2dbdf 100644 --- a/tests/test_warm_up/test_warm_up_routines.py +++ b/tests/test_warm_up/test_warm_up_routines.py @@ -87,7 +87,7 @@ def loss_fn(y_pred, y_true): target = torch.empty(100, 1).random_(0, 2) # wide -X_wide = torch.empty(100, 10).random_(0, 2) +X_wide = torch.empty(100, 4).random_(1, 20) # deep colnames = list(string.ascii_lowercase)[:10] @@ -107,7 +107,7 @@ def loss_fn(y_pred, y_true): # Define the model components # wide -wide = Wide(10, 1) +wide = Wide(X_wide.unique().size(0), 1) if use_cuda: wide.cuda()