diff --git a/README.md b/README.md index dad4f128..a200e1ca 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@

- +

[![Build Status](https://travis-ci.org/jrzaurin/pytorch-widedeep.svg?branch=master)](https://travis-ci.org/jrzaurin/pytorch-widedeep) @@ -9,11 +9,7 @@ [![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://github.com/jrzaurin/pytorch-widedeep/graphs/commit-activity) [![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/jrzaurin/pytorch-widedeep/issues) [![codecov](https://codecov.io/gh/jrzaurin/pytorch-widedeep/branch/master/graph/badge.svg)](https://codecov.io/gh/jrzaurin/pytorch-widedeep) - -Platform | Version Support ----------|:--------------- -OSX | [![Python 3.6 3.7](https://img.shields.io/badge/python-3.6%20%7C%203.7-blue.svg)](https://www.python.org/) -Linux | [![Python 3.6 3.7 3.8](https://img.shields.io/badge/python-3.6%20%7C%203.7%20%7C%203.8-blue.svg)](https://www.python.org/) +[![Python 3.6 3.7 3.8](https://img.shields.io/badge/python-3.6%20%7C%203.7%20%7C%203.8-blue.svg)](https://www.python.org/) # pytorch-widedeep @@ -88,15 +84,23 @@ as:

-When using `pytorch-widedeep`, the assumption is that the so called `Wide` and -`deep dense` (this can be either `DeepDense` or `DeepDenseResnet`. See the -documentation and examples folder for more details) components in the figures -are **always** present, while `DeepText text` and `DeepImage` are optional. +Note that each individual component, `wide`, `deepdense` (either `DeepDense` +or `DeepDenseResnet`), `deeptext` and `deepimage`, can be used independently +and in isolation. For example, one could use only `wide`, which is in simply a +linear model. + +On the other hand, while I recommend using the `Wide` and `DeepDense` (or +`DeepDenseResnet`) classes in `pytorch-widedeep` to build the `wide` and +`deepdense` component, it is very likely that users will want to use their own +models in the case of the `deeptext` and `deepimage` components. That is +perfectly possible as long as the the custom models have an attribute called +`output_dim` with the size of the last layer of activations, so that +`WideDeep` can be constructed + `pytorch-widedeep` includes standard text (stack of LSTMs) and image -(pre-trained ResNets or stack of CNNs) models. However, the user can use any -custom model as long as it has an attribute called `output_dim` with the size -of the last layer of activations, so that `WideDeep` can be constructed. See -the examples folder or the docs for more information. +(pre-trained ResNets or stack of CNNs) models. + +See the examples folder or the docs for more information. ### Installation @@ -124,6 +128,28 @@ cd pytorch-widedeep pip install -e . ``` +**Important note for Mac users**: at the time of writing (Dec-2020) the latest +`torch` release is `1.7`. This release has some +[issues](https://stackoverflow.com/questions/64772335/pytorch-w-parallelnative-cpp206) +when running on Mac and the data-loaders will not run in parallel. In +addition, since `python 3.8`, [the `multiprocessing` library start method +changed from `'fork'` to +`'spawn'`](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods). +This also affects the data-loaders (for any `torch` version) and they will not +run in parallel. Therefore, for Mac users I recommend using `python 3.6` or +`3.7` and `torch <= 1.6` (with the corresponding, consistent version of +`torchvision`, e.g. `0.7.0` for `torch 1.6`). I do not want to force this +versioning in the `setup.py` file since I expect that all these issues are +fixed in the future. Therefore, after installing `pytorch-widedeep` via pip or +directly from github, downgrade `torch` and `torchvision` manually: + +```bash +pip install pytorch-widedeep +pip install torch==1.6.0 torchvision==0.7.0 +``` + +None of these issues affect Linux users. + ### Quick start Binary classification with the [adult diff --git a/VERSION b/VERSION index c0a1ac19..5546bd2c 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.4.6 \ No newline at end of file +0.4.7 \ No newline at end of file diff --git a/docs/figures/widedeep_logo.png b/docs/figures/widedeep_logo.png index a444feff..2c703fc6 100644 Binary files a/docs/figures/widedeep_logo.png and b/docs/figures/widedeep_logo.png differ diff --git a/docs/figures/widedeep_logo_old.png b/docs/figures/widedeep_logo_old.png new file mode 100644 index 00000000..a444feff Binary files /dev/null and b/docs/figures/widedeep_logo_old.png differ diff --git a/examples/02_Model_Components.ipynb b/examples/02_Model_Components.ipynb index 5374900c..81caeca0 100644 --- a/examples/02_Model_Components.ipynb +++ b/examples/02_Model_Components.ipynb @@ -130,7 +130,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "if we simply numerically encode (label encode or `le`) the values, starting from 1 (we will save 0 for padding, i.e. unseen values)" + "if we simply numerically encode (label encode or `le`) the values:" ] }, { @@ -146,7 +146,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "now, let's see if the two implementations are equivalent" + "Note that in the functioning implementation of the package we start from 1, saving 0 for padding, i.e. unseen values. \n", + "\n", + "Now, let's see if the two implementations are equivalent" ] }, { @@ -261,7 +263,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Note that even though the input dim is 10, the Embedding layer has 11 weights. This is because we save 0 for padding, which is used for unseen values during the encoding process" + "Note that even though the input dim is 10, the Embedding layer has 11 weights. Again, this is because we save 0 for padding, which is used for unseen values during the encoding process" ] }, { diff --git a/examples/03_Binary_Classification_with_Defaults.ipynb b/examples/03_Binary_Classification_with_Defaults.ipynb index 5f7bf029..ced88c6b 100644 --- a/examples/03_Binary_Classification_with_Defaults.ipynb +++ b/examples/03_Binary_Classification_with_Defaults.ipynb @@ -591,16 +591,16 @@ "name": "stderr", "output_type": "stream", "text": [ - "epoch 1: 100%|██████████| 611/611 [00:05<00:00, 115.33it/s, loss=0.743, metrics={'acc': 0.6205, 'prec': 0.2817}]\n", - "valid: 100%|██████████| 153/153 [00:00<00:00, 168.06it/s, loss=0.545, metrics={'acc': 0.6452, 'prec': 0.3014}]\n", - "epoch 2: 100%|██████████| 611/611 [00:04<00:00, 122.57it/s, loss=0.486, metrics={'acc': 0.7765, 'prec': 0.5517}]\n", - "valid: 100%|██████████| 153/153 [00:00<00:00, 158.84it/s, loss=0.44, metrics={'acc': 0.783, 'prec': 0.573}] \n", - "epoch 3: 100%|██████████| 611/611 [00:04<00:00, 124.89it/s, loss=0.419, metrics={'acc': 0.8129, 'prec': 0.6753}]\n", - "valid: 100%|██████████| 153/153 [00:00<00:00, 158.10it/s, loss=0.402, metrics={'acc': 0.815, 'prec': 0.6816}] \n", - "epoch 4: 100%|██████████| 611/611 [00:04<00:00, 126.35it/s, loss=0.393, metrics={'acc': 0.8228, 'prec': 0.7047}]\n", - "valid: 100%|██████████| 153/153 [00:00<00:00, 160.72it/s, loss=0.385, metrics={'acc': 0.8233, 'prec': 0.7024}]\n", - "epoch 5: 100%|██████████| 611/611 [00:04<00:00, 124.33it/s, loss=0.38, metrics={'acc': 0.826, 'prec': 0.702}] \n", - "valid: 100%|██████████| 153/153 [00:00<00:00, 163.43it/s, loss=0.376, metrics={'acc': 0.8264, 'prec': 0.7}] \n" + "epoch 1: 100%|██████████| 611/611 [00:06<00:00, 101.71it/s, loss=0.448, metrics={'acc': 0.792, 'prec': 0.5728}] \n", + "valid: 100%|██████████| 153/153 [00:00<00:00, 171.00it/s, loss=0.366, metrics={'acc': 0.7991, 'prec': 0.5907}]\n", + "epoch 2: 100%|██████████| 611/611 [00:06<00:00, 101.69it/s, loss=0.361, metrics={'acc': 0.8324, 'prec': 0.6817}]\n", + "valid: 100%|██████████| 153/153 [00:00<00:00, 169.36it/s, loss=0.357, metrics={'acc': 0.8328, 'prec': 0.6807}]\n", + "epoch 3: 100%|██████████| 611/611 [00:05<00:00, 102.65it/s, loss=0.352, metrics={'acc': 0.8366, 'prec': 0.691}] \n", + "valid: 100%|██████████| 153/153 [00:00<00:00, 171.49it/s, loss=0.352, metrics={'acc': 0.8361, 'prec': 0.6867}]\n", + "epoch 4: 100%|██████████| 611/611 [00:06<00:00, 101.52it/s, loss=0.347, metrics={'acc': 0.8389, 'prec': 0.6956}]\n", + "valid: 100%|██████████| 153/153 [00:00<00:00, 163.49it/s, loss=0.349, metrics={'acc': 0.8383, 'prec': 0.6906}]\n", + "epoch 5: 100%|██████████| 611/611 [00:07<00:00, 84.91it/s, loss=0.343, metrics={'acc': 0.8405, 'prec': 0.6987}] \n", + "valid: 100%|██████████| 153/153 [00:01<00:00, 142.83it/s, loss=0.347, metrics={'acc': 0.8399, 'prec': 0.6946}]\n" ] } ], @@ -664,22 +664,88 @@ "name": "stderr", "output_type": "stream", "text": [ - "epoch 1: 100%|██████████| 611/611 [00:05<00:00, 108.62it/s, loss=0.894, metrics={'acc': 0.5182, 'prec': 0.2037}]\n", - "valid: 100%|██████████| 153/153 [00:00<00:00, 154.44it/s, loss=0.604, metrics={'acc': 0.5542, 'prec': 0.2135}]\n", - "epoch 2: 100%|██████████| 611/611 [00:05<00:00, 106.49it/s, loss=0.51, metrics={'acc': 0.751, 'prec': 0.4614}] \n", - "valid: 100%|██████████| 153/153 [00:00<00:00, 157.79it/s, loss=0.452, metrics={'acc': 0.7581, 'prec': 0.4898}]\n", - "epoch 3: 100%|██████████| 611/611 [00:05<00:00, 106.66it/s, loss=0.425, metrics={'acc': 0.8031, 'prec': 0.6618}]\n", - "valid: 100%|██████████| 153/153 [00:00<00:00, 160.73it/s, loss=0.405, metrics={'acc': 0.806, 'prec': 0.6686}] \n", - "epoch 4: 100%|██████████| 611/611 [00:05<00:00, 106.58it/s, loss=0.394, metrics={'acc': 0.8185, 'prec': 0.6966}]\n", - "valid: 100%|██████████| 153/153 [00:00<00:00, 155.55it/s, loss=0.385, metrics={'acc': 0.8196, 'prec': 0.6994}]\n", - "epoch 5: 100%|██████████| 611/611 [00:05<00:00, 107.28it/s, loss=0.38, metrics={'acc': 0.8236, 'prec': 0.7004}] \n", - "valid: 100%|██████████| 153/153 [00:00<00:00, 155.37it/s, loss=0.375, metrics={'acc': 0.8244, 'prec': 0.7017}]\n" + "epoch 1: 100%|██████████| 611/611 [00:07<00:00, 77.46it/s, loss=0.387, metrics={'acc': 0.8192, 'prec': 0.6576}]\n", + "valid: 100%|██████████| 153/153 [00:01<00:00, 147.78it/s, loss=0.36, metrics={'acc': 0.8216, 'prec': 0.6617}] \n", + "epoch 2: 100%|██████████| 611/611 [00:08<00:00, 74.99it/s, loss=0.358, metrics={'acc': 0.8313, 'prec': 0.6836}]\n", + "valid: 100%|██████████| 153/153 [00:00<00:00, 158.26it/s, loss=0.355, metrics={'acc': 0.8321, 'prec': 0.6848}]\n", + "epoch 3: 100%|██████████| 611/611 [00:08<00:00, 76.28it/s, loss=0.351, metrics={'acc': 0.8345, 'prec': 0.6889}]\n", + "valid: 100%|██████████| 153/153 [00:00<00:00, 154.84it/s, loss=0.354, metrics={'acc': 0.8347, 'prec': 0.6887}]\n", + "epoch 4: 100%|██████████| 611/611 [00:07<00:00, 76.71it/s, loss=0.346, metrics={'acc': 0.8374, 'prec': 0.6946}]\n", + "valid: 100%|██████████| 153/153 [00:00<00:00, 157.80it/s, loss=0.353, metrics={'acc': 0.8369, 'prec': 0.6935}]\n", + "epoch 5: 100%|██████████| 611/611 [00:08<00:00, 73.25it/s, loss=0.343, metrics={'acc': 0.8386, 'prec': 0.6966}]\n", + "valid: 100%|██████████| 153/153 [00:00<00:00, 157.05it/s, loss=0.352, metrics={'acc': 0.8382, 'prec': 0.6961}]\n" ] } ], "source": [ "model.fit(X_wide=X_wide, X_deep=X_deep, target=target, n_epochs=5, batch_size=64, val_split=0.2)" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Also mentioning that one could build a model with the individual components independently. For example, a model comprised only by the `wide` component would be simply a linear model. This could be attained by just:" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [], + "source": [ + "model = WideDeep(wide=wide)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "model.compile(method='binary', metrics=[Accuracy, Precision])" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\r", + " 0%| | 0/611 [00:00 transpose axis # and normalize if necessary if not self.transforms or "ToTensor" not in self.transforms_names: + if xdi.ndim == 2: + xdi = xdi[:, :, None] xdi = xdi.transpose(2, 0, 1) if "int" in str(xdi.dtype): xdi = (xdi / xdi.max()).astype("float32") @@ -87,4 +91,11 @@ def __getitem__(self, idx: int): return X def __len__(self): - return len(self.X_deep) + if self.X_wide is not None: + return len(self.X_wide) + if self.X_deep is not None: + return len(self.X_deep) + if self.X_text is not None: + return len(self.X_text) + if self.X_img is not None: + return len(self.X_img) diff --git a/pytorch_widedeep/models/wide_deep.py b/pytorch_widedeep/models/wide_deep.py index 70ce529b..730bb365 100644 --- a/pytorch_widedeep/models/wide_deep.py +++ b/pytorch_widedeep/models/wide_deep.py @@ -1,5 +1,4 @@ import os -import warnings import numpy as np import torch @@ -22,7 +21,9 @@ from ._multiple_lr_scheduler import MultipleLRScheduler n_cpus = os.cpu_count() + use_cuda = torch.cuda.is_available() +device = torch.device("cuda" if use_cuda else "cpu") class WideDeep(nn.Module): @@ -104,37 +105,31 @@ class WideDeep(nn.Module): """ - def __init__( + def __init__( # noqa: C901 self, - wide: nn.Module, - deepdense: nn.Module, - pred_dim: int = 1, + wide: Optional[nn.Module] = None, + deepdense: Optional[nn.Module] = None, deeptext: Optional[nn.Module] = None, deepimage: Optional[nn.Module] = None, deephead: Optional[nn.Module] = None, head_layers: Optional[List[int]] = None, head_dropout: Optional[List] = None, head_batchnorm: Optional[bool] = None, + pred_dim: int = 1, ): super(WideDeep, self).__init__() - # check that model components have the required output_dim attribute - if not hasattr(deepdense, "output_dim"): - raise AttributeError( - "deepdense model must have an 'output_dim' attribute. " - "See pytorch-widedeep.models.deep_dense.DeepDense" - ) - if deeptext is not None and not hasattr(deeptext, "output_dim"): - raise AttributeError( - "deeptext model must have an 'output_dim' attribute. " - "See pytorch-widedeep.models.deep_dense.DeepText" - ) - if deepimage is not None and not hasattr(deepimage, "output_dim"): - raise AttributeError( - "deepimage model must have an 'output_dim' attribute. " - "See pytorch-widedeep.models.deep_dense.DeepText" - ) + self._check_model_components( + wide, + deepdense, + deeptext, + deepimage, + deephead, + head_layers, + head_dropout, + pred_dim, + ) # required as attribute just in case we pass a deephead self.pred_dim = pred_dim @@ -146,17 +141,11 @@ def __init__( self.deepimage = deepimage self.deephead = deephead - if deephead is not None and head_layers is not None: - warnings.simplefilter("module") - warnings.warn( - "both 'deephead' and 'head_layers' are not None." - "'deephead' takes priority and will be used", - UserWarning, - ) - if self.deephead is None: if head_layers is not None: - input_dim: int = self.deepdense.output_dim # type:ignore + input_dim = 0 + if self.deepdense is not None: + input_dim += self.deepdense.output_dim # type:ignore if self.deeptext is not None: input_dim += self.deeptext.output_dim # type:ignore if self.deepimage is not None: @@ -179,9 +168,10 @@ def __init__( "head_out", nn.Linear(head_layers[-1], pred_dim) ) else: - self.deepdense = nn.Sequential( - self.deepdense, nn.Linear(self.deepdense.output_dim, pred_dim) # type: ignore - ) + if self.deepdense is not None: + self.deepdense = nn.Sequential( + self.deepdense, nn.Linear(self.deepdense.output_dim, pred_dim) # type: ignore + ) if self.deeptext is not None: self.deeptext = nn.Sequential( self.deeptext, nn.Linear(self.deeptext.output_dim, pred_dim) # type: ignore @@ -190,34 +180,42 @@ def __init__( self.deepimage = nn.Sequential( self.deepimage, nn.Linear(self.deepimage.output_dim, pred_dim) # type: ignore ) - else: - self.deephead + # else: + # self.deephead - def forward(self, X: Dict[str, Tensor]) -> Tensor: # type: ignore + def forward(self, X: Dict[str, Tensor]) -> Tensor: # type: ignore # noqa: C901 # Wide output: direct connection to the output neuron(s) - out = self.wide(X["wide"]) + if self.wide is not None: + out = self.wide(X["wide"]) + else: + batch_size = X[list(X.keys())[0]].size(0) + out = torch.zeros(batch_size, self.pred_dim).to(device) # Deep output: either connected directly to the output neuron(s) or # passed through a head first if self.deephead: - deepside = self.deepdense(X["deepdense"]) + if self.deepdense is not None: + deepside = self.deepdense(X["deepdense"]) + else: + deepside = torch.FloatTensor().to(device) if self.deeptext is not None: deepside = torch.cat([deepside, self.deeptext(X["deeptext"])], axis=1) # type: ignore if self.deepimage is not None: deepside = torch.cat([deepside, self.deepimage(X["deepimage"])], axis=1) # type: ignore deephead_out = self.deephead(deepside) - deepside_out = nn.Linear(deephead_out.size(1), self.pred_dim)(deephead_out) - return out.add(deepside_out) + deepside_linear = nn.Linear(deephead_out.size(1), self.pred_dim).to(device) + return out.add_(deepside_linear(deephead_out)) else: - out.add(self.deepdense(X["deepdense"])) + if self.deepdense is not None: + out.add_(self.deepdense(X["deepdense"])) if self.deeptext is not None: - out.add(self.deeptext(X["deeptext"])) + out.add_(self.deeptext(X["deeptext"])) if self.deepimage is not None: - out.add(self.deepimage(X["deepimage"])) + out.add_(self.deepimage(X["deepimage"])) return out - def compile( + def compile( # noqa: C901 self, method: str, optimizers: Optional[Union[Optimizer, Dict[str, Optimizer]]] = None, @@ -345,9 +343,9 @@ def compile( if isinstance(optimizers, Dict) and not isinstance(lr_schedulers, Dict): raise ValueError( - "'parameters 'optimizers' and 'lr_schedulers' must have consistent type. " - "(Optimizer, LRScheduler) or (Dict[str, Optimizer], Dict[str, LRScheduler]) " - "Please, read the Documentation for more details" + "''optimizers' and 'lr_schedulers' must have consistent type: " + "(Optimizer and LRScheduler) or (Dict[str, Optimizer] and Dict[str, LRScheduler]) " + "Please, read the documentation or see the examples for more details" ) self.verbose = verbose @@ -372,14 +370,7 @@ def compile( if optimizers is not None: if isinstance(optimizers, Optimizer): self.optimizer: Union[Optimizer, MultipleOptimizer] = optimizers - elif isinstance(optimizers, Dict) and len(optimizers) == 1: - raise ValueError( - "The dictionary of optimizers must contain one item per model component, " - "i.e. at least two for the 'wide' and 'deepdense' components. Otherwise " - "pass one Optimizer object that will be used for all components" - "i.e. optimizers = torch.optim.Adam(model.parameters())" - ) - elif len(optimizers) > 1: + elif isinstance(optimizers, Dict): opt_names = list(optimizers.keys()) mod_names = [n for n, c in self.named_children()] for mn in mod_names: @@ -427,10 +418,9 @@ def compile( self.callback_container = CallbackContainer(self.callbacks) self.callback_container.set_model(self) - if use_cuda: - self.cuda() + self.to(device) - def fit( + def fit( # noqa: C901 self, X_wide: Optional[np.ndarray] = None, X_deep: Optional[np.ndarray] = None, @@ -582,21 +572,8 @@ def fit( >>> # X_val = {'X_wide': X_wide_val, 'X_deep': X_deep_val, 'target': y_val} >>> # model.fit(X_train=X_train, X_val=X_val n_epochs=10, batch_size=256) - .. note:: :obj:`WideDeep` assumes that `X_wide`, `X_deep` and `target` ALWAYS exist, while - `X_text` and `X_img` are optional - - .. note:: Either `X_train` or the three `X_wide`, `X_deep` and `target` must be passed to the - fit method - """ - if X_train is None and (X_wide is None or X_deep is None or target is None): - raise ValueError( - "Training data is missing. Either a dictionary (X_train) with " - "the training dataset or at least 3 arrays (X_wide, X_deep, " - "target) must be passed to the fit method" - ) - self.batch_size = batch_size train_set, eval_set = self._train_val_split( X_wide, X_deep, X_text, X_img, X_train, X_val, val_split, target @@ -689,8 +666,8 @@ def fit( def predict( self, - X_wide: np.ndarray, - X_deep: np.ndarray, + X_wide: Optional[np.ndarray] = None, + X_deep: Optional[np.ndarray] = None, X_text: Optional[np.ndarray] = None, X_img: Optional[np.ndarray] = None, X_test: Optional[Dict[str, np.ndarray]] = None, @@ -716,10 +693,6 @@ def predict( `'X_wide'`, `'X_deep'`, `'X_text'`, `'X_img'` and `'target'` the values are the corresponding matrices. - - .. note:: WideDeep assumes that `X_wide`, `X_deep` and `target` ALWAYS exist, - while `X_text` and `X_img` are optional. - """ preds_l = self._predict(X_wide, X_deep, X_text, X_img, X_test) if self.method == "regression": @@ -733,8 +706,8 @@ def predict( def predict_proba( self, - X_wide: np.ndarray, - X_deep: np.ndarray, + X_wide: Optional[np.ndarray] = None, + X_deep: Optional[np.ndarray] = None, X_text: Optional[np.ndarray] = None, X_img: Optional[np.ndarray] = None, X_test: Optional[Dict[str, np.ndarray]] = None, @@ -807,7 +780,7 @@ def _loss_fn(self, y_pred: Tensor, y_true: Tensor) -> Tensor: # type: ignore if self.method == "multiclass": return F.cross_entropy(y_pred, y_true, weight=self.class_weight) - def _train_val_split( + def _train_val_split( # noqa: C901 self, X_wide: Optional[np.ndarray] = None, X_deep: Optional[np.ndarray] = None, @@ -835,100 +808,51 @@ def _train_val_split( :obj:`torch.utils.data.DataLoader`. See :class:`pytorch_widedeep.models._wd_dataset` """ - #  Without validation - if X_val is None and val_split is None: - # if a train dictionary is passed, check if text and image datasets - # are present and instantiate the WideDeepDataset class - if X_train is not None: - X_wide, X_deep, target = ( - X_train["X_wide"], - X_train["X_deep"], - X_train["target"], - ) - if "X_text" in X_train.keys(): - X_text = X_train["X_text"] - if "X_img" in X_train.keys(): - X_img = X_train["X_img"] - X_train = {"X_wide": X_wide, "X_deep": X_deep, "target": target} - try: - X_train.update({"X_text": X_text}) - except: - pass - try: - X_train.update({"X_img": X_img}) - except: - pass + + if X_val is not None: + assert ( + X_train is not None + ), "if the validation set is passed as a dictionary, the training set must also be a dictionary" train_set = WideDeepDataset(**X_train, transforms=self.transforms) # type: ignore - eval_set = None - #  With validation - else: - if X_val is not None: - # if a validation dictionary is passed, then if not train - # dictionary is passed we build it with the input arrays - # (either the dictionary or the arrays must be passed) - if X_train is None: - X_train = {"X_wide": X_wide, "X_deep": X_deep, "target": target} - if X_text is not None: - X_train.update({"X_text": X_text}) - if X_img is not None: - X_train.update({"X_img": X_img}) - else: - # if a train dictionary is passed, check if text and image - # datasets are present. The train/val split using val_split - if X_train is not None: - X_wide, X_deep, target = ( - X_train["X_wide"], - X_train["X_deep"], - X_train["target"], - ) - if "X_text" in X_train.keys(): - X_text = X_train["X_text"] - if "X_img" in X_train.keys(): - X_img = X_train["X_img"] - ( - X_tr_wide, - X_val_wide, - X_tr_deep, - X_val_deep, - y_tr, - y_val, - ) = train_test_split( - X_wide, - X_deep, - target, - test_size=val_split, - random_state=self.seed, - stratify=target if self.method != "regression" else None, + eval_set = WideDeepDataset(**X_val, transforms=self.transforms) # type: ignore + elif val_split is not None: + if not X_train: + X_train = self._build_train_dict(X_wide, X_deep, X_text, X_img, target) + y_tr, y_val, idx_tr, idx_val = train_test_split( + X_train["target"], + np.arange(len(X_train["target"])), + test_size=val_split, + stratify=X_train["target"] if self.method != "regression" else None, + ) + X_tr, X_val = {"target": y_tr}, {"target": y_val} + if "X_wide" in X_train.keys(): + X_tr["X_wide"], X_val["X_wide"] = ( + X_train["X_wide"][idx_tr], + X_train["X_wide"][idx_val], ) - X_train = {"X_wide": X_tr_wide, "X_deep": X_tr_deep, "target": y_tr} - X_val = {"X_wide": X_val_wide, "X_deep": X_val_deep, "target": y_val} - try: - X_tr_text, X_val_text = train_test_split( - X_text, - test_size=val_split, - random_state=self.seed, - stratify=target if self.method != "regression" else None, - ) - X_train.update({"X_text": X_tr_text}), X_val.update( - {"X_text": X_val_text} - ) - except: - pass - try: - X_tr_img, X_val_img = train_test_split( - X_img, - test_size=val_split, - random_state=self.seed, - stratify=target if self.method != "regression" else None, - ) - X_train.update({"X_img": X_tr_img}), X_val.update( - {"X_img": X_val_img} - ) - except: - pass - # At this point the X_train and X_val dictionaries have been built - train_set = WideDeepDataset(**X_train, transforms=self.transforms) # type: ignore + if "X_deep" in X_train.keys(): + X_tr["X_deep"], X_val["X_deep"] = ( + X_train["X_deep"][idx_tr], + X_train["X_deep"][idx_val], + ) + if "X_text" in X_train.keys(): + X_tr["X_text"], X_val["X_text"] = ( + X_train["X_text"][idx_tr], + X_train["X_text"][idx_val], + ) + if "X_img" in X_train.keys(): + X_tr["X_img"], X_val["X_img"] = ( + X_train["X_img"][idx_tr], + X_train["X_img"][idx_val], + ) + train_set = WideDeepDataset(**X_tr, transforms=self.transforms) # type: ignore eval_set = WideDeepDataset(**X_val, transforms=self.transforms) # type: ignore + else: + if not X_train: + X_train = self._build_train_dict(X_wide, X_deep, X_text, X_img, target) + train_set = WideDeepDataset(**X_train, transforms=self.transforms) # type: ignore + eval_set = None + return train_set, eval_set def _warm_up( @@ -981,7 +905,7 @@ def _warm_up( else: warmer.warm_all(self.deepimage, "deepimage", loader, n_epochs, max_lr) - def _lr_scheduler_step(self, step_location: str): + def _lr_scheduler_step(self, step_location: str): # noqa: C901 r""" Function to execute the learning rate schedulers steps. If the lr_scheduler is Cyclic (i.e. CyclicLR or OneCycleLR), the step @@ -1025,7 +949,7 @@ def _training_step(self, data: Dict[str, Tensor], target: Tensor, batch_idx: int self.train() X = {k: v.cuda() for k, v in data.items()} if use_cuda else data y = target.float() if self.method != "multiclass" else target - y = y.cuda() if use_cuda else y + y = y.to(device) self.optimizer.zero_grad() y_pred = self.forward(X) @@ -1051,7 +975,7 @@ def _validation_step(self, data: Dict[str, Tensor], target: Tensor, batch_idx: i with torch.no_grad(): X = {k: v.cuda() for k, v in data.items()} if use_cuda else data y = target.float() if self.method != "multiclass" else target - y = y.cuda() if use_cuda else y + y = y.to(device) y_pred = self.forward(X) loss = self._loss_fn(y_pred, y) @@ -1069,8 +993,8 @@ def _validation_step(self, data: Dict[str, Tensor], target: Tensor, batch_idx: i def _predict( self, - X_wide: np.ndarray, - X_deep: np.ndarray, + X_wide: Optional[np.ndarray] = None, + X_deep: Optional[np.ndarray] = None, X_text: Optional[np.ndarray] = None, X_img: Optional[np.ndarray] = None, X_test: Optional[Dict[str, np.ndarray]] = None, @@ -1082,7 +1006,11 @@ def _predict( if X_test is not None: test_set = WideDeepDataset(**X_test) else: - load_dict = {"X_wide": X_wide, "X_deep": X_deep} + load_dict = {} + if X_wide is not None: + load_dict = {"X_wide": X_wide} + if X_deep is not None: + load_dict.update({"X_deep": X_deep}) if X_text is not None: load_dict.update({"X_text": X_text}) if X_img is not None: @@ -1095,7 +1023,7 @@ def _predict( num_workers=n_cpus, shuffle=False, ) - test_steps = (len(test_loader.dataset) // test_loader.batch_size) + 1 + test_steps = (len(test_loader.dataset) // test_loader.batch_size) + 1 # type: ignore[arg-type] self.eval() preds_l = [] @@ -1113,3 +1041,78 @@ def _predict( preds_l.append(preds) self.train() return preds_l + + @staticmethod + def _build_train_dict(X_wide, X_deep, X_text, X_img, target): + X_train = {"target": target} + if X_wide is not None: + X_train["X_wide"] = X_wide + if X_deep is not None: + X_train["X_deep"] = X_deep + if X_text is not None: + X_train["X_text"] = X_text + if X_img is not None: + X_train["X_img"] = X_img + return X_train + + @staticmethod # noqa: C901 + def _check_model_components( + wide, + deepdense, + deeptext, + deepimage, + deephead, + head_layers, + head_dropout, + pred_dim, + ): + + if wide is not None: + assert wide.wide_linear.weight.size(1) == pred_dim, ( + "the 'pred_dim' of the wide component ({}) must be equal to the 'pred_dim' " + "of the deep component and the overall model itself ({})".format( + wide.wide_linear.weight.size(1), pred_dim + ) + ) + if deepdense is not None and not hasattr(deepdense, "output_dim"): + raise AttributeError( + "deepdense model must have an 'output_dim' attribute. " + "See pytorch-widedeep.models.deep_dense.DeepText" + ) + if deeptext is not None and not hasattr(deeptext, "output_dim"): + raise AttributeError( + "deeptext model must have an 'output_dim' attribute. " + "See pytorch-widedeep.models.deep_dense.DeepText" + ) + if deepimage is not None and not hasattr(deepimage, "output_dim"): + raise AttributeError( + "deepimage model must have an 'output_dim' attribute. " + "See pytorch-widedeep.models.deep_dense.DeepText" + ) + if deephead is not None and head_layers is not None: + raise ValueError( + "both 'deephead' and 'head_layers' are not None. Use one of the other, but not both" + ) + if head_layers is not None and not deepdense and not deeptext and not deepimage: + raise ValueError( + "if 'head_layers' is not None, at least one deep component must be used" + ) + if head_layers is not None and head_dropout is not None: + assert len(head_layers) == len( + head_dropout + ), "'head_layers' and 'head_dropout' must have the same length" + if deephead is not None: + deephead_inp_feat = next(deephead.parameters()).size(1) + output_dim = 0 + if deepdense is not None: + output_dim += deepdense.output_dim + if deeptext is not None: + output_dim += deeptext.output_dim + if deepimage is not None: + output_dim += deepimage.output_dim + assert deephead_inp_feat == output_dim, ( + "if a custom 'deephead' is used its input features ({}) must be equal to " + "the output features of the deep component ({})".format( + deephead_inp_feat, output_dim + ) + ) diff --git a/pytorch_widedeep/version.py b/pytorch_widedeep/version.py index 3dd3d2d5..a34b2f6b 100644 --- a/pytorch_widedeep/version.py +++ b/pytorch_widedeep/version.py @@ -1 +1 @@ -__version__ = "0.4.6" +__version__ = "0.4.7" diff --git a/setup.py b/setup.py index 9f9d8702..8283192a 100644 --- a/setup.py +++ b/setup.py @@ -33,9 +33,10 @@ ] extras["quality"] = [ "black", - "isort @ git+git://github.com/timothycrosley/isort.git@e63ae06ec7d70b06df9e528357650281a3d3ec22#egg=isort", + "isort", "flake8", ] +extras["all"] = extras["test"] + extras["docs"] + extras["quality"] # main setup kw args setup_kwargs = { @@ -62,7 +63,7 @@ "torch", "torchvision", ], - "extra_requires": extras, + "extras_require": extras, "python_requires": ">=3.6.0", "classifiers": [ dev_status[majorminor], diff --git a/tests/test_model_components/test_wide_deep.py b/tests/test_model_components/test_wide_deep.py index 5a6fc249..1e822862 100644 --- a/tests/test_model_components/test_wide_deep.py +++ b/tests/test_model_components/test_wide_deep.py @@ -55,7 +55,7 @@ def test_history_callback(deepcomponent, component_name): def test_deephead_and_head_layers(): deephead = nn.Sequential(nn.Linear(32, 16), nn.Linear(16, 8)) - with pytest.warns(UserWarning): + with pytest.raises(ValueError): model = WideDeep( # noqa: F841 wide=wide, deepdense=deepdense, head_layers=[16, 8], deephead=deephead ) diff --git a/tests/test_model_functioning/test_data_inputs.py b/tests/test_model_functioning/test_data_inputs.py index da484fff..483a8670 100644 --- a/tests/test_model_functioning/test_data_inputs.py +++ b/tests/test_model_functioning/test_data_inputs.py @@ -2,6 +2,7 @@ import numpy as np import pytest +from torch import nn from torchvision.transforms import ToTensor, Normalize from sklearn.model_selection import train_test_split @@ -67,11 +68,16 @@ transforms1 = [ToTensor, Normalize(mean=mean, std=std)] transforms2 = [Normalize(mean=mean, std=std)] +deephead_ds = nn.Sequential(nn.Linear(16, 8), nn.Linear(8, 4)) +deephead_dt = nn.Sequential(nn.Linear(64, 8), nn.Linear(8, 4)) +deephead_di = nn.Sequential(nn.Linear(512, 8), nn.Linear(8, 4)) -############################################################################## +# ############################################################################# # Test many possible scenarios of data inputs I can think off. Surely users # will input something unexpected -############################################################################## +# ############################################################################# + + @pytest.mark.parametrize( "X_wide, X_deep, X_text, X_img, X_train, X_val, target, val_split, transforms, nepoch, null", [ @@ -266,3 +272,141 @@ def test_widedeep_inputs( model.history.epoch[0] == nepoch and model.history._history["train_loss"] is not null ) + + +@pytest.mark.parametrize( + "X_wide, X_deep, X_text, X_img, X_train, X_val, target", + [ + ( + X_wide, + X_deep, + X_text, + X_img, + None, + { + "X_wide": X_wide_val, + "X_deep": X_deep_val, + "X_text": X_text_val, + "X_img": X_img_val, + "target": y_val, + }, + target, + ), + ], +) +def test_xtrain_xval_assertion( + X_wide, + X_deep, + X_text, + X_img, + X_train, + X_val, + target, +): + model = WideDeep( + wide=wide, deepdense=deepdense, deeptext=deeptext, deepimage=deepimage + ) + model.compile(method="binary", verbose=0) + with pytest.raises(AssertionError): + model.fit( + X_wide=X_wide, + X_deep=X_deep, + X_text=X_text, + X_img=X_img, + X_train=X_train, + X_val=X_val, + target=target, + batch_size=16, + ) + + +@pytest.mark.parametrize( + "wide, deepdense, deeptext, deepimage, X_wide, X_deep, X_text, X_img, target", + [ + (wide, None, None, None, X_wide, None, None, None, target), + (None, deepdense, None, None, None, X_deep, None, None, target), + (None, None, deeptext, None, None, None, X_text, None, target), + (None, None, None, deepimage, None, None, None, X_img, target), + ], +) +def test_individual_inputs( + wide, deepdense, deeptext, deepimage, X_wide, X_deep, X_text, X_img, target +): + model = WideDeep( + wide=wide, deepdense=deepdense, deeptext=deeptext, deepimage=deepimage + ) + model.compile(method="binary", verbose=0) + model.fit( + X_wide=X_wide, + X_deep=X_deep, + X_text=X_text, + X_img=X_img, + target=target, + batch_size=16, + ) + # check it has run succesfully + assert len(model.history._history) == 1 + + +############################################################################### +#  test deephead is not None and individual components +############################################################################### + + +@pytest.mark.parametrize( + "deepdense, deeptext, deepimage, X_deep, X_text, X_img, deephead, target", + [ + (deepdense, None, None, X_deep, None, None, deephead_ds, target), + (None, deeptext, None, None, X_text, None, deephead_dt, target), + (None, None, deepimage, None, None, X_img, deephead_di, target), + ], +) +def test_deephead_individual_components( + deepdense, deeptext, deepimage, X_deep, X_text, X_img, deephead, target +): + model = WideDeep( + deepdense=deepdense, deeptext=deeptext, deepimage=deepimage, deephead=deephead + ) # noqa: F841 + model.compile(method="binary", verbose=0) + model.fit( + X_wide=X_wide, + X_deep=X_deep, + X_text=X_text, + X_img=X_img, + target=target, + batch_size=16, + ) + # check it has run succesfully + assert len(model.history._history) == 1 + + +############################################################################### +#  test deephead is None and head_layers is not None and individual components +############################################################################### + + +@pytest.mark.parametrize( + "deepdense, deeptext, deepimage, X_deep, X_text, X_img, target", + [ + (deepdense, None, None, X_deep, None, None, target), + (None, deeptext, None, None, X_text, None, target), + (None, None, deepimage, None, None, X_img, target), + ], +) +def test_head_layers_individual_components( + deepdense, deeptext, deepimage, X_deep, X_text, X_img, target +): + model = WideDeep( + deepdense=deepdense, deeptext=deeptext, deepimage=deepimage, head_layers=[8, 4] + ) # noqa: F841 + model.compile(method="binary", verbose=0) + model.fit( + X_wide=X_wide, + X_deep=X_deep, + X_text=X_text, + X_img=X_img, + target=target, + batch_size=16, + ) + # check it has run succesfully + assert len(model.history._history) == 1 diff --git a/tests/test_model_functioning/test_miscellaneous.py b/tests/test_model_functioning/test_miscellaneous.py new file mode 100644 index 00000000..140d5a76 --- /dev/null +++ b/tests/test_model_functioning/test_miscellaneous.py @@ -0,0 +1,196 @@ +import string + +import numpy as np +import torch +import pytest +from sklearn.model_selection import train_test_split + +from pytorch_widedeep.models import ( + Wide, + DeepText, + WideDeep, + DeepDense, + DeepImage, +) +from pytorch_widedeep.metrics import Accuracy, Precision +from pytorch_widedeep.callbacks import EarlyStopping + +# Wide array +X_wide = np.random.choice(50, (32, 10)) + +# Deep Array +colnames = list(string.ascii_lowercase)[:10] +embed_cols = [np.random.choice(np.arange(5), 32) for _ in range(5)] +embed_input = [(u, i, j) for u, i, j in zip(colnames[:5], [5] * 5, [16] * 5)] +cont_cols = [np.random.rand(32) for _ in range(5)] +X_deep = np.vstack(embed_cols + cont_cols).transpose() + +#  Text Array +padded_sequences = np.random.choice(np.arange(1, 100), (32, 48)) +X_text = np.hstack((np.repeat(np.array([[0, 0]]), 32, axis=0), padded_sequences)) +vocab_size = 100 + +#  Image Array +X_img = np.random.choice(256, (32, 224, 224, 3)) +X_img_norm = X_img / 255.0 + +# Target +target = np.random.choice(2, 32) +target_multi = np.random.choice(3, 32) + +# train/validation split +( + X_wide_tr, + X_wide_val, + X_deep_tr, + X_deep_val, + X_text_tr, + X_text_val, + X_img_tr, + X_img_val, + y_train, + y_val, +) = train_test_split(X_wide, X_deep, X_text, X_img, target) + +# build model components +wide = Wide(np.unique(X_wide).shape[0], 1) +deepdense = DeepDense( + hidden_layers=[32, 16], + dropout=[0.5, 0.5], + deep_column_idx={k: v for v, k in enumerate(colnames)}, + embed_input=embed_input, + continuous_cols=colnames[-5:], +) +deeptext = DeepText(vocab_size=vocab_size, embed_dim=32, padding_idx=0) +deepimage = DeepImage(pretrained=True) + +############################################################################### +#  test consistecy between optimizers and lr_schedulers format +############################################################################### + + +def test_optimizer_scheduler_format(): + model = WideDeep(deepdense=deepdense) + optimizers = {"deepdense": torch.optim.Adam(model.deepdense.parameters(), lr=0.01)} + schedulers = torch.optim.lr_scheduler.StepLR(optimizers["deepdense"], step_size=3) + with pytest.raises(ValueError): + model.compile( + method="binary", + optimizers=optimizers, + lr_schedulers=schedulers, + ) + + +############################################################################### +#  test that callbacks are properly initialised internally +############################################################################### + + +def test_non_instantiated_callbacks(): + model = WideDeep(wide=wide, deepdense=deepdense) + callbacks = [EarlyStopping] + model.compile(method="binary", callbacks=callbacks) + assert model.callbacks[1].__class__.__name__ == "EarlyStopping" + + +############################################################################### +#  test that multiple metrics are properly constructed internally +############################################################################### + + +def test_multiple_metrics(): + model = WideDeep(wide=wide, deepdense=deepdense) + metrics = [Accuracy, Precision] + model.compile(method="binary", metrics=metrics) + assert ( + model.metric._metrics[0].__class__.__name__ == "Accuracy" + and model.metric._metrics[1].__class__.__name__ == "Precision" + ) + + +############################################################################### +#  test the train step with metrics runs well for a binary prediction +############################################################################### + + +def test_basic_run_with_metrics_binary(): + model = WideDeep(wide=wide, deepdense=deepdense) + model.compile(method="binary", metrics=[Accuracy], verbose=False) + model.fit( + X_wide=X_wide, + X_deep=X_deep, + target=target, + n_epochs=1, + batch_size=16, + val_split=0.2, + ) + assert ( + "train_loss" in model.history._history.keys() + and "train_acc" in model.history._history.keys() + ) + + +############################################################################### +#  test the train step with metrics runs well for a muticlass prediction +############################################################################### + + +def test_basic_run_with_metrics_multiclass(): + wide = Wide(np.unique(X_wide).shape[0], 3) + deepdense = DeepDense( + hidden_layers=[32, 16], + dropout=[0.5, 0.5], + deep_column_idx={k: v for v, k in enumerate(colnames)}, + embed_input=embed_input, + continuous_cols=colnames[-5:], + ) + model = WideDeep(wide=wide, deepdense=deepdense, pred_dim=3) + model.compile(method="multiclass", metrics=[Accuracy], verbose=False) + model.fit( + X_wide=X_wide, + X_deep=X_deep, + target=target_multi, + n_epochs=1, + batch_size=16, + val_split=0.2, + ) + assert ( + "train_loss" in model.history._history.keys() + and "train_acc" in model.history._history.keys() + ) + + +############################################################################### +#  test predict method for individual components +############################################################################### + + +@pytest.mark.parametrize( + "wide, deepdense, deeptext, deepimage, X_wide, X_deep, X_text, X_img, target", + [ + (wide, None, None, None, X_wide, None, None, None, target), + (None, deepdense, None, None, None, X_deep, None, None, target), + (None, None, deeptext, None, None, None, X_text, None, target), + (None, None, None, deepimage, None, None, None, X_img, target), + ], +) +def test_predict_with_individual_component( + wide, deepdense, deeptext, deepimage, X_wide, X_deep, X_text, X_img, target +): + + model = WideDeep( + wide=wide, deepdense=deepdense, deeptext=deeptext, deepimage=deepimage + ) + model.compile(method="binary", verbose=0) + model.fit( + X_wide=X_wide, + X_deep=X_deep, + X_text=X_text, + X_img=X_img, + target=target, + batch_size=16, + ) + # simply checking that runs and produces outputs + preds = model.predict(X_wide=X_wide, X_deep=X_deep, X_text=X_text, X_img=X_img) + + assert preds.shape[0] == 32 and "train_loss" in model.history._history diff --git a/tests/test_warm_up/test_warm_up_routines.py b/tests/test_warm_up/test_warm_up_routines.py index c5611d77..2fd1c951 100644 --- a/tests/test_warm_up/test_warm_up_routines.py +++ b/tests/test_warm_up/test_warm_up_routines.py @@ -161,7 +161,7 @@ def test_warm_all(model, modelname, loader, n_epochs, max_lr): has_run = True try: warmer.warm_all(model, modelname, loader, n_epochs, max_lr) - except: + except Exception: has_run = False assert has_run @@ -182,6 +182,6 @@ def test_warm_gradual(model, modelname, loader, max_lr, layers, routine): has_run = True try: warmer.warm_gradual(model, modelname, loader, max_lr, layers, routine) - except: + except Exception: has_run = False assert has_run