Warning during save_hyperparameter() gives misleading advice? #13615

hogru · 2022-07-12T09:48:33Z

hogru
Jul 12, 2022

I try to understand / rectify a warning about saving my hyper parameters and would need some assistance please.

I build a model this way:

from pytorch_lightning.core.mixins import HyperparametersMixin

class MyModel(nn.Module, HyperparametersMixin):
    def __init__(...):
        super().__init__()
        self.save_hyperparameters()  # Logs to self.hparams only, not to the logger (since there isn't any yet)

class MyModule(pl.LightningModule):
    def __init__(model: nn.Module):
        super().__init__()
        self.save_hyperparameters("model", logger=False)
        self.save_hyperparameters()
        self.save_hyperparameters(model.hparams)

model = MyModel(...)
module = MyModule(model)

This works, and I can load a checkpoint with model = MyModule.load_from_checkpoint(model_path_to_load_from)

But I also get this warning during the initialization of MyModule: Attribute 'model' is an instance of 'nn.Module' and is already saved during checkpointing. It is recommended to ignore them using 'self.save_hyperparameters(ignore=['model'])'.

So I change the corresponding code in MyModule to:

self.save_hyperparameters(ignore=["model"])
self.save_hyperparameters(model.hparams)

The created checkpoint is marginally reduced by ~3KB, the checkpoint size is ~1MB.

But when I want to load the checkpoint I get this error:

File "/Users/stephan/Library/Caches/pypoetry/virtualenvs/molgen-6oMP0hTK-py3.9/lib/python3.9/site-packages/pytorch_lightning/core/saving.py", line 161, in load_from_checkpoint
    model = cls._load_model_state(checkpoint, strict=strict, **kwargs)
  File "/Users/stephan/Library/Caches/pypoetry/virtualenvs/molgen-6oMP0hTK-py3.9/lib/python3.9/site-packages/pytorch_lightning/core/saving.py", line 203, in _load_model_state
    model = cls(**_cls_kwargs)
TypeError: __init__() missing 1 required keyword-only argument: 'model'

Which seems to indicate that I need to save model as a hyper parameter.

What am I missing? What is the correct way to save the hyper parameters / model in the pl.LightningModule?

Answered by rohitgr7

Jul 12, 2022

the attributes that are not saved as hparams need to be passed explicitly. Considering you are using load_from_checkpoint API, you can use model = MyModule.load_from_checkpoint(ckpt_path, model=model).

If you include it in the hparams, your checkpoints will be unnecessarily big and can create issues if you have large models.

By

is already saved during checkpointing.

it means the model weights are already saved in the checkpoint and are loaded using PyTorch API, not as hparams.

View full answer

rohitgr7 · 2022-07-12T12:56:54Z

rohitgr7
Jul 12, 2022

the attributes that are not saved as hparams need to be passed explicitly. Considering you are using load_from_checkpoint API, you can use model = MyModule.load_from_checkpoint(ckpt_path, model=model).

If you include it in the hparams, your checkpoints will be unnecessarily big and can create issues if you have large models.

By

is already saved during checkpointing.

it means the model weights are already saved in the checkpoint and are loaded using PyTorch API, not as hparams.

0 replies

hogru · 2022-07-13T09:52:31Z

hogru
Jul 13, 2022
Author

Hi @rohitgr7,

Thank you very much for your swift response. Understood, I think.

Would you mind checking my understanding? I have 2 concerns

(1) I changed the code to load the checkpoint as follows, please refer to the assumption in the comment at the end of the code snippet.

    # Load checkpoint and move it to CPU
    checkpoint = torch.load(model_path_to_load_from, map_location=torch.device("cpu"))

    # Leverage the logged hyper params to build an (untrained) model
    # The function build_model returns an instance of nn.Module
    hparams = checkpoint["hyper_parameters"]
    model_config = {
        "vocab_size": hparams["vocab_size"],
        "embed_dim": hparams["embed_dim"],
        ...
    }
    model = build_model(
        hparams["architecture"], hparams["variant"], **model_config
    )

    # ***** I think I do not need to manually load the state_dict, True? *****
    # If I need to, the issue is that the state_dict contains keys such as "model.linear.bias",
    # but load_state_dict() expects "linear.bias"
    # model.load_state_dict(checkpoint["state_dict"])

   # Finally, load the pl module
    pl_model = MoleculeGenerator.load_from_checkpoint(model_path_to_load_from, model=model)

(2) Besides the model, I pass a loss function to the pl Module, which is also an nn.Module. Do I need to / do you recommend treating it as the model, i.e., load / configure it manually from the checkpoint? I guess the question is, should I exclude nn.Module just as a precaution if it gets too large (and the loss function parameters probably don't) or is it sort of “forbidden” to log nn.Modules via save_hyperparameters()?

0 replies

rohitgr7 · 2022-07-13T10:18:48Z

rohitgr7
Jul 13, 2022

I changed the code to load the checkpoint as follows, please refer to the assumption in the comment at the end of the code snippet.

I'd not recommend this way. Since you are loading the actual model using the hparams, you should load it from within the LightningModule's init

class MyModule(LightningModule):
    def __init__(...):
        super().__init__()
        self.save_hyperparameters(...)
        model = build_model(self.hparams...)

just wondering, how did you load the model without the checkpoint if you need hparams to load them.

Besides the model, I pass a loss function to the pl Module, which is also an nn.Module. Do I need to / do you recommend treating it as the model, i.e., load / configure it manually from the checkpoint? I guess the question is, should I exclude nn.Module just as a precaution if it gets too large (and the loss function parameters probably don't) or is it sort of “forbidden” to log nn.Modules via save_hyperparameters()?

it's actually hard for us to determine which one is a loss module or which one is a model. So we check if an object is an instance of nn.Module and raise a warning accordingly. Personally, I'd exclude an instance of the Loss function from the hparam since it's not an hparam (rather the type of loss or loss name should be considered as an hparam), but again it's optional and we are raising a warning, not an error, since loss functions don't have any params, it should be fine too, but again, I won't recommend it.

0 replies

hogru · 2022-07-13T12:17:18Z

hogru
Jul 13, 2022
Author

I hope you still have some time/energy to stay with me, I really would like to do it properly.

That's getting trickier than I thought. For context: I am doing experiments and need to combine different datasets, models, encodings, … to assess the overall/combined performance. So, I am trying to make a generic wrapper that I can plug those components into. Which is the background for my comment:

Since you are loading the actual model using the hparams, you should load it from within the LightningModule's init

I think I understand your reasoning, but since I have different model architectures, I have a different set of hparams for each architecture. So, I would need to pass the superset of those hparams to the LightningModule and I am/was trying to avoid that. My build_model has a bunch of if architecture == ... statements, and the LightningModule (until now) is not aware of the model specifics. I just have to make sure that the models' input and output shapes stay the same (or at least can be dynamically assessed).

just wondering, how did you load the model without the checkpoint if you need hparams to load them.

Maybe not correctly until now? Until now, I saved the nn.Modules with save_hyperparameters() and loaded the LightningModule with load_from_checkpoint(). But since I am still building the pipeline, I did not do predictions with the loaded model.

As a summary of my understanding, I have the following options:

Save the nn.Modules as hparams and be done with a load_checkpoint(). Easy but not recommended due to potential (model parameter) size issues
Build the model (and the loss function) during __init__() of the LightningModule. Recommended way, but "blows up" the parameter/arguments to __init__()
Load the checkpoint with torch.load(), build the model and loss function "outside" of the LightningModule and pass them as arguments to the LightningModules __init__(). Not recommended (but not sure why) and I probably need to "transcribe" the model's state_dict from e.g. model.linear.bias to linear.bias

If you can think of a fourth "best of all worlds" I am happy to hear it of course. Thanks for your insights.

0 replies

rohitgr7 · 2022-07-13T17:55:53Z

rohitgr7
Jul 13, 2022

personally I'd recommend this.

Build the model (and the loss function) during init() of the LightningModule. Recommended way, but "blows up" the parameter/arguments to init()

if you are concerned with the huge number of arguments, you can use namespaces.
https://pytorch-lightning.readthedocs.io/en/latest/common/hyperparameters.html#argparser-best-practices

because in all other cases, you might have to reload the checkpoint manually to initialize the model using the saved hparams.

1 reply

hogru Jul 14, 2022
Author

Ok, will (try to) follow your recommendation and have already started. I will document it here, at least for me that was not straightforward and maybe somebody else finds this helpful. Here's what I did so far (might edit this later):

Remove "traces" of pytorch lightning in my nn.Modules, i.e., delete the HyperparametersMixin and save_hyperparameters(). I like the nn.Modules to be "plain" better anyway
I do the config with Hydra/Omegeaconf and my config is in a single dataclass with sub dataclasses, Enums (for validation); 90 items and counting. This dataclass is an instance of my Hydra config schema (not an OmegaConf)
In order not to pass the whole config to the Lightning Module I settled on model_config = SimpleNamespace(**asdict(cfg.model_config)) with cfg being the whole Hydra/Omegaconf config dataclass
To make the Lightning Module independent of the Enums I copied/amended a custom StrEnum class that allows for comparison as such: Enum == str (__eq__() override)
in the Lightning Module I do the following:

    def __init__(
        self,
        model_config: SimpleNamespace,
        loss_fn_config: SimpleNamespace,
        optimizer_config: SimpleNamespace,
        metadata: SimpleNamespace,
    ) -> None:
        super().__init__()
        self.save_hyperparameters(logger=False)  # otherwise this logs the looong namespace variable to wandb (unusable in the UI)
        # to me it seems that the namespace is not "translated" to a dict, which wandb seems to expect; so do it myself
        # this still logs the namespace to wandb, but I can hide it in the UI
        # Not pretty, but seems to work so far
        self.save_hyperparameters(vars(model_config) | vars(loss_fn_config) |
                                  vars(optimizer_config) | vars(model_metadata), logger=True)
        self.model = ...  # use model_config  # this way, the model is not part of self.hparams
        self.loss_fn = ...  # use loss_fn_config
        ...

Hope that works...

BrunoBelucci · 2023-03-02T21:00:01Z

BrunoBelucci
Mar 2, 2023

Hello,
I know it has already been answered, but I came across the same question and your discussion was really helpful. I just wanted to share my solution in case anyone else is looking for a way to save a model as a hyperparameter. What I did was pass the model class as a parameter to the init inside the LightningModule together with a **model_hyperparameters.

class AbstractModel(pl.LightningModule):
    def __init__(self, model_class, **model_hyperparameters):
        super().__init__()
        self.save_hyperparameters()
        self.model = model_class(**model_hyperparameters)
...

In this way, each checkpoint saves the hyperparameters needed for each model and I only instantiate the model inside the LightningModule.

0 replies

jiajiexiao · 2023-08-14T01:12:18Z

jiajiexiao
Aug 14, 2023

I also encountered the same issue..The argument of torch.nn.Module type seems recommended to be excluded in LightningModule.save_hyperparameters() according to the warning message and discussion here. However, the Lightning documentation (version 2.0.6) shows an example of accessing the input modules from the checkpoint (link) as seen below:

class Encoder(nn.Module):
    ...


class Decoder(nn.Module):
    ...


class Autoencoder(pl.LightningModule):
    def __init__(self, encoder, decoder, *args, **kwargs):
        ...


autoencoder = Autoencoder(Encoder(), Decoder())

checkpoint = torch.load(CKPT_PATH)
encoder_weights = checkpoint["encoder"]
decoder_weights = checkpoint["decoder"]

This is not consistent with the recommendation from the warning message and causes confusion.

2 replies

LukasK13 Sep 15, 2023

I stumbled upon this as well. I think either the documentation or the warning message should be updated to make this consistent.

SuryaThiru Sep 25, 2023

I have a similar scenario where I am trying to load a generic feature extractor (initialized externally) into a larger model. I would prefer this backbone be part of the checkpoint instead of passing it in while loading the checkpoint as it would make my experiments harder to track and maintain.

What is the best way to go about this?

senarvi · 2023-10-24T10:10:05Z

senarvi
Oct 24, 2023

I also use Hydra for configuration. My model configurations may contain nested nn.Modules (such as backbone, encoder, etc.). It's easy to instantiate the LightningModule using a single Hydra call. However, then it's not possible to save the hyperparameters.

In my opinion the solution by @BrunoBelucci is the cleanest, but unfortunately it doesn't work with nested nn.Modules (when, in his example, one of the arguments in model_hyperparameters is an nn.Module).

I tried lots of different solutions. A solution similar to the one described by @hogru was promising - a main LightningModule that instantiates the model from an OmegaConf configuration. This gets quite ugly because one would like to keep the LightningModule methods (training_step() etc.) together with the model. So the LightningModule would have to call the corresponding method from the model class.

Finally I found a really simple solution. Let's consider an example where the model configuration includes a backbone.

@dataclass
class MyBackboneConfig:
    _target_: str = "backbones.MyBackbone"
    depth: int = 8

@dataclass
class MyModelConfig:
    _target_: str = "models.MyModel"
    backbone: MyBackboneConfig = MyBackboneConfig()

When I instantiate the model, I don't instantiate the arguments recursively (see the Hydra option _recursive_).

from hydra.utils import instantiate

model = instantiate(model_cfg, _recursive_=False)

The model constructor instantiates any OmegaConf arguments recursively.

class MyModel(LightningModule):
    def __init__(self, backbone):
        super().__init__()
        self.save_hyperparameters()
        self.backbone = instantiate(backbone)

When loading a checkpoint, it's possible to use a Hydra configuration and not load hyperparameters from the checkpoint like this:

from hydra.utils import call

model_cfg._target_ += ".load_from_checkpoint"
model = call(model_cfg, checkpoint_path, _recursive_=False)

It's also possible to use the hyperparameters from the checkpoint and not override anything like this:

from hydra._internal.utils import _locate

model_class = _locate(model_cfg._target_)
model = model_class.load_from_checkpoint(checkpoint_path)

One should just be aware that it's not possible to load the hyperparameters from a checkpoint and override some of the nested hyperparameters (e.g. override only the depth of the backbone in the above example). _load_from_checkpoint() will use update() to replace any hyperparameters that are provided in **kwargs. So if backbone is provided, it will completely replace the backbone in the hyperparameters.

2 replies

e-yi Jan 17, 2024

I'm also a hydra user, and the following is my solution in the end. Since all the hyperparemeters are actually saved in a hydra config file, I decided that I only need to resotore the state_dict.

from hydra import compose, initialize
from hydra.utils import instantiate

with initialize(version_base="1.3", config_path="../configs",):
    cfg = compose(config_name="xxx.yaml", overrides=["experiment=xxx"])

m  = instantiate(cfg.model)
ckpt= torch.load('xxx.ckpt')
m.load_state_dict(ckpt['state_dict'])

fses91 Aug 15, 2024

This is the solution, I was searching for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warning during save_hyperparameter() gives misleading advice? #13615

{{title}}

Replies: 8 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Warning during save_hyperparameter() gives misleading advice? #13615

Replies: 8 comments · 5 replies

hogru Jul 13, 2022 Author

hogru Jul 13, 2022 Author

hogru Jul 14, 2022 Author

Replies: 8 comments 5 replies

hogru
Jul 13, 2022
Author

hogru
Jul 13, 2022
Author

hogru Jul 14, 2022
Author