Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add version hyperparameter to the checkpoints. #288

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

RaulPPelaez
Copy link
Collaborator

@RaulPPelaez RaulPPelaez commented Feb 20, 2024

Check version when loading a model.

Print a warning if the checkpoint is loaded with a version different to the one used to create it.

Closes #291

@RaulPPelaez
Copy link
Collaborator Author

@peastman could you please review this one?

torchmdnet/__init__.py Outdated Show resolved Hide resolved
@RaulPPelaez
Copy link
Collaborator Author

@stefdoerr I merged #291 here so that it is intertwined with the checkpoint version functionality.

@peastman
Copy link
Collaborator

I've never used versioneer, but from their documentation it looks like it does basically the same thing your code did? When you're doing an official build by cloning the repository it should work, but if you're doing a local build in a repository you've been using for a long time the recent tags will be missing and it will get the version number wrong.

The only ways I can see to reliably keep the version number in the code in sync with the tag on github are 1) update it by hand, or 2) have something that runs on github to automatically update the code whenever you create a release.

In OpenMM we update the version number by hand, but then check for consistency in the recipe to build conda packages. If we forget to update the version number, the conda build will fail until we fix it.

https://github.com/conda-forge/openmm-feedstock/blob/8dfb41fddc4f2561d1adc29d9ace70656bb3045b/recipe/test_openmm.sh#L48-L56

@stefdoerr
Copy link
Collaborator

stefdoerr commented Feb 24, 2024

I don't understand why it's an issue. If you git pull it pulls all the tags from remote. I've never had an issue with this. If you do local builds with untagged commits, versioneer also adds a dirty-COMMITHASH to the version to indicate that you are not on a specific version.

(htmd)  sdoerr@yoga~/Work/moleculekit   mainipython
In [1]: from moleculekit import __version__

In [2]: __version__
Out[2]: '1.8.21+3.g9f6320d'

@peastman
Copy link
Collaborator

If you git pull it pulls all the tags from remote.

It only pulls tags if you specify the --tags option. Here's the tags in my local repository.

$ git tag
0.0.1
0.1.0
0.1.1
0.1.2
0.1.3
0.1.4
0.1.5
0.2.0
0.2.1
0.2.2
0.2.3
0.2.4

@stefdoerr
Copy link
Collaborator

stefdoerr commented Feb 26, 2024

Are we on different git versions? For me git pull works fine pulling tags

@RaulPPelaez
Copy link
Collaborator Author

This also slipped past me because I use magit, which pulls tags explicitly.
Realistically versioneer cannot access more information than my initial solution, in the end only the git tag contains version information.
tbh it starts to look like manual version is the best option. I like the idea of having a CI test checking the version, but if the check is done post-release then if one releases with an incorrect version it stays there forever.

What about this? https://github.com/marketplace/actions/auto-release
This would make it so that a commit updating the version number is what creates a release automatically.

@stefdoerr
Copy link
Collaborator

Really, I don't see the issue. Git pull pulls the tags fine as I show above. If the issue only happens on forks (?) then make an alias if you want for git pull --tags. Seems to me like a very minor issue to end up having to manually write versions into files.

@RaulPPelaez
Copy link
Collaborator Author

Still, if we make the version depend on the git info, local installs could register things like "v2.5.3+some_hash-dirty" into checkpoints.
At least some additional code to handle that is needed.

@stefdoerr
Copy link
Collaborator

stefdoerr commented Feb 26, 2024

Why would that cause an issue though? It is the exact version of that commit. If you prefer not to have the hash-dirty parse it out in torchmd-net when it does the version checks between ckpt and current version

@peastman
Copy link
Collaborator

Git pull pulls the tags fine as I show above.

Git does not pull tags as I show above! :) I can't explain why it works one way for you and a different way for me. But we clearly can't assume it will.

I just checked my local torchmd-net repository on a different computer, and that one has no tags at all.

@stefdoerr
Copy link
Collaborator

stefdoerr commented Feb 26, 2024

Can you check git --version? I'm curious now. I'm on 2.34.1

@peastman
Copy link
Collaborator

On my laptop:

$ git --version
git version 2.39.2 (Apple Git-143)

And on a server:

$ git --version
git version 2.34.1

@peastman
Copy link
Collaborator

Did you perhaps clone the repository from the master torchmd-net repo rather than a personal fork?

@stefdoerr
Copy link
Collaborator

But you are on a fork, correct? So I assume you are also running git fetch upstream?
I just feel like it's a bit of an edge case which users won't normally encounter. I never had that issue before and this issue only affects some devs.

@stefdoerr
Copy link
Collaborator

Yes I'm not on a fork. On a fork I assume it requires a few more steps to get everything in sync.

@peastman
Copy link
Collaborator

Whenever I want to pull the latest code I do

git pull upstream main
git push

@peastman
Copy link
Collaborator

this issue only affects some devs.

It affects anyone who creates a fork.

Possibly this really indicates a deeper issue with this PR: do we really not intend to provide any backward compatibility at all for models? If I train a model and post it for people to use, is it really expected that my model will only work with the exact version of torchmd-net I trained it with and no other?

@stefdoerr
Copy link
Collaborator

git fetch upstream seems to fetch the tags fine for me. Give it a try.

About the backwards comp I don't know.

The reason I'm arguing against hard-coding versions is that it's a chore and people forget to do their chores. If it takes one more command to fetch tags and it only affects developers I think it's better to not switch to hard-coded versions. Considering it's already two commands to pull from upstream, might as well make it three and throw them into an alias :)

@RaulPPelaez
Copy link
Collaborator Author

Possibly this really indicates a deeper issue with this PR: do we really not intend to provide any backward compatibility at all for models? If I train a model and post it for people to use, is it really expected that my model will only work with the exact version of torchmd-net I trained it with and no other?

Its not like we do not want to provide backwards comp. The code in this PR only prints a warning. Some default behaviors might change, some bugs might be fixed or names changed...
For instance, last time I checked, this repo https://github.com/torchmd/torchmd-protein-thermodynamics cannot run with the latest version. It was written for something like 0.2.0, where some hparams had slightly different names.
The changes are easy enough, but not having version information in the hparams makes it so one has to do a bit of archeology instead of simply "conda install torchmdnet==some_version".

Its intended to be simply informative.

@RaulPPelaez
Copy link
Collaborator Author

I will release version 2.0.0 soon, from which point we will be using semantic versioning, so we can relax the version comparison then. Perhaps not even printing a warning if a checkpoint with 2.1.0 is loaded in any 2.x.y version.
I would like to add the version to the checkpoints at that point.

@peastman
Copy link
Collaborator

I want to train a model and post it online for other people to use. Is that supposed to be a supported use case? A few years from now, will people who want to use it be expected to install obsolete versions of torchmd-net and all its dependencies? Will that even be possible? (New GPUs will require new versions of CUDA, which will require new versions of PyTorch.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants