Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(l2gmodel): store features list metadata as instance attribute #979

Merged
merged 11 commits into from
Jan 30, 2025

Conversation

ireneisdoomed
Copy link
Contributor

@ireneisdoomed ireneisdoomed commented Jan 28, 2025

✨ Context

The list of features that the L2G model has been trained on is a crucial part of the model metadata. The order in which those features have to be passed to be model is also important and shall be maintained so that L2G produces accurate results.

To accomplish this, I've added a new instance attribute features_list to the LocusToGeneModel class. Any method that needs this information will fetch it now from the model's metadata, not from the step config.

This is to avoid incompatibilities between the list of features provided in the package config and older versions of the model.

This PR was motivated by an issue observed in #939

🛠 What does this PR implement

  • Addition of LocusToGeneModel.features_list as instance attribute.
  • Everytime a model is loaded, this attribute has to be populated. Now, there are 2 ways to load a model:
    1. From the Hub. The feature list is model metadata, so the list is loaded from this metadata. This is the preferred method, because it ensures that the list of features match the ones during training
    2. From a file. Here we lack that metadata, so we populate the default list from the config, which could have backwards compatibility issues.
  • The features_list parameter from the config is then used:
    1. In the training step, to select features to include
    2. In the predicition step, only if the model is not loaded from the Hub

🙈 Missing

🚦 Before submitting

  • Do these changes cover one single feature (one change at a time)?
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes?
  • Did you make sure there is no commented out code in this PR?
  • Did you follow conventional commits standards in PR title and commit messages?
  • Did you make sure the branch is up-to-date with the dev branch?
  • Did you write any new necessary tests?
  • Did you make sure the changes pass local tests (make test)?
  • Did you make sure the changes pass pre-commit rules (e.g uv run pre-commit run --all-files)?

@ireneisdoomed ireneisdoomed marked this pull request as ready for review January 28, 2025 16:37
@@ -5,7 +5,7 @@ ci:
autofix_commit_msg: "chore: pre-commit auto fixes [...]"
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.9.3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the artifact from the .pre-commit update?

Copy link
Contributor

@project-defiant project-defiant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To sum up of what I understood:
The feature_list is derived from the config.json file generated for the model in the repository.

This is a correct approach from what I can read from the skops package documentation. The only inconvenience is (as you have mentioned), when someone wants to use the older model, then they must provide the features list to use.

Two additional points:

  • This is a great approach, moreover I would also allow the user to add the config.json in the skops format, since they are responsible for running the model predictions, hence they should make sure that the feature_list is correct.
  • Not sure if it already is in the place but, make sure that all of the features are in the FM and in correct order provided by the user (or metadata).

I wonder if we could make an update to the skops to allow for referencing model by the commit. Mayby it is a good time to make this as a suggestion in the issue ?

I can take a look in free time next week.

src/gentropy/l2g.py Show resolved Hide resolved
@ireneisdoomed
Copy link
Contributor Author

ireneisdoomed commented Jan 30, 2025

Ty for the comments!

I wonder if we could make an update to the skops to allow for referencing model by the commit

That would be nice, but loading a model that is not downloaded from the Hub would still have the issue. I think that your suggestion of adding a model metadata when we dump the model is good!

@ireneisdoomed ireneisdoomed merged commit e7f7945 into dev Jan 30, 2025
7 checks passed
@ireneisdoomed ireneisdoomed deleted the il-l2g-feature-list-injection branch January 30, 2025 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants