Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible supporting gpn series model? #1

Open
HelloWorldLTY opened this issue Sep 20, 2024 · 4 comments
Open

Possible supporting gpn series model? #1

HelloWorldLTY opened this issue Sep 20, 2024 · 4 comments

Comments

@HelloWorldLTY
Copy link

Hi, thanks for your great work. It seems that gpn model is not supported (gpn-msa, its new version supports human genome). Would you please consider including it? Thanks.

Traceback (most recent call last):
  File "/gpfs/radev/project/ying_rex/tl688/BEND/run_testgpn.py", line 10, in <module>
    embedder = bend.embedders.GPNEmbedder('songlab/gpn-msa-sapiens')
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/radev/project/ying_rex/tl688/BEND/bend/utils/embedders.py", line 63, in __init__
    self.load_model(*args, **kwargs)
  File "/gpfs/radev/project/ying_rex/tl688/BEND/bend/utils/embedders.py", line 128, in load_model
    self.tokenizer = AutoTokenizer.from_pretrained(model_name)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/radev/project/ying_rex/tl688/llm/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 913, in from_pretrained
    tokenizer_class_py, tokenizer_class_fast = TOKENIZER_MAPPING[type(config)]
                                               ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
  File "/gpfs/radev/project/ying_rex/tl688/llm/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 732, in __getitem__
    model_type = self._reverse_config_mapping[key.__name__]
KeyError: 'GPNRoFormerConfig'

This is the error message.

@miguelgondu
Copy link

Tagging the relevant people: @frederikkemarin @fteufel

@fteufel
Copy link

fteufel commented Sep 22, 2024

Hi,

yes, GPN-MSA is not supported. As indicated in the readme (https://github.com/MachineLearningLifeScience/BEND?tab=readme-ov-file#embedders-overview), the embedder is meant to be used with the A. thaliana/Brassicales models.

My last status was that it's unclear whether GPN-MSA is useful as an embedding model (https://x.com/gsbenegas/status/1727746984055083075). Supporting it in our embedders would be a bit more complicated, as it operates on MSAs, rather than single sequences.

@fteufel
Copy link

fteufel commented Sep 22, 2024

@frederikkemarin can you replace the repo here with a fork from the main repo in your account, so that it points back correctly and can be synced?

@HelloWorldLTY
Copy link
Author

Hi thanks a lot, that makes sense to me. I notice that there is one paper custome trained a GPN for human genome, and I tried that model with BEND which worked for me. Thanks a lot.

https://www.biorxiv.org/content/10.1101/2024.02.29.582810v1.full.pdf

@frederikkemarin frederikkemarin pinned this issue Nov 1, 2024
@frederikkemarin frederikkemarin unpinned this issue Nov 1, 2024
@miguelgondu miguelgondu transferred this issue from another repository Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants