(Question) Can I use this for protein homology to UniRef50? #144

jolespin · 2024-03-02T05:56:28Z

I'm looking for a faster alternative to Diamond for aligning proteins to UniRef50 so I can map identifiers to de novo proteins.

Can I use this tool to accomplish this task?

mheinzinger · 2024-03-05T16:51:27Z

Not out of the box, no.
There are approaches which use embedding (distance) from pLMs for remote homolgy detection s.a. (really not a full list just an excerpt that just came to my mind with the latter being from our group (disclaimer)):

You can use the code base provided in the first link to align proteins and/or you can use the recipe described in the latter link to find remote homologs. My 2 cents: if you really want to align proteins I do not think that embeddings will give you a speed up (at least, I am not aware of an implementation that would a) generate embeddings and b) align them to some DB in less time than MMSeqs2/Diamond. What embeddings might give you is some fast pre-filter if you have your DB already pre-computed (see second link for details).
But I would probably just use foldseek (potentially together with predicted 3Di if you care about speed- disclaimer#2 also from us --> https://github.com/mheinzinger/ProstT5/tree/main/scripts ).

jolespin · 2024-03-05T18:38:02Z

Ok this is very useful information.

What embeddings might give you is some fast pre-filter if you have your DB already pre-computed
I'll look into your paper for more details but I'm a bit confused. Let's say you have a model that uses protein embeddings for UniRef50. When you're saying having the DB pre-computed are you referring to the query proteins or the reference proteins or both?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Question) Can I use this for protein homology to UniRef50? #144

(Question) Can I use this for protein homology to UniRef50? #144

jolespin commented Mar 2, 2024

mheinzinger commented Mar 5, 2024 •

edited

Loading

jolespin commented Mar 5, 2024

(Question) Can I use this for protein homology to UniRef50? #144

(Question) Can I use this for protein homology to UniRef50? #144

Comments

jolespin commented Mar 2, 2024

mheinzinger commented Mar 5, 2024 • edited Loading

jolespin commented Mar 5, 2024

mheinzinger commented Mar 5, 2024 •

edited

Loading