You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I am trying to build a database from the NCBI nr FASTA (707338897 entries) for more extensive protein search. I have tried to split up the FASTA into smaller chunks (about 250 entries per run) and combine the result npy files. Larger chunks result in frequent GPU memory issue. I only have access to a 24GB GPU. However, it seems that this will take forever to finish (~ 3 years). I am wondering if there is any method to speed up this process.
The text was updated successfully, but these errors were encountered:
Hi, no that is not feasible. Youd need a much larger GPU cluster to encode
that many proteins.
We have considered using ESM2 instead of Protrans, then that could take
advantage of the 700M proteins in mgnify. That'll require retraining both
DeepBLAST and TMvec with the ESM2 model.
On Mon, Apr 22, 2024 at 12:21 AM Bryant ***@***.***> wrote:
Hello, I am trying to build a database from the NCBI nr FASTA (707338897
entries) for more extensive protein search. I have tried to split up the
FASTA into smaller chunks (about 250 entries per run) and combine the
result npy files. Larger chunks result in frequent GPU memory issue. I only
have access to a 24GB GPU. However, it seems that this will take forever to
finish (~ 3 years). I am wondering if there is any method to speed up this
process.
—
Reply to this email directly, view it on GitHub
<#159>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA75VXN5SNPMHBRIRPZPDYDY6SF3RAVCNFSM6AAAAABGSAGZQCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI2TKNJUGM3TENQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
Hello, I am trying to build a database from the NCBI nr FASTA (707338897 entries) for more extensive protein search. I have tried to split up the FASTA into smaller chunks (about 250 entries per run) and combine the result npy files. Larger chunks result in frequent GPU memory issue. I only have access to a 24GB GPU. However, it seems that this will take forever to finish (~ 3 years). I am wondering if there is any method to speed up this process.
The text was updated successfully, but these errors were encountered: