Feasible approach to build a large database #159

yzlwk · 2024-04-22T04:20:51Z

Hello, I am trying to build a database from the NCBI nr FASTA (707338897 entries) for more extensive protein search. I have tried to split up the FASTA into smaller chunks (about 250 entries per run) and combine the result npy files. Larger chunks result in frequent GPU memory issue. I only have access to a 24GB GPU. However, it seems that this will take forever to finish (~ 3 years). I am wondering if there is any method to speed up this process.

mortonjt · 2024-04-22T11:46:14Z

Hi, no that is not feasible. Youd need a much larger GPU cluster to encode that many proteins. We have considered using ESM2 instead of Protrans, then that could take advantage of the 700M proteins in mgnify. That'll require retraining both DeepBLAST and TMvec with the ESM2 model.

…

On Mon, Apr 22, 2024 at 12:21 AM Bryant ***@***.***> wrote: Hello, I am trying to build a database from the NCBI nr FASTA (707338897 entries) for more extensive protein search. I have tried to split up the FASTA into smaller chunks (about 250 entries per run) and combine the result npy files. Larger chunks result in frequent GPU memory issue. I only have access to a 24GB GPU. However, it seems that this will take forever to finish (~ 3 years). I am wondering if there is any method to speed up this process. — Reply to this email directly, view it on GitHub <#159>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA75VXN5SNPMHBRIRPZPDYDY6SF3RAVCNFSM6AAAAABGSAGZQCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI2TKNJUGM3TENQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feasible approach to build a large database #159

Feasible approach to build a large database #159

yzlwk commented Apr 22, 2024

mortonjt commented Apr 22, 2024 via email

Feasible approach to build a large database #159

Feasible approach to build a large database #159

Comments

yzlwk commented Apr 22, 2024

mortonjt commented Apr 22, 2024 via email