Encountering an Issue While Building a Queryable Protein Sequence Database #141

pengjk689 · 2023-09-26T05:27:46Z

Dear,

When I attempt to build the tmvec database using the entire human protein data from the UniProt database (208,022 entries), I encounter the following issue:

Traceback (most recent call last):
File "/home/pengjiak/miniconda3/envs/tmvec/bin/tmvec-build-database", line 110, in
encoded_database = encode(flat_seqs, model_deep, model, tokenizer, device)
File "/home/pengjiak/miniconda3/envs/tmvec/lib/python3.9/site-packages/tm_vec/tm_vec_utils.py", line 61, in encode
protrans_sequence = featurize_prottrans(sequences[i:i+1], model, tokenizer, device)
File "/home/pengjiak/miniconda3/envs/tmvec/lib/python3.9/site-packages/tm_vec/tm_vec_utils.py", line 24, in featurize_prottrans
embedding = model(input_ids=input_ids, attention_mask=attention_mask)
File "/home/pengjiak/miniconda3/envs/tmvec/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/pengjiak/miniconda3/envs/tmvec/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 1964, in forward
encoder_outputs = self.encoder(
File "/home/pengjiak/miniconda3/envs/tmvec/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/pengjiak/miniconda3/envs/tmvec/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 1123, in forward
layer_outputs = layer_module(
File "/home/pengjiak/miniconda3/envs/tmvec/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/pengjiak/miniconda3/envs/tmvec/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 695, in forward
self_attention_outputs = self.layer[0](
File "/home/pengjiak/miniconda3/envs/tmvec/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/pengjiak/miniconda3/envs/tmvec/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 602, in forward
attention_output = self.SelfAttention(
File "/home/pengjiak/miniconda3/envs/tmvec/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/pengjiak/miniconda3/envs/tmvec/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 552, in forward
position_bias = position_bias + mask # (batch_size, n_heads, seq_length, key_length)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 25.09 GiB (GPU 0; 79.18 GiB total capacity; 55.53 GiB already allocated; 22.96 GiB free; 55.73 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

It's worth noting that I encounter this issue even when attempting to build the database with 20,000 protein FASTA files, which is quite perplexing. Looking forward to your response.

pengjk689 · 2023-09-26T05:31:05Z

here my code:
tmvec-build-database
--input-fasta human_swissProt.fa
--tm-vec-model ${path}/tm_vec_cath_model.ckpt
--tm-vec-config-path ${path}/tm_vec_cath_model_params.json
--output human_swissProt_database
--protrans-model ${path}/prot_t5_xl_uniref50
--device 'gpu' \

mortonjt · 2023-11-11T03:24:55Z

Hi, we cannot currently do that from the CLI -- you'll need to batch it into smaller chunks, encode and create the database from the encodings (otherwise you'll cram too much into gpu memory). We'll try to streamline this in a follow up release

gbrsales mentioned this issue May 14, 2024

Pretraind model download #160

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encountering an Issue While Building a Queryable Protein Sequence Database #141

Encountering an Issue While Building a Queryable Protein Sequence Database #141

pengjk689 commented Sep 26, 2023

pengjk689 commented Sep 26, 2023

mortonjt commented Nov 11, 2023 •

edited

Loading

Encountering an Issue While Building a Queryable Protein Sequence Database #141

Encountering an Issue While Building a Queryable Protein Sequence Database #141

Comments

pengjk689 commented Sep 26, 2023

pengjk689 commented Sep 26, 2023

mortonjt commented Nov 11, 2023 • edited Loading

mortonjt commented Nov 11, 2023 •

edited

Loading