Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up via xformers #120

Open
mortonjt opened this issue Jun 14, 2023 · 6 comments
Open

Speed up via xformers #120

mortonjt opened this issue Jun 14, 2023 · 6 comments

Comments

@mortonjt
Copy link

Just in case you weren't familiar with this, there is an xformers library that can allow for a >4x speed up on all transformer operations
https://github.com/facebookresearch/xformers

Could be low hanging fruit to speed up the operations in this library :)

@mheinzinger
Copy link
Collaborator

mheinzinger commented Aug 28, 2023

Hi Jamie,

thanks for reaching out! - I wanted to try this before answering but obviously it took me way too long. I already gave it a shot few weeks ago but failed to reach some significant speed-up but maybe I did something wrong (used it for translation on the new ProstT5 model).

Have you made positive experience with this using some protein language models?

@mortonjt
Copy link
Author

Sorry to hear about that. I haven't tried this out yet for protein LLMs (only tested it out on stable-diffusion), but it is on my radar. Hoping that it could be useful for inference and speed up the embedding calculations (which we're noticing is a bottleneck for protein annotation)

@mheinzinger
Copy link
Collaborator

Hm, how many proteins are you trying to label? - From my experience ProtT5-XL-U50 encoder-only in half-precision using batching as described here reaches around 0.1s/protein on average for the 20k proteins human (so around 30m for human).

@mheinzinger
Copy link
Collaborator

I had a brief look and I stopped once I hit the following error: AttributeError: 'FeatureExtractionPipeline' object has no attribute 'enable_xformers_memory_efficient_attention' (tried to extract embeddings from the ProtT5-XL-U50-fp16 model from my link in the post above).
So not sure whether it is as easily plug-n-play as I had hoped. In case you find some example/tutorial that shows how this should be done for plain Transformers (no diffusion etc), pls send by and I can give it a try. So far, I only found tutorials on how to use this on diffusion models in huggingface (but most likely I just missed the right source)

@mortonjt
Copy link
Author

mortonjt commented Aug 28, 2023 via email

@mheinzinger
Copy link
Collaborator

mheinzinger commented Aug 28, 2023

Yeah, I see your point. We also ran UniRef50 at one point but only to make predictions, not for embedding extraction (esp. as storing those embeddings becomes expensive quickly).
Only things I can recommend (probably obvious but still):

  • sort & process sequences by length to avoid padding.
  • batch-processing & half-precision to max-out batch sizes
  • If you write chunks of proteins (after sorting by length), e.g., splitting UniRef50 into 50 chunks á 1M proteins, you can parallelize embedding extraction on multiple GPUs
  • Set a upper length limit. Long proteins are the main reason for slow-down. If you only remove proteins longer than the AlphaFold-DB length limit (>1280 residues), you can already reduce the average embedding time from 0.1s/protein to 0.035s/protein for the human proteome while loosing only 5% of the data (19k/20k human proteins are <1280 residues)
  • Maybe check TensorRT T5-example (no experience, though)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants