ProTrek is a tri-modal protein language model that jointly models protein sequence, structure and function (SSF).
Guide:
-
Obtain the necessary data: Researchers should obtain the protein sequences, structures, and functions that they want to analyze using ProTrek. This data can be obtained from various sources such as databases like PDB, UniProt, or Swiss-Prot.
-
Preprocess the data: The data should be preprocessed to ensure it is in a format that ProTrek can use. This may involve cleaning the data, removing duplicates, and formatting the sequences, structures, and functions appropriately.
-
Configure ProTrek: Researchers should configure the parameters of ProTrek according to their specific needs. The configuration process will vary depending on the platform being used, but it is typically straightforward. Once configured, researchers should save the configuration for future use.
-
Run ProTrek: After preprocessing the data and configuring ProTrek, researchers can run the model using the preprocessed data. ProTrek will automatically perform contrastive learning with three core alignment strategies (using structure as the supervision signal for AA sequences and vice versa, mutual supervision between sequences and functions, and mutual supervision between structures and functions) to tightly associate sequence, structure, and function.
-
Analyze the results: Once ProTrek has finished running, researchers can analyze the results to identify potential drug targets and design more effective therapeutics. The model's performance in sequence-function and function-sequence retrieval, as well as its speed and accuracy in protein-protein search, will enable researchers to quickly and accurately identify relevant protein interactions.
-
Iterate and refine: As with any machine learning model, ProTrek can be improved through iterative refinement. Researchers should continue to evaluate the model's performance on new data and adjust the parameters as needed to optimize its accuracy.
https://github.com/westlake-repl/ProTrek
https://github.com/amelie-iska/EnzymeFlow
https://github.com/amelie-iska/NucleusDiff
https://github.com/amelie-iska/boltz
https://github.com/amelie-iska/syntheseus
https://github.com/amelie-iska/ChemLactica
https://github.com/chaidiscovery/chai-lab
https://github.com/amelie-iska/celltraj
https://github.com/amelie-iska/ChatCell
https://github.com/amelie-iska/mdgen
https://github.com/amelie-iska/cellxgene
https://github.com/amelie-iska/RFdiffusion
https://github.com/amelie-iska/ProtRAP-LM
https://github.com/amelie-iska/RGFN
https://github.com/amelie-iska/PepGLAD
https://github.com/amelie-iska/scDiff
https://github.com/amelie-iska/protein_generator
https://github.com/amelie-iska/AnyMolGenCritic
https://github.com/amelie-iska/ReactZyme
https://github.com/amelie-iska/ldmol
https://github.com/amelie-iska/OAReactDiff
https://github.com/amelie-iska/ProteinDT
https://github.com/amelie-iska/LigandMPNN
https://github.com/amelie-iska/DiffPepBuilder/tree/main
https://github.com/amelie-iska/MMseqs2-App
https://github.com/amelie-iska/DrugHIVE
https://github.com/amelie-iska/RetroGFN
https://github.com/amelie-iska/GIT-Mol
https://github.com/amelie-iska/rnaflow