Skip to content

The QueGO pipeline identifies potential homologs to proteins related to specific functionality.

License

Notifications You must be signed in to change notification settings

PombertLab/QueGO

Repository files navigation

QueGO (Query Gene Ontologies)

The QueGO pipeline identifies potential homologs to proteins related to specific functionality. Protein sequences and/or 3D structures are automatically downloaded from the UniProt Knowledgebase (UniProtKB) for proteins having Gene Ontology (GO) term descriptions that contain a desired keyword. The downloaded protein data are searched against the provided protein data using sequence- (DIAMOND) and/or 3D-based (Foldseek or GESAMT) homology, producing a list of potential homologs.

Why use QueGO?

With the ever increasing amount of genome data, identifying all proteins relevant to functions of interest can be time consuming, error prone, and extremely diffucult to reproduce between research groups. QueGO automates these searches, returning and downloading identical results given the same input parameters (if no changes to the source database), as well as removes the point-and-click download required by manual curation, increasing the speed of data acquisition.

What are Gene Ontologies?

The Gene Ontology Consortium has worked to create a class-based system of internationally agreed upon terms used to describe the roles and functions that a protein may take part in. Each term is a member of a class, and expands the specificity of the functional description of the class it belongs to, with all terms being rooted by one of three base terms: biological process, cellular component, or molecular function.

References

Sensitive protein alignments at tree-of-life scale using DIAMOND. Buchfink, B., Reuter, K., & Drost, H. Nat Methods 18, 366–368 (2021) DOI: 10.1038/s41592-021-01101-x

Foldseek: fast and accurate protein structure search Michel van Kempen, Stephanie S. Kim, Charlotte Tumescheit, Milot Mirdita, Johannes Söding, Martin Steinegger bioRxiv 2022.02.07.479398; DOI: 10.1101/2022.02.07.479398

UniProt: the universal protein knowledgebase in 2021. The UniProt Consortium, Nucleic Acids Research, Volume 49, Issue D1, 8 January 2021, Pages D480–D489, DOI: 10.1093/nar/gkaa1100

About

The QueGO pipeline identifies potential homologs to proteins related to specific functionality.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published