Skip to content

lyons89/awesome-proteomics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 

Repository files navigation

Awesome Awesome Awesome

awesome-proteomics

An awesome list of proteomics tools and resources.

I've been analyzing large scale proteomics data sets for over 5 years now. I've recently stumbled upon these awesome lists and wanted to make an one for proteomics.

Proteomics

Table of Contents

1A. Learning Resources - Proteomics
1B. Learning Resources - Programming
2. Databases
3. Raw data search software/algorithms
4. Assorted Pipeline Tools
5. Raw Data Analysis
6. Stastical Analysis
7. Protein Pathway Enrichment
8. Kinase Motif/Activity Analysis
9. Top down data analysis
10. Multi-Omics data analysis

1A. Leaning Resources - Proteomics

Ben Orsburn has hands down the best protomeics blog I've ever seen. Ben is very knowledgable with a great sense of humor.

Phil Wilmarth has alot of goodlearning resources. He was python scripts for raw data analysis as well as blog posts detailing TMT data analysis techniques.

biostars is like stackoverflow but only for bioinformatics, has forums for questions, job postings and tutorials.

Review article about protoemics.

Tutorial videos from NCQBCS, a project led by the Coon lab. Contains lots of information regarding experimental design, ionization, quantitative proteomics, analysis, post-translational modifications and more.

ASMS video mass spec channel contains a lot of videos from leading researchers in the protoemics field.

Videos from Nikolai most of his videos focus on single cell proteomics, and DIA.

Videos from Matthew Padula lots of great videos on the basis of mass spectrometry and proteomics.

MayInstuite Computational proteomics short courses organized by Olga Vitek. They also have ALOT of videos on youtube.

1B. Learning Resources - Programming

R books large abundance of ebooks for learning R, from basic R, to advance R, Shiny and more!!

Are-we-learning-yet a resource to learn machine learning in rust. I've noticed rust is starting to become a more popular language, not only in the wild but also in proteomics (see Sage below).

conda/bioconda anaconda is a popular bioinformatics tool used mostly for python, and for some R, programming. Useful for creating reproducible enrivonments.

Intro Math if you're like me and it's been a few years since you've had to use math check out this repo to brush up on it and learn some R, Python and Julia.

python data science tips another resource to learn python/pandas with.

Mass Spec Coding Club great resource to learn python and then apply that knowledge to mass spectrometry.

2. Databases

ProteomeXchange is a global repository for raw MS data that contains links to all major databases, including MassIVE, Pride, iProX and more. Probably the best place to start.

Pastel BioScience has a database that contains staggering amounts information that I'm sure the rest of my awesome list will be redundant.

Uniprot - Has all the information you will ever need to know for individual proteins and the go to for protein FASTA databases.

Biogrid - protein-protein interaction database

KEGG - biological pathway database

Reactome - nicer looking biological pathway database

2021 - CPTAC - python/R - API interaface to publically available cancer datasets - paper

2021 - ppx - python - Python interface to proteomics data repositories - paper

3. Raw data search software/algorithms

2017 - Fragpipe - Java - It's a very fast search engine with a nice GUI. The software is modular, it consists of MSfragger the database search algorithm, Philosopher that analyzes the database results, as well as others for PTM and TMT integration.

2008 - MaxQuant is probably the most used and well known DDA software. Developed by Jurgon Cox, this completely free software is user friendly and is always being updated with new and original features. There is even a youtube that has tons of videos on how to use the software. - paper

2015 - Peptide-shaker is like the swiss army knife of search tools. You can search data with multiple search engines inclduing, comet, tide, andromeda, mascot, X!Tandem and more that I've never heard of. paper

2010 - skyline - software for targeted proteomics - paper

2012 - Comet - C++ - Free and open-source search engine, lately it's had several - paper

2020 - DIA-NN - C/C++ - free and open source search tool for DIA data that uses neural networks, works using either a library or a FASTA database. - paper

2023 - Sage - Rust - most likely the current fasest search engine, it's completely terminal based but if you learn to use it, it will be worth it - paper

4. Assorted pipeline Tools

2019 - MaxQuant Live (not sure where to put this) for real time monitoring of MS data and acquistion.

2009 - PAW_pipeline - python - a pretty much stock python raw file protoemics pipeline tool. It includes functions, to convert files, run comet, produce histograms. Can also do TMT - paper

2015 - Ursgal - python - combines multiple search engine algorithms, postprocessing algorithms, and stastis on the output from multiple search engines - paper1 paper2

2019 - DIAlignR - R - DIA retention time alignment of targetd MS data, including DIA and SWATH-MS - paper

2021 - Monocle - C# - for monoisotopic peak and accurate precursor m/z detection in shotgun proteomics experiments. - paper

2021 - RawBeans is a upgraded program of RawMeat. It's a raw data quaility control tool that help identify insturment issues relating to spray instability, problems with fragmentation or unequal loading. This program can be used on a stand alone PC or included in a pipeline. - paper

2021 - mokapot - python - Semisupervised Learning for Peptide Detection - paper

2021 - qcloud2 - cloud based quality control pipeline, can be integrated with nextflow and openMS - paper

2021 - DIAproteomics - python - a module that can be added to a openMS workflow for the analysis of DIA data - paper

5. Raw Data Analysis

2012/2020 - MSnbase - R - provides MS data structures, allows you to process, quantify, visualize raw data - paper

2015 - MaRaCluster - C++ - clustering technique to identify fragment spectra stemming from the same peptide species - paper

2015 - pyproteome - python - analyzes proteomics data, can filter, normalize, perform motif and pathway enrichment. Currently only supports ProteomeDiscoverer .msf search files - paper

2018 - pyteomics - python - proteomics framework tools - paper

2018 - RawTools - C# - quality control checking of raw files, can assist in method development and insturment quality control - paper

2018 - MSstatsQC - R - provides methods for multiple peptide monitoring using raw MS files, works for DDA and DIA data - paper

2018 - rawDiag - R - Package that can be used in conjustion with rawrr - paper

2020 - COSS - java - user-friendly spectral library search tool - paper

2021 - rawrr - R - A great package that can read in raw thermo files! Thats great to me, because I always find it tedious to convert a raw file into a mzML or mzXML file - paper

2021 - PSpecteR - R - User Friendly and Interactive for Visualizing the quality of Top-Down and Bottom-Up Proteomics - paper

2022 - RforMassSpectrometry - R - a massive project that contains multiple helpful packages including RforMassSpectrometry, MsExperiment, Spectra, QFeatures, PSMatch, Chromatograms, MsCoreUtils, and MetaboCoreUtils.

2023 - mpwR - R - package that allows you to directly compare the output of raw search engines such as MQ, DIANN, spectronaut and I think PD. It's also helpful if you're testing out different settings within your search engine and you want to quickly see how each performs. - paper

6. Stastical Analysis

2014 - MSstats - R - DDA/shotgun, bottom-up, SRM, DIA - paper

2018 - PaDuA - python - proteomics and phosphoproteomics data analysis - paper

2020 - MSstatsTMT - R - TMT shotgun proteomics - paper

2020 - proteiNorm - R - TMT and unlabeled, has multiple options for normalization and statistical analysis - paper

2020 - DEqMS - R - Developed ontop of limma, but takes into account variability in PSMs. Works on both labelled and unlabelled samples - paper

2021 - MSstatsPTM - labeled and unlabeled PTM data analysis - paper

PermFDP - R - Package to perform multiple hypothesis correction using permutation based FDP. One of the better performing methods for multiple test corrections. - paper the paper isn't on the tool, it's just a paper that uses it and compares it to other methods.

7. Protein Pathway Enrichment

2019 - fgsea - R - fast gene set enrichment analysis - paper

2019 - pathfindR - R - active subnetwork oriented pathway enrichment analyses that uses protein-protein ineteraction networks to enchance the standard pathway analysis method - paper

2020 - lipidR - R - lipidomics data analysis - paper

2021 - phosphoRWHN - R - pathway enrichment for phosphoproteomics data - paper

2021 - leapR - R - package for multiple pathway analysis - paper

8. Kinase Motif/Activity Analysis

2017 - KSEAapp - R - Kinase substrate enrichment analysis. I would recommend using with a freshly downloaded kinase-substarte database from phosphositeplus - paper

2015 - rnotifx - R - motif enrichment analyssis of PTMs on proteins, probably mostly used for phosphorylation - paper

9. Top down data analysis

2021 - ClipsMS - python - analysis of terminal and internal fragments in top-down mass spectrometry data - paper

10. Multi-Omics data analysis

2015 - moCluster - R - Integration of multiple omics datasets to identify patterns - paper

2019 - MOGSA - R - Multiple omics data integrative clustering and gene set analysis - paper

Miscellaneous

cytoscape - visualizing protein-protein interaction netweorks

2012 - ProteoWizard - Great software for converting one MS file type to another. I mostly use it ot convert thermo .raw files to mzML - paper

2019 - IPSC - Interactive Peptide Spectrum Annotator, web based utility for shotgun mass spectrum annotation - paper

2020 - PeCorA - R - peptide correlation analysis - paper

2021 - ProteaseGuru - C# - tool for In Silico Database Digestion, optimize bottom up experiments - paper

2021 - DeepLC - python - predicts retention times for peptides that have unseen modifications - paper

About

An awesome list of proteomics tools and resources

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published