- Start_Here - This repository. Updated January 20, 2024.
-
mouse_lens_development_Khan2018_reanalysis - A thorough and detailed re-analysis of a TMT-labeled bottom-up quantitative proteomcis study. The experiment is tracking the developing mouse lens proteome at two embryonic ages (E15 and E18, in days) and postnatal ages (P0, P3, P6, and P9). The salient points are:
- doing quantitative proteomics without using ratios
- combining multi-plex TMT experiments
- understanding samples with a few highly abundant proteins
- understanding how data normalization and statistical testing results are coupled
- preparing results in ways that facilitate data exploration and discovery
-
PXD030990_human-tear_re-analysis - A re-analysis of human tear samples characterized in a single-shot experimental design. Tear has a few highly abundant proteins that makes deep proteome profiling without fractionation impossible. Single-shot experimental designs have gained popularity but they are much more limiting than seems to be realized. Proteomic depth is a case of getting out what you put in. A single LC run won't get you much. Short gradient single LC runs will get you even less.
-
Human_rhesus_TMT - Analysis discussion of a multi-sample, multi-fraction, multi-kit, multi-species TMTpro experiment. Details how to analyze a 21 rhesus samples, 24 human samples, 45 samples total, 17 channels per plex (15 plus 2 pooled standards) in 3 plexes labeled with TMTpro 18-plex reagents experiment.
-
quantitative_proteomics_data_cleaning - A discussion of basic data cleaning concepts for quantitative proteomics data and some useful notebook quality control (QC) metrics.
- pwilmart.github.io - Account website repository (mostly blog entries).
- TMT-zero-replacement - Blog about replacing missing data in TMT datasets. (Dec. 2018)
- MaxQuant performance - Blog about how well MaxQuant performs for PSM identification. (March 2019)
- Quant tool survey - Some thoughts about quantitative proteomics tools. (March 2019)
- pipeline performance - Some ideas about how to make pipelines more effective. (Sep. 2019)
- Humpty Dumpty - Bog about how to do quantification from shotgun proteomics data. (Sep. 2019)
- Orthologs and annotations - Blog about using ortholog mapping to annotate proteomics results. (Oct. 2019)
- TMT ratio distortion - Understanding TMT ratio distortions. (Jan. 2020)
- FASTA files and protein inference - Blog about how FASTA files and protein inference affect shotgun proteomics results. (Jan. 2020)
- Proteomics meta data - Thoughts about how to describe proteomic experimental designs. (Jan. 2020)
- Open Search - Some caveats to Open search data exploration. (July 2020)
- Soup to nuts - Overview of typical steps in peptide/protein ID pipelines. (July 2020)
- How to Excel - Why I use (and like) Excel. (July 2020)
- Houston, we have a discrepancy - Thoughts on the discrepancy between MSstatsTMT and my analyses. (July 2020)
- Unique peptides and shotgun quantification - How to use unique peptides in quantification. (Sep. 2020)
- Go big or go home? - Are we shooting ourselves in the foot with narrow tolerance database searching? (April 2021)
- Is MaxQuant holding back proteomics? - Should the most heavily used proteomics data analysis tool perform better? (May 2021)
- What makes the PAW pipeline different? - More of the history and design choices behind the PAW pipeline. (June 2021)
- TMT bad practices - Many established TMT data analysis methods are not really very good to use. (Dec. 2021)
GitHub markdown (and the auto rendering of repository README.md files as nice webpages) creates a fast way to do technical blogging. Supporting files and images are easier to add to a repository than to a formal website. Repositories can also be great for sharing presentations (meeting content or training resources).
- talk_to_repo_example - Tutorial on turning talks and posters into GitHub content. (Nov. 2019)
- Installing R kernel in Jupyter notebooks - How to add an R kernel to Jupyter notebooks. (Aug. 2021)
- Gene-set-enrichment_STRING-DB - Short tutorial on doing gene set enrichment with STRING-DB. (Sep. 2021)
- PRIDE_submission_tutorial - A guide to submitting PAW pipeline results to PRIDE. (May 2020)
- precursor_mass_corrections - Is monoisotopic peak picking for MS2 scans a problem that needs solving? (April 2021)
- score_distributions_FDR - Get your annoying tail out of my good scores! (April 2021)
- IRS_validation - Notebooks demonstrating how Internal Reference Scaling (IRS) in multiplex TMT experiments works. (Jan. 2019)
- human_tear_references - A summary of quantitative tear proteomics references up to April 2022. Stimulated tearing confounds (probably) all these studies. (April 2022)
- TMT_PAW_pipeline - Details about how TMT labeling is handled in the PAW pipeline. (Oct. 2022)
- TMT_channel_cross_talk - A deeper dive on adjacent channel cross talk for TMTpro 18-plex. How large is the effect and some pros and cons of correction. (Dec. 2022)
- Human-plasma_DIA-vs-TMT - An apples-to-aardvarks comparison of human plasma proteomes from DIA versus TMT. (Feb. 2023)
- PXD011691_reanalysis - Reanalysis of data from PXD011691 - another DIA versus TMT experiment. (Feb. 2023)
- quantitative_proteomics_data_cleaning - A discussion of basic data cleaning concepts for quantitative proteomics data and some useful notebook quality control (QC) metrics. (April 2023)
- Human_rhesus_TMT - Analysis discussion of a multi-sample, multi-fraction, multi-kit, multi-species TMTpro experiment. Details how to analyze a 21 rhesus samples, 24 human samples, 45 samples total, 17 channels per plex (15 plus 2 pooled standards) in 3 plexes labeled with TMTpro 18-plex reagents experiment. (Oct. 2023)
- PXD030990_human-tear_re-analysis - A re-analysis of human tear samples characterized in a single-shot experimental design. Tear has a few highly abundant proteins that makes deep proteome profiling without fractionation impossible. Single-shot experimental designs have gained popularity but they are much more limiting than seems to be realized. Proteomic depth is a case of getting out what you put in. A single LC run won't get you much. Short gradient single LC runs will get you even less. (Nov. 2023)
- mouse_lens_development_Khan2018_reanalysis - A thorough and detailed re-analysis of a TMT-labeled bottom-up quantitative proteomcis study. The experiment is tracking the developing mouse lens proteome at two embryonic ages (E15 and E18, in days) and postnatal ages (P0, P3, P6, and P9). The salient points are:
- doing quantitative proteomics without using ratios
- combining multi-plex TMT experiments
- understanding samples with a few highly abundant proteins
- understanding how data normalization and statistical testing results are coupled
- preparing results in ways that facilitate data exploration and discovery
(Jan. 2024)
-
PAW_pipeline - The PAW/Comet proteomics pipeline
-
fasta_utilities - Utilities for downloading and prepping FASTA files
-
utilities - Some miscellaneous utility scripts
-
annotations - Scripts for adding UniProt annotations to results lists
-
PAW_BLAST - Scripts for BLAST ortholog matching
-
Z-score_GUI - Script for sliding-window Z-score analyses
-
IRS_validation - Method overview and validation of Internal Reference Scaling normalization
- IRS_validation - Notebook with more background on the IRS method and experimental validation
- auto_finder_BIND-473 - Notebook showing how to verify that reference channels are correctly assigned
-
IRS_normalization - Example of the IRS method using developing mouse lens data
- understanding_IRS - Notebook covering normalization details
- statistical_testing - Notebook demonstrating edgeR testing
- statistical_testing_ratios - Notebook where ratios to the reference channel are computed and tested using limma
- statistical_testing_take2 - More statistical testing of P0 vs P3 time points
-
Plubell_2017_PAW - Reanalysis of original IRS MCP paper's data
-
PXD017823_Real-Time-Search - Analysis of some real time search SPS-MS3 TMT data from the Gygi Lab
- Schweppe_RTS_by-method - Compares the older SPS MS3 acquisition method to the new real time search (RTS) acquisition method (Note: RTS acquisition used a protein close-out feature to limit acquisition of peptides from abundant proteins)
- PXD017823_RTS_comparisons_IRS - Compares cell line expression with edgeR for the RTS data (not the IRS adjusted values)
- PXD017823_Regular_comparisons - Compares cell line expression with edgeR for the regular SPS MS3 data (not the IRS adjusted values)
- PXD017823_RTS_comparisons_IRS - Compares cell line expression with edgeR for the RTS data after IRS
- PXD017823_Regular_comparisons_IRS - Compares cell line expression with edgeR for the regular SPS MS3 data after IRS
-
MaxQuant_and_PAW - Comparison of PAW and MaxQuant with same TMT data (KUR1502 project)
- Comet/PAW - Single-plex TMT analysis using my Comet/PAW pipeline
- Comet/PAW edgeR vs t-test - Comparison of edgeR to t-test statistical testing
- Comet/PAW edgeR vs limma - Comparison of edgeR to limma statistical modeling
- Comet/PAW limma-voom - Comparison of edgeR to limma-voom (TMM normalization and trended variance)
- MaxQuant analysis - KUR1502 project analyzed with MaxQuant and edgeR
-
Multiple_TMT_MQ - Example of IRS method using MaxQuant analysis of the developing mouse lens data
- Multiple-TMT with MQ - Three TMT plexes combined with IRS using MaxQuant results
-
Dilution_series - Comparison of PSM, peptide, and protein level data for a dilution series of TMT-labeled mouse brain membrane proteins
- Dilution series notebook - Dilution series of mouse brain TMT-labeled digest
-
TMT_analysis_examples - Descriptions of various TMT analyses
- Another repository "switchyard"
-
Yeast_CarbonSources - Gygi Lab TMT data from yeast grown with different sugars
- Comet/PAW pipeline - Re-analysis of data using Comet/PAW pipeline
- MaxQuant - Re-analysis of same data with MaxQuant
-
Yeast_triple_KO_TMT - Gygi Lab yeast triple knockout (TKO) data
- Yeast TKO - Comparison of platforms and methods for understanding TMT interference
-
PXD001077_P-furiosus - Work in progress...
-
Metaplastic-BC_PXD014414 - Reanalysis of a 3-plex, 27-sample cancer study demonstrating IRS
- PXD014414_comparisons_major - Notebook for comparisons of normal, triple negative, and metaplastic samples
- PXD014414_comparisons_subtypes - Notebook for comparisons of metaplastic subtypes
- Some other statistical testing methods:
- PXD014414_comparisons_major_edgeR-glm - Uses the glm modeling in edgeR
- PXD014414_comparisons_major_TTEST - Uses a 2-sample t-test
- PXD014414_comparisons_major_limma - Uses limma on log2 intensities
- PXD014414_comparisons_major_voom-limma - Uses voom for variance estimate and limma
-
BCP-ALL_QE-TMT_Nat-Comm-2019 - A 3-plex, 27-sample MS2-based TMT study of cancer data
- Nat-Comm-2019_TMT_QE_pools - Uses single pooled channel for IRS method
- Nat-Comm-2019_TMT_QE_averages - Slightly better IRS results using plex averages
-
PXD013277_E-coli_spike-ins_MS2-TMT - E. coli proteome spiked into a human background
- PXD013277_comparisons_human - Notebook looking at unchanged human background
- PXD013277_comparisons_no-norm - Notebook with manually matched human protein levels between spike-in channels
- PXD013277_comparisons - Notebook with generic edgeR workup (TMM normalization and exact testing)
-
SPS-MS3_vs_MS2_TMT - Comparison of MS2 TMT and SPS MS3 TMT for same
MORG-75
data- MORG-75_combined - Unique use of IRS to combine TMT data between two Orbitrap platforms acquired with different methods
- MORG-75_Fusion - Analysis of the SPS MS3 data
- MORG-75_QE - Analysis of the Q Exactive MS2 TMT data
-
JPR-201712_MS2-MS3 - A comparison of MS2 and SPS MS3 data for an E. coli background (analysis of the unchanged background)
- First analysis - Original notebook of how similar the E. coli background is at PSM, peptide, and protein levels
- Second analysis - Newer notebook with a direct comparison of MS2 and SPS-MS3 data
- JPR-2017_serum - Notebook looking at depleted serum samples
-
PXD004163_Notebooks - More MS2 data to illustrate using notebooks
- PXD004163_comparisons_3-10 - Re-analysis with Comet/PAW for the IPG 3-10 range data (72-fractions)
- PXD004163_comparisons_3.7-4.9 - Re-analysis with Comet/PAW for the IPG 3.7-4.9 range data (72-fractions)
- PXD004163_comparisons_Mascot - Notebook loading in the Mascot Quantitative Summary data file
-
Smith_SpC_2018 - Quantitative analysis of large SpC dataset
- Smith_2018_edgeR - Notebook with spectral counting data in a paired study design using edgeR
-
Sea_lion_urine_SpC - Analysis of sea lion urine samples
- PXD009019_average_missing - Notebook to determine the low SpC cutoff value
- PXD009019_QC_check - Notebook with some quality control steps and outlier checking
- PXD009019_SpC_DE - Notebook with the SpS differential expression analysis
-
ABRF_iPRG_2015_SpC - Analysis of ABRF iPRG data from 2015 study
- ABRF_2015_edgeR - Spectral counting data analyzed with edgeR
-
ABRF_2020 - How to use GitHub to support scientific research and education
-
ASMS-2013_search-engine-comparison - How to compare database search engines
-
ASMS_2018 - The Internal Reference Scaling (IRS) method
-
Cascadia_2013 - Extended parsimony protein grouping
-
Cascadia_2018 - A tour of Jupyter notebooks
-
RECOMB-CP_2012 - How to analyze shotgun proteomics data
-
OHSU-Proteomics - Tutorials, etc. from the OHSU Proteomics core
-
TMT Publications - List of TMT papers (mostly OHSU) that use methods developed here
-
Cloud and desktop computing workflow timing - Some workflow timing data in support of this paper.