Support for Phosphoproteomics datasets. #334

ypriverol · 2024-01-09T09:34:18Z

quantms 1.2 phosphoproteomics

In version 1.2 of quantms phosphoproteomics experiments can be done using the Luciphor2 tool. Phosphoproteomics experiments are analyzed in the following way:

Every search engine reports the phoshposite identified.
Luciphor2 is used to assess the phosphosite localization computing the FLR, apply filters 1% FDR at msrun level

The major issue we have found during the first iterations of the use of the tool:

Luciphor2 split each PSMs in groups by charge state and to perform the model, it needs at least 50 PSM by category (this parameter is configurable but the recommended is at least 50).

Note: This triggers a major issues for categories like 4,5 charge state where the algorithm may get small number of PSMs and the model fails to produce the statistics for each phosphosite.
Even if we reduce the number of PSMs, we will need to test the impact on the results when a lower number is used.

Because of this issue, we may need to explore how to group multiple runs into one in order to increase the number of psms by charge state, we have two options here:

Perform Luciphor2 at the level of experiment, grouping all the msruns from all samples all together. This is the most accurate approach statistically because the FLRs will be computed at the level of the entire experiment. However, it will be the most computationally expensive and for large CPTAC datasets probably not possible with Luciphor2, which may not be able to handle more than 100 RAW files and millions of PSMs. Luciphor2 has been tested in quantms only with less than 10 RAW files.
Group the msruns by sample, only RAW files from the same sample will be grouped, this may be difficult in TMT experiments but in LFQ it may be a solution between single msruns and the entire experiment.

Alternatives approaches:

We should explore the recent Alanine decoy approach, recently published in JPR. By adding decoy phosphosites we may be able to construct a TDA (target-decoy approach) with any of the available probability scores systems of OpenMS such as AScore or PhosphoRS.

Review the phospho scoring methods in OpenMS and benchmark them.
Enable Alanine as a possible phospho site. Definition here is needed in the OpenMS modification database.
Develop the TDA approach in OpenMS
Benchmark the approach against Luciphor2 strategy already existing in quantms 1.2.

Additionally, to develop a new TDA approach based on ARScore or PhosphoRS, we can adopt PTMPhrophet algorithm in the tool which will provide the framework for the to compute the Probabilities + a model for LFR approach. The disadvantage is that we may need to work with the PTMPhrophet team to make the tool a standalone tool to be included it in bioconda/biocontainers, also we may need to do the adapters for OpenMS.

Benchmark and results

We have a large collection of CPTAC datasets in phosphoproteomics well annotated that can be used to perform a reanalysis and generate a phospho map by tumor and cancer types. The focus of the benchmark will be pure technical:

Speed and performance, the final strategy in quantms 1.X must be faster and more scalable than the current Luciphor2 approach.
Results must be accurate and comparable with Luciphor2 approach and other tools such as MQ.

ypriverol added documentation Improvements or additions to documentation enhancement New feature or request high-priority labels Jan 9, 2024

ypriverol self-assigned this Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Phosphoproteomics datasets. #334

Support for Phosphoproteomics datasets. #334

ypriverol commented Jan 9, 2024 •

edited

Loading

Support for Phosphoproteomics datasets. #334

Support for Phosphoproteomics datasets. #334

Comments

ypriverol commented Jan 9, 2024 • edited Loading

quantms 1.2 phosphoproteomics

Alternatives approaches:

Benchmark and results

ypriverol commented Jan 9, 2024 •

edited

Loading