Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Phosphoproteomics datasets. #334

Open
10 tasks
ypriverol opened this issue Jan 9, 2024 · 0 comments
Open
10 tasks

Support for Phosphoproteomics datasets. #334

ypriverol opened this issue Jan 9, 2024 · 0 comments
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request high-priority

Comments

@ypriverol
Copy link
Member

ypriverol commented Jan 9, 2024

quantms 1.2 phosphoproteomics

In version 1.2 of quantms phosphoproteomics experiments can be done using the Luciphor2 tool. Phosphoproteomics experiments are analyzed in the following way:

  • Every search engine reports the phoshposite identified.
  • Luciphor2 is used to assess the phosphosite localization computing the FLR, apply filters 1% FDR at msrun level

The major issue we have found during the first iterations of the use of the tool:

Luciphor2 split each PSMs in groups by charge state and to perform the model, it needs at least 50 PSM by category (this parameter is configurable but the recommended is at least 50).

  • Note: This triggers a major issues for categories like 4,5 charge state where the algorithm may get small number of PSMs and the model fails to produce the statistics for each phosphosite.
  • Even if we reduce the number of PSMs, we will need to test the impact on the results when a lower number is used.

Because of this issue, we may need to explore how to group multiple runs into one in order to increase the number of psms by charge state, we have two options here:

  • Perform Luciphor2 at the level of experiment, grouping all the msruns from all samples all together. This is the most accurate approach statistically because the FLRs will be computed at the level of the entire experiment. However, it will be the most computationally expensive and for large CPTAC datasets probably not possible with Luciphor2, which may not be able to handle more than 100 RAW files and millions of PSMs. Luciphor2 has been tested in quantms only with less than 10 RAW files.
  • Group the msruns by sample, only RAW files from the same sample will be grouped, this may be difficult in TMT experiments but in LFQ it may be a solution between single msruns and the entire experiment.

Alternatives approaches:

We should explore the recent Alanine decoy approach, recently published in JPR. By adding decoy phosphosites we may be able to construct a TDA (target-decoy approach) with any of the available probability scores systems of OpenMS such as AScore or PhosphoRS.

  • Review the phospho scoring methods in OpenMS and benchmark them.
  • Enable Alanine as a possible phospho site. Definition here is needed in the OpenMS modification database.
  • Develop the TDA approach in OpenMS
  • Benchmark the approach against Luciphor2 strategy already existing in quantms 1.2.

Additionally, to develop a new TDA approach based on ARScore or PhosphoRS, we can adopt PTMPhrophet algorithm in the tool which will provide the framework for the to compute the Probabilities + a model for LFR approach. The disadvantage is that we may need to work with the PTMPhrophet team to make the tool a standalone tool to be included it in bioconda/biocontainers, also we may need to do the adapters for OpenMS.

Benchmark and results

We have a large collection of CPTAC datasets in phosphoproteomics well annotated that can be used to perform a reanalysis and generate a phospho map by tumor and cancer types. The focus of the benchmark will be pure technical:

  • Speed and performance, the final strategy in quantms 1.X must be faster and more scalable than the current Luciphor2 approach.
  • Results must be accurate and comparable with Luciphor2 approach and other tools such as MQ.
@ypriverol ypriverol added documentation Improvements or additions to documentation enhancement New feature or request high-priority labels Jan 9, 2024
@ypriverol ypriverol self-assigned this Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request high-priority
Projects
None yet
Development

No branches or pull requests

1 participant