LFQ MBR FDR algorithm needed. #303

ypriverol · 2023-10-13T08:53:06Z

Description of the Feature

During the benchmark of quantms using LFQ and MBR (issues #300 #301 #287) we developed a new probabilistic algorithm based on SVM that control the number of false positives in a better way than previous proteomicsLFQ algorithm (based on number of samples where the feature is found).

However, the current algorithm produces better reliable results issues #301 #287 we should aim in ProteomicsLFQ a better FDR control algorithm that only use one parameter. In addition, would be great to improve the algorithm and feature detection. From my point of view, these are the priorities for that algorithm:

Implement an FDR-based approach for MBR reducing the number of parameters.
Improve the feature detection, including the possibility to do feature transfer across any msrun in the experiment. I think OpenMS only transfer features across samples in the same condition, however MQ uses all msruns in the experiment, which may be the source of the differences between tools.
Implement the MBRs for TMT datasets similar to the following manuscript https://pubs.acs.org/doi/10.1021/acs.jproteome.0c00209

We can discuss the details @timosachsenberg @jpfeuffer @daichengxin.

Command used and terminal output

No response

Relevant files

No response

System information

No response

timosachsenberg · 2023-10-13T09:52:41Z

I think it should transfer between all files of the same fraction number already.
I think our settings are a bit conservative to not inflate the transfer FDR too much. A more data driven approach would be great here.

E.g., I could imagine that we could

determine most similar runs (e.g., aka mapalingertreeguided)
train classifier on identified target and decoy (mass offset) features to model correct transfer and wrong transfer (to offset feature).
use classifier in FeatureLinkderUnlabeledQT to annotate linking p-values
figure out a way how to filter those to attain a global transfer FDR

@jpfeuffer and @cbielow what do you think?

More scalable alternatives would be approaches like IonQuant or Sage.

ypriverol · 2023-10-13T10:39:29Z

I think it should transfer between all files of the same fraction number already. I think our settings are a bit conservative to not inflate the transfer FDR too much. A more data driven approach would be great here.

I'm probably wrong but MQ do not care much about fraction identifiers, they do transfer also across fractions. My guess is based on the assumption that MQ do not know what raw file belongs to what fraction.

cbielow · 2023-10-13T13:11:14Z

I'm probably wrong but MQ do not care much about fraction identifiers, they do transfer also across fractions. My guess is based on the assumption that MQ do not know what raw file belongs to what fraction.

Actually, MQ only transfers ID's across fractions which are at most 1 fraction apart. Hence you also have to tell MQ about the fraction number in the experimental design.
Of course, if you simply "forget" to annotate fractions in MQ, then it will transfer whatever it can across all runs (and incur a massive false positive rate...)

ypriverol · 2023-10-13T13:18:30Z

This is interesting @cbielow. Nice discussion. I have seen a lot of experiments not providing fraction information. Do the FDR algorithm of MQ @cbielow correct that, or the FDR will be inflated (if that is the case, do you have a paper reference or some data to show that?)

cbielow · 2023-10-13T13:50:53Z

I only have very old data (and I would need to dig a lot to find it) and anecdotal evidence.

there is https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7346880/ which does not discuss fractions, but shows that MQ FDR is not kept at bay, unless you enable the MQ LFQ algorithm.

There is also a discussion on the MQ mailing list on this: https://groups.google.com/g/maxquant-list/c/a9bZMUeSE7Y/m/J6Rw174oCAAJ
Even in newer MQ versions, the XML config still has <matchBetweenRunsFdr>False</matchBetweenRunsFdr> by default, with no way of enabling it in the GUI and its hard to find any documentation on the topic. So it seems MQ is not very confident about this and disables it.

There is also https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8131922/ which describes an FDR method, but which can be augmented with more data to make it better IMHO. The paper also uses MQ 1.6 which is rather old.

jpfeuffer · 2023-10-13T16:39:57Z

Good ideas. The problem with the last approach is that it is very costly with our current data structures.
I think we would need a binned and indexed representation of an experiment to make this viable (see flashlfq or sage).

And I think we might need to dissect the FFID API to be able to extract single features on demand. Currently it is very focussed on processing a full set of predefined IDs.

jpfeuffer · 2023-10-13T17:00:18Z

Not saying it can't be done. @timosachsenberg and me were just thinking about potentially faster or easier to implement ways

timosachsenberg · 2023-10-14T09:09:43Z

Btw interesting that the lfq algorithm (did not look into detail but think it is maxlfq) seems to correct for some wrong linking. Can probably be seen as a robust summarization method.

ypriverol added the bug Something isn't working label Oct 13, 2023

ypriverol assigned timosachsenberg Oct 13, 2023

ypriverol added enhancement New feature or request help wanted Extra attention is needed high-priority release 1.3 and removed bug Something isn't working labels Oct 13, 2023

This was referenced Oct 13, 2023

proteomicsLFQ with new SVM results in UPS1 dataset #301

Closed

Single cell analysis performance with worse results #287

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LFQ MBR FDR algorithm needed. #303

LFQ MBR FDR algorithm needed. #303

ypriverol commented Oct 13, 2023 •

edited

Loading

timosachsenberg commented Oct 13, 2023 •

edited

Loading

ypriverol commented Oct 13, 2023

cbielow commented Oct 13, 2023 •

edited

Loading

ypriverol commented Oct 13, 2023

cbielow commented Oct 13, 2023

jpfeuffer commented Oct 13, 2023 •

edited

Loading

jpfeuffer commented Oct 13, 2023

timosachsenberg commented Oct 14, 2023

LFQ MBR FDR algorithm needed. #303

LFQ MBR FDR algorithm needed. #303

Comments

ypriverol commented Oct 13, 2023 • edited Loading

Description of the Feature

Command used and terminal output

Relevant files

System information

timosachsenberg commented Oct 13, 2023 • edited Loading

ypriverol commented Oct 13, 2023

cbielow commented Oct 13, 2023 • edited Loading

ypriverol commented Oct 13, 2023

cbielow commented Oct 13, 2023

jpfeuffer commented Oct 13, 2023 • edited Loading

jpfeuffer commented Oct 13, 2023

timosachsenberg commented Oct 14, 2023

ypriverol commented Oct 13, 2023 •

edited

Loading

timosachsenberg commented Oct 13, 2023 •

edited

Loading

cbielow commented Oct 13, 2023 •

edited

Loading

jpfeuffer commented Oct 13, 2023 •

edited

Loading