Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAGE error in Big dataset TMT #329

Open
ypriverol opened this issue Dec 9, 2023 · 11 comments
Open

SAGE error in Big dataset TMT #329

ypriverol opened this issue Dec 9, 2023 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@ypriverol
Copy link
Member

Description of the bug

ERROR ~ Error executing process > 'NFCORE_QUANTMS:QUANTMS:TMT:ID:PSMRESCORING:PERCOLATOR (g00594_Prot_37_11)'

Caused by:
  Process `NFCORE_QUANTMS:QUANTMS:TMT:ID:PSMRESCORING:PERCOLATOR (g00594_Prot_37_11)` terminated with an error exit status (9)

Command executed:

  OMP_NUM_THREADS=48 PercolatorAdapter \
      -in g00594_Prot_37_11_sage.idXML \
      -out g00594_Prot_37_11_sage_perc.idXML \
      -threads 48 \
      -subset_max_train 300000 \
      -decoy_pattern DECOY_ \
      -post_processing_tdc \
      -score_type pep \
      -debug 0 \
      2>&1 | tee g00594_Prot_37_11_sage_percolator.log
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_QUANTMS:QUANTMS:TMT:ID:PSMRESCORING:PERCOLATOR":
      PercolatorAdapter: $(PercolatorAdapter 2>&1 | grep -E '^Version(.*)' | sed 's/Version: //g' | cut -d ' ' -f 1)
      percolator: $(percolator -h 2>&1 | grep -E '^Percolator version(.*)' | sed 's/Percolator version //g')
  END_VERSIONS

Command exit status:
  9

Command output:
  Loading input file: g00594_Prot_37_11_sage.idXML
  Merging peptide ids.
  Merging protein ids.
  Prepared percolator input.
  Standard output: Running: /usr/local/bin/percolator -U -m /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_target_pout_psms.tab -M /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_decoy_pout_psms.tab --num-threads 48 -N 300000 -Y /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_pin.tab
  
  Standard error: Percolator version 3.05.0, Build Date Aug 31 2020 19:03:04
  Copyright (c) 2006-9 University of Washington. All rights reserved.
  Written by Lukas Käll ([email protected]) in the
  Department of Genome Sciences at the University of Washington.
  Issued command:
  /usr/local/bin/percolator -U -m /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_target_pout_psms.tab -M /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_decoy_pout_psms.tab --num-threads 48 -N 300000 -Y /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_pin.tab
  Started Sat Dec  9 15:05:43 2023
  Hyperparameters: selectionFdr=0.01, Cpos=0, Cneg=0, maxNiter=10
  Reading tab-delimited input from datafile /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_pin.tab
  Features:
  mass peplen charge2 charge3 charge4 charge5 enzN enzC enzInt dm absdm score SAGE:ln(-poisson) SAGE:ln(delta_best) SAGE:ln(delta_next) SAGE:ln(matched_intensity_pct) SAGE:longest_b SAGE:longest_y SAGE:longest_y_pct SAGE:matched_peaks SAGE:scored_candidates 
  Found 33916 PSMs
  Concatenated search input detected and --post-processing-tdc flag set. Applying target-decoy competition on Percolator scores.
  Train/test set contains 16926 positives and 16990 negatives, size ratio=0.996233 and pi0=1
  Selecting Cpos by cross-validation.
  Selecting Cneg by cross-validation.
  Split 1:	Exception caught: Error in the input data: cannot find an initial direction with positive training examples. Consider setting/raising the initial training FDR threshold (--train-initial-fdr).
  Terminating.
  
  Process '/usr/local/bin/percolator' did not finish successfully (exit code: ). Please check the log.
  
  PercolatorAdapter took 2.54 s (wall), 2.28 s (CPU), 0.08 s (system), 2.20 s (user); Peak Memory Usage: 168 MB.

Command wrapper:
  Loading input file: g00594_Prot_37_11_sage.idXML
  Merging peptide ids.
  Merging protein ids.
  Prepared percolator input.
  Standard output: Running: /usr/local/bin/percolator -U -m /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_target_pout_psms.tab -M /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_decoy_pout_psms.tab --num-threads 48 -N 300000 -Y /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_pin.tab
  
  Standard error: Percolator version 3.05.0, Build Date Aug 31 2020 19:03:04
  Copyright (c) 2006-9 University of Washington. All rights reserved.
  Written by Lukas Käll ([email protected]) in the
  Department of Genome Sciences at the University of Washington.
  Issued command:
  /usr/local/bin/percolator -U -m /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_target_pout_psms.tab -M /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_decoy_pout_psms.tab --num-threads 48 -N 300000 -Y /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_pin.tab
  Started Sat Dec  9 15:05:43 2023
  Hyperparameters: selectionFdr=0.01, Cpos=0, Cneg=0, maxNiter=10
  Reading tab-delimited input from datafile /tmp/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_1/20231209_150541_hl-codon-bm-10.ebi.ac.uk_42_2_pin.tab
  Features:
  mass peplen charge2 charge3 charge4 charge5 enzN enzC enzInt dm absdm score SAGE:ln(-poisson) SAGE:ln(delta_best) SAGE:ln(delta_next) SAGE:ln(matched_intensity_pct) SAGE:longest_b SAGE:longest_y SAGE:longest_y_pct SAGE:matched_peaks SAGE:scored_candidates 
  Found 33916 PSMs
  Concatenated search input detected and --post-processing-tdc flag set. Applying target-decoy competition on Percolator scores.
  Train/test set contains 16926 positives and 16990 negatives, size ratio=0.996233 and pi0=1
  Selecting Cpos by cross-validation.
  Selecting Cneg by cross-validation.
  Split 1:	Exception caught: Error in the input data: cannot find an initial direction with positive training examples. Consider setting/raising the initial training FDR threshold (--train-initial-fdr).
  Terminating.
  
  Process '/usr/local/bin/percolator' did not finish successfully (exit code: ). Please check the log.
  
  PercolatorAdapter took 2.54 s (wall), 2.28 s (CPU), 0.08 s (system), 2.20 s (user); Peak Memory Usage: 168 MB.

Work dir:
  /hps/nobackup/juan/pride/reanalysis/absolute-expression/cell-lines/MSV000085836/work/9a/8f8f857d333f8a6b2224af6bac7059

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Command used and terminal output

No response

Relevant files

No response

System information

No response

@ypriverol ypriverol added the bug Something isn't working label Dec 9, 2023
@jpfeuffer
Copy link
Collaborator

That's mostly a Percolator error though.
Is is Sage-only or multiple engines with ConsensusID?
Did you try the fix suggested in the error?
As a last resort I would plot the score distributions and check what is wrong.
Lastly someone could try to implement allowing to use the q-values directly out of Sage. It also has an LDA for rescoring.

@jpfeuffer
Copy link
Collaborator

I know that @timosachsenberg has this error a lot in cross linking experiments

@ypriverol
Copy link
Member Author

That's mostly a Percolator error though. Is is Sage-only or multiple engines with ConsensusID? Did you try the fix suggested in the error?

Im running the experiment now without SAGE. It is a multiple search engine run.

As a last resort I would plot the score distributions and check what is wrong. Lastly someone could try to implement allowing to use the q-values directly out of Sage. It also has an LDA for rescoring.

@jpfeuffer
Copy link
Collaborator

Or sage only to see if sage is the reason or the combination

@ypriverol
Copy link
Member Author

The interesting thing, is that the issue is within sage, not with other search engines.

@jpfeuffer
Copy link
Collaborator

Could also be another engine that is just very different from sage such that the combination does not work well. (That's why I would try it alone). Also combinations kind of defeat the speed advantage of sage.

@jpfeuffer
Copy link
Collaborator

Ah I see what you mean now. Percolator fails on the sage-only output before consensusID. Yes that is a problem.

@ypriverol
Copy link
Member Author

Yes, it is inside SAGE adapter. REally nice issue for @timosachsenberg Christmas.

@jpfeuffer
Copy link
Collaborator

Maybe it is, maybe not. We mostly just pass whatever was in the pin file.

I would love to skip all the idxml conversion back and forth but we currently depend on it because we a) depend on the information about search settings and b) we had problems with the scan_ids in the pin file (they were sometimes just a number, so a lookup in the mzml was necessary)

@timosachsenberg
Copy link

The interesting thing, is that the issue is within sage, not with other search engines.

This might indicate that sage is not finding enough true targets. Maybe some configuration wrong? How many hits are left if the output of the other search engines are filtered by q-value? Many or just a few hundreds (this could mean that these are a bit more sensitive and just had enough true targets to find a score and direction)

@ypriverol
Copy link
Member Author

I need the other search engines to finish, I will let you know for that particular file what happen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants