Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantms DIANN 1.9.1dev #380

Open
ypriverol opened this issue Jul 1, 2024 · 8 comments
Open

quantms DIANN 1.9.1dev #380

ypriverol opened this issue Jul 1, 2024 · 8 comments
Assignees

Comments

@ypriverol
Copy link
Member

ypriverol commented Jul 1, 2024

Description of the bug

@daichengxin we have a new version of diann 1.9.1dev:

Can we migrate from 1.8.1 -> 1.9.1dev?

@ypriverol
Copy link
Member Author

ypriverol commented Jul 2, 2024

A PR is now ongoing here: #381 . We are now facing the following error for the same data that we had before in the tests with version 1.8.1:

ERROR ~ Error executing process > 'NFCORE_QUANTMS:QUANTMS:DIA:DIANNSUMMARY (PXD026600.sdrf)'


Caused by:
  Missing output file(s) `empirical_library.tsv.speclib` expected by process `NFCORE_QUANTMS:QUANTMS:DIA:DIANNSUMMARY (PXD026600.sdrf)`


Command executed:

  # Notes: if .quant files are passed, mzml/.d files are not accessed, so the name needs to be passed but files
  # do not need to pe present.
  
  diann   --lib empirical_library.tsv \
          --fasta REF_EColi_K12_UPS1_combined.fasta \
          --f RD139_Narrow_UPS1_0_1fmol_inj1.mzML --f RD139_Narrow_UPS1_0_25fmol_inj1.mzML --f RD139_Narrow_UPS1_0_1fmol_inj2.mzML --f RD139_Narrow_UPS1_0_25fmol_inj2.mzML \
          --threads 2 \
          --verbose 3 \
          --individual-windows \
          --quick-mass-acc --individual-mass-acc \
          --temp ./quant/ \
          --relaxed-prot-inf \
          --pg-level 2 \
           \
          --use-quant \
          --matrices \
          --out diann_report.tsv \
          --qvalue 0.01 \
           \
          2>&1 | tee diannsummary.log
  
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_QUANTMS:QUANTMS:DIA:DIANNSUMMARY":
      DIA-NN: $(diann 2>&1 | grep "DIA-NN" | grep -oP "\d+\.\d+(\.\w+)*(\.[\d]+)?")
  END_VERSIONS

Command exit status:
  0

Command output:
  
  4 files will be processed
  [0:00] Loading spectral library empirical_library.tsv
  [0:00] Spectral library loaded: 1600 protein isoforms, 1600 protein groups and 6896 precursors in 5794 elution groups.
  [0:00] Loading protein annotations from FASTA REF_EColi_K12_UPS1_combined.fasta
  [0:00] Annotating library proteins with information from the FASTA database
  [0:00] Gene names missing for some isoforms
  [0:00] Library contains 1596 proteins, and 1596 genes
  [0:00] Initialising library
  [0:00] Saving the library to empirical_library.tsv.skyline.speclib
  [0:00] Cross-run analysis
  [0:00] Reading quantification information: 4 files
  [0:00] Averaged recommended settings for this experiment: Mass accuracy = 13ppm, MS1 accuracy = 7ppm, Scan window = 8
  [0:00] Quantifying peptides
  WARNING: QuantUMS requires 6 or more runs for the optimisation of its hyperparameters to perform best.
  [0:14] Quantification parameters: 0.253287, 0.00642625, 0.00220468, 0.0116664, 0.0142923, 0.0129503, 0.0751957, 0.0439114, 0.0311259, 0.0130772, 0.0480012, 0.0335494, 0.193477, 0.0500949, 0.0532672, 0.0109079
  [0:15] Assembling protein groups
  [0:15] Quantifying proteins
  [0:15] Calculating q-values for protein and gene groups
  [0:15] Calculating global q-values for protein and gene groups
  [0:15] Protein groups with global q-value <= 0.01: 1531
  [0:15] Compressed report saved to diann_report.parquet. Use R 'arrow' or Python 'PyArrow' package to process
  [0:15] Writing report
  [0:16] Report saved to diann_report.tsv.
  [0:16] Saving precursor levels matrix
  [0:16] Precursor levels matrix (1% precursor and protein group FDR) saved to diann_report.pr_matrix.tsv.
  [0:16] Saving protein group levels matrix
  [0:16] Protein group levels matrix (1% precursor FDR and protein group FDR) saved to diann_report.pg_matrix.tsv.
  [0:16] Saving gene group levels matrix
  [0:16] Gene groups levels matrix (1% precursor FDR and protein group FDR) saved to diann_report.gg_matrix.tsv.
  [0:16] Saving unique genes levels matrix
  [0:16] Unique genes levels matrix (1% precursor FDR and protein group FDR) saved to diann_report.unique_genes_matrix.tsv.
  [0:16] Stats report saved to diann_report.stats.tsv
  
  The following warnings or errors (in alphabetic order) were detected at least the indicated number of times:
  WARNING: QuantUMS requires 6 or more runs for the optimisation of its hyperparameters to perform best. : 1
  WARNING: combining reuse of .quant files with automatic optimisation of mass accuracies or scan window will lead to results that are different from those of the original analysis that produced the .quant files and is therefore not recommended : 1
  Finished
  
  
  How to cite:
  using DIA-NN: Demichev et al, Nature Methods, 2020, https://www.nature.com/articles/s41592-019-0638-x
  analysing Scanning SWATH: Messner et al, Nature Biotechnology, 2021, https://www.nature.com/articles/s41587-021-00860-4
  analysing PTMs: Steger et al, Nature Communications, 2021, https://www.nature.com/articles/s41467-021-25454-1
  analysing dia-PASEF: Demichev et al, Nature Communications, 2022, https://www.nature.com/articles/s41467-022-31492-0
  analysing Slice-PASEF: Szyrwiel et al, biorxiv, 2022, https://doi.org/10.1101/2022.10.31.514544
  plexDIA / multiplexed DIA: Derks et al, Nature Biotechnology, 2023, https://www.nature.com/articles/s41587-022-01389-w
  CysQuant: Huang et al, Redox Biology, 2023, https://doi.org/10.1016/j.redox.2023.102908
  using QuantUMS: Kistner at al, biorxiv, 2023, https://doi.org/10.1101/2023.06.20.545604
  [0:16] Log saved to diann_report.log.txt

Work dir:
  /hps/nobackup/juan/pride/reanalysis/quantms-ypriverol/quantms/work/8a/e46c899f20887370bd3b141d22edd9

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

@vdemichev can you help us here? some tips.

@ypriverol
Copy link
Member Author

Previous issue has been solved. I have now another issue.

ERROR ~ Error executing process > 'NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT (PXD026600.sdrf)'

Caused by:
  Process `NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT (PXD026600.sdrf)` terminated with an error exit status (1)


Command executed:

  diann_convert.py convert \
      --folder ./ \
      --exp_design PXD026600.sdrf_openms_design.tsv \
      --diann_version ./version/versions.yml \
      --dia_params "20.0;ppm;10.0;ppm;Trypsin;Carbamidomethyl (C);Oxidation (M)" \
      --charge 3 \
      --missed_cleavages 1 \
      --qvalue_threshold 0.01 \
      2>&1 | tee convert_report.log
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT":
      pyopenms: $(pip show pyopenms | grep "Version" | awk -F ': ' '{print $2}')
  END_VERSIONS

Command exit status:
  1

Command output:
  2024-07-02 11:48:03,797 [convert] -   Fraction_Group Fraction                      Spectra_Filepath Label Sample                              run
  0              1        1   RD139_Narrow_UPS1_0_1fmol_inj1.mzML     1      1   RD139_Narrow_UPS1_0_1fmol_inj1
  1              2        1   RD139_Narrow_UPS1_0_1fmol_inj2.mzML     1      1   RD139_Narrow_UPS1_0_1fmol_inj2
  2              3        1  RD139_Narrow_UPS1_0_25fmol_inj1.mzML     1      2  RD139_Narrow_UPS1_0_25fmol_inj1
  3              4        1  RD139_Narrow_UPS1_0_25fmol_inj2.mzML     1      2  RD139_Narrow_UPS1_0_25fmol_inj2
  2024-07-02 11:48:03,798 [convert] - 
  
  s_DataFrame ((2, 3))>>>
  2024-07-02 11:48:03,798 [convert] -   Sample                MSstats_Condition MSstats_BioReplicate
  0      1   CT=Mixture;CN=UPS1;QY=0.1 fmol                    1
  1      2  CT=Mixture;CN=UPS1;QY=0.25 fmol                    2
  2024-07-02 11:48:03,799 [convert] - Adding Fraction, BioReplicate, Condition columns
  2024-07-02 11:48:03,978 [convert] - MSstats input file is saved as PXD026600.sdrf_openms_design_msstats_in.csv
  2024-07-02 11:48:04,130 [convert] - Triqler input file is saved as PXD026600.sdrf_openms_design_triqler_in.tsv
  2024-07-02 11:48:04,131 [convert_to_mztab] - Converting to mzTab
  2024-07-02 11:48:04,131 [diann_version] - Validating DIANN version
  2024-07-02 11:48:04,133 [diann_version] - Found DIA-NN version:     DIA-NN: 1.9.beta.1
  
  2024-07-02 11:48:04,133 [diann_version] - Validating DIANN version
  2024-07-02 11:48:04,134 [diann_version] - Found DIA-NN version:     DIA-NN: 1.9.beta.1
  
  Warning: OPENMS_DATA_PATH environment variable not found and no share directory was installed. Some functionality might not work as expected.
  Traceback (most recent call last):
    File "/hps/nobackup/juan/pride/reanalysis/quantms-ypriverol/quantms/bin/diann_convert.py", line 1377, in <module>
      cli()
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
      return self.main(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
      rv = self.invoke(ctx)
           ^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
      return ctx.invoke(self.callback, **ctx.params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
      return __callback(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
      return f(get_current_context(), *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/hps/nobackup/juan/pride/reanalysis/quantms-ypriverol/quantms/bin/diann_convert.py", line 143, in convert
      diann_directory.convert_to_mztab(
    File "/hps/nobackup/juan/pride/reanalysis/quantms-ypriverol/quantms/bin/diann_convert.py", line 303, in convert_to_mztab
      self.validate_diann_version()
    File "/hps/nobackup/juan/pride/reanalysis/quantms-ypriverol/quantms/bin/diann_convert.py", line 297, in validate_diann_version
      raise ValueError(f"Unsupported DIANN version {self.diann_version}")
  ValueError: Unsupported DIANN version 1.9.beta.1

Command error:
  2024-07-02 11:48:03,797 [convert] -   Fraction_Group Fraction                      Spectra_Filepath Label Sample                              run
  0              1        1   RD139_Narrow_UPS1_0_1fmol_inj1.mzML     1      1   RD139_Narrow_UPS1_0_1fmol_inj1
  1              2        1   RD139_Narrow_UPS1_0_1fmol_inj2.mzML     1      1   RD139_Narrow_UPS1_0_1fmol_inj2
  2              3        1  RD139_Narrow_UPS1_0_25fmol_inj1.mzML     1      2  RD139_Narrow_UPS1_0_25fmol_inj1
  3              4        1  RD139_Narrow_UPS1_0_25fmol_inj2.mzML     1      2  RD139_Narrow_UPS1_0_25fmol_inj2
  2024-07-02 11:48:03,798 [convert] - 
  
  s_DataFrame ((2, 3))>>>
  2024-07-02 11:48:03,798 [convert] -   Sample                MSstats_Condition MSstats_BioReplicate
  0      1   CT=Mixture;CN=UPS1;QY=0.1 fmol                    1
  1      2  CT=Mixture;CN=UPS1;QY=0.25 fmol                    2
  2024-07-02 11:48:03,799 [convert] - Adding Fraction, BioReplicate, Condition columns
  2024-07-02 11:48:03,978 [convert] - MSstats input file is saved as PXD026600.sdrf_openms_design_msstats_in.csv
  2024-07-02 11:48:04,130 [convert] - Triqler input file is saved as PXD026600.sdrf_openms_design_triqler_in.tsv
  2024-07-02 11:48:04,131 [convert_to_mztab] - Converting to mzTab
  2024-07-02 11:48:04,131 [diann_version] - Validating DIANN version
  2024-07-02 11:48:04,133 [diann_version] - Found DIA-NN version:     DIA-NN: 1.9.beta.1
  
  2024-07-02 11:48:04,133 [diann_version] - Validating DIANN version
  2024-07-02 11:48:04,134 [diann_version] - Found DIA-NN version:     DIA-NN: 1.9.beta.1
  
  Warning: OPENMS_DATA_PATH environment variable not found and no share directory was installed. Some functionality might not work as expected.
  Traceback (most recent call last):
    File "/hps/nobackup/juan/pride/reanalysis/quantms-ypriverol/quantms/bin/diann_convert.py", line 1377, in <module>
      cli()
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
      return self.main(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
      rv = self.invoke(ctx)
           ^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
      return ctx.invoke(self.callback, **ctx.params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
      return __callback(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
      return f(get_current_context(), *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/hps/nobackup/juan/pride/reanalysis/quantms-ypriverol/quantms/bin/diann_convert.py", line 143, in convert
      diann_directory.convert_to_mztab(
    File "/hps/nobackup/juan/pride/reanalysis/quantms-ypriverol/quantms/bin/diann_convert.py", line 303, in convert_to_mztab
      self.validate_diann_version()
    File "/hps/nobackup/juan/pride/reanalysis/quantms-ypriverol/quantms/bin/diann_convert.py", line 297, in validate_diann_version
      raise ValueError(f"Unsupported DIANN version {self.diann_version}")
  ValueError: Unsupported DIANN version 1.9.beta.1

Work dir:
  /hps/nobackup/juan/pride/reanalysis/quantms-ypriverol/quantms/work/16/81256ab1dc8d045cde019a4f7bf712

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

@vdemichev
Copy link

Commented in #381

@vdemichev
Copy link

"ValueError: Unsupported DIANN version 1.9.beta.1" - seems the script does not like the DIA-NN version name.

@ypriverol
Copy link
Member Author

Yes. Im working on that. @vdemichev in the previous steps we got the following message:

 
  The following warnings or errors (in alphabetic order) were detected at least the indicated number of times:
  WARNING: QuantUMS requires 6 or more runs for the optimisation of its hyperparameters to perform best. : 1
  WARNING: combining reuse of .quant files with automatic optimisation of mass accuracies or scan window will lead to results that are different from those of the original analysis that produced the .quant files and is therefore not recommended : 1
  Finished

Can you help us to understand the algorithm and commands to see if we need to do some changes?

@vdemichev
Copy link

First is fine.
Second: also fine, because you are just aggregating the .quant files to produce a spectral library here, I assume.

@ypriverol
Copy link
Member Author

@vdemichev, another small issue:

The previous pg matrix tsv has the following columns:

Protein.Group	Protein.Ids	Protein.Names	Genes	First.Protein.Description ... mzML files

Now:

Protein.Group	Protein.Names	Genes	First.Protein.Description ... mzML files

The field Protein.Ids that are not longer exported?

@vdemichev
Copy link

Yes, this column did not make sense in pg_matrix, as the list of Protein.Ids can only be compiled for a peptide, not a protein group.

@ypriverol ypriverol self-assigned this Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants