Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDRF Parsing Error #376

Open
jackrogan opened this issue May 23, 2024 · 1 comment
Open

SDRF Parsing Error #376

jackrogan opened this issue May 23, 2024 · 1 comment
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed high-priority

Comments

@jackrogan
Copy link

Description of the bug

Hi,

I'm trying to run a minimal experiment to test using the SDRF format to develop a TMT pipeline. I've tried copying the format as best I can, but I don't understand what is being flagged as incorrect here:

Command used and terminal output

Command:

nextflow run bigbio/quantms -r dev -profile docker --input 20240521b_JR_TMT_HS_KO_MIN.sdrf.tsv --database /home/jack.rogan/Proteomics/Human_reference_proteome.fasta --add_decoys --search_engines comet --max_precursor_charge 5 --min_peptide_length 7 --FDR_level psm-level-fdrs --max_memory 48.GB --outdir 20240521b_JR_TMT_HS_KO_MIN_comet --acquisition_method dda --labelling_type "tmt10plex" --normalize true --msstats_remove_one_feat_prot false --msstatslfq_removeFewMeasurements false

Output:

ERROR ~ Error executing process > 'NFCORE_QUANTMS:QUANTMS:CREATE_INPUT_CHANNEL:SDRFPARSING (20240521b_JR_TMT_HS_KO_MIN.sdrf.tsv)'

Caused by:
  Process `NFCORE_QUANTMS:QUANTMS:CREATE_INPUT_CHANNEL:SDRFPARSING (20240521b_JR_TMT_HS_KO_MIN.sdrf.tsv)` terminated with an error exit status (1)


Command executed:

  ## -t2 since the one-table format parser is broken in OpenMS2.5
  ## -l for legacy behavior to always add sample columns
  
  parse_sdrf convert-openms \
      -t2 -l \
      --extension_convert raw:mzML,.gz:,.tar.gz:,.tar:,.zip: \
      -s 20240521b_JR_TMT_HS_KO_MIN.sdrf.tsv \
       \
      2>&1 | tee 20240521b_JR_TMT_HS_KO_MIN.sdrf_parsing.log
  
  mv openms.tsv 20240521b_JR_TMT_HS_KO_MIN.sdrf_config.tsv
  mv experimental_design.tsv 20240521b_JR_TMT_HS_KO_MIN.sdrf_openms_design.tsv
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_QUANTMS:QUANTMS:CREATE_INPUT_CHANNEL:SDRFPARSING":
      sdrf-pipelines: $(parse_sdrf --version 2>&1 | awk -F ' ' '{print $2}')
  END_VERSIONS

Command exit status:
  1

Command output:
  PROCESSING: 20240521b_JR_TMT_HS_KO_MIN.sdrf.tsv"
  Factor columns: ['factor value[treatment]']
  Characteristics columns (those covered by factor columns removed): ['characteristics[organism]', 'characteristics[organism part]', 'characteristics[sex]', 'characteristics[age]', 'characteristics[developmental stage]', 'characteristics[ethnic group]', 'characteristics[disease]', 'characteristics[cell line]', 'characteristics[cell type]', 'characteristics[infect]', 'characteristics[enrichment process]', 'characteristics[biological replicate]']
  Conditions (5): dict_keys(['OE33_0pc_KO', 'OE33_25pc_KO', 'OE33_50pc_KO', 'OE33_75pc_KO', 'OE33_100pc_KO'])
  Files per condition: dict_values([1, 1, 1, 1, 1])
  Traceback (most recent call last):
    File "/usr/local/lib/python3.12/site-packages/sdrf_pipelines/parse_sdrf.py", line 62, in openms_from_sdrf
      OpenMS().openms_convert(sdrf, onetable, legacy, verbose, conditionsfromcolumns, extension_convert)
    File "/usr/local/lib/python3.12/site-packages/sdrf_pipelines/openms/openms.py", line 446, in openms_convert
      self.writeTwoTableExperimentalDesign(
    File "/usr/local/lib/python3.12/site-packages/sdrf_pipelines/openms/openms.py", line 617, in writeTwoTableExperimentalDesign
      label = str(choice[label[label_index[raw]]])
                  ~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
  KeyError: 'TMT127N'
  
  The above exception was the direct cause of the following exception:
  
  Traceback (most recent call last):
    File "/usr/local/bin/parse_sdrf", line 10, in <module>
      sys.exit(main())
               ^^^^^^
    File "/usr/local/lib/python3.12/site-packages/sdrf_pipelines/parse_sdrf.py", line 239, in main
      cli()
    File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
      return self.main(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1078, in main
      rv = self.invoke(ctx)
           ^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
      return ctx.invoke(self.callback, **ctx.params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/click/core.py", line 783, in invoke
      return __callback(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/click/decorators.py", line 33, in new_func
      return f(get_current_context(), *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/sdrf_pipelines/parse_sdrf.py", line 65, in openms_from_sdrf
      raise ValueError(msg) from ex
  ValueError: Error: 'TMT127N'

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
  PROCESSING: 20240521b_JR_TMT_HS_KO_MIN.sdrf.tsv"
  Factor columns: ['factor value[treatment]']
  Characteristics columns (those covered by factor columns removed): ['characteristics[organism]', 'characteristics[organism part]', 'characteristics[sex]', 'characteristics[age]', 'characteristics[developmental stage]', 'characteristics[ethnic group]', 'characteristics[disease]', 'characteristics[cell line]', 'characteristics[cell type]', 'characteristics[infect]', 'characteristics[enrichment process]', 'characteristics[biological replicate]']
  Conditions (5): dict_keys(['OE33_0pc_KO', 'OE33_25pc_KO', 'OE33_50pc_KO', 'OE33_75pc_KO', 'OE33_100pc_KO'])
  Files per condition: dict_values([1, 1, 1, 1, 1])
  Traceback (most recent call last):
    File "/usr/local/lib/python3.12/site-packages/sdrf_pipelines/parse_sdrf.py", line 62, in openms_from_sdrf
      OpenMS().openms_convert(sdrf, onetable, legacy, verbose, conditionsfromcolumns, extension_convert)
    File "/usr/local/lib/python3.12/site-packages/sdrf_pipelines/openms/openms.py", line 446, in openms_convert
      self.writeTwoTableExperimentalDesign(
    File "/usr/local/lib/python3.12/site-packages/sdrf_pipelines/openms/openms.py", line 617, in writeTwoTableExperimentalDesign
      label = str(choice[label[label_index[raw]]])
                  ~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
  KeyError: 'TMT127N'
  
  The above exception was the direct cause of the following exception:
  
  Traceback (most recent call last):
    File "/usr/local/bin/parse_sdrf", line 10, in <module>
      sys.exit(main())
               ^^^^^^
    File "/usr/local/lib/python3.12/site-packages/sdrf_pipelines/parse_sdrf.py", line 239, in main
      cli()
    File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
      return self.main(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1078, in main
      rv = self.invoke(ctx)
           ^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
      return ctx.invoke(self.callback, **ctx.params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/click/core.py", line 783, in invoke
      return __callback(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/click/decorators.py", line 33, in new_func
      return f(get_current_context(), *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.12/site-packages/sdrf_pipelines/parse_sdrf.py", line 65, in openms_from_sdrf
      raise ValueError(msg) from ex
  ValueError: Error: 'TMT127N'

Work dir:
  /mnt/bigdata/Jack/20240521b_JR_TMT_HS_KO_MIN/work/89/1cc0e0fc3d4400aff9683b5ca7a053

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

Relevant files

20240521b_JR_TMT_HS_KO_MIN.sdrf.tsv.txt

System information

Nextflow 24.04.1
Docker
Ubuntu
bigbio/quantms dev

@jackrogan jackrogan added the bug Something isn't working label May 23, 2024
@ypriverol ypriverol added help wanted Extra attention is needed high-priority labels May 23, 2024
@daichengxin
Copy link
Collaborator

daichengxin commented May 24, 2024

Thanks for testing. This is a bug caused by incomplete label. We will fix this bug to flexibly allow.
https://github.com/bigbio/sdrf-pipelines/blob/fe1851e0377a0aefb4434da6904b9187f651c3ac/sdrf_pipelines/openms/openms.py#L849-L873

Fix the logic to directly index label. And flexibly allow incomplete label in sdrf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed high-priority
Projects
None yet
Development

No branches or pull requests

3 participants