Bug in converting DIA-NN output #274

ypriverol · 2023-06-10T11:10:06Z

Description of the bug


nf-core/quantms execution completed unsuccessfully!

The exit status of the task that caused the workflow execution to fail was: 1.

The full error message was:

Error executing process > 'NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT (PXD037340-DIA.sdrf)'

Caused by:
  Process `NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT (PXD037340-DIA.sdrf)` terminated with an error exit status (1)

Command executed:

  diann_convert.py convert \
      --folder ./ \
      --diann_version ./version/versions.yml \
      --dia_params "0.02;Da;20;ppm;Trypsin;;" \
      --charge 4 \
      --missed_cleavages 1 \
      --qvalue_threshold 0.01 \
      2>&1 | tee convert_report.log
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT":
      pyopenms: $(pip show pyopenms | grep "Version" | awk -F ': ' '{print $2}')
  END_VERSIONS

Command exit status:
  1

Command output:
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:370: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:369: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:370: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:369: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:370: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:369: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:370: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:369: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:370: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:369: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:370: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
  Traceback (most recent call last):
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 924, in 
      cli()
    File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
      return self.main(*args, **kwargs)
    File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
      rv = self.invoke(ctx)
    File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
      return ctx.invoke(self.callback, **ctx.params)
    File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
      return __callback(*args, **kwargs)
    File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
      return f(get_current_context(), *args, **kwargs)
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 196, in convert
      PRH = mztab_PRH(report, pg, index_ref, database, fasta_df)
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 445, in mztab_PRH
      out_mztab_PRH = pd.concat([out_mztab_PRH, protein_details_df]).reset_index(drop=True)
    File "/usr/local/lib/python3.10/site-packages/pandas/util/_decorators.py", line 331, in wrapper
      return func(*args, **kwargs)
    File "/usr/local/lib/python3.10/site-packages/pandas/core/reshape/concat.py", line 381, in concat
      return op.get_result()
    File "/usr/local/lib/python3.10/site-packages/pandas/core/reshape/concat.py", line 612, in get_result
      indexers[ax] = obj_labels.get_indexer(new_labels)
    File "/usr/local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3905, in get_indexer
      raise InvalidIndexError(self._requires_unique_msg)
  pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Command wrapper:
      done < <(ps -e -o pid= -o ppid=)
  
      pstat() {
          local x_pid=$1
          local STATUS=$(2> /dev/null < /proc/$1/status egrep 'Vm|ctxt')
  
          if [ $? = 0 ]; then
          local  x_vsz=$(echo "$STATUS" | grep VmSize | awk '{print $2}' || echo -n '0')
          local  x_rss=$(echo "$STATUS" | grep VmRSS | awk '{print $2}' || echo -n '0')
          local x_peak=$(echo "$STATUS" | egrep 'VmPeak|VmHWM' | sed 's/^.*:\s*//' | sed 's/[\sa-zA-Z]*$//' | tr '\n' ' ' || echo -n '0 0')
          local x_pmem=$(awk -v rss=$x_rss -v mem_tot=$mem_tot 'BEGIN {printf "%.0f", rss/mem_tot*100*10}' || echo -n '0')
          local vol_ctxt=$(echo "$STATUS" | grep '\bvoluntary_ctxt_switches' | awk '{print $2}' || echo -n '0')
          local inv_ctxt=$(echo "$STATUS" | grep '\bnonvoluntary_ctxt_switches' | awk '{print $2}' || echo -n '0')
          cpu_stat[x_pid]="$x_pid $x_pmem $x_vsz $x_rss $x_peak $vol_ctxt $inv_ctxt"
          fi
      }
  
      pwalk() {
          pstat $1
          for i in ${ALL_CHILDREN[$1]:=}; do pwalk $i; done
      }
  
      pwalk $1
  }
  
  nxf_stat() {
      cpu_stat=()
      nxf_tree $1
  
  
  (... more ...)
  ------------------------------------------------------------
  
  Exited with exit code 1.
  
  Resource usage summary:
  
      CPU time :                                   1059.00 sec.
      Max Memory :                                 1305 MB
      Average Memory :                             797.00 MB
      Total Requested Memory :                     30720.00 MB
      Delta Memory :                               29415.00 MB
      Max Swap :                                   -
      Max Processes :                              13
      Max Threads :                                116
      Run time :                                   1057 sec.
      Turnaround time :                            1059 sec.
  
  The output (if any) is above this job summary.

Work dir:
  /hps/nobackup/juan/pride/reanalysis/absolute-expression/platelet/PXD037340/work/83/8e8e22e767bff27f39b1f314d61fbd

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Command used and terminal output

No response

Relevant files

No response

System information

No response

The text was updated successfully, but these errors were encountered:

ypriverol · 2023-06-10T11:16:30Z

Here the files needed: http://ftp.pride.ebi.ac.uk/pub/databases/pride/resources/proteomes/quantms-benchmark/8e8e22e767bff27f39b1f314d61fbd/

WangHong007 · 2023-06-10T12:45:03Z

@ypriverol Same plasma dataset error in #268.

ypriverol · 2023-06-10T13:52:31Z

Yes.

WangHong007 · 2023-06-11T02:29:09Z

@ypriverol Can you check the SDRF of PXD037340 again? In the experimental design file, I found that sample 31 corresponds to fraction 91, but it corresponds to 21 mass spectrometry files. This resulted in the protein_abundance_assay[91] appearing 21 times in mzTab.

ypriverol added the bug Something isn't working label Jun 10, 2023

ypriverol closed this as completed Jun 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in converting DIA-NN output #274

Bug in converting DIA-NN output #274

ypriverol commented Jun 10, 2023

ypriverol commented Jun 10, 2023

WangHong007 commented Jun 10, 2023

ypriverol commented Jun 10, 2023

WangHong007 commented Jun 11, 2023

Bug in converting DIA-NN output #274

Bug in converting DIA-NN output #274

Comments

ypriverol commented Jun 10, 2023

Description of the bug

Command used and terminal output

Relevant files

System information

ypriverol commented Jun 10, 2023

WangHong007 commented Jun 10, 2023

ypriverol commented Jun 10, 2023

WangHong007 commented Jun 11, 2023