Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in converting DIA-NN output #274

Closed
ypriverol opened this issue Jun 10, 2023 · 4 comments
Closed

Bug in converting DIA-NN output #274

ypriverol opened this issue Jun 10, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@ypriverol
Copy link
Member

Description of the bug


nf-core/quantms execution completed unsuccessfully!

The exit status of the task that caused the workflow execution to fail was: 1.

The full error message was:

Error executing process > 'NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT (PXD037340-DIA.sdrf)'

Caused by:
  Process `NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT (PXD037340-DIA.sdrf)` terminated with an error exit status (1)

Command executed:

  diann_convert.py convert \
      --folder ./ \
      --diann_version ./version/versions.yml \
      --dia_params "0.02;Da;20;ppm;Trypsin;;" \
      --charge 4 \
      --missed_cleavages 1 \
      --qvalue_threshold 0.01 \
      2>&1 | tee convert_report.log
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT":
      pyopenms: $(pip show pyopenms | grep "Version" | awk -F ': ' '{print $2}')
  END_VERSIONS

Command exit status:
  1

Command output:
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:370: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:369: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:370: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:369: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:370: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:369: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:370: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:369: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:370: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:369: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-assay_refs"] = ",".join(study_variable)
  /hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py:370: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
    out_mztab_MTD.loc[1, "study_variable[" + str(i) + "]-description"] = "no description given"
  Traceback (most recent call last):
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 924, in 
      cli()
    File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
      return self.main(*args, **kwargs)
    File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
      rv = self.invoke(ctx)
    File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
    File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
      return ctx.invoke(self.callback, **ctx.params)
    File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
      return __callback(*args, **kwargs)
    File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
      return f(get_current_context(), *args, **kwargs)
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 196, in convert
      PRH = mztab_PRH(report, pg, index_ref, database, fasta_df)
    File "/hps/nobackup/juan/pride/reanalysis/quantms/bin/diann_convert.py", line 445, in mztab_PRH
      out_mztab_PRH = pd.concat([out_mztab_PRH, protein_details_df]).reset_index(drop=True)
    File "/usr/local/lib/python3.10/site-packages/pandas/util/_decorators.py", line 331, in wrapper
      return func(*args, **kwargs)
    File "/usr/local/lib/python3.10/site-packages/pandas/core/reshape/concat.py", line 381, in concat
      return op.get_result()
    File "/usr/local/lib/python3.10/site-packages/pandas/core/reshape/concat.py", line 612, in get_result
      indexers[ax] = obj_labels.get_indexer(new_labels)
    File "/usr/local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3905, in get_indexer
      raise InvalidIndexError(self._requires_unique_msg)
  pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Command wrapper:
      done < <(ps -e -o pid= -o ppid=)
  
      pstat() {
          local x_pid=$1
          local STATUS=$(2> /dev/null < /proc/$1/status egrep 'Vm|ctxt')
  
          if [ $? = 0 ]; then
          local  x_vsz=$(echo "$STATUS" | grep VmSize | awk '{print $2}' || echo -n '0')
          local  x_rss=$(echo "$STATUS" | grep VmRSS | awk '{print $2}' || echo -n '0')
          local x_peak=$(echo "$STATUS" | egrep 'VmPeak|VmHWM' | sed 's/^.*:\s*//' | sed 's/[\sa-zA-Z]*$//' | tr '\n' ' ' || echo -n '0 0')
          local x_pmem=$(awk -v rss=$x_rss -v mem_tot=$mem_tot 'BEGIN {printf "%.0f", rss/mem_tot*100*10}' || echo -n '0')
          local vol_ctxt=$(echo "$STATUS" | grep '\bvoluntary_ctxt_switches' | awk '{print $2}' || echo -n '0')
          local inv_ctxt=$(echo "$STATUS" | grep '\bnonvoluntary_ctxt_switches' | awk '{print $2}' || echo -n '0')
          cpu_stat[x_pid]="$x_pid $x_pmem $x_vsz $x_rss $x_peak $vol_ctxt $inv_ctxt"
          fi
      }
  
      pwalk() {
          pstat $1
          for i in ${ALL_CHILDREN[$1]:=}; do pwalk $i; done
      }
  
      pwalk $1
  }
  
  nxf_stat() {
      cpu_stat=()
      nxf_tree $1
  
  
  (... more ...)
  ------------------------------------------------------------
  
  Exited with exit code 1.
  
  Resource usage summary:
  
      CPU time :                                   1059.00 sec.
      Max Memory :                                 1305 MB
      Average Memory :                             797.00 MB
      Total Requested Memory :                     30720.00 MB
      Delta Memory :                               29415.00 MB
      Max Swap :                                   -
      Max Processes :                              13
      Max Threads :                                116
      Run time :                                   1057 sec.
      Turnaround time :                            1059 sec.
  
  The output (if any) is above this job summary.

Work dir:
  /hps/nobackup/juan/pride/reanalysis/absolute-expression/platelet/PXD037340/work/83/8e8e22e767bff27f39b1f314d61fbd

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Command used and terminal output

No response

Relevant files

No response

System information

No response

@ypriverol ypriverol added the bug Something isn't working label Jun 10, 2023
@ypriverol
Copy link
Member Author

@WangHong007
Copy link

@ypriverol Same plasma dataset error in #268.

@ypriverol
Copy link
Member Author

Yes.

@WangHong007
Copy link

@ypriverol Can you check the SDRF of PXD037340 again? In the experimental design file, I found that sample 31 corresponds to fraction 91, but it corresponds to 21 mass spectrometry files. This resulted in the protein_abundance_assay[91] appearing 21 times in mzTab.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants