Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown AA in peptide sequence compute MonoIsotopic Mass #368

Closed
ypriverol opened this issue Apr 16, 2024 · 18 comments · Fixed by #369
Closed

Unknown AA in peptide sequence compute MonoIsotopic Mass #368

ypriverol opened this issue Apr 16, 2024 · 18 comments · Fixed by #369
Assignees
Labels
bug Something isn't working

Comments

@ypriverol
Copy link
Member

Description of the bug

When running the DIA workflow, the dia_convert give the following error if Uniprot Trembl is used:

Caused by:
  Process `NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT (final_updated_QPLAP_test_metadata.sdrf)` terminated with an error exit status (1)

Command executed:

  diann_convert.py convert \
      --folder ./ \
      --exp_design final_updated_QPLAP_test_metadata.sdrf_openms_design.tsv \
      --diann_version ./version/versions.yml \
      --dia_params "0.5;Da;20.0;ppm;Trypsin;Carbamidomethyl (C);Acetyl (Protein N-term),Oxidation (M)" \
      --charge 4 \
      --missed_cleavages 1 \
      --qvalue_threshold 0.01 \
      2>&1 | tee convert_report.log

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_QUANTMS:QUANTMS:DIA:DIANNCONVERT":
      pyopenms: $(pip show pyopenms | grep "Version" | awk -F ': ' '{print $2}')
  END_VERSIONS

Command exit status:
  1

Command output:
  2024-04-10 00:49:19,684 [convert] - Revision 0.1.1
  2024-04-10 00:49:19,684 [convert] - Reading input files...
  2024-04-10 00:49:23,275 [main_report_df] - Filtering report based on qvalue threshold: 0.01, 330767 rows
  2024-04-10 00:49:23,303 [main_report_df] - Report filtered, 330767 rows remaining
  2024-04-10 00:49:23,303 [main_report_df] - Calculating Precursor.Mz
  Warning: OPENMS_DATA_PATH environment variable not found and no share directory was installed. Some functionality might not work as expected.
  Traceback (most recent call last):
    File "/home-link/zxoqe31/.nextflow/assets/nf-core/quantms/bin/diann_convert.py", line 1331, in <module>
      cli()
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
      return self.main(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1078, in main
      rv = self.invoke(ctx)
           ^^^^^^^^^^^^^^^^
 File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
      return ctx.invoke(self.callback, **ctx.params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/core.py", line 783, in invoke
      return __callback(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/click/decorators.py", line 33, in new_func
      return f(get_current_context(), *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home-link/zxoqe31/.nextflow/assets/nf-core/quantms/bin/diann_convert.py", line 71, in convert
      report = diann_directory.main_report_df(qvalue_threshold=qvalue_threshold)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home-link/zxoqe31/.nextflow/assets/nf-core/quantms/bin/diann_convert.py", line 352, in main_report_df
      uniq_masses = {k: AASequence.fromString(k).getMonoWeight() for k in report["Modified.Sequence"].unique()}
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home-link/zxoqe31/.nextflow/assets/nf-core/quantms/bin/diann_convert.py", line 352, in <dictcomp>
      uniq_masses = {k: AASequence.fromString(k).getMonoWeight() for k in report["Modified.Sequence"].unique()}
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "pyopenms/_pyopenms_6.pyx", line 818, in pyopenms._pyopenms_6.AASequence.getMonoWeight
    File "pyopenms/_pyopenms_6.pyx", line 787, in pyopenms._pyopenms_6.AASequence._getMonoWeight_0
  RuntimeError: the value 'AVQVHQDTLRTMYFAXR' was used but is not valid; Cannot get weight of sequence with unknown AA 'X' with unknown mass.

The best way to solve that would be to see if pyopenms supports unknown aminoacids @timosachsenberg @jpfeuffer. If is not possible, then we need to find a way to handle them in quantms to mzTab.

Command used and terminal output

No response

Relevant files

No response

System information

No response

@ypriverol ypriverol added the bug Something isn't working label Apr 16, 2024
@timosachsenberg
Copy link

OpenMS support unknown amino acids e.g. 'X' but calculating the precursor m/z is not allowed
We support unknown amino acids with masses X[123.4] but you need to know the mass.
What does DIA-NN output?

@ypriverol
Copy link
Member Author

ypriverol commented Apr 16, 2024

@timosachsenberg, DIA-NN used Selenomethionine for X

The line of code is the following:

AASequence.fromString(k).getMonoWeight()

You mean that we have to do k.replace('X', 'X[123.4]')?

For this peptide AVQVHQDTLRTMYFAXR -> AVQVHQDTLRTMYFAX[123.4]R

@jpfeuffer
Copy link
Collaborator

No, in this case, replace with the actual mass of a Selenomethionine residue

@jpfeuffer
Copy link
Collaborator

Or add a new Residue to the ResidueDB and set the onelettercode to something meaningful

@timosachsenberg
Copy link

timosachsenberg commented Apr 16, 2024 via email

@ypriverol
Copy link
Member Author

No, in this case, replace with the actual mass of a Selenomethionine residue

What will be this solution? @timosachsenberg @jpfeuffer ?

@jpfeuffer
Copy link
Collaborator

196.995499

Probably -H2O

@ypriverol
Copy link
Member Author

@jpfeuffer but the way to reply will be:

k.replace('X', '[196.995499]')

My point is the notations.

@jpfeuffer
Copy link
Collaborator

No, your first guess was correct: k.replace('X', 'X[196.995499]'). Probably -18 though. I am not sure.

@ypriverol
Copy link
Member Author

@timosachsenberg -18?

@timosachsenberg
Copy link

timosachsenberg commented Apr 16, 2024

yes without the water.
https://www.ebi.ac.uk/chebi/searchId.do?chebiId=30019
At least thatch’s the residue mass. Do you have the precursor mass from dia-nn then you can double check . Or I can double check tomorrow in the OpenMS code. But I think we work with residue masses …

@timosachsenberg
Copy link

AASequence aa2 = AASequence::fromString("PEPTC(Carbamidomethyl)IDE");
AASequence aa6 = AASequence::fromString("PEPTX[160.030654]IDE");

have same mass.
Cysteine resdiue is 103.0, Carbamidomethyl is +57.3 so the value in the bracket is the residue mass.
For Selenomethionine it is thus 178.98494

@timosachsenberg
Copy link

@cbielow pinging here to check if this is properly documented

@ypriverol
Copy link
Member Author

@timosachsenberg @jpfeuffer :

I will use the following:

X -> X[178.98493453312]   # 196.995499 - 17.003288 - 1.00727646688
U -> U[132.94306553312]   # 150.95363   - 17.003288 - 1.00727646688
O -> O[237.14773053312]    # 255.158295 - 17.003288 - 1.00727646688        

@timosachsenberg
Copy link

What is U and O?
would U->X be better (you wrote U->U) ? or how does DIA-NN encode it in the sequence?

U -> X[132.94306553312]   # 150.95363   - 17.003288 - 1.00727646688
O -> X[237.14773053312]    # 255.158295 - 17.003288 - 1.00727646688        

@ypriverol ypriverol linked a pull request Apr 17, 2024 that will close this issue
11 tasks
@ypriverol
Copy link
Member Author

This is the message from Vadim:

AA['U'] = 150.95363;
AA['X'] = 196.995499;
AA['O'] = 255.158295;
const double proton = 1.00727646688;
const double OH = 17.003288;

@timosachsenberg
Copy link

I guess both works in OpenMS but I think using X for everything "unknown" or non standard residue could be more convenient (and better tested) ;) .

@ypriverol
Copy link
Member Author

I guess both works in OpenMS but I think using X for everything "unknown" or non standard residue could be more convenient (and better tested) ;) .

PR now ready: #369

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants