Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defining more enzymes in OpenMS Comet adaptor #354

Open
ypriverol opened this issue Feb 22, 2024 · 9 comments
Open

Defining more enzymes in OpenMS Comet adaptor #354

ypriverol opened this issue Feb 22, 2024 · 9 comments
Assignees
Labels
enhancement New feature or request

Comments

@ypriverol
Copy link
Member

Description of feature

Currently, @timosachsenberg @jpfeuffer comet only support 'Asp-N,Chymotrypsin,CNBr,no cleavage,unspecific cleavage,Trypsin,Arg-C,Lys-C,Lys-N,PepsinA,Trypsin/P,glutamyl endopeptidase' However comet has a way to pass a definition of more enzymes https://uwpr.github.io/Comet/parameters/parameters_202301/search_enzyme_number.html using a parameter file. How can we use that possibility to define for example Lys-C/P currently Lys-C will not work because msgf+ processor change it to Lys-C/P and comet do not supported it.

@ypriverol ypriverol added the enhancement New feature or request label Feb 22, 2024
@timosachsenberg
Copy link

@ypriverol this would actually be a good entry level task for a student that wants to get into OpenMS/C++

@jpfeuffer
Copy link
Collaborator

jpfeuffer commented Feb 22, 2024

For both lysc and multi enzymes you will need to give up consensus id compatibility then.
A fix for Lysc is just a simple if-case logic in the workflow.

Multi enzymes is a large change in both openms and the workflow.
Openms needs to support it in both the data structures and things like indexing. You don't only need support for multiple enzymes but also logic for if they were applied at the same time or after each other.
It will probably also not be compatible with an own or a workflow generated decoy databases unless you run multiple searches with different enzymes (and generate one decoy database for each enzyme).
You will need to use comet's decoy generation.
Therefore it is probably easiest to run comet without the adapter and convert to idxml later on.

@timosachsenberg
Copy link

timosachsenberg commented Feb 22, 2024

I agree with Julianus that properly modelling multienzyme digestion is adding a lot of complexity.
One note: you often see Lys-C/Trypsin combination because it improves cutting after K.
From a search engine perspective, the combination can just be treated as Trypsin (or even Trypsin/P) because Lys-C basically cuts at a subset of Trypsin cutting sites.
So maybe such complexity is not needed?

@ypriverol
Copy link
Member Author

Im trying to tackle here the first use case which is quite common, the use of another enzyme and not multi-enzyme. Then, it should be easy to extend OpenMS to extend enzymes and support them.

@jpfeuffer
Copy link
Collaborator

jpfeuffer commented Feb 22, 2024

We could make this workaround for this special case on the workflow level by allowing multi enzymes on workflow level only.
Then you would see trypsin/lys-c in the workflow reports and trypsin as far as OpenMS is concerned.

Or we start by adding this special case to OpenMS. (Introducing a new mix enzyme).
This would be mainly for reporting reasons then.

@ypriverol
Copy link
Member Author

I don't know why you want to do the mix enzyme. The problem is actually much simpler. We have Lys-C/P which in fact is supported by comet but the Adapter in OpenMS doesn't support it. I want to support it in OpenMS in order to be able to process the dataset that used only Lys-C/P with msgf+ and comet. No mix enzymes.

@jpfeuffer
Copy link
Collaborator

Ah ok I completely misread the issue then haha

@timosachsenberg
Copy link

timosachsenberg commented Feb 23, 2024

Doing Arg-C and Lys-C before trypsin is not an issue.

<ITEM name="RegExDescription" value="Arg-C cuts after R residue unless the next residue is P." type="string" />
<ITEM name="RegExDescription" value="Lys-C cuts after K if not followed by P." type="string" />

but Glu-C as listen on: https://www.ebi.ac.uk/pride/archive/projects/PXD005200
is an issue. It cleaves mainly after E but also after D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants