-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate report for Reactome #24
Comments
This report will be generated based on the go-cam models meant to be as semantically correct with respect to Reactome objectives as possible. This means that it will not be based on models that have been adapted specifically to support curation objectives from the GO. For example, complexes will not be removed or decomposed as Reactome is interested in these. |
Columns for report: |
Note that the report should have one row per unique reactome id.. |
manual_plus_inferred_mapping.txt Here is the current report on inferred classes for Reactome entities. Ping @deustp01 @ukemi @thomaspd @cmungall Notes on run:
|
Adding ping to connect @fabregat to this thread. |
Hi @goodb, I was just taking a look at the report you enclosed above. It looks like some of the data is about Reactome Disease models. We are filtering those in the loads, aren't we? We certainly don't want annotations from GO_CAM models that represent the disease state. |
Eg. Biological Process Defective ABCD1 causes adrenoleukodystrophy (ALD) R-HSA-5684045 |
Would the "disease" attribute or Reactome physical entities and events be useful as a filter - for GO, you'd only want instances for which the attribute value is NULL? |
We need to learn more from you about the Reactome disease annotations. Are whole pathways labeled as disease pathways? Since GO only annotates 'normal' biology we wouldn't want pathways that represent a disease state. Mind you in the long run it would be fascinating to do the semantic comparison of the 'disease' pathways versus the 'healthy' pathways. |
@ukemi (first note that a lot has changed since that run, so the inference report is likely going to be very different now). Right now the code does not do any filtering of the disease models. It looks like the BioPAX for these models isn't really complete enough to develop a proper model anyway. e.g. looking at 'Defective ABCD1 causes adrenoleukodystrophy (ALD)' there isn't any structured data about the disease, the mutant genes, or the relation to the normal pathway in the BioPAX export. If we ever do want to turn this information into GO-CAMs (which I personally think would be very valuable for building analyses) we'd need some work on their end to improve the BioPAX or we'd need to access the data another way. (Ping @deustp01 ) For now I'll add the disease filter to the converter. See #58 |
@deustp01 I don't see any 'disease' information coming through the BioPAX. If there was such a tag, that would be very helpful. My plan was to make use of the Disease pathway hierarchy and ignore any of the subpathways there. Just visually from your browser that looks like it ought to work. |
|
The relationship between disease pathways and their normal counterparts is complicated, and doesn't work very well for us. A plan to revise it substantially is in the works but it will be a really large effort so it's not clear when it is going to happen. Meanwhile, there are about three versions of disease pathway: loss-of-function, a variant gene encodes a nonfunctional protein or no protein at all so any reaction dependent on that protein fails (phenylketonuria, adrenoleukodystrophy); gain-of-function, a variant gene has a novel function so a reaction dependent on the protein is altered (constitutively active mutant forms of signaling proteins); a pathogen introduces novel alien proteins into a human cell and those proteins mediate novel reactions with no normal human counterpart. In every case though, the basic unit of disease annotation is a disease pathway containing one or more disease reactions involving abnormal proteins and possibly abnormal molecules of other sorts, e.g., lipopolysaccharide. If a disease reaction has a normal counterpart, that is noted. All loss-of-function annotations point to the reaction that would have happened if the normal protein had been available, for example. But I'm not sure how any of this is represented in the BioPax export. Try getting a BioPax download of an individual disease pathway and see what it contains. |
I looked at one and basically none of that information comes through - just the reactions involved and their participants. Even the mutant gene in the one I looked at was not there. So.. if and when we want to go down this road we will need to think through how to do it. Perhaps another case for working on a BioPAX level 4... |
Grasping at another straw here, do the modified-residue attributes of protein (entity with accessioned sequence) instances come through, specifically ones of the genetically modified residue subclass? That's how we annotate the sequence variants that differentiate a mutant disease protein from its canonical UniProt normal counterpart. Any protein with a non-null genetically modified residue attribute is a disease protein, and any reaction involving that protein is a disease reaction. |
We do get BioPAX "ModificationFeature" annotations on the mutants. These are linked to a SequenceSite and a SequenceModificationVocabulary annotation (e.g. L-arginine removal) which in turn is xrefed to something with db MOD and an id like MOD:01632 . Getting to this info. is possible but a bit complex. My impression is that it would be easier and perhaps more consistent if we just use the disease subtree to filter these out for now. |
The disease subtree should be equally reliable. |
Show what GO classes can be inferred - show where they match existing annotations, where they differ, and where they differ if they are 'deeper'.
The text was updated successfully, but these errors were encountered: