Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generalize physical entity classification #110

Open
goodb opened this issue Oct 14, 2020 · 8 comments
Open

generalize physical entity classification #110

goodb opened this issue Oct 14, 2020 · 8 comments
Assignees

Comments

@goodb
Copy link
Contributor

goodb commented Oct 14, 2020

As of today, the conversion assumes that physical entities have specific classes associated with them generated automatically as a precondition of building the GO-CAMs. For Reactome this is the REACTO ontology.

For input resources that a) use gene identifiers present in the neo.owl ontology (coming from the GO central GPI file and b) do not use constructs like Sets and c) do not rely heavily on complexes, it would be useful to have the converter make use of the neo IRIs for the physical entity classes.

If the time comes to make this work, look for the currently hard-coded uses of 'GoCAM.reacto_base_iri' in the main BioPax2GO class to get started.

@ukemi
Copy link

ukemi commented Jan 16, 2025

@dustine32

@ukemi ukemi moved this from Software In progress to Tabled for now in Reactome2GO: Pipeline for updating Reactome models Jan 22, 2025
@dustine32
Copy link
Collaborator

I believe this ticket is basically "if a reactome ID can be easily mapped to a MOD ID (or UniProtKB in human's case) then use it rather than creating a REACTO class". This is essentially what #326 is for in the human case. For other non-Reactome sources like YeastPathways, there are extra steps (like xref BioPAX ID, e.g., YDR321W-MONOMER, to SGD ID via SGD GPI xref field) that we may or may not be able to consistently apply to all other sources.

The "generalize" part of the title may allude to centralizing all of this ID conversion code in one place, which I think the methods getPhysicalEntityIRI (code) and getEntityReferenceId (code) attempt to do. However, these two methods are interchangeably called all over the codebase, and it will take a not insignificant amount of work to refactor all of these uses into using a single method without changing conversion behavior.

In short, I think this ticket (if left open) should just cover the larger generalization refactor work, and other more specific requirements (e.g., "FlyBase IDs should be converted like this") should be addressed in separate tickets (ex: #326).

@ukemi
Copy link

ukemi commented Jan 27, 2025

@dustine32 For some reason I thought we had looked at the Reactome BioPax and for things like the Mouse projections, the MGI identifiers were included there. So it might be easier????

@dustine32
Copy link
Collaborator

@ukemi Whether or not fetching MGI IDs is easy, I just think the work to do it should be tracked in a ticket more specific than this one since I assume (I know this means I'm wrong!) that the method/code may be different for other organisms or sources. Right now, this #110 ticket appears to me un-closable until we account for all different cases? What do you think?

Actually, checking now, I don't see any MGI IDs in the Mus_musculus.owl BioPAX from the Reactome site. I do see MGI IDs xref'd on the Reactome website for proteins like Gnpda1 but these are not exported in the BioPAX.

@ukemi
Copy link

ukemi commented Jan 30, 2025

Bummer. So maybe the best thing to do would be to use the UniProt identifiers and the mouse GPI file for mapping since that it supposed to be the 'official' cross references. Perhaps this can be used for everyone that provides a GPI file. But yes, I agree about that ticket. In fact since the GPI file is supposed to have annotatable objects in column 1, then there is no need to search for some kind of string match. Whatever is in column 1 from the Uniprot goes into the model. We should double check that this would work for fly, @sjm41 @rozaru , and worm, @vanaukenk .

@deustp01
Copy link
Collaborator

I do see MGI IDs xref'd on the Reactome website for proteins like Gnpda1 but these are not exported in the BioPAX.

These xrefs are not in the database itself (gk_central) but are created on the fly as part of our release process. I expect we do not want them in our data structure because then we would need to maintain them to keep them current with changes in MGI, etc. Possible workarounds may be something to discuss with Adam Wright (sorry - I don't know his GitHub name). Even without Adam, an agenda item for weeds today?

@ukemi
Copy link

ukemi commented Jan 30, 2025

I think it would be best to use the GPI, which is maintained by MGI, as it should be.

@ukemi
Copy link

ukemi commented Jan 30, 2025

But thinking more about this, I think my column 1 suggestion is too simplistic because there are also uniprot xrefs to pro identifiers in column 1. We would want to use the MGI genes I would think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants