-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-exact equivalences #159
Comments
Interesting. In the case of InterPro mappings, most (all?) of them were automatically generated using the script famplex/import/interpro_mappings.py Line 89 in ca2f885
(when the mappings were added we used the default Jaccard index threshold of 1). |
I see, that makes sense in the sense that the same set of Hedgehog proteins could have a "Hint domain C-terminal", still, semantically that probably shouldn't be curated as an equivalence. What if we differentiated family and domain entries in Interpro and only added family equivalences? |
That would definitely make sense if we could get that information systematically. |
I found that there is a large number of equivalences in
equivalences.csv
that are not exact matches, for instance, in the case of InterPro mappings. As an example, takeFPLX:Hedgehog
which is mapped to 6 different InterPro entries.One that looks exact is https://www.ebi.ac.uk/interpro/entry/InterPro/IPR001657/ (Hedgehog protein) but the others include e.g., https://www.ebi.ac.uk/interpro/entry/InterPro/IPR001767/ (Hedgehog protein, Hint domain) and https://www.ebi.ac.uk/interpro/entry/InterPro/IPR003586/ (Hint domain C-terminal) which, I don't think should be considered equivalences. I suspect that these might have been added with the goal of adding as many IP->FPLX mappings as possible from sources that produce various groundings in InterPro. Still they are misleading if interpreted in the opposite direction.
The text was updated successfully, but these errors were encountered: