Skip to content

GO_Sim_Util is a software tool designed to calculate the similarity between terms within the Gene Ontology. Developed as part of a computer engineering and informatics diploma project at the University of Patras, this package employs various established methods to analyze the Gene Ontology and quantify term relatedness.

Notifications You must be signed in to change notification settings

alex-d4v/GO_Sim_Util

Repository files navigation

A. General

This is a package for GO term similarity comparison . There are multiple methods from differect categories of bibliography concerning GO term similarity .

B. Set Up

1. Java - 11

  • Install : sudo apt-get install openjdk-11-jdk
  • Add to path : export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

2. OWL Tools :

  1. Get from GitHub : git clone https://github.com/owlcollab/owltools.git
  2. Move to OWLTools-Parent directory .
  3. Install using maven : Run mvn clean install .
  4. Add owltools to path : export PATH=$PATH:directory/to/where/you/downloaded/owltools/OWLTools-Runner/target

3. Python3 Modules

  • ontobio : pip install ontobio
  • networkx : pip install networkx[default,extra]

C. Methods

Edge Based Methods

1. Simple Edge Counting Method

1.1 Between Term Similarity

  • Find the minimum lenght path .
  • Each transition costs 1 .
1.2 Between Entities
  • For each combination of terms from each entity find the minimum path of terms .
  • Sum all the distances and divide by product of number of elements of each set .

2. Weight Edges using the maximum distance from LCA to the root

2.1 Between Term Similarity

  • Find the LCA of terms .
  • Find the maximum distance of LCA from the root .
  • Use the formula from :

$$ W_{path(t_1,t_2)} = \sum_{i=0}^{n}{(w)^i} $$

  • w is a hyperparameter for weight . (.815)

2.2 Between Entities

3. Semantic Value Method

3.1 Between Term Similarity

  • Construct the subgraph for each term $t$ involving all of its ancestors $N_t$ in ontology graph .

$$ G_t = (t, N_t , E_t) $$

  • Calculate s-value for a term in each subgraph using the formula :

$$ S_{t}(t_i) = \begin{equation} \begin{cases}S_{t}(t) = 1 \\ S_t(t_i) = \max{{w_e\cdot S_{t}(t_i')|t_i' \in childrenof(t_i) \in N_t}} \ if \ t \ \neq \ t_i\end{cases} \end{equation} $$

  • Calculate term similarity using the formulas :

$$ S_{GO}(t_1, t_2) = \frac{\sum_{t_i \in N_{t_1}\cap N_{t_2}}{(S_{t_1}(t_i) + S_{t_2}(t_i)}}{SV(t_1) + SV(t_2)} $$

$$ SV(t) = \sum_{t_i \in N_t}{S_t(t_i)} $$

4. Shortest Semantic Differentiation Distance - SSDD

4.1 Between Term Similarity


Information Content Methods

1. Resnik Semantic Similarity

2. Jiang & Conrath Semantic Similarity

3. Lin Semantic Similarity

4. Relevance Semantic Similarity

5. SimIC


Hybrid Methods

1. Integrated Semantic Similarity

2. Hybrid Relative Specificity Similarity

2.1 Between Term Similarity

  • Find Most Informative Common Ancestor - MICA .
  • Calculate Information Content Distance from root .

$$ \alpha_{IC} = -\ln{(p_{IC}(MICA)} $$

  • Find Most Informative Leaf for each term .
  • Calculate Average Information Content Distance from each MIL and term .

$$ \beta_{IC} = \frac{-\ln{(p_{IC}(t_1))} + \ln{(p_{IC}(MIL_1))} - \ln{(p_{IC}(t_2))} + \ln{(p_{IC}(MIL_2))} }{2} $$

  • Find Minimum Path from each term to MICA .

Graph Embedding Methods

1. go2vec

About

GO_Sim_Util is a software tool designed to calculate the similarity between terms within the Gene Ontology. Developed as part of a computer engineering and informatics diploma project at the University of Patras, this package employs various established methods to analyze the Gene Ontology and quantify term relatedness.

Resources

Stars

Watchers

Forks

Packages

No packages published