Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop #311

Merged
merged 26 commits into from
Nov 24, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
8517438
Setup.py added
Demirrr Oct 30, 2023
1b19f6b
__repr__ included for EvoLearnerNode to show list of respective objects
Demirrr Oct 30, 2023
a2bf10c
Readme updated via examples and DRILL refactored
Demirrr Oct 30, 2023
4e481dc
gradio included
Demirrr Oct 30, 2023
641b7b6
Text reduced
Demirrr Oct 30, 2023
2b9df00
Merge branch 'develop' into DRILL
Demirrr Oct 30, 2023
04aeacd
Merge pull request #307 from dice-group/DRILL
Demirrr Oct 30, 2023
b4e0182
Added more support for triplestore calls
alkidbaci Nov 2, 2023
99e0a3b
Initialize KB with triplestore address only
alkidbaci Nov 2, 2023
fbaa456
Fixed query when owl:Thing used in UNION, Intersect or as filler
alkidbaci Nov 2, 2023
e32494a
Changed example to run CELOE instead
alkidbaci Nov 2, 2023
dc0972b
Updated converter test to recent changes
alkidbaci Nov 2, 2023
245161d
Refactoring
alkidbaci Nov 3, 2023
af61c2c
Updated documentation
alkidbaci Nov 3, 2023
6bdede0
Refactoring KB
alkidbaci Nov 3, 2023
f7f9e60
Merge pull request #308 from dice-group/retrieval_via_triplestore
Demirrr Nov 7, 2023
34de594
Added 'requests' package
alkidbaci Nov 9, 2023
dd79bcc
Removed 'use_triplestore' argument
alkidbaci Nov 9, 2023
934b725
Fixed an issue
alkidbaci Nov 9, 2023
7235127
Added `verbalize` method for CL models. #292
alkidbaci Nov 21, 2023
cfc96f6
Refactoring: owlapy is now a dependency
alkidbaci Nov 22, 2023
e51e435
Documentation update
alkidbaci Nov 22, 2023
17921fb
Removed unused files
alkidbaci Nov 23, 2023
5133ca2
Changed description for CL execution
alkidbaci Nov 23, 2023
e3379ea
Added verbalization example
alkidbaci Nov 23, 2023
21b818d
Merge pull request #310 from dice-group/general_adjustments
Demirrr Nov 24, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
207 changes: 39 additions & 168 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,217 +1,88 @@
# Ontolearn

*Ontolearn* is an open-source software library for description logic learning problem.
Find more in the [Documentation](https://ontolearn-docs-dice-group.netlify.app/usage/01_introduction).

Learning algorithms:
- **Drill** → [Neuro-Symbolic Class Expression Learning](https://www.ijcai.org/proceedings/2023/0403.pdf)
- **EvoLearner** → [EvoLearner: Learning Description Logics with Evolutionary Algorithms](https://dl.acm.org/doi/abs/10.1145/3485447.3511925)
- **NCES2** → (soon) [Neural Class Expression Synthesis in ALCHIQ(D)](https://papers.dice-research.org/2023/ECML_NCES2/NCES2_public.pdf)
- **NCES** → [Neural Class Expression Synthesis](https://link.springer.com/chapter/10.1007/978-3-031-33455-9_13)
- **NERO** → [Learning Permutation-Invariant Embeddings for Description Logic Concepts](https://link.springer.com/chapter/10.1007/978-3-031-30047-9_9)
- **NERO** → (soon) [Learning Permutation-Invariant Embeddings for Description Logic Concepts](https://link.springer.com/chapter/10.1007/978-3-031-30047-9_9)
- **CLIP** → (soon) [Learning Concept Lengths Accelerates Concept Learning in ALC](https://link.springer.com/chapter/10.1007/978-3-031-06981-9_14)
- **CELOE** → [Class Expression Learning for Ontology Engineering](https://www.sciencedirect.com/science/article/abs/pii/S1570826811000023)
- **OCEL** → A limited version of CELOE

You can find more details about these algorithms and what *Ontolearn* has to offer in the [documentation](https://ontolearn-docs-dice-group.netlify.app/index.html).

Quick navigation:
- [Installation](#installation)
- [Quick try-out](#quick-try-out)
- [Usage](#usage)
- [Relevant Papers](#relevant-papers)

## Installation
For detailed instructions please refer to the [installation guide](https://ontolearn-docs-dice-group.netlify.app/usage/installation.html) in the documentation.

### Installation from source

Make sure to set up a virtual python environment like [Anaconda](https://www.anaconda.com/) before continuing with the installation.

To successfully pass all the tests you need to download some external resources in advance
(see [_Download external files_](#download-external-files)). You will need
at least to download the datasets. Also, install _java_ and _curl_ if you don't have them in your system already:

```commandline
sudo apt install openjdk-11-jdk
sudo apt install curl
```

A quick start up will be as follows:

```shell
git clone https://github.com/dice-group/Ontolearn.git && conda create --name onto python=3.8 && conda activate onto
pip3 install -r requirements.txt && python -c "import ontolearn"
# wget https://files.dice-research.org/projects/Ontolearn/KGs.zip -O ./KGs.zip && unzip KGs.zip
# python -m pytest tests # Partial test with pytest
pip install ontolearn
```
or
```shell
pip install ontolearn # more on https://pypi.org/project/ontolearn/
```

## Quick try-out

You can execute the script `deploy_cl.py` to deploy the concept learners in a local web server and try
the algorithms using an interactive interface made possible by [Gradio](https://www.gradio.app/). Currently,
you can only deploy the following concept learners: **NCES**, **EvoLearner**, **CELOE** and **OCEL**.

> **NOTE: In case you don't have a dataset, don't worry, you can use
> the datasets we store in our data server. See _[Download external files](#download-external-files)_.**

For example the command below will launch an interface using **EvoLearner** as the model on
the **Family** dataset which is a simple dataset with 202 individuals:

```shell
python deploy_cl.py --model evolearner --path_knowledge_base KGs/Family/family-benchmark_rich_background.owl
git clone https://github.com/dice-group/Ontolearn.git && conda create --name onto python=3.8 && conda activate onto
pip3 install -r requirements.txt && python -c "import ontolearn"
wget https://files.dice-research.org/projects/Ontolearn/KGs.zip -O ./KGs.zip && unzip KGs.zip
python -m pytest tests # Partial test with pytest
```

Once you run this command, a local URL where the model is deployed will be provided to you.


In the interface you need to enter the positive and the negative examples. For a quick run you can
click on the **Random Examples** checkbox, but you may as well enter some real examples for
the learning problem of **Aunt**, **Brother**, **Cousin**, etc. which
you can find in the file `examples/synthetic_problems.json`. Just copy and paste the IRIs of
positive and negative examples for a specific learning problem directly
in their respective fields.

Run the help command to see the description on this script usage:

```shell
python deploy_cl.py --help
```

## Usage

In the [examples](https://github.com/dice-group/Ontolearn/tree/develop/examples) folder, you can find examples on how to use
the learning algorithms. Also in the [tests](https://github.com/dice-group/Ontolearn/tree/develop/tests) folder you can find the test cases.

For more detailed instructions we suggest to follow the [guides](https://ontolearn-docs-dice-group.netlify.app/usage/06_concept_learners) in the documentation.

Below we give a simple example on using CELOE to learn class expressions for a small dataset.

```python
from ontolearn.concept_learner import CELOE
from ontolearn.model_adapter import ModelAdapter
from ontolearn.owlapy.model import OWLNamedIndividual, IRI
from ontolearn.owlapy.namespaces import Namespaces
from ontolearn.owlapy.render import DLSyntaxObjectRenderer
from ontolearn.owlapy.owlready2.complex_ce_instances import OWLReasoner_Owlready2_ComplexCEInstances

NS = Namespaces('ex', 'http://example.com/father#')

# Defining the learning problem
positive_examples = {OWLNamedIndividual(IRI.create(NS, 'stefan')),
OWLNamedIndividual(IRI.create(NS, 'markus')),
OWLNamedIndividual(IRI.create(NS, 'martin'))}
negative_examples = {OWLNamedIndividual(IRI.create(NS, 'heinz')),
OWLNamedIndividual(IRI.create(NS, 'anna')),
OWLNamedIndividual(IRI.create(NS, 'michelle'))}

# Create a model of CELOE using ModelAdapter
# Only the class of the learning algorithm is specified
model = ModelAdapter(learner_type=CELOE,
reasoner_type=OWLReasoner_Owlready2_ComplexCEInstances,
path="KGs/father.owl")

# Fit the learning problem to the model
model.fit(pos=positive_examples,
neg=negative_examples)

# Used to render to description logics syntax
from ontolearn.knowledge_base import KnowledgeBase
from ontolearn.concept_learner import CELOE, OCEL, EvoLearner, Drill
from ontolearn.learning_problem import PosNegLPStandard
from ontolearn.metrics import F1
from owlapy.model import OWLNamedIndividual, IRI
from ontolearn.utils import setup_logging
from owlapy.render import DLSyntaxObjectRenderer
setup_logging()
renderer = DLSyntaxObjectRenderer()
max_runtime, topk=1, 3
kb = KnowledgeBase(path="../KGs/Family/family-benchmark_rich_background.owl")
lp = PosNegLPStandard(pos={OWLNamedIndividual(IRI.create(p)) for p in
{"http://www.benchmark.org/family#F10F175",
"http://www.benchmark.org/family#F10F177"}},
neg={OWLNamedIndividual(IRI.create("http://www.benchmark.org/family#F9M142"))})

# Print the rendered top best hypothesis
for desc in model.best_hypotheses(1):
print('The result:', renderer.render(desc.concept), 'has quality', desc.quality)
```
The goal in this example is to learn a class expression for the concept "father".
The output is as follows:
```
The result: (¬female) ⊓ (∃ hasChild.⊤) has quality 1.0
```

NCES is a powerful algorithm implemented recently.
For a quick start on how to use it, please refer to the notebook [simple usage NCES](examples/simple-usage-NCES.ipynb).

----------------------------------------------------------------------------

#### Download external files

Some resources like pre-calculated embeddings or `pre_trained_agents` and datasets (ontologies)
are not included in the repository directly. Use the command line command `wget`
to download them from our data server.

> **NOTE: Before you run this commands in your terminal, make sure you are
in the root directory of the project!**

To download the datasets:

```shell
wget https://files.dice-research.org/projects/Ontolearn/KGs.zip -O ./KGs.zip
```

Then depending on your operating system, use the appropriate command to unzip the files:

```shell
# Windows
tar -xf KGs.zip

# or
preds_evo = list(EvoLearner(knowledge_base=kb, quality_func=F1(), max_runtime=max_runtime).fit(lp).best_hypotheses(n=topk))
preds_celoe = list(CELOE(knowledge_base=kb, quality_func=F1(), max_runtime=max_runtime).fit(lp).best_hypotheses(n=topk))
preds_ocel = list(OCEL(knowledge_base=kb, quality_func=F1(), max_runtime=max_runtime).fit(lp).best_hypotheses(n=topk))
preds_drill = list(Drill(knowledge_base=kb, quality_func=F1(), max_runtime=max_runtime).fit(lp).best_hypotheses(n=topk))

# macOS and Linux
unzip KGs.zip
for i in range(3):
print(f"{i+1}.Pred:\n"
f"DRILL:{renderer.render(preds_drill[i].concept)}\n"
f"EvoLearner:{renderer.render(preds_celoe[i].concept)}\n"
f"CELOE:{renderer.render(preds_celoe[i].concept)}\nOCEL:{renderer.render(preds_ocel[i].concept)}\n")
```

Finally, remove the _.zip_ file:
Fore more please refer to the [examples](https://github.com/dice-group/Ontolearn/tree/develop/examples) folder.

```shell
rm KGs.zip
```

And for NCES data:
## Deployment

```shell
wget https://files.dice-research.org/projects/NCES/NCES_Ontolearn_Data/NCESData.zip -O ./NCESData.zip
unzip NCESData.zip
rm NCESData.zip
pip install gradio
```


----------------------------------------------------------------------------

#### Building (sdist and bdist_wheel)

You can use <code>tox</code> to build sdist and bdist_wheel packages for Ontolearn.
- "sdist" is short for "source distribution" and is useful for distribution of packages that will be installed from source.
- "bdist_wheel" is short for "built distribution wheel" and is useful for distributing packages that include large amounts of compiled code, as well as for distributing packages that have complex dependencies.

To build and compile the necessary components of Ontolearn, use:
To deploy **EvoLearner** on the **Family** knowledge graph. Available models to deploy: **EvoLearner**, **NCES**, **CELOE** and **OCEL**.
```shell
tox -e build
```

To automatically build and test the documentation of Ontolearn, use:
```shell
tox -e docs
python deploy_cl.py --model evolearner --path_knowledge_base KGs/Family/family-benchmark_rich_background.owl
```
Run the help command to see the description on this script usage:

----------------------------------------------------------------------------

#### Simple Linting

Using the following command will run the linting tool [flake8](https://flake8.pycqa.org/) on the source code.
```shell
flake8
python deploy_cl.py --help
```

----------------------------------------------------------------------------

### Citing
Currently, we are working on our manuscript describing our framework.
If you find our work useful in your research, please consider citing the respective paper:
```
# DRILL
@inproceedings{demir2023drill,
added-at = {2023-08-01T11:08:41.000+0200},
author = {Demir, Caglar and Ngomo, Axel-Cyrille Ngonga},
booktitle = {The 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023},
title = {Neuro-Symbolic Class Expression Learning},
Expand Down
23 changes: 14 additions & 9 deletions deploy_cl.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import pandas as pd
import torch
import gradio as gr
from argparse import ArgumentParser
import random
import os
Expand All @@ -15,8 +14,14 @@
from ontolearn.learning_problem import PosNegLPStandard
from ontolearn.refinement_operators import ModifiedCELOERefinement
from ontolearn.value_splitter import EntropyValueSplitter, BinningValueSplitter
from ontolearn.owlapy.model import OWLNamedIndividual, IRI
from ontolearn.owlapy.render import DLSyntaxObjectRenderer
from owlapy.model import OWLNamedIndividual, IRI
from owlapy.render import DLSyntaxObjectRenderer

try:
import gradio as gr
except ImportError as e:
raise ImportError("Gradio not found! Please install gradio to use this script --> `pip install gradio`")


metrics = {'F1': F1,
'Accuracy': Accuracy,
Expand Down Expand Up @@ -141,15 +146,15 @@ def predict(positive_examples, negative_examples, random_examples, size_of_examp
with gr.Row():
i6 = gr.Checkbox(label="Terminate on goal", value=True)
i8 = gr.Checkbox(label="Use data properties", value=True)
i9 = gr.Checkbox(label="Use card restrictions", value=True)
i9 = gr.Checkbox(label="Use cardinality restrictions for object properties", value=True)
i10 = gr.Checkbox(label="Use inverse", value=False)
with gr.Row():
i7 = gr.Number(label="Maximum runtime", value=600)
i11 = gr.Number(label="Tournament size", value=7)
i13 = gr.Number(label="Population size", value=800)
with gr.Row():
i14 = gr.Number(label="Num generations", value=200)
i12 = gr.Number(label="Card limit", value=10)
i12 = gr.Number(label="Cardinality limit for object properties", value=10)
i15 = gr.Number(label="Height limit", value=17)
gr.Markdown("Set arguments for the fitness function (LinearPressureFitness)")
with gr.Box():
Expand Down Expand Up @@ -291,8 +296,8 @@ def predict(positive_examples, negative_examples, random_examples: bool, size_of
info="For the value splitter: BinningValueSplitter")
i18 = gr.Number(label="Maximum child length", value=10, info="\n")
with gr.Row():
i22 = gr.Checkbox(label="Use card restrictions", value=True)
i26 = gr.Number(label="Card limit", value=10)
i22 = gr.Checkbox(label="Use cardinality restrictions for object properties", value=True)
i26 = gr.Number(label="Cardinality limit for object properties", value=10)
with gr.Row():
i19 = gr.Checkbox(label="Use negation", value=True)
i20 = gr.Checkbox(label="Use all constructors", value=True)
Expand Down Expand Up @@ -405,8 +410,8 @@ def predict(positive_examples, negative_examples, random_examples: bool, size_of
info="For the value splitter: BinningValueSplitter")
i16 = gr.Number(label="Maximum child length", value=10, info="\n")
with gr.Row():
i20 = gr.Checkbox(label="Use card restrictions", value=True)
i24 = gr.Number(label="Card limit", value=10)
i20 = gr.Checkbox(label="Use cardinality restrictions for object properties", value=True)
i24 = gr.Number(label="Cardinality limit for object properties", value=10)
with gr.Row():
i17 = gr.Checkbox(label="Use negation", value=True)
i18 = gr.Checkbox(label="Use all constructors", value=True)
Expand Down
38 changes: 23 additions & 15 deletions docs/usage/01_introduction.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Ontolearn

**Version:** ontolearn 0.5.4
**Version:** ontolearn 0.6.0

**GitHub repository:** [https://github.com/dice-group/Ontolearn](https://github.com/dice-group/Ontolearn)

Expand All @@ -14,28 +14,36 @@

Ontolearn is an open-source software library for explainable structured machine learning in Python.

For the core module [owlapy](ontolearn.owlapy) Ontolearn is based on [Owlready2](https://owlready2.readthedocs.io/en/latest/index.html),
a package for manipulating OWL 2.0 ontologies in Python. In addition, we have implemented
a higher degree of code for manipulation OWL 2.0 ontologies, in pursuit of making it
easier, more flexible and of course, having this all in Python. This adaptation of
Owlready2 library made it possible to build more complex algorithms.

Ontolearn started with the goal of using _Explainable Structured Machine Learning_
in OWL 2.0 ontologies and this
exactly what our library offers. The main contribution are the exclusive concept learning
algorithms that are part of this library. Currently, we have 4 fully functioning algorithms that
learn concept in description logics. Papers can be found [here](09_further_resources.md).

Ontolearn can do the following:
For the base (core) module of Ontolearn we use [owlapy](https://github.com/dice-group/owlapy)
combined with [Owlready2](https://owlready2.readthedocs.io/en/latest/index.html). _Owlapy_ is a python package
based on owlapi and implemented by us, the Ontolearn team. For the sake of modularization we
have moved it in a separate repository. On the other side _Owlready2_ is a package for manipulating
OWL 2.0 ontologies in Python. So in the end we have implemented
a higher degree of code for manipulation OWL 2.0 ontologies, in pursuit of making it
easier to implement and understand, and of course, having this all in Python. This adaptation of
Owlready2 library made it possible to build more complex algorithms.

---------------------------------------

- Load/save ontologies in RDF/XML, OWL/XML
- Modify ontologies by adding/removing axioms
- Access individuals/classes/properties of an ontology (and a lot more)
- Define learning problems
- Construct class expressions
- Use concept learning algorithms to classify positive examples in a learning problem
**Ontolearn (including owlapy) can do the following:**

- Load/save ontologies in RDF/XML, OWL/XML.
- Modify ontologies by adding/removing axioms.
- Access individuals/classes/properties of an ontology (and a lot more).
- Define learning problems.
- Construct class expressions.
- Use concept learning algorithms to classify positive examples in a learning problem.
- Use local datasets or datasets that are hosted on a triplestore server, for the learning task.
- Reason over an ontology.
- Other convenient functionalities like converting OWL class expressions to SPARQL or DL syntax
- Other convenient functionalities like converting OWL class expressions to SPARQL or DL syntax.

------------------------------------

The rest of the content is build as a top-to-bottom guide, but nevertheless self-containing, where
you can learn more in depth about the capabilities of Ontolearn.
Expand Down
Loading
Loading