diff --git a/docs/usage/01_introduction.md b/docs/usage/01_introduction.md index 92ca860c..07e6b3a7 100644 --- a/docs/usage/01_introduction.md +++ b/docs/usage/01_introduction.md @@ -20,8 +20,7 @@ exactly what our library offers. The main contribution are the exclusive concept algorithms that are part of this library. Currently, we have 6 fully functioning algorithms that learn concept in description logics. Papers can be found [here](09_further_resources.md). -For the base (core) module of Ontolearn we use [owlapy](https://github.com/dice-group/owlapy) -which on its end uses [Owlready2](https://owlready2.readthedocs.io/en/latest/index.html). _Owlapy_ is a python package +For the base (core) module of Ontolearn we use [owlapy](https://github.com/dice-group/owlapy). _Owlapy_ is a python package based on owlapi (the java counterpart), and implemented by us, the Ontolearn team. For the sake of modularization we have moved it in a separate repository. The modularization aspect helps us to increase readability and reduce complexity. @@ -46,4 +45,4 @@ for ontology manipulation and reasoning as well. ------------------------------------ The rest of content after "examples" is build as a top-to-bottom guide, but nevertheless self-containing, where -you can learn more in depth about the capabilities of Ontolearn. +you can learn more in depth about the components of Ontolearn. diff --git a/docs/usage/02_installation.md b/docs/usage/02_installation.md index d388aa09..fce7b010 100644 --- a/docs/usage/02_installation.md +++ b/docs/usage/02_installation.md @@ -79,10 +79,9 @@ make use of the replace all functionality to change them. ## Download External Files Some resources like pre-calculated embeddings or `pre_trained_agents` and datasets (ontologies) -are not included in the repository directly. Use the command line command `wget` -to download them from our data server. +are not included in the repository directly. Use the command `wget` to download them from our data server. -> **NOTE: Before you run this commands in your terminal, make sure you are +> **NOTE: Before you run the following commands in your terminal, make sure you are in the root directory of the project!** To download the datasets: @@ -109,6 +108,14 @@ Finally, remove the _.zip_ file: rm KGs.zip ``` +To download learning problems: + +```shell +wget https://files.dice-research.org/projects/Ontolearn/LPs.zip +``` + +Follow the same steps to unzip as the in the KGs case. + -------------------------------------------------------- ### NCES data: @@ -130,7 +137,7 @@ rm -f NCESData.zip ### CLIP data: -```commandline +```shell wget https://files.dice-research.org/projects/Ontolearn/CLIP/CLIPData.zip unzip CLIPData.zip rm CLIPData.zip @@ -143,11 +150,18 @@ it is necessary to use the `build` tool. It can be invoked with: ```shell python -m build + +# or + +python setup.py bdist_wheel sdist ``` -from the main source code folder. Packages created by `build` can then -be uploaded as releases to the [Python Package Index (PyPI)](https://pypi.org/) using -[twine](https://pypi.org/project/twine/). +Distribution packages that are created, can then +be published to the [Python Package Index (PyPI)](https://pypi.org/) using [twine](https://pypi.org/project/twine/). + +```shell +py -m twine upload --repository pypi dist/* +``` ### Building the docs @@ -167,12 +181,17 @@ sphinx-build -M latex docs/ docs/_build/ ## Simple Linting -Using the following command will run the linting tool [flake8](https://flake8.pycqa.org/) on the source code. +You can lint check using [flake8](https://flake8.pycqa.org/): ```shell flake8 ``` -Additionally, you can specify the path where you want to flake8 to run. +or ruff: +```shell +ruff check +``` + +Additionally, you can specify the path where you want to execute the linter. ---------------------------------------------------------------------- diff --git a/docs/usage/03_examples.md b/docs/usage/03_examples.md index 57b52950..9eaa6927 100644 --- a/docs/usage/03_examples.md +++ b/docs/usage/03_examples.md @@ -2,7 +2,7 @@ In this guide we will show some non-trival examples of typical use-cases of Ontolearn which you can also find in the -[examples](https://github.com/dice-group/Ontolearn/tree/develop/examples) folder. +[examples](https://github.com/dice-group/Ontolearn/tree/master/examples) folder. ## Ex. 1: Learning Over a Local Ontology @@ -133,7 +133,7 @@ save_owl_class_expressions(expressions=h, path="owl_prediction") Here we have used the triplestore endpoint as you see in step _(1)_ which is available only on a private network. However, you can host your own triplestore server following [this guide](06_concept_learners.md#loading-and-launching-a-triplestore) -and run TDL using you own local endpoint. +and run TDL using you own local endpoint. We have a [script](https://github.com/dice-group/Ontolearn/blob/master/examples/concept_learning_via_triplestore_example.py) for that also. -------------------------------------------------------------- @@ -263,6 +263,6 @@ if __name__ == '__main__': ----------------------------------------------------------- -In the next guide we will explore the [KnowledgeBase](ontolearn.knowledge_base.KnowledgeBase) class that is needed to +In the next guide we will explore the [KnowledgeBase](ontolearn.knowledge_base.KnowledgeBase) class which is needed to run a concept learner. diff --git a/docs/usage/04_knowledge_base.md b/docs/usage/04_knowledge_base.md index 654a0752..21a33a92 100644 --- a/docs/usage/04_knowledge_base.md +++ b/docs/usage/04_knowledge_base.md @@ -1,12 +1,19 @@ # Knowledge Bases -In Ontolearn we represent a knowledge base -by the class [KnowledgeBase](ontolearn.knowledge_base.KnowledgeBase) which contains two main class attributes, -an ontology [AbstractOWLOntology](https://dice-group.github.io/owlapy/autoapi/owlapy/owl_ontology/index.html#owlapy.owl_ontology.AbstractOWLOntology) -and a reasoner [AbstractOWLReasoner](https://dice-group.github.io/owlapy/autoapi/owlapy/owl_reasoner/index.html#owlapy.owl_reasoner.AbstractOWLReasoner). -It also contains the class and properties hierarchy as well as other -Ontology-related attributes required for the Structured Machine Learning library. +In Ontolearn a knowledge base is represented +by an implementor of [AbstractKnowledgeBase](ontolearn.abstracts.AbstractKnowledgeBase) which contains two main +attributes, an ontology of type [AbstractOWLOntology](https://dice-group.github.io/owlapy/autoapi/owlapy/owl_ontology/index.html#owlapy.owl_ontology.AbstractOWLOntology) +and a reasoner of type [AbstractOWLReasoner](https://dice-group.github.io/owlapy/autoapi/owlapy/owl_reasoner/index.html#owlapy.owl_reasoner.AbstractOWLReasoner). Be careful, different implementations of these abstract classes +are not compatible with each other. For example, you can not use [TripleStore](ontolearn.triple_store.TripleStore) +knowledge base with +[StructuralReasoner](https://dice-group.github.io/owlapy/autoapi/owlapy/owl_reasoner/index.html#owlapy.owl_reasoner.StructuralReasoner), +but you can use _TripleStore_ KB with [TripleStoreReasoner](ontolearn.triple_store.TripleStoreReasoner). +_AbstractKnowledgeBase_ contains the necessary methods to facilitate _Structured Machine Learning_. +Currently, there are two implementation of _AbstractKnowledgeBase_: + +- [KnowledgeBase](ontolearn.knowledge_base.KnowledgeBase) → used for local datasets. +- [TripleStore](ontolearn.triple_store.TripleStore) → used for datasets hosted on a server. ## Knowledge Base vs Ontology @@ -14,20 +21,21 @@ These terms may be used interchangeably sometimes but in Ontolearn they are not although they share a lot of similarities. An ontology in owlapy, as explained [here](https://dice-group.github.io/owlapy/usage/ontologies.html) is the object where we load the OWL 2.0 ontologies from a _.owl_ file containing the ontology in an RDF/XML or OWL/XML format. -On the other side a KnowledgeBase is a class which combines an ontology and a reasoner together. +On the other side a knowledge base combines an ontology and a reasoner together. Therefore, differently from the ontology you can use methods that require reasoning. You can check the methods for each in the links below: -- [KnowledgeBase](ontolearn.knowledge_base.KnowledgeBase) +- [AbstractKnowledgeBase](ontolearn.knowledge_base.AbstractKnowledgeBase) - [AbstractOWLOntology](https://dice-group.github.io/owlapy/autoapi/owlapy/owl_ontology/index.html#owlapy.owl_ontology.AbstractOWLOntology) In summary: -- An instance of `KnowledgeBase` contains an ontology and a reasoner and +- An implementation of `AbstractKnowledgeBase` contains an ontology and a reasoner and is required to run a learning algorithm. -- The ontology object can load an OWL 2.0 ontology, -be modified using the ontology manager and saved. +- An ontology represents the OWL 2 ontology you have locally or hosted on triplestore server. Using class methods you +can retrieve information from signature of this ontology. In case of a local the ontology, it can be modified and +saved. - Although they have some similar functionalities, there are a lot of other distinct functionalities that each of them has. @@ -49,10 +57,6 @@ kb = KnowledgeBase(path="file://KGs/Family/father.owl") What happens in the background is that the ontology located in this path will be loaded in the `AbstractOWLOntology` object of `kb` as done [here](https://dice-group.github.io/owlapy/usage/ontologies.html#loading-an-ontology). -In our recent version you can also initialize a knowledge base using a dataset hosted in a triplestore. -Since that knowledge base is mainly used for executing a concept learner, we cover that matter more in depth -in _[Use Triplestore Knowledge Base](06_concept_learners.md#use-triplestore-knowledge-base)_ -section of _[Concept Learning](06_concept_learners.md)_. ## Ignore Concepts @@ -120,173 +124,21 @@ all_individuals = kb.individuals() You can as well get all the individuals using: ```python -all_individuals_set = kb.all_individuals_set() -``` -The difference is that `individuals()` return type is `Iterable[OWLNamedIndividual]` -and `all_individuals_set()` return type is `frozenset(OWLNamedIndividual)`. +from owlapy.class_expression import OWLThing -In case you need your result as frozenset, `individual_set` method is a better option -then the `individuals` method: - -```python -male_individuals_set = kb.individuals_set(male_concept) +all_individuals_set = kb.individuals_set(OWLThing) ``` +The difference is that `individuals()` return type is generator. +and `individuals_set()` return type is frozenset. -Or you can even combine both methods: - -```python -male_individuals_set = kb.individuals_set(male_individuals) -``` - - -## Evaluate a Concept - -When using a concept learner, the generated concepts (class expressions) for a certain learning problem -need to be evaluated to see the performance. -To do that you can use the method `evaluate_concept` of `KnowledgeBase`. It requires the following arguments: - -1. a concept to evaluate: [OWLClassExpression](https://dice-group.github.io/owlapy/autoapi/owlapy/class_expression/class_expression/index.html#owlapy.class_expression.class_expression.OWLClassExpression) -2. a quality metric: [AbstractScorer](ontolearn.abstracts.AbstractScorer) -3. the encoded learning problem: [EncodedLearningProblem](ontolearn.learning_problem.EncodedPosNegLPStandard) - -The evaluation should be done for the learning problem that you used to generate the -concept. The main result of the evaluation is the quality score describing how well the generated -concept is doing on the job of classifying the positive individuals. The concept learners do this -process automatically. - -### Construct a learning problem - -To evaluate a concept you need a learning problem. Firstly, we create two simple sets containing -the positive and negative examples for the concept of 'Father'. Our positive examples -(individuals to describe) are stefan, markus, and martin. And our negative examples -(individuals to not describe) are heinz, anna, and michelle. +For large amount of data `individuals()` is more computationally efficient: ```python -from owlapy.owl_individual import OWLNamedIndividual - -positive_examples = {OWLNamedIndividual(IRI.create(NS, 'stefan')), - OWLNamedIndividual(IRI.create(NS, 'markus')), - OWLNamedIndividual(IRI.create(NS, 'martin'))} - -negative_examples = {OWLNamedIndividual(IRI.create(NS, 'heinz')), - OWLNamedIndividual(IRI.create(NS, 'anna')), - OWLNamedIndividual(IRI.create(NS, 'michelle'))} -``` - -Now the learning problem can be captured in its respective object, the -[positive-negative standard learning problem](ontolearn.learning_problem.PosNegLPStandard) and -encode it using the method `encode_learning_problem` of `KnowledgeBase`: - - -```python -from ontolearn.learning_problem import PosNegLPStandard - -lp = PosNegLPStandard(pos=positive_examples, neg=negative_examples) - -encoded_lp = kb.encode_learning_problem(lp) -``` - -Now that we have an encoded learning problem, we need a concept to evaluate. - -### Construct a concept - -Suppose that the class expression `(¬female) ⊓ (∃ hasChild.⊤)` -was generated by [CELOE](ontolearn.concept_learner.CELOE) -for the concept of 'Father'. We will see how that can happen later -but for now we let's construct this class expression manually: - - -```python -from owlapy.owl_property import OWLObjectProperty -from owlapy.class_expression import OWLObjectSomeValuesFrom , OWLObjectIntersectionOf - -female = OWLClass(IRI(NS,'female')) -not_female = kb.generator.negation(female) -has_child_property = OWLObjectProperty(IRI(NS, "hasChild")) -thing = OWLClass(IRI('http://www.w3.org/2002/07/owl#', 'Thing')) -exist_has_child_T = OWLObjectSomeValuesFrom(property=has_child_property, filler=thing) - -concept_to_test = OWLObjectIntersectionOf([not_female, exist_has_child_T]) -``` - -`kb` has an instance of [ConceptGenerator](ontolearn.concept_generator.ConceptGenerator) -which we use in this case to create the negated concept `¬female`. The other classes -[OWLObjectProperty](https://dice-group.github.io/owlapy/autoapi/owlapy/owl_property/index.html#owlapy.owl_property.OWLObjectProperty), -[OWLObjectSomeValuesFrom](https://dice-group.github.io/owlapy/autoapi/owlapy/class_expression/index.html#owlapy.class_expression.OWLObjectSomeValuesFrom) -and [OWLObjectIntersectionOf](https://dice-group.github.io/owlapy/autoapi/owlapy/class_expression/nary_boolean_expression/index.html#owlapy.class_expression.nary_boolean_expression.OWLObjectIntersectionOf) are classes -that represent different kind of axioms in owlapy and can be found in -[owlapy.class_expression](https://dice-group.github.io/owlapy/autoapi/owlapy/class_expression/index.html) module. There are more kind of axioms there which you -can use to construct class expressions like we did in the example above. - -### Evaluation and results - -You can now evaluate the concept you just constructed as follows: - - -```python -from ontolearn.metrics import F1 +male_individuals = kb.individuals(male_concept) -evaluated_concept = kb.evaluate_concept(concept_to_test, F1(), encoded_lp) +[print(ind) for ind in male_individuals] # print male individuals ``` -In this example we use F1-score to evaluate the concept, but there are more [metrics](ontolearn.metrics) -which you can use including Accuracy, Precision and Recall. - -You can now: - -- Print the quality: - - ```python - print(evaluated_concept.q) # 1.0 - ``` - -- Print the set of individuals covered by the hypothesis: - - ```python - for ind in evaluated_concept.inds: - print(ind) - - # OWLNamedIndividual(http://example.com/father#markus) - # OWLNamedIndividual(http://example.com/father#martin) - # OWLNamedIndividual(http://example.com/father#stefan) - ``` -- Print the amount of them: - - ```python - print(evaluated_concept.ic) # 3 - ``` - -## Obtaining axioms - -You can retrieve Tbox and Abox axioms by using `tbox` and `abox` methods respectively. -Let us take them one at a time. The `tbox` method has 2 parameters, `entities` and `mode`. -`entities` specifies the owl entity from which we want to obtain the Tbox axioms. It can be -a single entity, a `Iterable` of entities, or `None`. - -The allowed types of entities are: -- OWLClass -- OWLObjectProperty -- OWLDataProperty - -Only the Tbox axioms related to the given entit-y/ies will be returned. If no entities are -passed, then it returns all the Tbox axioms. -The second parameter `mode` _(str)_ sets the return format type. It can have the -following values: -1) `'native'` -> triples are represented as tuples of owlapy objects. -2) `'iri'` -> triples are represented as tuples of IRIs as strings. -3) `'axiom'` -> triples are represented as owlapy axioms. - -For the `abox` method the idea is similar. Instead of the parameter `entities`, there is the parameter -`individuals` which accepts an object of type OWLNamedIndividuals or Iterable[OWLNamedIndividuals]. - -If you want to obtain all the axioms (Tbox + Abox) of the knowledge base, you can use the method `triples`. It -requires only the `mode` parameter. - -> **NOTE**: The results of these methods are limited only to named and direct entities. -> That means that especially the axioms that contain anonymous owl objects (objects that don't have an IRI) -> will not be part of the result set. For example, if there is a Tbox T={ C ⊑ (A ⊓ B), C ⊑ D }, -> only the latter subsumption axiom will be returned. - ## Sampling the Knowledge Base @@ -385,12 +237,133 @@ folder. You will find descriptive comments in that script that will help you und For more details about OntoSample you can see [this paper](https://dl.acm.org/doi/10.1145/3583780.3615158). +## TripleSore Knowledge Base + +Instead of going through nodes using expensive computation resources why not just make use of the +efficient approach of querying a triplestore using SPARQL queries. We have brought this +functionality to Ontolearn for our learning algorithms, and we take care of the conversion part behind the scene. +Let's see what it takes to make use of it. + +First of all you need a server which should host the triplestore for your ontology. If you don't +already have one, see [Loading and Launching a Triplestore](#loading-and-launching-a-triplestore) below. + +Now you can simply initialize an instance of `TripleStore` class that will serve as an input for your desired +concept learner: + +```python +from ontolearn.triple_store import TripleStore + +kb = TripleStore(url="http://your_domain/some_path/sparql") +``` + +Notice that the triplestore endpoint is enough to initialize an object of `TripleStore`. +Also keep in mind that this knowledge base can be initialized by using either one of +[TripleStoreOntology](ontolearn.triple_store.TripleStoreOntology) or [TripleStoreReasoner](ontolearn.triple_store.TripleStoreReasoner). Using the `TripleStore` KB means that +every querying process taking place during concept learning is now done using SPARQL queries. + +> **Important notice:** The performance of a concept learner may differentiate when using TripleStore instead of +> KnowledgeBase for the same ontology. This happens because some SPARQL queries may not yield the exact same results +> as the local querying methods. + + +## Loading and Launching a Triplestore + +We will provide a simple approach to load and launch a triplestore in a local server. For this, +we will be using _apache-jena_ and _apache-jena-fuseki_. As a prerequisite you need +JDK 11 or higher and if you are on Windows, you need [Cygwin](https://www.cygwin.com/). In case of +issues or any further reference please visit the official page of [Apache Jena](https://jena.apache.org/index.html) +and check the documentation under "Triple Store". + +Having that said, let us now load and launch a triplestore on the "Father" ontology: + +Open a terminal window and make sure you are in the root directory. Create a directory to +store the files for Fuseki server: + +```shell +mkdir Fuseki && cd Fuseki +``` +Install _apache-jena_ and _apache-jena-fuseki_. We will use version 4.7.0. + +```shell +# install Jena +wget https://archive.apache.org/dist/jena/binaries/apache-jena-4.7.0.tar.gz +#install Jena-Fuseki +wget https://archive.apache.org/dist/jena/binaries/apache-jena-fuseki-4.7.0.tar.gz +``` + +Unzip the files: + +```shell +tar -xzf apache-jena-fuseki-4.7.0.tar.gz +tar -xzf apache-jena-4.7.0.tar.gz +``` + +Make a directory for our 'father' database inside jena-fuseki: + +```shell +mkdir -p apache-jena-fuseki-4.7.0/databases/father/ +``` + +Now just load the 'father' ontology using the following commands: + +```shell +cd .. + +Fuseki/apache-jena-4.7.0/bin/tdb2.tdbloader --loader=parallel --loc Fuseki/apache-jena-fuseki-4.7.0/databases/father/ KGs/Family/father.owl +``` + +Launch the server, and it will be waiting eagerly for your queries. + +```shell +cd Fuseki/apache-jena-fuseki-4.7.0 + +java -Xmx4G -jar fuseki-server.jar --tdb2 --loc=databases/father /father +``` + +Notice that we launched the database found in `Fuseki/apache-jena-fuseki-4.7.0/databases/father` to the path `/father`. +By default, jena-fuseki runs on port 3030 so the full URL would be: `http://localhost:3030/father`. When +you pass this url to `triplestore_address` argument, you have to add the +`/sparql` sub-path indicating to the server that we are querying via SPARQL queries. Full path now should look like: +`http://localhost:3030/father/sparql`. + +You can now create a triplestore knowledge base, a reasoner or an ontology that uses this URL for their +operations. + +## Obtaining axioms + +You can retrieve Tbox and Abox axioms by using `tbox` and `abox` methods respectively. +Let us take them one at a time. The `tbox` method has 2 parameters, `entities` and `mode`. +`entities` specifies the owl entity from which we want to obtain the Tbox axioms. It can be +a single entity, a `Iterable` of entities, or `None`. + +The allowed types of entities are: +- OWLClass +- OWLObjectProperty +- OWLDataProperty + +Only the Tbox axioms related to the given entit-y/ies will be returned. If no entities are +passed, then it returns all the Tbox axioms. +The second parameter `mode` _(str)_ sets the return format type. It can have the +following values: +1) `'native'` -> triples are represented as tuples of owlapy objects. +2) `'iri'` -> triples are represented as tuples of IRIs as strings. +3) `'axiom'` -> triples are represented as owlapy axioms. + +For the `abox` method the idea is similar. Instead of the parameter `entities`, there is the parameter +`individuals` which accepts an object of type OWLNamedIndividuals or Iterable[OWLNamedIndividuals]. + +If you want to obtain all the axioms (Tbox + Abox) of the knowledge base, you can use the method `triples`. It +requires only the `mode` parameter. + +> **NOTE**: The results of these methods are limited only to named and direct entities. +> That means that especially the axioms that contain anonymous owl objects (objects that don't have an IRI) +> will not be part of the result set. For example, if there is a Tbox T={ C ⊑ (A ⊓ B), C ⊑ D }, +> only the latter subsumption axiom will be returned. + ----------------------------------------------------------------------------------------------------- -Since we cannot cover everything here in details, see [KnowledgeBase API documentation](ontolearn.knowledge_base.KnowledgeBase) -to check all the methods that this class has to offer. You will find convenient methods to -access the class/property hierarchy, methods that use the reasoner indirectly and -a lot more. +Since we cannot cover everything here in details, check the API docs for knowledge base related classes +to see all the methods that these classes have to offer. -In the next guide we will walk through how to use concept learners to learn class expressions in a -knowledge base for a certain learning problem. \ No newline at end of file +In the next guide we will walk through _how to define a learning problem_, _how to use concept learners_ and other fancy +stuff, like evaluating a class expression. \ No newline at end of file