diff --git a/fr/lang/fr/README.md b/fr/lang/fr/README.md new file mode 100644 index 0000000000..484089f3c1 --- /dev/null +++ b/fr/lang/fr/README.md @@ -0,0 +1,165 @@ +# graphkit-learn +[![Build Status](https://travis-ci.org/jajupmochi/graphkit-learn.svg?branch=master)](https://travis-ci.org/jajupmochi/graphkit-learn) [![Build status](https://ci.appveyor.com/api/projects/status/bdxsolk0t1uji9rd?svg=true)](https://ci.appveyor.com/project/jajupmochi/graphkit-learn) [![codecov](https://codecov.io/gh/jajupmochi/graphkit-learn/branch/master/graph/badge.svg)](https://codecov.io/gh/jajupmochi/graphkit-learn) [![Documentation Status](https://readthedocs.org/projects/graphkit-learn/badge/?version=master)](https://graphkit-learn.readthedocs.io/en/master/?badge=master) [![PyPI version](https://badge.fury.io/py/graphkit-learn.svg)](https://badge.fury.io/py/graphkit-learn) + +A Python package for graph kernels, graph edit distances and graph pre-image problem. + +## Requirements + +* python>=3.6 +* numpy>=1.16.2 +* scipy>=1.1.0 +* matplotlib>=3.1.0 +* networkx>=2.2 +* scikit-learn>=0.20.0 +* tabulate>=0.8.2 +* tqdm>=4.26.0 +* control>=0.8.2 (for generalized random walk kernels only) +* slycot==0.3.3 (for generalized random walk kernels only, which requires a fortran compiler, gfortran for example) + +## How to use? + +### Install the library + +* Install stable version from PyPI (may not be up-to-date): +``` +$ pip install graphkit-learn +``` + +* Install latest version from GitHub: +``` +$ git clone https://github.com/jajupmochi/graphkit-learn.git +$ cd graphkit-learn/ +$ python setup.py install +``` + +### Run the test + +A series of [tests](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/tests) can be run to check if the library works correctly: +``` +$ pip install -U pip pytest codecov coverage pytest-cov +$ pytest -v --cov-config=.coveragerc --cov-report term --cov=gklearn gklearn/tests/ +``` + +### Check examples + +A series of demos of using the library can be found on [Google Colab](https://drive.google.com/drive/folders/1r2gtPuFzIys2_MZw1wXqE2w3oCoVoQUG?usp=sharing) and in the [`example`](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/examples) folder. + +### Other demos + +Check [`notebooks`](https://github.com/jajupmochi/graphkit-learn/tree/master/notebooks) directory for more demos: +* [`notebooks`](https://github.com/jajupmochi/graphkit-learn/tree/master/notebooks) directory includes test codes of graph kernels based on linear patterns; +* [`notebooks/tests`](https://github.com/jajupmochi/graphkit-learn/tree/master/notebooks/tests) directory includes codes that test some libraries and functions; +* [`notebooks/utils`](https://github.com/jajupmochi/graphkit-learn/tree/master/notebooks/utils) directory includes some useful tools, such as a Gram matrix checker and a function to get properties of datasets; +* [`notebooks/else`](https://github.com/jajupmochi/graphkit-learn/tree/master/notebooks/else) directory includes other codes that we used for experiments. + +### Documentation + +The docs of the library can be found [here](https://graphkit-learn.readthedocs.io/en/master/?badge=master). + +## Main contents + +### 1 List of graph kernels + +* Based on walks + * [The common walk kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/common_walk.py) [1] + * Exponential + * Geometric + * [The marginalized kenrel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/marginalized.py) + * With tottering [2] + * Without tottering [7] + * [The generalized random walk kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/random_walk.py) [3] + * [Sylvester equation](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/sylvester_equation.py) + * Conjugate gradient + * Fixed-point iterations + * [Spectral decomposition](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/spectral_decomposition.py) +* Based on paths + * [The shortest path kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/shortest_path.py) [4] + * [The structural shortest path kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/structural_sp.py) [5] + * [The path kernel up to length h](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/path_up_to_h.py) [6] + * The Tanimoto kernel + * The MinMax kernel +* Non-linear kernels + * [The treelet kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/treelet.py) [10] + * [Weisfeiler-Lehman kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/weisfeiler_lehman.py) [11] + * [Subtree](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/weisfeiler_lehman.py#L479) + +A demo of computing graph kernels can be found on [Google Colab](https://colab.research.google.com/drive/17Q2QCl9CAtDweGF8LiWnWoN2laeJqT0u?usp=sharing) and in the [`examples`](https://github.com/jajupmochi/graphkit-learn/blob/master/gklearn/examples/compute_graph_kernel.py) folder. + +### 2 Graph Edit Distances + +### 3 Graph preimage methods + +A demo of generating graph preimages can be found on [Google Colab](https://colab.research.google.com/drive/1PIDvHOcmiLEQ5Np3bgBDdu0kLOquOMQK?usp=sharing) and in the [`examples`](https://github.com/jajupmochi/graphkit-learn/blob/master/gklearn/examples/median_preimege_generator.py) folder. + +### 4 Interface to `GEDLIB` + +[`GEDLIB`](https://github.com/dbblumenthal/gedlib) is an easily extensible C++ library for (suboptimally) computing the graph edit distance between attributed graphs. [A Python interface](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/gedlib) for `GEDLIB` is integrated in this library, based on [`gedlibpy`](https://github.com/Ryurin/gedlibpy) library. + +### 5 Computation optimization methods + +* Python’s `multiprocessing.Pool` module is applied to perform **parallelization** on the computations of all kernels as well as the model selection. +* **The Fast Computation of Shortest Path Kernel (FCSP) method** [8] is implemented in *the random walk kernel*, *the shortest path kernel*, as well as *the structural shortest path kernel* where FCSP is applied on both vertex and edge kernels. +* **The trie data structure** [9] is employed in *the path kernel up to length h* to store paths in graphs. + +## Issues + +* This library uses `multiprocessing.Pool.imap_unordered` function to do the parallelization, which may not be able to run correctly under Windows system. For now, Windows users may need to comment the parallel codes and uncomment the codes below them which run serially. We will consider adding a parameter to control serial or parallel computations as needed. + +* Some modules (such as `Numpy`, `Scipy`, `sklearn`) apply [`OpenBLAS`](https://www.openblas.net/) to perform parallel computation by default, which causes conflicts with other parallelization modules such as `multiprossing.Pool`, highly increasing the computing time. By setting its thread to 1, `OpenBLAS` is forced to use a single thread/CPU, thus avoids the conflicts. For now, this procedure has to be done manually. Under Linux, type this command in terminal before running the code: +``` +$ export OPENBLAS_NUM_THREADS=1 +``` +Or add `export OPENBLAS_NUM_THREADS=1` at the end of your `~/.bashrc` file, then run +``` +$ source ~/.bashrc +``` +to make this effective permanently. + +## Results + +Check this paper for detailed description of graph kernels and experimental results: + +Linlin Jia, Benoit Gaüzère, and Paul Honeine. Graph Kernels Based on Linear Patterns: Theoretical and Experimental Comparisons. working paper or preprint, March 2019. URL https://hal-normandie-univ.archives-ouvertes.fr/hal-02053946. + +A comparison of performances of graph kernels on benchmark datasets can be found [here](https://graphkit-learn.readthedocs.io/en/master/experiments.html). + +## How to contribute + +Fork the library and open a pull request! Make your own contribute to the community! + +## Authors + +* [Linlin Jia](https://jajupmochi.github.io/), LITIS, INSA Rouen Normandie +* [Benoit Gaüzère](http://pagesperso.litislab.fr/~bgauzere/#contact_en), LITIS, INSA Rouen Normandie +* [Paul Honeine](http://honeine.fr/paul/Welcome.html), LITIS, Université de Rouen Normandie + +## Citation + +Still waiting... + +## Acknowledgments + +This research was supported by CSC (China Scholarship Council) and the French national research agency (ANR) under the grant APi (ANR-18-CE23-0014). The authors would like to thank the CRIANN (Le Centre Régional Informatique et d’Applications Numériques de Normandie) for providing computational resources. + +## References +[1] Thomas Gärtner, Peter Flach, and Stefan Wrobel. On graph kernels: Hardness results and efficient alternatives. Learning Theory and Kernel Machines, pages 129–143, 2003. + +[2] H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled graphs. In Proceedings of the 20th International Conference on Machine Learning, Washington, DC, United States, 2003. + +[3] Vishwanathan, S.V.N., Schraudolph, N.N., Kondor, R., Borgwardt, K.M., 2010. Graph kernels. Journal of Machine Learning Research 11, 1201–1242. + +[4] K. M. Borgwardt and H.-P. Kriegel. Shortest-path kernels on graphs. In Proceedings of the International Conference on Data Mining, pages 74-81, 2005. + +[5] Liva Ralaivola, Sanjay J Swamidass, Hiroto Saigo, and Pierre Baldi. Graph kernels for chemical informatics. Neural networks, 18(8):1093–1110, 2005. + +[6] Suard F, Rakotomamonjy A, Bensrhair A. Kernel on Bag of Paths For Measuring Similarity of Shapes. InESANN 2007 Apr 25 (pp. 355-360). + +[7] Mahé, P., Ueda, N., Akutsu, T., Perret, J.L., Vert, J.P., 2004. Extensions of marginalized graph kernels, in: Proc. the twenty-first international conference on Machine learning, ACM. p. 70. + +[8] Lifan Xu, Wei Wang, M Alvarez, John Cavazos, and Dongping Zhang. Parallelization of shortest path graph kernels on multi-core cpus and gpus. Proceedings of the Programmability Issues for Heterogeneous Multicores (MultiProg), Vienna, Austria, 2014. + +[9] Edward Fredkin. Trie memory. Communications of the ACM, 3(9):490–499, 1960. + +[10] Gaüzere, B., Brun, L., Villemin, D., 2012. Two new graphs kernels in chemoinformatics. Pattern Recognition Letters 33, 2038–2047. + +[11] Shervashidze, N., Schweitzer, P., Leeuwen, E.J.v., Mehlhorn, K., Borgwardt, K.M., 2011. Weisfeiler-lehman graph kernels. Journal of Machine Learning Research 12, 2539–2561. diff --git a/lang/fr/.appveyor.yml b/lang/fr/.appveyor.yml new file mode 100644 index 0000000000..d63af3a00f --- /dev/null +++ b/lang/fr/.appveyor.yml @@ -0,0 +1,29 @@ +--- +environment: + matrix: + - + PYTHON: "C:\\Python36" + - + PYTHON: "C:\\Python36-x64" + - + PYTHON: "C:\\Python37" + - + PYTHON: "C:\\Python37-x64" + - + PYTHON: "C:\\Python38" + - + PYTHON: "C:\\Python38-x64" +#skip_commits: +#files: +#- "*.yml" +#- "*.rst" +#- "LICENSE" +install: + - "%PYTHON%\\python.exe -m pip install -U pip" + - "%PYTHON%\\python.exe -m pip install wheel" + - "%PYTHON%\\python.exe -m pip install -r requirements.txt" + - "%PYTHON%\\python.exe -m pip install -U pytest" +build: false +test_script: + - "%PYTHON%\\python.exe setup.py bdist_wheel" + - "%PYTHON%\\python.exe -m pytest -v gklearn/tests/ --ignore=gklearn/tests/test_median_preimage_generator.py" diff --git a/lang/fr/.coveragerc b/lang/fr/.coveragerc new file mode 100644 index 0000000000..1acf8611f6 --- /dev/null +++ b/lang/fr/.coveragerc @@ -0,0 +1,4 @@ +[run] +omit = + gklearn/tests/* + gklearn/examples/* diff --git a/lang/fr/.gitignore b/lang/fr/.gitignore new file mode 100644 index 0000000000..8954c13d9e --- /dev/null +++ b/lang/fr/.gitignore @@ -0,0 +1,81 @@ +# Jupyter Notebook +.ipynb_checkpoints +datasets/* +!datasets/ds.py +!datasets/Alkane/ +!datasets/acyclic/ +!datasets/Acyclic/ +!datasets/MAO/ +!datasets/PAH/ +!datasets/MUTAG/ +!datasets/Letter-med/ +!datasets/ENZYMES_txt/ +!datasets/DD/ +!datasets/NCI1/ +!datasets/NCI109/ +!datasets/AIDS/ +!datasets/monoterpenoides/ +!datasets/Monoterpenoides/ +!datasets/Fingerprint/*.txt +!datasets/Cuneiform/*.txt +notebooks/results/* +notebooks/check_gm/* +notebooks/test_parallel/* +requirements/* +gklearn/model.py +gklearn/kernels/*_sym.py +*.npy +*.eps +*.dat +*.pyc + +gklearn/preimage/* +!gklearn/preimage/*.py +!gklearn/preimage/experiments/*.py +!gklearn/preimage/experiments/tools/*.py + +__pycache__ +##*# + +docs/build/* +!docs/build/latex/*.pdf +docs/log* + +*.egg-info +dist/ +build/ + +.coverage +htmlcov + +virtualenv + +.vscode/ + +# gedlibpy +gklearn/gedlib/build/ +gklearn/gedlib/build/__pycache__/ +gklearn/gedlib/collections/ +gklearn/gedlib/Median_Example/ +gklearn/gedlib/build/include/gedlib-master/median/collections/ +gklearn/gedlib/include/ +gklearn/gedlib/libgxlgedlib.so + +# misc +notebooks/preimage/ +notebooks/unfinished +gklearn/kernels/else/ +gklearn/kernels/unfinished/ +gklearn/kernels/.tags + +# pyenv +.python-version + +# docker travis debug. +ci.sh + +# outputs. +outputs/ + +# pyCharm. +.idea/ diff --git a/lang/fr/.readthedocs.yml b/lang/fr/.readthedocs.yml new file mode 100644 index 0000000000..32329e3116 --- /dev/null +++ b/lang/fr/.readthedocs.yml @@ -0,0 +1,27 @@ +--- +#.readthedocs.yml +#Read the Docs configuration file +#See https://docs.readthedocs.io/en/stable/config-file/v2.html for details +#Required +version: 2 +#Build documentation in the docs/ directory with Sphinx +sphinx: + configuration: docs/source/conf.py +#Build documentation with MkDocs +#mkdocs: +#configuration: mkdocs.yml +#Optionally build your docs in additional formats such as PDF and ePub +formats: all +#Optionally set the version of Python and requirements required to build your docs +python: + version: 3.6 + install: + - + requirements: docs/requirements.txt + - + requirements: requirements.txt + - + method: pip + path: . + extra_requirements: + - docs diff --git a/lang/fr/.travis.yml b/lang/fr/.travis.yml new file mode 100644 index 0000000000..d7786c7a6a --- /dev/null +++ b/lang/fr/.travis.yml @@ -0,0 +1,22 @@ +--- +language: python +python: + - '3.6' + - '3.7' + - '3.8' +before_install: + - python --version + - pip install -U pip + - pip install -U pytest + - pip install codecov + - pip install coverage + - pip install pytest-cov + - sudo apt-get -y install gfortran +install: + - pip install -r requirements.txt + - pip install wheel +script: + - python setup.py bdist_wheel + - if [ $TRAVIS_PYTHON_VERSION == 3.6 ]; then pytest -v --cov-config=.coveragerc --cov-report term --cov=gklearn gklearn/tests/; else pytest -v --cov-config=.coveragerc --cov-report term --cov=gklearn gklearn/tests/ --ignore=gklearn/tests/test_median_preimage_generator.py; fi +after_success: + - codecov diff --git a/lang/fr/LICENSE b/lang/fr/LICENSE new file mode 100644 index 0000000000..94a9ed024d --- /dev/null +++ b/lang/fr/LICENSE @@ -0,0 +1,674 @@ + GNU GENERAL PUBLIC LICENSE + Version 3, 29 June 2007 + + Copyright (C) 2007 Free Software Foundation, Inc. + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The GNU General Public License is a free, copyleft license for +software and other kinds of works. + + The licenses for most software and other practical works are designed +to take away your freedom to share and change the works. By contrast, +the GNU General Public License is intended to guarantee your freedom to +share and change all versions of a program--to make sure it remains free +software for all its users. We, the Free Software Foundation, use the +GNU General Public License for most of our software; it applies also to +any other work released this way by its authors. You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +them if you wish), that you receive source code or can get it if you +want it, that you can change the software or use pieces of it in new +free programs, and that you know you can do these things. + + To protect your rights, we need to prevent others from denying you +these rights or asking you to surrender the rights. Therefore, you have +certain responsibilities if you distribute copies of the software, or if +you modify it: responsibilities to respect the freedom of others. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must pass on to the recipients the same +freedoms that you received. You must make sure that they, too, receive +or can get the source code. And you must show them these terms so they +know their rights. + + Developers that use the GNU GPL protect your rights with two steps: +(1) assert copyright on the software, and (2) offer you this License +giving you legal permission to copy, distribute and/or modify it. + + For the developers' and authors' protection, the GPL clearly explains +that there is no warranty for this free software. For both users' and +authors' sake, the GPL requires that modified versions be marked as +changed, so that their problems will not be attributed erroneously to +authors of previous versions. + + Some devices are designed to deny users access to install or run +modified versions of the software inside them, although the manufacturer +can do so. This is fundamentally incompatible with the aim of +protecting users' freedom to change the software. The systematic +pattern of such abuse occurs in the area of products for individuals to +use, which is precisely where it is most unacceptable. Therefore, we +have designed this version of the GPL to prohibit the practice for those +products. If such problems arise substantially in other domains, we +stand ready to extend this provision to those domains in future versions +of the GPL, as needed to protect the freedom of users. + + Finally, every program is threatened constantly by software patents. +States should not allow patents to restrict development and use of +software on general-purpose computers, but in those that do, we wish to +avoid the special danger that patents applied to a free program could +make it effectively proprietary. To prevent this, the GPL assures that +patents cannot be used to render the program non-free. + + The precise terms and conditions for copying, distribution and +modification follow. + + TERMS AND CONDITIONS + + 0. Definitions. + + "This License" refers to version 3 of the GNU General Public License. + + "Copyright" also means copyright-like laws that apply to other kinds of +works, such as semiconductor masks. + + "The Program" refers to any copyrightable work licensed under this +License. Each licensee is addressed as "you". "Licensees" and +"recipients" may be individuals or organizations. + + To "modify" a work means to copy from or adapt all or part of the work +in a fashion requiring copyright permission, other than the making of an +exact copy. The resulting work is called a "modified version" of the +earlier work or a work "based on" the earlier work. + + A "covered work" means either the unmodified Program or a work based +on the Program. + + To "propagate" a work means to do anything with it that, without +permission, would make you directly or secondarily liable for +infringement under applicable copyright law, except executing it on a +computer or modifying a private copy. Propagation includes copying, +distribution (with or without modification), making available to the +public, and in some countries other activities as well. + + To "convey" a work means any kind of propagation that enables other +parties to make or receive copies. Mere interaction with a user through +a computer network, with no transfer of a copy, is not conveying. + + An interactive user interface displays "Appropriate Legal Notices" +to the extent that it includes a convenient and prominently visible +feature that (1) displays an appropriate copyright notice, and (2) +tells the user that there is no warranty for the work (except to the +extent that warranties are provided), that licensees may convey the +work under this License, and how to view a copy of this License. If +the interface presents a list of user commands or options, such as a +menu, a prominent item in the list meets this criterion. + + 1. Source Code. + + The "source code" for a work means the preferred form of the work +for making modifications to it. "Object code" means any non-source +form of a work. + + A "Standard Interface" means an interface that either is an official +standard defined by a recognized standards body, or, in the case of +interfaces specified for a particular programming language, one that +is widely used among developers working in that language. + + The "System Libraries" of an executable work include anything, other +than the work as a whole, that (a) is included in the normal form of +packaging a Major Component, but which is not part of that Major +Component, and (b) serves only to enable use of the work with that +Major Component, or to implement a Standard Interface for which an +implementation is available to the public in source code form. A +"Major Component", in this context, means a major essential component +(kernel, window system, and so on) of the specific operating system +(if any) on which the executable work runs, or a compiler used to +produce the work, or an object code interpreter used to run it. + + The "Corresponding Source" for a work in object code form means all +the source code needed to generate, install, and (for an executable +work) run the object code and to modify the work, including scripts to +control those activities. However, it does not include the work's +System Libraries, or general-purpose tools or generally available free +programs which are used unmodified in performing those activities but +which are not part of the work. For example, Corresponding Source +includes interface definition files associated with source files for +the work, and the source code for shared libraries and dynamically +linked subprograms that the work is specifically designed to require, +such as by intimate data communication or control flow between those +subprograms and other parts of the work. + + The Corresponding Source need not include anything that users +can regenerate automatically from other parts of the Corresponding +Source. + + The Corresponding Source for a work in source code form is that +same work. + + 2. Basic Permissions. + + All rights granted under this License are granted for the term of +copyright on the Program, and are irrevocable provided the stated +conditions are met. This License explicitly affirms your unlimited +permission to run the unmodified Program. The output from running a +covered work is covered by this License only if the output, given its +content, constitutes a covered work. This License acknowledges your +rights of fair use or other equivalent, as provided by copyright law. + + You may make, run and propagate covered works that you do not +convey, without conditions so long as your license otherwise remains +in force. You may convey covered works to others for the sole purpose +of having them make modifications exclusively for you, or provide you +with facilities for running those works, provided that you comply with +the terms of this License in conveying all material for which you do +not control copyright. Those thus making or running the covered works +for you must do so exclusively on your behalf, under your direction +and control, on terms that prohibit them from making any copies of +your copyrighted material outside their relationship with you. + + Conveying under any other circumstances is permitted solely under +the conditions stated below. Sublicensing is not allowed; section 10 +makes it unnecessary. + + 3. Protecting Users' Legal Rights From Anti-Circumvention Law. + + No covered work shall be deemed part of an effective technological +measure under any applicable law fulfilling obligations under article +11 of the WIPO copyright treaty adopted on 20 December 1996, or +similar laws prohibiting or restricting circumvention of such +measures. + + When you convey a covered work, you waive any legal power to forbid +circumvention of technological measures to the extent such circumvention +is effected by exercising rights under this License with respect to +the covered work, and you disclaim any intention to limit operation or +modification of the work as a means of enforcing, against the work's +users, your or third parties' legal rights to forbid circumvention of +technological measures. + + 4. Conveying Verbatim Copies. + + You may convey verbatim copies of the Program's source code as you +receive it, in any medium, provided that you conspicuously and +appropriately publish on each copy an appropriate copyright notice; +keep intact all notices stating that this License and any +non-permissive terms added in accord with section 7 apply to the code; +keep intact all notices of the absence of any warranty; and give all +recipients a copy of this License along with the Program. + + You may charge any price or no price for each copy that you convey, +and you may offer support or warranty protection for a fee. + + 5. Conveying Modified Source Versions. + + You may convey a work based on the Program, or the modifications to +produce it from the Program, in the form of source code under the +terms of section 4, provided that you also meet all of these conditions: + + a) The work must carry prominent notices stating that you modified + it, and giving a relevant date. + + b) The work must carry prominent notices stating that it is + released under this License and any conditions added under section + 7. This requirement modifies the requirement in section 4 to + "keep intact all notices". + + c) You must license the entire work, as a whole, under this + License to anyone who comes into possession of a copy. This + License will therefore apply, along with any applicable section 7 + additional terms, to the whole of the work, and all its parts, + regardless of how they are packaged. This License gives no + permission to license the work in any other way, but it does not + invalidate such permission if you have separately received it. + + d) If the work has interactive user interfaces, each must display + Appropriate Legal Notices; however, if the Program has interactive + interfaces that do not display Appropriate Legal Notices, your + work need not make them do so. + + A compilation of a covered work with other separate and independent +works, which are not by their nature extensions of the covered work, +and which are not combined with it such as to form a larger program, +in or on a volume of a storage or distribution medium, is called an +"aggregate" if the compilation and its resulting copyright are not +used to limit the access or legal rights of the compilation's users +beyond what the individual works permit. Inclusion of a covered work +in an aggregate does not cause this License to apply to the other +parts of the aggregate. + + 6. Conveying Non-Source Forms. + + You may convey a covered work in object code form under the terms +of sections 4 and 5, provided that you also convey the +machine-readable Corresponding Source under the terms of this License, +in one of these ways: + + a) Convey the object code in, or embodied in, a physical product + (including a physical distribution medium), accompanied by the + Corresponding Source fixed on a durable physical medium + customarily used for software interchange. + + b) Convey the object code in, or embodied in, a physical product + (including a physical distribution medium), accompanied by a + written offer, valid for at least three years and valid for as + long as you offer spare parts or customer support for that product + model, to give anyone who possesses the object code either (1) a + copy of the Corresponding Source for all the software in the + product that is covered by this License, on a durable physical + medium customarily used for software interchange, for a price no + more than your reasonable cost of physically performing this + conveying of source, or (2) access to copy the + Corresponding Source from a network server at no charge. + + c) Convey individual copies of the object code with a copy of the + written offer to provide the Corresponding Source. This + alternative is allowed only occasionally and noncommercially, and + only if you received the object code with such an offer, in accord + with subsection 6b. + + d) Convey the object code by offering access from a designated + place (gratis or for a charge), and offer equivalent access to the + Corresponding Source in the same way through the same place at no + further charge. You need not require recipients to copy the + Corresponding Source along with the object code. If the place to + copy the object code is a network server, the Corresponding Source + may be on a different server (operated by you or a third party) + that supports equivalent copying facilities, provided you maintain + clear directions next to the object code saying where to find the + Corresponding Source. Regardless of what server hosts the + Corresponding Source, you remain obligated to ensure that it is + available for as long as needed to satisfy these requirements. + + e) Convey the object code using peer-to-peer transmission, provided + you inform other peers where the object code and Corresponding + Source of the work are being offered to the general public at no + charge under subsection 6d. + + A separable portion of the object code, whose source code is excluded +from the Corresponding Source as a System Library, need not be +included in conveying the object code work. + + A "User Product" is either (1) a "consumer product", which means any +tangible personal property which is normally used for personal, family, +or household purposes, or (2) anything designed or sold for incorporation +into a dwelling. In determining whether a product is a consumer product, +doubtful cases shall be resolved in favor of coverage. For a particular +product received by a particular user, "normally used" refers to a +typical or common use of that class of product, regardless of the status +of the particular user or of the way in which the particular user +actually uses, or expects or is expected to use, the product. A product +is a consumer product regardless of whether the product has substantial +commercial, industrial or non-consumer uses, unless such uses represent +the only significant mode of use of the product. + + "Installation Information" for a User Product means any methods, +procedures, authorization keys, or other information required to install +and execute modified versions of a covered work in that User Product from +a modified version of its Corresponding Source. The information must +suffice to ensure that the continued functioning of the modified object +code is in no case prevented or interfered with solely because +modification has been made. + + If you convey an object code work under this section in, or with, or +specifically for use in, a User Product, and the conveying occurs as +part of a transaction in which the right of possession and use of the +User Product is transferred to the recipient in perpetuity or for a +fixed term (regardless of how the transaction is characterized), the +Corresponding Source conveyed under this section must be accompanied +by the Installation Information. But this requirement does not apply +if neither you nor any third party retains the ability to install +modified object code on the User Product (for example, the work has +been installed in ROM). + + The requirement to provide Installation Information does not include a +requirement to continue to provide support service, warranty, or updates +for a work that has been modified or installed by the recipient, or for +the User Product in which it has been modified or installed. Access to a +network may be denied when the modification itself materially and +adversely affects the operation of the network or violates the rules and +protocols for communication across the network. + + Corresponding Source conveyed, and Installation Information provided, +in accord with this section must be in a format that is publicly +documented (and with an implementation available to the public in +source code form), and must require no special password or key for +unpacking, reading or copying. + + 7. Additional Terms. + + "Additional permissions" are terms that supplement the terms of this +License by making exceptions from one or more of its conditions. +Additional permissions that are applicable to the entire Program shall +be treated as though they were included in this License, to the extent +that they are valid under applicable law. If additional permissions +apply only to part of the Program, that part may be used separately +under those permissions, but the entire Program remains governed by +this License without regard to the additional permissions. + + When you convey a copy of a covered work, you may at your option +remove any additional permissions from that copy, or from any part of +it. (Additional permissions may be written to require their own +removal in certain cases when you modify the work.) You may place +additional permissions on material, added by you to a covered work, +for which you have or can give appropriate copyright permission. + + Notwithstanding any other provision of this License, for material you +add to a covered work, you may (if authorized by the copyright holders of +that material) supplement the terms of this License with terms: + + a) Disclaiming warranty or limiting liability differently from the + terms of sections 15 and 16 of this License; or + + b) Requiring preservation of specified reasonable legal notices or + author attributions in that material or in the Appropriate Legal + Notices displayed by works containing it; or + + c) Prohibiting misrepresentation of the origin of that material, or + requiring that modified versions of such material be marked in + reasonable ways as different from the original version; or + + d) Limiting the use for publicity purposes of names of licensors or + authors of the material; or + + e) Declining to grant rights under trademark law for use of some + trade names, trademarks, or service marks; or + + f) Requiring indemnification of licensors and authors of that + material by anyone who conveys the material (or modified versions of + it) with contractual assumptions of liability to the recipient, for + any liability that these contractual assumptions directly impose on + those licensors and authors. + + All other non-permissive additional terms are considered "further +restrictions" within the meaning of section 10. If the Program as you +received it, or any part of it, contains a notice stating that it is +governed by this License along with a term that is a further +restriction, you may remove that term. If a license document contains +a further restriction but permits relicensing or conveying under this +License, you may add to a covered work material governed by the terms +of that license document, provided that the further restriction does +not survive such relicensing or conveying. + + If you add terms to a covered work in accord with this section, you +must place, in the relevant source files, a statement of the +additional terms that apply to those files, or a notice indicating +where to find the applicable terms. + + Additional terms, permissive or non-permissive, may be stated in the +form of a separately written license, or stated as exceptions; +the above requirements apply either way. + + 8. Termination. + + You may not propagate or modify a covered work except as expressly +provided under this License. Any attempt otherwise to propagate or +modify it is void, and will automatically terminate your rights under +this License (including any patent licenses granted under the third +paragraph of section 11). + + However, if you cease all violation of this License, then your +license from a particular copyright holder is reinstated (a) +provisionally, unless and until the copyright holder explicitly and +finally terminates your license, and (b) permanently, if the copyright +holder fails to notify you of the violation by some reasonable means +prior to 60 days after the cessation. + + Moreover, your license from a particular copyright holder is +reinstated permanently if the copyright holder notifies you of the +violation by some reasonable means, this is the first time you have +received notice of violation of this License (for any work) from that +copyright holder, and you cure the violation prior to 30 days after +your receipt of the notice. + + Termination of your rights under this section does not terminate the +licenses of parties who have received copies or rights from you under +this License. If your rights have been terminated and not permanently +reinstated, you do not qualify to receive new licenses for the same +material under section 10. + + 9. Acceptance Not Required for Having Copies. + + You are not required to accept this License in order to receive or +run a copy of the Program. Ancillary propagation of a covered work +occurring solely as a consequence of using peer-to-peer transmission +to receive a copy likewise does not require acceptance. However, +nothing other than this License grants you permission to propagate or +modify any covered work. These actions infringe copyright if you do +not accept this License. Therefore, by modifying or propagating a +covered work, you indicate your acceptance of this License to do so. + + 10. Automatic Licensing of Downstream Recipients. + + Each time you convey a covered work, the recipient automatically +receives a license from the original licensors, to run, modify and +propagate that work, subject to this License. You are not responsible +for enforcing compliance by third parties with this License. + + An "entity transaction" is a transaction transferring control of an +organization, or substantially all assets of one, or subdividing an +organization, or merging organizations. If propagation of a covered +work results from an entity transaction, each party to that +transaction who receives a copy of the work also receives whatever +licenses to the work the party's predecessor in interest had or could +give under the previous paragraph, plus a right to possession of the +Corresponding Source of the work from the predecessor in interest, if +the predecessor has it or can get it with reasonable efforts. + + You may not impose any further restrictions on the exercise of the +rights granted or affirmed under this License. For example, you may +not impose a license fee, royalty, or other charge for exercise of +rights granted under this License, and you may not initiate litigation +(including a cross-claim or counterclaim in a lawsuit) alleging that +any patent claim is infringed by making, using, selling, offering for +sale, or importing the Program or any portion of it. + + 11. Patents. + + A "contributor" is a copyright holder who authorizes use under this +License of the Program or a work on which the Program is based. The +work thus licensed is called the contributor's "contributor version". + + A contributor's "essential patent claims" are all patent claims +owned or controlled by the contributor, whether already acquired or +hereafter acquired, that would be infringed by some manner, permitted +by this License, of making, using, or selling its contributor version, +but do not include claims that would be infringed only as a +consequence of further modification of the contributor version. For +purposes of this definition, "control" includes the right to grant +patent sublicenses in a manner consistent with the requirements of +this License. + + Each contributor grants you a non-exclusive, worldwide, royalty-free +patent license under the contributor's essential patent claims, to +make, use, sell, offer for sale, import and otherwise run, modify and +propagate the contents of its contributor version. + + In the following three paragraphs, a "patent license" is any express +agreement or commitment, however denominated, not to enforce a patent +(such as an express permission to practice a patent or covenant not to +sue for patent infringement). To "grant" such a patent license to a +party means to make such an agreement or commitment not to enforce a +patent against the party. + + If you convey a covered work, knowingly relying on a patent license, +and the Corresponding Source of the work is not available for anyone +to copy, free of charge and under the terms of this License, through a +publicly available network server or other readily accessible means, +then you must either (1) cause the Corresponding Source to be so +available, or (2) arrange to deprive yourself of the benefit of the +patent license for this particular work, or (3) arrange, in a manner +consistent with the requirements of this License, to extend the patent +license to downstream recipients. "Knowingly relying" means you have +actual knowledge that, but for the patent license, your conveying the +covered work in a country, or your recipient's use of the covered work +in a country, would infringe one or more identifiable patents in that +country that you have reason to believe are valid. + + If, pursuant to or in connection with a single transaction or +arrangement, you convey, or propagate by procuring conveyance of, a +covered work, and grant a patent license to some of the parties +receiving the covered work authorizing them to use, propagate, modify +or convey a specific copy of the covered work, then the patent license +you grant is automatically extended to all recipients of the covered +work and works based on it. + + A patent license is "discriminatory" if it does not include within +the scope of its coverage, prohibits the exercise of, or is +conditioned on the non-exercise of one or more of the rights that are +specifically granted under this License. You may not convey a covered +work if you are a party to an arrangement with a third party that is +in the business of distributing software, under which you make payment +to the third party based on the extent of your activity of conveying +the work, and under which the third party grants, to any of the +parties who would receive the covered work from you, a discriminatory +patent license (a) in connection with copies of the covered work +conveyed by you (or copies made from those copies), or (b) primarily +for and in connection with specific products or compilations that +contain the covered work, unless you entered into that arrangement, +or that patent license was granted, prior to 28 March 2007. + + Nothing in this License shall be construed as excluding or limiting +any implied license or other defenses to infringement that may +otherwise be available to you under applicable patent law. + + 12. No Surrender of Others' Freedom. + + If conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot convey a +covered work so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you may +not convey it at all. For example, if you agree to terms that obligate you +to collect a royalty for further conveying from those to whom you convey +the Program, the only way you could satisfy both those terms and this +License would be to refrain entirely from conveying the Program. + + 13. Use with the GNU Affero General Public License. + + Notwithstanding any other provision of this License, you have +permission to link or combine any covered work with a work licensed +under version 3 of the GNU Affero General Public License into a single +combined work, and to convey the resulting work. The terms of this +License will continue to apply to the part which is the covered work, +but the special requirements of the GNU Affero General Public License, +section 13, concerning interaction through a network will apply to the +combination as such. + + 14. Revised Versions of this License. + + The Free Software Foundation may publish revised and/or new versions of +the GNU General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + + Each version is given a distinguishing version number. If the +Program specifies that a certain numbered version of the GNU General +Public License "or any later version" applies to it, you have the +option of following the terms and conditions either of that numbered +version or of any later version published by the Free Software +Foundation. If the Program does not specify a version number of the +GNU General Public License, you may choose any version ever published +by the Free Software Foundation. + + If the Program specifies that a proxy can decide which future +versions of the GNU General Public License can be used, that proxy's +public statement of acceptance of a version permanently authorizes you +to choose that version for the Program. + + Later license versions may give you additional or different +permissions. However, no additional obligations are imposed on any +author or copyright holder as a result of your choosing to follow a +later version. + + 15. Disclaimer of Warranty. + + THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY +APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT +HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY +OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, +THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM +IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF +ALL NECESSARY SERVICING, REPAIR OR CORRECTION. + + 16. Limitation of Liability. + + IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS +THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY +GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE +USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF +DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD +PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), +EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF +SUCH DAMAGES. + + 17. Interpretation of Sections 15 and 16. + + If the disclaimer of warranty and limitation of liability provided +above cannot be given local legal effect according to their terms, +reviewing courts shall apply local law that most closely approximates +an absolute waiver of all civil liability in connection with the +Program, unless a warranty or assumption of liability accompanies a +copy of the Program in return for a fee. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +state the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + + Copyright (C) + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . + +Also add information on how to contact you by electronic and paper mail. + + If the program does terminal interaction, make it output a short +notice like this when it starts in an interactive mode: + + Copyright (C) + This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, your program's commands +might be different; for a GUI interface, you would use an "about box". + + You should also get your employer (if you work as a programmer) or school, +if any, to sign a "copyright disclaimer" for the program, if necessary. +For more information on this, and how to apply and follow the GNU GPL, see +. + + The GNU General Public License does not permit incorporating your program +into proprietary programs. If your program is a subroutine library, you +may consider it more useful to permit linking proprietary applications with +the library. If this is what you want to do, use the GNU Lesser General +Public License instead of this License. But first, please read +. diff --git a/lang/fr/Problems.md b/lang/fr/Problems.md new file mode 100644 index 0000000000..cb7dd1e1b1 --- /dev/null +++ b/lang/fr/Problems.md @@ -0,0 +1,23 @@ +# About graph kenrels. + +## (Random walk) Sylvester equation kernel. + +### ImportError: cannot import name 'frange' from 'matplotlib.mlab' + +You are using an outdated `control` with a recent `matplotlib`. `mlab.frange` was removed in `matplotlib-3.1.0`, and `control` removed the call in `control-0.8.2`. + +Update your `control` package. + +### Intel MKL FATAL ERROR: Cannot load libmkl_avx2.so or libmkl_def.so. + +The Intel Math Kernel Library (MKL) is missing or not properly set. I assume MKL is required by the `control` module. + +Install MKL. Then add the following to your path: + +``` +export PATH=/opt/intel/bin:$PATH + +export LD_LIBRARY_PATH=/opt/intel/lib/intel64:/opt/intel/mkl/lib/intel64:$LD_LIBRARY_PATH + +export LD_PRELOAD=/opt/intel/mkl/lib/intel64/libmkl_def.so:/opt/intel/mkl/lib/intel64/libmkl_avx2.so:/opt/intel/mkl/lib/intel64/libmkl_core.so:/opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so:/opt/intel/mkl/lib/intel64/libmkl_intel_thread.so:/opt/intel/lib/intel64_lin/libiomp5.so +``` diff --git a/lang/fr/README.md b/lang/fr/README.md new file mode 100644 index 0000000000..d980044e31 --- /dev/null +++ b/lang/fr/README.md @@ -0,0 +1,165 @@ +# graphkit-learn +[![Build Status](https://travis-ci.org/jajupmochi/graphkit-learn.svg?branch=master)](https://travis-ci.org/jajupmochi/graphkit-learn) [![Build status](https://ci.appveyor.com/api/projects/status/bdxsolk0t1uji9rd?svg=true)](https://ci.appveyor.com/project/jajupmochi/graphkit-learn) [![codecov](https://codecov.io/gh/jajupmochi/graphkit-learn/branch/master/graph/badge.svg)](https://codecov.io/gh/jajupmochi/graphkit-learn) [![Documentation Status](https://readthedocs.org/projects/graphkit-learn/badge/?version=master)](https://graphkit-learn.readthedocs.io/en/master/?badge=master) [![PyPI version](https://badge.fury.io/py/graphkit-learn.svg)](https://badge.fury.io/py/graphkit-learn) + +A Python package for graph kernels, graph edit distances and graph pre-image problem. + +## Requirements + +* python>=3.6 +* numpy>=1.16.2 +* scipy>=1.1.0 +* matplotlib>=3.1.0 +* networkx>=2.2 +* scikit-learn>=0.20.0 +* tabulate>=0.8.2 +* tqdm>=4.26.0 +* control>=0.8.2 (for generalized random walk kernels only) +* slycot>0.4.0 (for generalized random walk kernels only, which requires a fortran compiler, gfortran for example) + +## How to use? + +### Install the library + +* Install stable version from PyPI (may not be up-to-date): +``` +$ pip install graphkit-learn +``` + +* Install latest version from GitHub: +``` +$ git clone https://github.com/jajupmochi/graphkit-learn.git +$ cd graphkit-learn/ +$ python setup.py install +``` + +### Run the test + +A series of [tests](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/tests) can be run to check if the library works correctly: +``` +$ pip install -U pip pytest codecov coverage pytest-cov +$ pytest -v --cov-config=.coveragerc --cov-report term --cov=gklearn gklearn/tests/ +``` + +### Check examples + +A series of demos of using the library can be found on [Google Colab](https://drive.google.com/drive/folders/1r2gtPuFzIys2_MZw1wXqE2w3oCoVoQUG?usp=sharing) and in the [`example`](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/examples) folder. + +### Other demos + +Check [`notebooks`](https://github.com/jajupmochi/graphkit-learn/tree/master/notebooks) directory for more demos: +* [`notebooks`](https://github.com/jajupmochi/graphkit-learn/tree/master/notebooks) directory includes test codes of graph kernels based on linear patterns; +* [`notebooks/tests`](https://github.com/jajupmochi/graphkit-learn/tree/master/notebooks/tests) directory includes codes that test some libraries and functions; +* [`notebooks/utils`](https://github.com/jajupmochi/graphkit-learn/tree/master/notebooks/utils) directory includes some useful tools, such as a Gram matrix checker and a function to get properties of datasets; +* [`notebooks/else`](https://github.com/jajupmochi/graphkit-learn/tree/master/notebooks/else) directory includes other codes that we used for experiments. + +### Documentation + +The docs of the library can be found [here](https://graphkit-learn.readthedocs.io/en/master/?badge=master). + +## Main contents + +### 1 List of graph kernels + +* Based on walks + * [The common walk kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/common_walk.py) [1] + * Exponential + * Geometric + * [The marginalized kenrel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/marginalized.py) + * With tottering [2] + * Without tottering [7] + * [The generalized random walk kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/random_walk.py) [3] + * [Sylvester equation](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/sylvester_equation.py) + * Conjugate gradient + * Fixed-point iterations + * [Spectral decomposition](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/spectral_decomposition.py) +* Based on paths + * [The shortest path kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/shortest_path.py) [4] + * [The structural shortest path kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/structural_sp.py) [5] + * [The path kernel up to length h](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/path_up_to_h.py) [6] + * The Tanimoto kernel + * The MinMax kernel +* Non-linear kernels + * [The treelet kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/treelet.py) [10] + * [Weisfeiler-Lehman kernel](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/weisfeiler_lehman.py) [11] + * [Subtree](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/kernels/weisfeiler_lehman.py#L479) + +A demo of computing graph kernels can be found on [Google Colab](https://colab.research.google.com/drive/17Q2QCl9CAtDweGF8LiWnWoN2laeJqT0u?usp=sharing) and in the [`examples`](https://github.com/jajupmochi/graphkit-learn/blob/master/gklearn/examples/compute_graph_kernel.py) folder. + +### 2 Graph Edit Distances + +### 3 Graph preimage methods + +A demo of generating graph preimages can be found on [Google Colab](https://colab.research.google.com/drive/1PIDvHOcmiLEQ5Np3bgBDdu0kLOquOMQK?usp=sharing) and in the [`examples`](https://github.com/jajupmochi/graphkit-learn/blob/master/gklearn/examples/median_preimege_generator.py) folder. + +### 4 Interface to `GEDLIB` + +[`GEDLIB`](https://github.com/dbblumenthal/gedlib) is an easily extensible C++ library for (suboptimally) computing the graph edit distance between attributed graphs. [A Python interface](https://github.com/jajupmochi/graphkit-learn/tree/master/gklearn/gedlib) for `GEDLIB` is integrated in this library, based on [`gedlibpy`](https://github.com/Ryurin/gedlibpy) library. + +### 5 Computation optimization methods + +* Python’s `multiprocessing.Pool` module is applied to perform **parallelization** on the computations of all kernels as well as the model selection. +* **The Fast Computation of Shortest Path Kernel (FCSP) method** [8] is implemented in *the random walk kernel*, *the shortest path kernel*, as well as *the structural shortest path kernel* where FCSP is applied on both vertex and edge kernels. +* **The trie data structure** [9] is employed in *the path kernel up to length h* to store paths in graphs. + +## Issues + +* This library uses `multiprocessing.Pool.imap_unordered` function to do the parallelization, which may not be able to run correctly under Windows system. For now, Windows users may need to comment the parallel codes and uncomment the codes below them which run serially. We will consider adding a parameter to control serial or parallel computations as needed. + +* Some modules (such as `Numpy`, `Scipy`, `sklearn`) apply [`OpenBLAS`](https://www.openblas.net/) to perform parallel computation by default, which causes conflicts with other parallelization modules such as `multiprossing.Pool`, highly increasing the computing time. By setting its thread to 1, `OpenBLAS` is forced to use a single thread/CPU, thus avoids the conflicts. For now, this procedure has to be done manually. Under Linux, type this command in terminal before running the code: +``` +$ export OPENBLAS_NUM_THREADS=1 +``` +Or add `export OPENBLAS_NUM_THREADS=1` at the end of your `~/.bashrc` file, then run +``` +$ source ~/.bashrc +``` +to make this effective permanently. + +## Results + +Check this paper for detailed description of graph kernels and experimental results: + +Linlin Jia, Benoit Gaüzère, and Paul Honeine. Graph Kernels Based on Linear Patterns: Theoretical and Experimental Comparisons. working paper or preprint, March 2019. URL https://hal-normandie-univ.archives-ouvertes.fr/hal-02053946. + +A comparison of performances of graph kernels on benchmark datasets can be found [here](https://graphkit-learn.readthedocs.io/en/master/experiments.html). + +## How to contribute + +Fork the library and open a pull request! Make your own contribute to the community! + +## Authors + +* [Linlin Jia](https://jajupmochi.github.io/), LITIS, INSA Rouen Normandie +* [Benoit Gaüzère](http://pagesperso.litislab.fr/~bgauzere/#contact_en), LITIS, INSA Rouen Normandie +* [Paul Honeine](http://honeine.fr/paul/Welcome.html), LITIS, Université de Rouen Normandie + +## Citation + +Still waiting... + +## Acknowledgments + +This research was supported by CSC (China Scholarship Council) and the French national research agency (ANR) under the grant APi (ANR-18-CE23-0014). The authors would like to thank the CRIANN (Le Centre Régional Informatique et d’Applications Numériques de Normandie) for providing computational resources. + +## References +[1] Thomas Gärtner, Peter Flach, and Stefan Wrobel. On graph kernels: Hardness results and efficient alternatives. Learning Theory and Kernel Machines, pages 129–143, 2003. + +[2] H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled graphs. In Proceedings of the 20th International Conference on Machine Learning, Washington, DC, United States, 2003. + +[3] Vishwanathan, S.V.N., Schraudolph, N.N., Kondor, R., Borgwardt, K.M., 2010. Graph kernels. Journal of Machine Learning Research 11, 1201–1242. + +[4] K. M. Borgwardt and H.-P. Kriegel. Shortest-path kernels on graphs. In Proceedings of the International Conference on Data Mining, pages 74-81, 2005. + +[5] Liva Ralaivola, Sanjay J Swamidass, Hiroto Saigo, and Pierre Baldi. Graph kernels for chemical informatics. Neural networks, 18(8):1093–1110, 2005. + +[6] Suard F, Rakotomamonjy A, Bensrhair A. Kernel on Bag of Paths For Measuring Similarity of Shapes. InESANN 2007 Apr 25 (pp. 355-360). + +[7] Mahé, P., Ueda, N., Akutsu, T., Perret, J.L., Vert, J.P., 2004. Extensions of marginalized graph kernels, in: Proc. the twenty-first international conference on Machine learning, ACM. p. 70. + +[8] Lifan Xu, Wei Wang, M Alvarez, John Cavazos, and Dongping Zhang. Parallelization of shortest path graph kernels on multi-core cpus and gpus. Proceedings of the Programmability Issues for Heterogeneous Multicores (MultiProg), Vienna, Austria, 2014. + +[9] Edward Fredkin. Trie memory. Communications of the ACM, 3(9):490–499, 1960. + +[10] Gaüzere, B., Brun, L., Villemin, D., 2012. Two new graphs kernels in chemoinformatics. Pattern Recognition Letters 33, 2038–2047. + +[11] Shervashidze, N., Schweitzer, P., Leeuwen, E.J.v., Mehlhorn, K., Borgwardt, K.M., 2011. Weisfeiler-lehman graph kernels. Journal of Machine Learning Research 12, 2539–2561. diff --git a/lang/fr/docs/Makefile b/lang/fr/docs/Makefile new file mode 100644 index 0000000000..69fe55ecfa --- /dev/null +++ b/lang/fr/docs/Makefile @@ -0,0 +1,19 @@ +# Minimal makefile for Sphinx documentation +# + +# You can set these variables from the command line. +SPHINXOPTS = +SPHINXBUILD = sphinx-build +SOURCEDIR = source +BUILDDIR = build + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefile + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) \ No newline at end of file diff --git a/lang/fr/docs/commands.md b/lang/fr/docs/commands.md new file mode 100644 index 0000000000..ff7cc4cd79 --- /dev/null +++ b/lang/fr/docs/commands.md @@ -0,0 +1,5 @@ +sphinx-apidoc -o docs/ gklearn/ --separate + +sphinx-apidoc -o source/ ../gklearn/ --separate --force --module-first --no-toc + +make html diff --git a/lang/fr/docs/make.bat b/lang/fr/docs/make.bat new file mode 100644 index 0000000000..543c6b13b4 --- /dev/null +++ b/lang/fr/docs/make.bat @@ -0,0 +1,35 @@ +@ECHO OFF + +pushd %~dp0 + +REM Command file for Sphinx documentation + +if "%SPHINXBUILD%" == "" ( + set SPHINXBUILD=sphinx-build +) +set SOURCEDIR=source +set BUILDDIR=build + +if "%1" == "" goto help + +%SPHINXBUILD% >NUL 2>NUL +if errorlevel 9009 ( + echo. + echo.The 'sphinx-build' command was not found. Make sure you have Sphinx + echo.installed, then set the SPHINXBUILD environment variable to point + echo.to the full path of the 'sphinx-build' executable. Alternatively you + echo.may add the Sphinx directory to PATH. + echo. + echo.If you don't have Sphinx installed, grab it from + echo.http://sphinx-doc.org/ + exit /b 1 +) + +%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% +goto end + +:help +%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% + +:end +popd diff --git a/lang/fr/docs/requirements.txt b/lang/fr/docs/requirements.txt new file mode 100644 index 0000000000..52189409e2 --- /dev/null +++ b/lang/fr/docs/requirements.txt @@ -0,0 +1,4 @@ +sphinx +m2r +nbsphinx +ipykernel diff --git a/lang/fr/docs/source/conf.py b/lang/fr/docs/source/conf.py new file mode 100644 index 0000000000..b0fae5a482 --- /dev/null +++ b/lang/fr/docs/source/conf.py @@ -0,0 +1,194 @@ +# -*- coding: utf-8 -*- +# +# Configuration file for the Sphinx documentation builder. +# +# This file does only contain a selection of the most common options. For a +# full list see the documentation: +# http://www.sphinx-doc.org/en/master/config + +# -- Path setup -------------------------------------------------------------- + +# If extensions (or modules to document with autodoc) are in another directory, +# add these directories to sys.path here. If the directory is relative to the +# documentation root, use os.path.abspath to make it absolute, like shown here. +# +import os +import sys +sys.path.insert(0, os.path.abspath('.')) +# sys.path.insert(0, os.path.abspath('..')) +sys.path.insert(0, '../') +sys.path.insert(0, '../../') + +# -- Project information ----------------------------------------------------- + +project = 'graphkit-learn' +copyright = '2020, Linlin Jia' +author = 'Linlin Jia' + +# The short X.Y version +version = '' +# The full version, including alpha/beta/rc tags +release = '1.0.0' + + +# -- General configuration --------------------------------------------------- + +# If your documentation needs a minimal Sphinx version, state it here. +# +# needs_sphinx = '1.0' + +# Add any Sphinx extension module names here, as strings. They can be +# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom +# ones. +extensions = [ + 'sphinx.ext.autodoc', + 'sphinx.ext.doctest', + 'sphinx.ext.todo', + 'sphinx.ext.coverage', + 'sphinx.ext.mathjax', + 'sphinx.ext.ifconfig', + 'sphinx.ext.viewcode', + 'm2r', +] + +# Add any paths that contain templates here, relative to this directory. +templates_path = ['_templates'] + +# The suffix(es) of source filenames. +# You can specify multiple suffix as a list of string: +# +source_suffix = ['.rst', '.md'] +# source_suffix = '.rst' + +# The master toctree document. +master_doc = 'index' + +# The language for content autogenerated by Sphinx. Refer to documentation +# for a list of supported languages. +# +# This is also used if you do content translation via gettext catalogs. +# Usually you set "language" from the command line for these cases. +language = None + +# List of patterns, relative to source directory, that match files and +# directories to ignore when looking for source files. +# This pattern also affects html_static_path and html_extra_path. +exclude_patterns = [] + +# The name of the Pygments (syntax highlighting) style to use. +pygments_style = None + + +# -- Options for HTML output ------------------------------------------------- + +# The theme to use for HTML and HTML Help pages. See the documentation for +# a list of builtin themes. +# +# html_theme = 'alabaster' +html_theme = 'sphinx_rtd_theme' + +# Theme options are theme-specific and customize the look and feel of a theme +# further. For a list of options available for each theme, see the +# documentation. +# +# html_theme_options = {} + +# Add any paths that contain custom static files (such as style sheets) here, +# relative to this directory. They are copied after the builtin static files, +# so a file named "default.css" will overwrite the builtin "default.css". +html_static_path = ['_static'] + +# Custom sidebar templates, must be a dictionary that maps document names +# to template names. +# +# The default sidebars (for documents that don't match any pattern) are +# defined by theme itself. Builtin themes are using these templates by +# default: ``['localtoc.html', 'relations.html', 'sourcelink.html', +# 'searchbox.html']``. +# +# html_sidebars = {} + + +# -- Options for HTMLHelp output --------------------------------------------- + +# Output file base name for HTML help builder. +htmlhelp_basename = 'graphkit-learndoc' + + +# -- Options for LaTeX output ------------------------------------------------ + +latex_elements = { + # The paper size ('letterpaper' or 'a4paper'). + # + # 'papersize': 'letterpaper', + + # The font size ('10pt', '11pt' or '12pt'). + # + # 'pointsize': '10pt', + + # Additional stuff for the LaTeX preamble. + # + # 'preamble': '', + + # Latex figure (float) alignment + # + # 'figure_align': 'htbp', +} + +# Grouping the document tree into LaTeX files. List of tuples +# (source start file, target name, title, +# author, documentclass [howto, manual, or own class]). +latex_documents = [ + (master_doc, 'graphkit-learn.tex', 'graphkit-learn Documentation', + 'Linlin Jia', 'manual'), +] + + +# -- Options for manual page output ------------------------------------------ + +# One entry per manual page. List of tuples +# (source start file, name, description, authors, manual section). +man_pages = [ + (master_doc, 'graphkit-learn', 'graphkit-learn Documentation', + [author], 1) +] + + +# -- Options for Texinfo output ---------------------------------------------- + +# Grouping the document tree into Texinfo files. List of tuples +# (source start file, target name, title, author, +# dir menu entry, description, category) +texinfo_documents = [ + (master_doc, 'graphkit-learn', 'graphkit-learn Documentation', + author, 'graphkit-learn', 'One line description of project.', + 'Miscellaneous'), +] + + +# -- Options for Epub output ------------------------------------------------- + +# Bibliographic Dublin Core info. +epub_title = project + +# The unique identifier of the text. This can be a ISBN number +# or the project homepage. +# +# epub_identifier = '' + +# A unique identification for the text. +# +# epub_uid = '' + +# A list of files that should not be packed into the epub file. +epub_exclude_files = ['search.html'] + + +# -- Extension configuration ------------------------------------------------- + +# -- Options for todo extension ---------------------------------------------- + +# If true, `todo` and `todoList` produce output, else they produce nothing. +todo_include_todos = True + +add_module_names = False diff --git a/lang/fr/docs/source/experiments.rst b/lang/fr/docs/source/experiments.rst new file mode 100644 index 0000000000..7d8d477afd --- /dev/null +++ b/lang/fr/docs/source/experiments.rst @@ -0,0 +1,22 @@ +Experiments +=========== + +To exhibit the effectiveness and practicability of `graphkit-learn` library, we tested it on several benchmark datasets. See `(Kersting et al., 2016) `__ for details on these datasets. + +A two-layer nested cross-validation (CV) is applied to select and evaluate models, where outer CV randomly splits the dataset into 10 folds with 9 as validation set, and inner CV then randomly splits validation set to 10 folds with 9 as training set. The whole procedure is performed 30 times, and the average performance is computed over these trails. Possible parameters of a graph kernel are also tuned during this procedure. + +The machine used to execute the experiments is a cluster with 28 CPU cores of Intel(R) Xeon(R) E5-2680 v4 @ 2.40GHz, 252GB memory, and 64-bit operating system CentOS Linux release 7.3.1611. All results were run with Python 3.5.2. + +The figure below exhibits accuracies achieved by graph kernels implemented in `graphkit-learn` library, in terms of regression error (the upper table) and classification rate (the lower table). Red color indicates the worse results and dark green the best ones. Gray cells with the “inf” marker indicate that the computation of the graph kernel on the dataset is omitted due to much higher consumption of computational resources than other kernels. + +.. image:: figures/all_test_accuracy.svg + :width: 600 + :alt: accuracies + +The figure below displays computational time consumed to compute Gram matrices of each graph +kernels (in :math:`log10` of seconds) on each dataset. Color legends have the same meaning as in the figure above. + +.. image:: figures/all_ave_gm_times.svg + :width: 600 + :alt: computational time + diff --git a/lang/fr/docs/source/figures/all_ave_gm_times.svg b/lang/fr/docs/source/figures/all_ave_gm_times.svg new file mode 100644 index 0000000000..037a6a1cd6 --- /dev/null +++ b/lang/fr/docs/source/figures/all_ave_gm_times.svg @@ -0,0 +1,2059 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/lang/fr/docs/source/figures/all_test_accuracy.svg b/lang/fr/docs/source/figures/all_test_accuracy.svg new file mode 100644 index 0000000000..13fa813bb2 --- /dev/null +++ b/lang/fr/docs/source/figures/all_test_accuracy.svg @@ -0,0 +1,2131 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/lang/fr/docs/source/gklearn.kernels.commonWalkKernel.rst b/lang/fr/docs/source/gklearn.kernels.commonWalkKernel.rst new file mode 100644 index 0000000000..1b4b4d8d9d --- /dev/null +++ b/lang/fr/docs/source/gklearn.kernels.commonWalkKernel.rst @@ -0,0 +1,7 @@ +gklearn.kernels.commonWalkKernel +================================ + +.. automodule:: gklearn.kernels.commonWalkKernel + :members: + :undoc-members: + :show-inheritance: diff --git a/lang/fr/docs/source/gklearn.kernels.marginalizedKernel.rst b/lang/fr/docs/source/gklearn.kernels.marginalizedKernel.rst new file mode 100644 index 0000000000..70141f7a16 --- /dev/null +++ b/lang/fr/docs/source/gklearn.kernels.marginalizedKernel.rst @@ -0,0 +1,7 @@ +gklearn.kernels.marginalizedKernel +================================== + +.. automodule:: gklearn.kernels.marginalizedKernel + :members: + :undoc-members: + :show-inheritance: diff --git a/lang/fr/docs/source/gklearn.kernels.randomWalkKernel.rst b/lang/fr/docs/source/gklearn.kernels.randomWalkKernel.rst new file mode 100644 index 0000000000..f6a24d6618 --- /dev/null +++ b/lang/fr/docs/source/gklearn.kernels.randomWalkKernel.rst @@ -0,0 +1,7 @@ +gklearn.kernels.randomWalkKernel +================================ + +.. automodule:: gklearn.kernels.randomWalkKernel + :members: + :undoc-members: + :show-inheritance: diff --git a/lang/fr/docs/source/gklearn.kernels.rst b/lang/fr/docs/source/gklearn.kernels.rst new file mode 100644 index 0000000000..404d2d3641 --- /dev/null +++ b/lang/fr/docs/source/gklearn.kernels.rst @@ -0,0 +1,19 @@ +gklearn.kernels +=============== + +.. automodule:: gklearn.kernels + :members: + :undoc-members: + :show-inheritance: + +.. toctree:: + + gklearn.kernels.commonWalkKernel + gklearn.kernels.marginalizedKernel + gklearn.kernels.randomWalkKernel + gklearn.kernels.spKernel + gklearn.kernels.structuralspKernel + gklearn.kernels.treeletKernel + gklearn.kernels.untilHPathKernel + gklearn.kernels.weisfeilerLehmanKernel + diff --git a/lang/fr/docs/source/gklearn.kernels.spKernel.rst b/lang/fr/docs/source/gklearn.kernels.spKernel.rst new file mode 100644 index 0000000000..d9da9bcdcf --- /dev/null +++ b/lang/fr/docs/source/gklearn.kernels.spKernel.rst @@ -0,0 +1,7 @@ +gklearn.kernels.spKernel +======================== + +.. automodule:: gklearn.kernels.spKernel + :members: + :undoc-members: + :show-inheritance: diff --git a/lang/fr/docs/source/gklearn.kernels.structuralspKernel.rst b/lang/fr/docs/source/gklearn.kernels.structuralspKernel.rst new file mode 100644 index 0000000000..90c0fe3c2d --- /dev/null +++ b/lang/fr/docs/source/gklearn.kernels.structuralspKernel.rst @@ -0,0 +1,7 @@ +gklearn.kernels.structuralspKernel +================================== + +.. automodule:: gklearn.kernels.structuralspKernel + :members: + :undoc-members: + :show-inheritance: diff --git a/lang/fr/docs/source/gklearn.kernels.treeletKernel.rst b/lang/fr/docs/source/gklearn.kernels.treeletKernel.rst new file mode 100644 index 0000000000..c88016dcb8 --- /dev/null +++ b/lang/fr/docs/source/gklearn.kernels.treeletKernel.rst @@ -0,0 +1,7 @@ +gklearn.kernels.treeletKernel +============================= + +.. automodule:: gklearn.kernels.treeletKernel + :members: + :undoc-members: + :show-inheritance: diff --git a/lang/fr/docs/source/gklearn.kernels.untilHPathKernel.rst b/lang/fr/docs/source/gklearn.kernels.untilHPathKernel.rst new file mode 100644 index 0000000000..76f39105bb --- /dev/null +++ b/lang/fr/docs/source/gklearn.kernels.untilHPathKernel.rst @@ -0,0 +1,7 @@ +gklearn.kernels.untilHPathKernel +================================ + +.. automodule:: gklearn.kernels.untilHPathKernel + :members: + :undoc-members: + :show-inheritance: diff --git a/lang/fr/docs/source/gklearn.kernels.weisfeilerLehmanKernel.rst b/lang/fr/docs/source/gklearn.kernels.weisfeilerLehmanKernel.rst new file mode 100644 index 0000000000..f5797a2217 --- /dev/null +++ b/lang/fr/docs/source/gklearn.kernels.weisfeilerLehmanKernel.rst @@ -0,0 +1,7 @@ +gklearn.kernels.weisfeilerLehmanKernel +====================================== + +.. automodule:: gklearn.kernels.weisfeilerLehmanKernel + :members: + :undoc-members: + :show-inheritance: diff --git a/lang/fr/docs/source/gklearn.rst b/lang/fr/docs/source/gklearn.rst new file mode 100644 index 0000000000..d7de14a196 --- /dev/null +++ b/lang/fr/docs/source/gklearn.rst @@ -0,0 +1,13 @@ +gklearn +======= + +.. automodule:: gklearn + :members: + :undoc-members: + :show-inheritance: + +.. toctree:: + + gklearn.kernels + gklearn.utils + diff --git a/lang/fr/docs/source/gklearn.utils.graphdataset.rst b/lang/fr/docs/source/gklearn.utils.graphdataset.rst new file mode 100644 index 0000000000..4e2aae17db --- /dev/null +++ b/lang/fr/docs/source/gklearn.utils.graphdataset.rst @@ -0,0 +1,7 @@ +gklearn.utils.graphdataset +========================== + +.. automodule:: gklearn.utils.graphdataset + :members: + :undoc-members: + :show-inheritance: diff --git a/lang/fr/docs/source/gklearn.utils.graphfiles.rst b/lang/fr/docs/source/gklearn.utils.graphfiles.rst new file mode 100644 index 0000000000..48b5e06277 --- /dev/null +++ b/lang/fr/docs/source/gklearn.utils.graphfiles.rst @@ -0,0 +1,7 @@ +gklearn.utils.graphfiles +======================== + +.. automodule:: gklearn.utils.graphfiles + :members: + :undoc-members: + :show-inheritance: diff --git a/lang/fr/docs/source/gklearn.utils.kernels.rst b/lang/fr/docs/source/gklearn.utils.kernels.rst new file mode 100644 index 0000000000..023cb3ec32 --- /dev/null +++ b/lang/fr/docs/source/gklearn.utils.kernels.rst @@ -0,0 +1,7 @@ +gklearn.utils.kernels +===================== + +.. automodule:: gklearn.utils.kernels + :members: + :undoc-members: + :show-inheritance: diff --git a/lang/fr/docs/source/gklearn.utils.model_selection_precomputed.rst b/lang/fr/docs/source/gklearn.utils.model_selection_precomputed.rst new file mode 100644 index 0000000000..b80e8fcc5e --- /dev/null +++ b/lang/fr/docs/source/gklearn.utils.model_selection_precomputed.rst @@ -0,0 +1,7 @@ +gklearn.utils.model\_selection\_precomputed +=========================================== + +.. automodule:: gklearn.utils.model_selection_precomputed + :members: + :undoc-members: + :show-inheritance: diff --git a/lang/fr/docs/source/gklearn.utils.parallel.rst b/lang/fr/docs/source/gklearn.utils.parallel.rst new file mode 100644 index 0000000000..8469b0a87a --- /dev/null +++ b/lang/fr/docs/source/gklearn.utils.parallel.rst @@ -0,0 +1,7 @@ +gklearn.utils.parallel +====================== + +.. automodule:: gklearn.utils.parallel + :members: + :undoc-members: + :show-inheritance: diff --git a/lang/fr/docs/source/gklearn.utils.rst b/lang/fr/docs/source/gklearn.utils.rst new file mode 100644 index 0000000000..3d8a0e6933 --- /dev/null +++ b/lang/fr/docs/source/gklearn.utils.rst @@ -0,0 +1,19 @@ +gklearn.utils +============= + +.. automodule:: gklearn.utils + :members: + :undoc-members: + :show-inheritance: + + +.. toctree:: + + gklearn.utils.graphdataset + gklearn.utils.graphfiles + gklearn.utils.kernels + gklearn.utils.model_selection_precomputed + gklearn.utils.parallel + gklearn.utils.trie + gklearn.utils.utils + diff --git a/lang/fr/docs/source/gklearn.utils.trie.rst b/lang/fr/docs/source/gklearn.utils.trie.rst new file mode 100644 index 0000000000..1310cb13db --- /dev/null +++ b/lang/fr/docs/source/gklearn.utils.trie.rst @@ -0,0 +1,7 @@ +gklearn.utils.trie +================== + +.. automodule:: gklearn.utils.trie + :members: + :undoc-members: + :show-inheritance: diff --git a/lang/fr/docs/source/gklearn.utils.utils.rst b/lang/fr/docs/source/gklearn.utils.utils.rst new file mode 100644 index 0000000000..004db5886f --- /dev/null +++ b/lang/fr/docs/source/gklearn.utils.utils.rst @@ -0,0 +1,7 @@ +gklearn.utils.utils +=================== + +.. automodule:: gklearn.utils.utils + :members: + :undoc-members: + :show-inheritance: diff --git a/lang/fr/docs/source/index.rst b/lang/fr/docs/source/index.rst new file mode 100644 index 0000000000..b531ba1fc5 --- /dev/null +++ b/lang/fr/docs/source/index.rst @@ -0,0 +1,24 @@ +.. graphkit-learn documentation master file, created by + sphinx-quickstart on Wed Feb 12 15:06:37 2020. + You can adapt this file completely to your liking, but it should at least + contain the root `toctree` directive. + +.. mdinclude:: ../../README.md + +Documentation +------------- + +.. toctree:: + :maxdepth: 1 + + modules.rst + experiments.rst + + + +Indices and tables +------------------ + +* :ref:`genindex` +* :ref:`modindex` +* :ref:`search` diff --git a/lang/fr/docs/source/modules.rst b/lang/fr/docs/source/modules.rst new file mode 100644 index 0000000000..536f81ba2b --- /dev/null +++ b/lang/fr/docs/source/modules.rst @@ -0,0 +1,7 @@ +Modules +======= + +.. toctree:: + :maxdepth: 4 + + gklearn diff --git a/lang/fr/gklearn/__init__.py b/lang/fr/gklearn/__init__.py new file mode 100644 index 0000000000..08ca4ed6c7 --- /dev/null +++ b/lang/fr/gklearn/__init__.py @@ -0,0 +1,21 @@ +# -*-coding:utf-8 -*- +""" +gklearn + +This package contains 4 sub packages : + * c_ext : binders to C++ code + * ged : allows to compute graph edit distance between networkX graphs + * kernels : computation of graph kernels, ie graph similarity measure compatible with SVM + * notebooks : examples of code using this library + * utils : Diverse computation on graphs +""" + +# info +__version__ = "0.1" +__author__ = "Benoit Gaüzère" +__date__ = "November 2017" + +# import sub modules +# from gklearn import c_ext +# from gklearn import ged +# from gklearn import utils diff --git a/lang/fr/gklearn/examples/ged/compute_graph_edit_distance.py b/lang/fr/gklearn/examples/ged/compute_graph_edit_distance.py new file mode 100644 index 0000000000..027d1e4cd0 --- /dev/null +++ b/lang/fr/gklearn/examples/ged/compute_graph_edit_distance.py @@ -0,0 +1,58 @@ +# -*- coding: utf-8 -*- +"""compute_graph_edit_distance.ipynb + +Automatically generated by Colaboratory. + +Original file is located at + https://colab.research.google.com/drive/1Wfgn7WVuyOQQgwOvdUQBz0BzEVdp0YM3 + +**This script demonstrates how to compute a graph edit distance.** +--- + +**0. Install `graphkit-learn`.** +""" + +"""**1. Get dataset.**""" + +from gklearn.utils import Dataset + +# Predefined dataset name, use dataset "MUTAG". +ds_name = 'MUTAG' + +# Initialize a Dataset. +dataset = Dataset() +# Load predefined dataset "MUTAG". +dataset.load_predefined_dataset(ds_name) +graph1 = dataset.graphs[0] +graph2 = dataset.graphs[1] +print(graph1, graph2) + +"""**2. Compute graph edit distance.**""" + +from gklearn.ged.env import GEDEnv + + +ged_env = GEDEnv() # initailize GED environment. +ged_env.set_edit_cost('CONSTANT', # GED cost type. + edit_cost_constants=[3, 3, 1, 3, 3, 1] # edit costs. + ) +ged_env.add_nx_graph(graph1, '') # add graph1 +ged_env.add_nx_graph(graph2, '') # add graph2 +listID = ged_env.get_all_graph_ids() # get list IDs of graphs +ged_env.init(init_type='LAZY_WITHOUT_SHUFFLED_COPIES') # initialize GED environment. +options = {'initialization_method': 'RANDOM', # or 'NODE', etc. + 'threads': 1 # parallel threads. + } +ged_env.set_method('BIPARTITE', # GED method. + options # options for GED method. + ) +ged_env.init_method() # initialize GED method. + +ged_env.run_method(listID[0], listID[1]) # run. + +pi_forward = ged_env.get_forward_map(listID[0], listID[1]) # forward map. +pi_backward = ged_env.get_backward_map(listID[0], listID[1]) # backward map. +dis = ged_env.get_upper_bound(listID[0], listID[1]) # GED bewteen two graphs. +print(pi_forward) +print(pi_backward) +print(dis) \ No newline at end of file diff --git a/lang/fr/gklearn/examples/kernels/compute_distance_in_kernel_space.py b/lang/fr/gklearn/examples/kernels/compute_distance_in_kernel_space.py new file mode 100644 index 0000000000..76c74947ce --- /dev/null +++ b/lang/fr/gklearn/examples/kernels/compute_distance_in_kernel_space.py @@ -0,0 +1,73 @@ +# -*- coding: utf-8 -*- +"""compute_distance_in_kernel_space.ipynb + +Automatically generated by Colaboratory. + +Original file is located at + https://colab.research.google.com/drive/17tZP6IrineQmzo9sRtfZOnHpHx6HnlMA + +**This script demonstrates how to compute distance in kernel space between the image of a graph and the mean of images of a group of graphs.** +--- + +**0. Install `graphkit-learn`.** +""" + +"""**1. Get dataset.**""" + +from gklearn.utils import Dataset + +# Predefined dataset name, use dataset "MUTAG". +ds_name = 'MUTAG' + +# Initialize a Dataset. +dataset = Dataset() +# Load predefined dataset "MUTAG". +dataset.load_predefined_dataset(ds_name) +len(dataset.graphs) + +"""**2. Compute graph kernel.**""" + +from gklearn.kernels import PathUpToH +import multiprocessing + +# Initailize parameters for graph kernel computation. +kernel_options = {'depth': 3, + 'k_func': 'MinMax', + 'compute_method': 'trie' + } + +# Initialize graph kernel. +graph_kernel = PathUpToH(node_labels=dataset.node_labels, # list of node label names. + edge_labels=dataset.edge_labels, # list of edge label names. + ds_infos=dataset.get_dataset_infos(keys=['directed']), # dataset information required for computation. + **kernel_options, # options for computation. + ) + +# Compute Gram matrix. +gram_matrix, run_time = graph_kernel.compute(dataset.graphs, + parallel='imap_unordered', # or None. + n_jobs=multiprocessing.cpu_count(), # number of parallel jobs. + normalize=True, # whether to return normalized Gram matrix. + verbose=2 # whether to print out results. + ) + +"""**3. Compute distance in kernel space.** + +Given a dataset $\mathcal{G}_N$, compute the distance in kernel space between the image of $G_1 \in \mathcal{G}_N$ and the mean of images of $\mathcal{G}_k \subset \mathcal{G}_N$. +""" + +from gklearn.preimage.utils import compute_k_dis + +# Index of $G_1$. +idx_1 = 10 +# Indices of graphs in $\mathcal{G}_k$. +idx_graphs = range(0, 10) + +# Compute the distance in kernel space. +dis_k = compute_k_dis(idx_1, + idx_graphs, + [1 / len(idx_graphs)] * len(idx_graphs), # weights for images of graphs in $\mathcal{G}_k$; all equal when computing the mean. + gram_matrix, # gram matrix of al graphs. + withterm3=False + ) +print(dis_k) \ No newline at end of file diff --git a/lang/fr/gklearn/examples/kernels/compute_graph_kernel.py b/lang/fr/gklearn/examples/kernels/compute_graph_kernel.py new file mode 100644 index 0000000000..2fe8d529c9 --- /dev/null +++ b/lang/fr/gklearn/examples/kernels/compute_graph_kernel.py @@ -0,0 +1,87 @@ +# -*- coding: utf-8 -*- +"""compute_graph_kernel.ipynb + +Automatically generated by Colaboratory. + +Original file is located at + https://colab.research.google.com/drive/17Q2QCl9CAtDweGF8LiWnWoN2laeJqT0u + +**This script demonstrates how to compute a graph kernel.** +--- + +**0. Install `graphkit-learn`.** +""" + +"""**1. Get dataset.**""" + +from gklearn.utils import Dataset + +# Predefined dataset name, use dataset "MUTAG". +ds_name = 'MUTAG' + +# Initialize a Dataset. +dataset = Dataset() +# Load predefined dataset "MUTAG". +dataset.load_predefined_dataset(ds_name) +len(dataset.graphs) + +"""**2. Compute graph kernel.**""" + +from gklearn.kernels import PathUpToH + +# Initailize parameters for graph kernel computation. +kernel_options = {'depth': 3, + 'k_func': 'MinMax', + 'compute_method': 'trie' + } + +# Initialize graph kernel. +graph_kernel = PathUpToH(node_labels=dataset.node_labels, # list of node label names. + edge_labels=dataset.edge_labels, # list of edge label names. + ds_infos=dataset.get_dataset_infos(keys=['directed']), # dataset information required for computation. + **kernel_options, # options for computation. + ) + +print('done.') + +import multiprocessing +import matplotlib.pyplot as plt + +# Compute Gram matrix. +gram_matrix, run_time = graph_kernel.compute(dataset.graphs, + parallel='imap_unordered', # or None. + n_jobs=multiprocessing.cpu_count(), # number of parallel jobs. + normalize=True, # whether to return normalized Gram matrix. + verbose=2 # whether to print out results. + ) +# Print results. +print() +print(gram_matrix) +print(run_time) +plt.imshow(gram_matrix) + +import multiprocessing + +# Compute grah kernels between a graph and a list of graphs. +kernel_list, run_time = graph_kernel.compute(dataset.graphs, # a list of graphs. + dataset.graphs[0], # a single graph. + parallel='imap_unordered', # or None. + n_jobs=multiprocessing.cpu_count(), # number of parallel jobs. + verbose=2 # whether to print out results. + ) +# Print results. +print() +print(kernel_list) +print(run_time) + +import multiprocessing + +# Compute a grah kernel between two graphs. +kernel, run_time = graph_kernel.compute(dataset.graphs[0], # a single graph. + dataset.graphs[1], # another single graph. + verbose=2 # whether to print out results. + ) +# Print results. +print() +print(kernel) +print(run_time) \ No newline at end of file diff --git a/lang/fr/gklearn/examples/kernels/compute_graph_kernel_old.py b/lang/fr/gklearn/examples/kernels/compute_graph_kernel_old.py new file mode 100644 index 0000000000..7149c68c97 --- /dev/null +++ b/lang/fr/gklearn/examples/kernels/compute_graph_kernel_old.py @@ -0,0 +1,31 @@ +# -*- coding: utf-8 -*- +"""compute_graph_kernel_v0.1.ipynb + +Automatically generated by Colaboratory. + +Original file is located at + https://colab.research.google.com/drive/10jUz7-ahPiE_T1qvFrh2NvCVs1e47noj + +**This script demonstrates how to compute a graph kernel.** +--- + +**0. Install `graphkit-learn`.** +""" + +"""**1. Get dataset.**""" + +from gklearn.utils.graphfiles import loadDataset + +graphs, targets = loadDataset('../../../datasets/MUTAG/MUTAG_A.txt') + +"""**2. Compute graph kernel.**""" + +from gklearn.kernels import untilhpathkernel + +gram_matrix, run_time = untilhpathkernel( + graphs, # The list of input graphs. + depth=5, # The longest length of paths. + k_func='MinMax', # Or 'tanimoto'. + compute_method='trie', # Or 'naive'. + n_jobs=1, # The number of jobs to run in parallel. + verbose=True) \ No newline at end of file diff --git a/lang/fr/gklearn/examples/kernels/model_selection_old.py b/lang/fr/gklearn/examples/kernels/model_selection_old.py new file mode 100644 index 0000000000..ca66be6ea8 --- /dev/null +++ b/lang/fr/gklearn/examples/kernels/model_selection_old.py @@ -0,0 +1,38 @@ +# -*- coding: utf-8 -*- +"""model_selection_old.ipynb + +Automatically generated by Colaboratory. + +Original file is located at + https://colab.research.google.com/drive/1uVkl7scNgEPrimX8ks6iEC5ijuhB8L_D + +**This script demonstrates how to compute a graph kernel.** +--- + +**0. Install `graphkit-learn`.** +""" + +"""**1. Perform model seletion and classification.**""" + +from gklearn.utils import model_selection_for_precomputed_kernel +from gklearn.kernels import untilhpathkernel +import numpy as np + +# Set parameters. +datafile = '../../../datasets/MUTAG/MUTAG_A.txt' +param_grid_precomputed = {'depth': np.linspace(1, 10, 10), + 'k_func': ['MinMax', 'tanimoto'], + 'compute_method': ['trie']} +param_grid = {'C': np.logspace(-10, 10, num=41, base=10)} + +# Perform model selection and classification. +model_selection_for_precomputed_kernel( + datafile, # The path of dataset file. + untilhpathkernel, # The graph kernel used for estimation. + param_grid_precomputed, # The parameters used to compute gram matrices. + param_grid, # The penelty Parameters used for penelty items. + 'classification', # Or 'regression'. + NUM_TRIALS=30, # The number of the random trials of the outer CV loop. + ds_name='MUTAG', # The name of the dataset. + n_jobs=1, + verbose=True) \ No newline at end of file diff --git a/lang/fr/gklearn/examples/preimage/median_preimege_generator.py b/lang/fr/gklearn/examples/preimage/median_preimege_generator.py new file mode 100644 index 0000000000..9afc7bd4d4 --- /dev/null +++ b/lang/fr/gklearn/examples/preimage/median_preimege_generator.py @@ -0,0 +1,115 @@ +# -*- coding: utf-8 -*- +"""example_median_preimege_generator.ipynb + +Automatically generated by Colaboratory. + +Original file is located at + https://colab.research.google.com/drive/1PIDvHOcmiLEQ5Np3bgBDdu0kLOquOMQK + +**This script demonstrates how to generate a graph preimage using Boria's method.** +--- +""" + +"""**1. Get dataset.**""" + +from gklearn.utils import Dataset, split_dataset_by_target + +# Predefined dataset name, use dataset "MAO". +ds_name = 'MAO' +# The node/edge labels that will not be used in the computation. +irrelevant_labels = {'node_attrs': ['x', 'y', 'z'], 'edge_labels': ['bond_stereo']} + +# Initialize a Dataset. +dataset_all = Dataset() +# Load predefined dataset "MAO". +dataset_all.load_predefined_dataset(ds_name) +# Remove irrelevant labels. +dataset_all.remove_labels(**irrelevant_labels) +# Split the whole dataset according to the classification targets. +datasets = split_dataset_by_target(dataset_all) +# Get the first class of graphs, whose median preimage will be computed. +dataset = datasets[0] +len(dataset.graphs) + +"""**2. Set parameters.**""" + +import multiprocessing + +# Parameters for MedianPreimageGenerator (our method). +mpg_options = {'fit_method': 'k-graphs', # how to fit edit costs. "k-graphs" means use all graphs in median set when fitting. + 'init_ecc': [4, 4, 2, 1, 1, 1], # initial edit costs. + 'ds_name': ds_name, # name of the dataset. + 'parallel': True, # whether the parallel scheme is to be used. + 'time_limit_in_sec': 0, # maximum time limit to compute the preimage. If set to 0 then no limit. + 'max_itrs': 100, # maximum iteration limit to optimize edit costs. If set to 0 then no limit. + 'max_itrs_without_update': 3, # If the times that edit costs is not update is more than this number, then the optimization stops. + 'epsilon_residual': 0.01, # In optimization, the residual is only considered changed if the change is bigger than this number. + 'epsilon_ec': 0.1, # In optimization, the edit costs are only considered changed if the changes are bigger than this number. + 'verbose': 2 # whether to print out results. + } +# Parameters for graph kernel computation. +kernel_options = {'name': 'PathUpToH', # use path kernel up to length h. + 'depth': 9, + 'k_func': 'MinMax', + 'compute_method': 'trie', + 'parallel': 'imap_unordered', # or None + 'n_jobs': multiprocessing.cpu_count(), + 'normalize': True, # whether to use normalized Gram matrix to optimize edit costs. + 'verbose': 2 # whether to print out results. + } +# Parameters for GED computation. +ged_options = {'method': 'IPFP', # use IPFP huristic. + 'initialization_method': 'RANDOM', # or 'NODE', etc. + 'initial_solutions': 10, # when bigger than 1, then the method is considered mIPFP. + 'edit_cost': 'CONSTANT', # use CONSTANT cost. + 'attr_distance': 'euclidean', # the distance between non-symbolic node/edge labels is computed by euclidean distance. + 'ratio_runs_from_initial_solutions': 1, + 'threads': multiprocessing.cpu_count(), # parallel threads. Do not work if mpg_options['parallel'] = False. + 'init_option': 'EAGER_WITHOUT_SHUFFLED_COPIES' + } +# Parameters for MedianGraphEstimator (Boria's method). +mge_options = {'init_type': 'MEDOID', # how to initial median (compute set-median). "MEDOID" is to use the graph with smallest SOD. + 'random_inits': 10, # number of random initialization when 'init_type' = 'RANDOM'. + 'time_limit': 600, # maximum time limit to compute the generalized median. If set to 0 then no limit. + 'verbose': 2, # whether to print out results. + 'refine': False # whether to refine the final SODs or not. + } +print('done.') + +"""**3. Run median preimage generator.**""" + +from gklearn.preimage import MedianPreimageGenerator + +# Create median preimage generator instance. +mpg = MedianPreimageGenerator() +# Add dataset. +mpg.dataset = dataset +# Set parameters. +mpg.set_options(**mpg_options.copy()) +mpg.kernel_options = kernel_options.copy() +mpg.ged_options = ged_options.copy() +mpg.mge_options = mge_options.copy() +# Run. +mpg.run() + +"""**4. Get results.**""" + +# Get results. +import pprint +pp = pprint.PrettyPrinter(indent=4) # pretty print +results = mpg.get_results() +pp.pprint(results) + +# Draw generated graphs. +def draw_graph(graph): + import matplotlib.pyplot as plt + import networkx as nx + plt.figure() + pos = nx.spring_layout(graph) + nx.draw(graph, pos, node_size=500, labels=nx.get_node_attributes(graph, 'atom_symbol'), font_color='w', width=3, with_labels=True) + plt.show() + plt.clf() + plt.close() + +draw_graph(mpg.set_median) +draw_graph(mpg.gen_median) \ No newline at end of file diff --git a/lang/fr/gklearn/examples/preimage/median_preimege_generator_cml.py b/lang/fr/gklearn/examples/preimage/median_preimege_generator_cml.py new file mode 100644 index 0000000000..314be9787b --- /dev/null +++ b/lang/fr/gklearn/examples/preimage/median_preimege_generator_cml.py @@ -0,0 +1,113 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Tue Jun 16 15:41:26 2020 + +@author: ljia + +**This script demonstrates how to generate a graph preimage using Boria's method with cost matrices learning.** +""" + +"""**1. Get dataset.**""" + +from gklearn.utils import Dataset, split_dataset_by_target + +# Predefined dataset name, use dataset "MAO". +ds_name = 'MAO' +# The node/edge labels that will not be used in the computation. +irrelevant_labels = {'node_attrs': ['x', 'y', 'z'], 'edge_labels': ['bond_stereo']} + +# Initialize a Dataset. +dataset_all = Dataset() +# Load predefined dataset "MAO". +dataset_all.load_predefined_dataset(ds_name) +# Remove irrelevant labels. +dataset_all.remove_labels(**irrelevant_labels) +# Split the whole dataset according to the classification targets. +datasets = split_dataset_by_target(dataset_all) +# Get the first class of graphs, whose median preimage will be computed. +dataset = datasets[0] +len(dataset.graphs) + +"""**2. Set parameters.**""" + +import multiprocessing + +# Parameters for MedianPreimageGenerator (our method). +mpg_options = {'init_method': 'random', # how to initialize node label cost vector. "random" means to initialize randomly. + 'init_ecc': [4, 4, 2, 1, 1, 1], # initial edit costs. + 'ds_name': ds_name, # name of the dataset. + 'parallel': True, # @todo: whether the parallel scheme is to be used. + 'time_limit_in_sec': 0, # maximum time limit to compute the preimage. If set to 0 then no limit. + 'max_itrs': 3, # maximum iteration limit to optimize edit costs. If set to 0 then no limit. + 'max_itrs_without_update': 3, # If the times that edit costs is not update is more than this number, then the optimization stops. + 'epsilon_residual': 0.01, # In optimization, the residual is only considered changed if the change is bigger than this number. + 'epsilon_ec': 0.1, # In optimization, the edit costs are only considered changed if the changes are bigger than this number. + 'verbose': 2 # whether to print out results. + } +# Parameters for graph kernel computation. +kernel_options = {'name': 'PathUpToH', # use path kernel up to length h. + 'depth': 9, + 'k_func': 'MinMax', + 'compute_method': 'trie', + 'parallel': 'imap_unordered', # or None + 'n_jobs': multiprocessing.cpu_count(), + 'normalize': True, # whether to use normalized Gram matrix to optimize edit costs. + 'verbose': 2 # whether to print out results. + } +# Parameters for GED computation. +ged_options = {'method': 'BIPARTITE', # use Bipartite huristic. + 'initialization_method': 'RANDOM', # or 'NODE', etc. + 'initial_solutions': 10, # when bigger than 1, then the method is considered mIPFP. + 'edit_cost': 'CONSTANT', # @todo: not needed. use CONSTANT cost. + 'attr_distance': 'euclidean', # @todo: not needed. the distance between non-symbolic node/edge labels is computed by euclidean distance. + 'ratio_runs_from_initial_solutions': 1, + 'threads': multiprocessing.cpu_count(), # parallel threads. Do not work if mpg_options['parallel'] = False. + 'init_option': 'LAZY_WITHOUT_SHUFFLED_COPIES' # 'EAGER_WITHOUT_SHUFFLED_COPIES' + } +# Parameters for MedianGraphEstimator (Boria's method). +mge_options = {'init_type': 'MEDOID', # how to initial median (compute set-median). "MEDOID" is to use the graph with smallest SOD. + 'random_inits': 10, # number of random initialization when 'init_type' = 'RANDOM'. + 'time_limit': 600, # maximum time limit to compute the generalized median. If set to 0 then no limit. + 'verbose': 2, # whether to print out results. + 'refine': False # whether to refine the final SODs or not. + } +print('done.') + +"""**3. Run median preimage generator.**""" + +from gklearn.preimage import MedianPreimageGeneratorCML + +# Create median preimage generator instance. +mpg = MedianPreimageGeneratorCML() +# Add dataset. +mpg.dataset = dataset +# Set parameters. +mpg.set_options(**mpg_options.copy()) +mpg.kernel_options = kernel_options.copy() +mpg.ged_options = ged_options.copy() +mpg.mge_options = mge_options.copy() +# Run. +mpg.run() + +"""**4. Get results.**""" + +# Get results. +import pprint +pp = pprint.PrettyPrinter(indent=4) # pretty print +results = mpg.get_results() +pp.pprint(results) + +# Draw generated graphs. +def draw_graph(graph): + import matplotlib.pyplot as plt + import networkx as nx + plt.figure() + pos = nx.spring_layout(graph) + nx.draw(graph, pos, node_size=500, labels=nx.get_node_attributes(graph, 'atom_symbol'), font_color='w', width=3, with_labels=True) + plt.show() + plt.clf() + plt.close() + +draw_graph(mpg.set_median) +draw_graph(mpg.gen_median) \ No newline at end of file diff --git a/lang/fr/gklearn/examples/preimage/median_preimege_generator_py.py b/lang/fr/gklearn/examples/preimage/median_preimege_generator_py.py new file mode 100644 index 0000000000..5b8152eb80 --- /dev/null +++ b/lang/fr/gklearn/examples/preimage/median_preimege_generator_py.py @@ -0,0 +1,114 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Tue Jun 16 15:41:26 2020 + +@author: ljia + +**This script demonstrates how to generate a graph preimage using Boria's method with cost matrices learning.** +""" + +"""**1. Get dataset.**""" + +from gklearn.utils import Dataset, split_dataset_by_target + +# Predefined dataset name, use dataset "MAO". +ds_name = 'MAO' +# The node/edge labels that will not be used in the computation. +irrelevant_labels = {'node_attrs': ['x', 'y', 'z'], 'edge_labels': ['bond_stereo']} + +# Initialize a Dataset. +dataset_all = Dataset() +# Load predefined dataset "MAO". +dataset_all.load_predefined_dataset(ds_name) +# Remove irrelevant labels. +dataset_all.remove_labels(**irrelevant_labels) +# Split the whole dataset according to the classification targets. +datasets = split_dataset_by_target(dataset_all) +# Get the first class of graphs, whose median preimage will be computed. +dataset = datasets[0] +# dataset.cut_graphs(range(0, 10)) +len(dataset.graphs) + +"""**2. Set parameters.**""" + +import multiprocessing + +# Parameters for MedianPreimageGenerator (our method). +mpg_options = {'fit_method': 'k-graphs', # how to fit edit costs. "k-graphs" means use all graphs in median set when fitting. + 'init_ecc': [4, 4, 2, 1, 1, 1], # initial edit costs. + 'ds_name': ds_name, # name of the dataset. + 'parallel': True, # @todo: whether the parallel scheme is to be used. + 'time_limit_in_sec': 0, # maximum time limit to compute the preimage. If set to 0 then no limit. + 'max_itrs': 100, # maximum iteration limit to optimize edit costs. If set to 0 then no limit. + 'max_itrs_without_update': 3, # If the times that edit costs is not update is more than this number, then the optimization stops. + 'epsilon_residual': 0.01, # In optimization, the residual is only considered changed if the change is bigger than this number. + 'epsilon_ec': 0.1, # In optimization, the edit costs are only considered changed if the changes are bigger than this number. + 'verbose': 2 # whether to print out results. + } +# Parameters for graph kernel computation. +kernel_options = {'name': 'PathUpToH', # use path kernel up to length h. + 'depth': 9, + 'k_func': 'MinMax', + 'compute_method': 'trie', + 'parallel': 'imap_unordered', # or None + 'n_jobs': multiprocessing.cpu_count(), + 'normalize': True, # whether to use normalized Gram matrix to optimize edit costs. + 'verbose': 2 # whether to print out results. + } +# Parameters for GED computation. +ged_options = {'method': 'BIPARTITE', # use Bipartite huristic. + 'initialization_method': 'RANDOM', # or 'NODE', etc. + 'initial_solutions': 10, # when bigger than 1, then the method is considered mIPFP. + 'edit_cost': 'CONSTANT', # use CONSTANT cost. + 'attr_distance': 'euclidean', # the distance between non-symbolic node/edge labels is computed by euclidean distance. + 'ratio_runs_from_initial_solutions': 1, + 'threads': multiprocessing.cpu_count(), # parallel threads. Do not work if mpg_options['parallel'] = False. + 'init_option': 'LAZY_WITHOUT_SHUFFLED_COPIES' # 'EAGER_WITHOUT_SHUFFLED_COPIES' + } +# Parameters for MedianGraphEstimator (Boria's method). +mge_options = {'init_type': 'MEDOID', # how to initial median (compute set-median). "MEDOID" is to use the graph with smallest SOD. + 'random_inits': 10, # number of random initialization when 'init_type' = 'RANDOM'. + 'time_limit': 600, # maximum time limit to compute the generalized median. If set to 0 then no limit. + 'verbose': 2, # whether to print out results. + 'refine': False # whether to refine the final SODs or not. + } +print('done.') + +"""**3. Run median preimage generator.**""" + +from gklearn.preimage import MedianPreimageGeneratorPy + +# Create median preimage generator instance. +mpg = MedianPreimageGeneratorPy() +# Add dataset. +mpg.dataset = dataset +# Set parameters. +mpg.set_options(**mpg_options.copy()) +mpg.kernel_options = kernel_options.copy() +mpg.ged_options = ged_options.copy() +mpg.mge_options = mge_options.copy() +# Run. +mpg.run() + +"""**4. Get results.**""" + +# Get results. +import pprint +pp = pprint.PrettyPrinter(indent=4) # pretty print +results = mpg.get_results() +pp.pprint(results) + +# Draw generated graphs. +def draw_graph(graph): + import matplotlib.pyplot as plt + import networkx as nx + plt.figure() + pos = nx.spring_layout(graph) + nx.draw(graph, pos, node_size=500, labels=nx.get_node_attributes(graph, 'atom_symbol'), font_color='w', width=3, with_labels=True) + plt.show() + plt.clf() + plt.close() + +draw_graph(mpg.set_median) +draw_graph(mpg.gen_median) \ No newline at end of file diff --git a/lang/fr/gklearn/experiments/ged/check_results_of_ged_env.py b/lang/fr/gklearn/experiments/ged/check_results_of_ged_env.py new file mode 100644 index 0000000000..7c81c5d4af --- /dev/null +++ b/lang/fr/gklearn/experiments/ged/check_results_of_ged_env.py @@ -0,0 +1,126 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Thu Jun 25 11:31:46 2020 + +@author: ljia +""" + +def xp_check_results_of_GEDEnv(): + """Compare results of GEDEnv to GEDLIB. + """ + """**1. Get dataset.**""" + + from gklearn.utils import Dataset + + # Predefined dataset name, use dataset "MUTAG". + ds_name = 'MUTAG' + + # Initialize a Dataset. + dataset = Dataset() + # Load predefined dataset "MUTAG". + dataset.load_predefined_dataset(ds_name) + + results1 = compute_geds_by_GEDEnv(dataset) + results2 = compute_geds_by_GEDLIB(dataset) + + # Show results. + import pprint + pp = pprint.PrettyPrinter(indent=4) # pretty print + print('Restuls using GEDEnv:') + pp.pprint(results1) + print() + print('Restuls using GEDLIB:') + pp.pprint(results2) + + return results1, results2 + + +def compute_geds_by_GEDEnv(dataset): + from gklearn.ged.env import GEDEnv + import numpy as np + + graph1 = dataset.graphs[0] + graph2 = dataset.graphs[1] + + ged_env = GEDEnv() # initailize GED environment. + ged_env.set_edit_cost('CONSTANT', # GED cost type. + edit_cost_constants=[3, 3, 1, 3, 3, 1] # edit costs. + ) + for g in dataset.graphs[0:10]: + ged_env.add_nx_graph(g, '') +# ged_env.add_nx_graph(graph1, '') # add graph1 +# ged_env.add_nx_graph(graph2, '') # add graph2 + listID = ged_env.get_all_graph_ids() # get list IDs of graphs + ged_env.init(init_type='LAZY_WITHOUT_SHUFFLED_COPIES') # initialize GED environment. + options = {'threads': 1 # parallel threads. + } + ged_env.set_method('BIPARTITE', # GED method. + options # options for GED method. + ) + ged_env.init_method() # initialize GED method. + + ged_mat = np.empty((10, 10)) + for i in range(0, 10): + for j in range(i, 10): + ged_env.run_method(i, j) # run. + ged_mat[i, j] = ged_env.get_upper_bound(i, j) + ged_mat[j, i] = ged_mat[i, j] + + results = {} + results['pi_forward'] = ged_env.get_forward_map(listID[0], listID[1]) # forward map. + results['pi_backward'] = ged_env.get_backward_map(listID[0], listID[1]) # backward map. + results['upper_bound'] = ged_env.get_upper_bound(listID[0], listID[1]) # GED bewteen two graphs. + results['runtime'] = ged_env.get_runtime(listID[0], listID[1]) + results['init_time'] = ged_env.get_init_time() + results['ged_mat'] = ged_mat + + return results + + +def compute_geds_by_GEDLIB(dataset): + from gklearn.gedlib import librariesImport, gedlibpy + from gklearn.ged.util import ged_options_to_string + import numpy as np + + graph1 = dataset.graphs[5] + graph2 = dataset.graphs[6] + + ged_env = gedlibpy.GEDEnv() # initailize GED environment. + ged_env.set_edit_cost('CONSTANT', # GED cost type. + edit_cost_constant=[3, 3, 1, 3, 3, 1] # edit costs. + ) +# ged_env.add_nx_graph(graph1, '') # add graph1 +# ged_env.add_nx_graph(graph2, '') # add graph2 + for g in dataset.graphs[0:10]: + ged_env.add_nx_graph(g, '') + listID = ged_env.get_all_graph_ids() # get list IDs of graphs + ged_env.init(init_option='LAZY_WITHOUT_SHUFFLED_COPIES') # initialize GED environment. + options = {'initialization-method': 'RANDOM', # or 'NODE', etc. + 'threads': 1 # parallel threads. + } + ged_env.set_method('BIPARTITE', # GED method. + ged_options_to_string(options) # options for GED method. + ) + ged_env.init_method() # initialize GED method. + + ged_mat = np.empty((10, 10)) + for i in range(0, 10): + for j in range(i, 10): + ged_env.run_method(i, j) # run. + ged_mat[i, j] = ged_env.get_upper_bound(i, j) + ged_mat[j, i] = ged_mat[i, j] + + results = {} + results['pi_forward'] = ged_env.get_forward_map(listID[0], listID[1]) # forward map. + results['pi_backward'] = ged_env.get_backward_map(listID[0], listID[1]) # backward map. + results['upper_bound'] = ged_env.get_upper_bound(listID[0], listID[1]) # GED bewteen two graphs. + results['runtime'] = ged_env.get_runtime(listID[0], listID[1]) + results['init_time'] = ged_env.get_init_time() + results['ged_mat'] = ged_mat + + return results + + +if __name__ == '__main__': + results1, results2 = xp_check_results_of_GEDEnv() \ No newline at end of file diff --git a/lang/fr/gklearn/experiments/papers/PRL_2020/accuracy_diff_entropy.py b/lang/fr/gklearn/experiments/papers/PRL_2020/accuracy_diff_entropy.py new file mode 100644 index 0000000000..0ababc3fcf --- /dev/null +++ b/lang/fr/gklearn/experiments/papers/PRL_2020/accuracy_diff_entropy.py @@ -0,0 +1,196 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Mon Oct 5 16:08:33 2020 + +@author: ljia + +This script compute classification accuracy of each geaph kernel on datasets +with different entropy of degree distribution. +""" +from utils import Graph_Kernel_List, cross_validate +import numpy as np +import logging + +num_nodes = 40 +half_num_graphs = 100 + + +def generate_graphs(): +# from gklearn.utils.graph_synthesizer import GraphSynthesizer +# gsyzer = GraphSynthesizer() +# graphs = gsyzer.unified_graphs(num_graphs=1000, num_nodes=20, num_edges=40, num_node_labels=0, num_edge_labels=0, seed=None, directed=False) +# return graphs + import networkx as nx + + degrees11 = [5] * num_nodes +# degrees12 = [2] * num_nodes + degrees12 = [5] * num_nodes + degrees21 = list(range(1, 11)) * 6 +# degrees22 = [5 * i for i in list(range(1, 11)) * 6] + degrees22 = list(range(1, 11)) * 6 + + # method 1 + graphs11 = [nx.configuration_model(degrees11, create_using=nx.Graph) for i in range(half_num_graphs)] + graphs12 = [nx.configuration_model(degrees12, create_using=nx.Graph) for i in range(half_num_graphs)] + + for g in graphs11: + g.remove_edges_from(nx.selfloop_edges(g)) + for g in graphs12: + g.remove_edges_from(nx.selfloop_edges(g)) + + # method 2: can easily generate isomorphic graphs. +# graphs11 = [nx.random_regular_graph(2, num_nodes, seed=None) for i in range(half_num_graphs)] +# graphs12 = [nx.random_regular_graph(10, num_nodes, seed=None) for i in range(half_num_graphs)] + + # Add node labels. + for g in graphs11: + for n in g.nodes(): + g.nodes[n]['atom'] = 0 + for g in graphs12: + for n in g.nodes(): + g.nodes[n]['atom'] = 1 + + graphs1 = graphs11 + graphs12 + + # method 1: the entorpy of the two classes is not the same. + graphs21 = [nx.configuration_model(degrees21, create_using=nx.Graph) for i in range(half_num_graphs)] + graphs22 = [nx.configuration_model(degrees22, create_using=nx.Graph) for i in range(half_num_graphs)] + + for g in graphs21: + g.remove_edges_from(nx.selfloop_edges(g)) + for g in graphs22: + g.remove_edges_from(nx.selfloop_edges(g)) + +# # method 2: tooo slow, and may fail. +# graphs21 = [nx.random_degree_sequence_graph(degrees21, seed=None, tries=100) for i in range(half_num_graphs)] +# graphs22 = [nx.random_degree_sequence_graph(degrees22, seed=None, tries=100) for i in range(half_num_graphs)] + +# # method 3: no randomness. +# graphs21 = [nx.havel_hakimi_graph(degrees21, create_using=None) for i in range(half_num_graphs)] +# graphs22 = [nx.havel_hakimi_graph(degrees22, create_using=None) for i in range(half_num_graphs)] + +# # method 4: +# graphs21 = [nx.configuration_model(degrees21, create_using=nx.Graph) for i in range(half_num_graphs)] +# graphs22 = [nx.degree_sequence_tree(degrees21, create_using=nx.Graph) for i in range(half_num_graphs)] + +# # method 5: the entorpy of the two classes is not the same. +# graphs21 = [nx.expected_degree_graph(degrees21, seed=None, selfloops=False) for i in range(half_num_graphs)] +# graphs22 = [nx.expected_degree_graph(degrees22, seed=None, selfloops=False) for i in range(half_num_graphs)] + +# # method 6: seems there is no randomness0 +# graphs21 = [nx.random_powerlaw_tree(num_nodes, gamma=3, seed=None, tries=10000) for i in range(half_num_graphs)] +# graphs22 = [nx.random_powerlaw_tree(num_nodes, gamma=3, seed=None, tries=10000) for i in range(half_num_graphs)] + + # Add node labels. + for g in graphs21: + for n in g.nodes(): + g.nodes[n]['atom'] = 0 + for g in graphs22: + for n in g.nodes(): + g.nodes[n]['atom'] = 1 + + graphs2 = graphs21 + graphs22 + +# # check for isomorphism. +# iso_mat1 = np.zeros((len(graphs1), len(graphs1))) +# num1 = 0 +# num2 = 0 +# for i in range(len(graphs1)): +# for j in range(i + 1, len(graphs1)): +# if nx.is_isomorphic(graphs1[i], graphs1[j]): +# iso_mat1[i, j] = 1 +# iso_mat1[j, i] = 1 +# num1 += 1 +# print('iso:', num1, ':', i, ',', j) +# else: +# num2 += 1 +# print('not iso:', num2, ':', i, ',', j) +# +# iso_mat2 = np.zeros((len(graphs2), len(graphs2))) +# num1 = 0 +# num2 = 0 +# for i in range(len(graphs2)): +# for j in range(i + 1, len(graphs2)): +# if nx.is_isomorphic(graphs2[i], graphs2[j]): +# iso_mat2[i, j] = 1 +# iso_mat2[j, i] = 1 +# num1 += 1 +# print('iso:', num1, ':', i, ',', j) +# else: +# num2 += 1 +# print('not iso:', num2, ':', i, ',', j) + + return graphs1, graphs2 + + +def get_infos(graph): + from gklearn.utils import Dataset + ds = Dataset() + ds.load_graphs(graph) + infos = ds.get_dataset_infos(keys=['all_degree_entropy', 'ave_node_degree']) + infos['ave_degree_entropy'] = np.mean(infos['all_degree_entropy']) + print(infos['ave_degree_entropy'], ',', infos['ave_node_degree']) + return infos + + +def xp_accuracy_diff_entropy(): + + # Generate graphs. + graphs1, graphs2 = generate_graphs() + + + # Compute entropy of degree distribution of the generated graphs. + info11 = get_infos(graphs1[0:half_num_graphs]) + info12 = get_infos(graphs1[half_num_graphs:]) + info21 = get_infos(graphs2[0:half_num_graphs]) + info22 = get_infos(graphs2[half_num_graphs:]) + + # Run and save. + import pickle + import os + save_dir = 'outputs/accuracy_diff_entropy/' + if not os.path.exists(save_dir): + os.makedirs(save_dir) + + accuracies = {} + confidences = {} + + for kernel_name in Graph_Kernel_List: + print() + print('Kernel:', kernel_name) + + accuracies[kernel_name] = [] + confidences[kernel_name] = [] + for set_i, graphs in enumerate([graphs1, graphs2]): + print() + print('Graph set', set_i) + + tmp_graphs = [g.copy() for g in graphs] + targets = [0] * half_num_graphs + [1] * half_num_graphs + + accuracy = 'error' + confidence = 'error' + try: + accuracy, confidence = cross_validate(tmp_graphs, targets, kernel_name, ds_name=str(set_i), output_dir=save_dir) #, n_jobs=1) + except Exception as exp: + print('An exception occured when running this experiment:') + LOG_FILENAME = save_dir + 'error.txt' + logging.basicConfig(filename=LOG_FILENAME, level=logging.DEBUG) + logging.exception('\n' + kernel_name + ', ' + str(set_i) + ':') + print(repr(exp)) + accuracies[kernel_name].append(accuracy) + confidences[kernel_name].append(confidence) + + pickle.dump(accuracy, open(save_dir + 'accuracy.' + kernel_name + '.' + str(set_i) + '.pkl', 'wb')) + pickle.dump(confidence, open(save_dir + 'confidence.' + kernel_name + '.' + str(set_i) + '.pkl', 'wb')) + + # Save all. + pickle.dump(accuracies, open(save_dir + 'accuracies.pkl', 'wb')) + pickle.dump(confidences, open(save_dir + 'confidences.pkl', 'wb')) + + return + + +if __name__ == '__main__': + xp_accuracy_diff_entropy() \ No newline at end of file diff --git a/lang/fr/gklearn/experiments/papers/PRL_2020/runtimes_28cores.py b/lang/fr/gklearn/experiments/papers/PRL_2020/runtimes_28cores.py new file mode 100644 index 0000000000..0e25f4656e --- /dev/null +++ b/lang/fr/gklearn/experiments/papers/PRL_2020/runtimes_28cores.py @@ -0,0 +1,57 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Mon Sep 21 10:34:26 2020 + +@author: ljia +""" +from utils import Graph_Kernel_List, Dataset_List, compute_graph_kernel +from gklearn.utils.graphdataset import load_predefined_dataset +import logging + + +def xp_runtimes_of_all_28cores(): + + # Run and save. + import pickle + import os + save_dir = 'outputs/runtimes_of_all_28cores/' + if not os.path.exists(save_dir): + os.makedirs(save_dir) + + run_times = {} + + for ds_name in Dataset_List: + print() + print('Dataset:', ds_name) + + run_times[ds_name] = [] + for kernel_name in Graph_Kernel_List: + print() + print('Kernel:', kernel_name) + + # get graphs. + graphs, _ = load_predefined_dataset(ds_name) + + # Compute Gram matrix. + run_time = 'error' + try: + gram_matrix, run_time = compute_graph_kernel(graphs, kernel_name, n_jobs=28) + except Exception as exp: + print('An exception occured when running this experiment:') + LOG_FILENAME = save_dir + 'error.txt' + logging.basicConfig(filename=LOG_FILENAME, level=logging.DEBUG) + logging.exception('') + print(repr(exp)) + run_times[ds_name].append(run_time) + + pickle.dump(run_time, open(save_dir + 'run_time.' + kernel_name + '.' + ds_name + '.pkl', 'wb')) + + # Save all. + pickle.dump(run_times, open(save_dir + 'run_times.pkl', 'wb')) + + return + + +if __name__ == '__main__': + xp_runtimes_of_all_28cores() diff --git a/lang/fr/gklearn/experiments/papers/PRL_2020/runtimes_diff_chunksizes.py b/lang/fr/gklearn/experiments/papers/PRL_2020/runtimes_diff_chunksizes.py new file mode 100644 index 0000000000..6d118d8b74 --- /dev/null +++ b/lang/fr/gklearn/experiments/papers/PRL_2020/runtimes_diff_chunksizes.py @@ -0,0 +1,62 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Mon Sep 21 10:34:26 2020 + +@author: ljia +""" +from utils import Graph_Kernel_List, Dataset_List, compute_graph_kernel +from gklearn.utils.graphdataset import load_predefined_dataset +import logging + + +def xp_runtimes_diff_chunksizes(): + + # Run and save. + import pickle + import os + save_dir = 'outputs/runtimes_diff_chunksizes/' + if not os.path.exists(save_dir): + os.makedirs(save_dir) + + run_times = {} + + for ds_name in Dataset_List: + print() + print('Dataset:', ds_name) + + run_times[ds_name] = [] + for kernel_name in Graph_Kernel_List: + print() + print('Kernel:', kernel_name) + + run_times[ds_name].append([]) + for chunksize in [1, 5, 10, 50, 100, 500, 1000, 5000, 10000, 50000, 100000]: + print() + print('Chunksize:', chunksize) + + # get graphs. + graphs, _ = load_predefined_dataset(ds_name) + + # Compute Gram matrix. + run_time = 'error' + try: + gram_matrix, run_time = compute_graph_kernel(graphs, kernel_name, chunksize=chunksize) + except Exception as exp: + print('An exception occured when running this experiment:') + LOG_FILENAME = save_dir + 'error.txt' + logging.basicConfig(filename=LOG_FILENAME, level=logging.DEBUG) + logging.exception('') + print(repr(exp)) + run_times[ds_name][-1].append(run_time) + + pickle.dump(run_time, open(save_dir + 'run_time.' + kernel_name + '.' + ds_name + '.' + str(chunksize) + '.pkl', 'wb')) + + # Save all. + pickle.dump(run_times, open(save_dir + 'run_times.pkl', 'wb')) + + return + + +if __name__ == '__main__': + xp_runtimes_diff_chunksizes() diff --git a/lang/fr/gklearn/experiments/papers/PRL_2020/synthesized_graphs_N.py b/lang/fr/gklearn/experiments/papers/PRL_2020/synthesized_graphs_N.py new file mode 100644 index 0000000000..891ae4c919 --- /dev/null +++ b/lang/fr/gklearn/experiments/papers/PRL_2020/synthesized_graphs_N.py @@ -0,0 +1,64 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Mon Sep 21 10:34:26 2020 + +@author: ljia +""" +from utils import Graph_Kernel_List, compute_graph_kernel +import logging + + +def generate_graphs(): + from gklearn.utils.graph_synthesizer import GraphSynthesizer + gsyzer = GraphSynthesizer() + graphs = gsyzer.unified_graphs(num_graphs=1000, num_nodes=20, num_edges=40, num_node_labels=0, num_edge_labels=0, seed=None, directed=False) + return graphs + + +def xp_synthesized_graphs_dataset_size(): + + # Generate graphs. + graphs = generate_graphs() + + # Run and save. + import pickle + import os + save_dir = 'outputs/synthesized_graphs_N/' + if not os.path.exists(save_dir): + os.makedirs(save_dir) + + run_times = {} + + for kernel_name in Graph_Kernel_List: + print() + print('Kernel:', kernel_name) + + run_times[kernel_name] = [] + for num_graphs in [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000]: + print() + print('Number of graphs:', num_graphs) + + sub_graphs = [g.copy() for g in graphs[0:num_graphs]] + + run_time = 'error' + try: + gram_matrix, run_time = compute_graph_kernel(sub_graphs, kernel_name) + except Exception as exp: + print('An exception occured when running this experiment:') + LOG_FILENAME = save_dir + 'error.txt' + logging.basicConfig(filename=LOG_FILENAME, level=logging.DEBUG) + logging.exception('') + print(repr(exp)) + run_times[kernel_name].append(run_time) + + pickle.dump(run_time, open(save_dir + 'run_time.' + kernel_name + '.' + str(num_graphs) + '.pkl', 'wb')) + + # Save all. + pickle.dump(run_times, open(save_dir + 'run_times.pkl', 'wb')) + + return + + +if __name__ == '__main__': + xp_synthesized_graphs_dataset_size() diff --git a/lang/fr/gklearn/experiments/papers/PRL_2020/synthesized_graphs_degrees.py b/lang/fr/gklearn/experiments/papers/PRL_2020/synthesized_graphs_degrees.py new file mode 100644 index 0000000000..f005172b8f --- /dev/null +++ b/lang/fr/gklearn/experiments/papers/PRL_2020/synthesized_graphs_degrees.py @@ -0,0 +1,63 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Mon Sep 21 10:34:26 2020 + +@author: ljia +""" +from utils import Graph_Kernel_List, compute_graph_kernel +import logging + + +def generate_graphs(degree): + from gklearn.utils.graph_synthesizer import GraphSynthesizer + gsyzer = GraphSynthesizer() + graphs = gsyzer.unified_graphs(num_graphs=100, num_nodes=20, num_edges=int(10*degree), num_node_labels=0, num_edge_labels=0, seed=None, directed=False) + return graphs + + +def xp_synthesized_graphs_degrees(): + + # Run and save. + import pickle + import os + save_dir = 'outputs/synthesized_graphs_degrees/' + if not os.path.exists(save_dir): + os.makedirs(save_dir) + + run_times = {} + + for kernel_name in Graph_Kernel_List: + print() + print('Kernel:', kernel_name) + + run_times[kernel_name] = [] + for degree in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]: + print() + print('Degree:', degree) + + # Generate graphs. + graphs = generate_graphs(degree) + + # Compute Gram matrix. + run_time = 'error' + try: + gram_matrix, run_time = compute_graph_kernel(graphs, kernel_name) + except Exception as exp: + print('An exception occured when running this experiment:') + LOG_FILENAME = save_dir + 'error.txt' + logging.basicConfig(filename=LOG_FILENAME, level=logging.DEBUG) + logging.exception('') + print(repr(exp)) + run_times[kernel_name].append(run_time) + + pickle.dump(run_time, open(save_dir + 'run_time.' + kernel_name + '.' + str(degree) + '.pkl', 'wb')) + + # Save all. + pickle.dump(run_times, open(save_dir + 'run_times.pkl', 'wb')) + + return + + +if __name__ == '__main__': + xp_synthesized_graphs_degrees() diff --git a/lang/fr/gklearn/experiments/papers/PRL_2020/synthesized_graphs_num_el.py b/lang/fr/gklearn/experiments/papers/PRL_2020/synthesized_graphs_num_el.py new file mode 100644 index 0000000000..8e35c74fbf --- /dev/null +++ b/lang/fr/gklearn/experiments/papers/PRL_2020/synthesized_graphs_num_el.py @@ -0,0 +1,63 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Mon Sep 21 10:34:26 2020 + +@author: ljia +""" +from utils import Graph_Kernel_List_ESym, compute_graph_kernel +import logging + + +def generate_graphs(num_el_alp): + from gklearn.utils.graph_synthesizer import GraphSynthesizer + gsyzer = GraphSynthesizer() + graphs = gsyzer.unified_graphs(num_graphs=100, num_nodes=20, num_edges=40, num_node_labels=0, num_edge_labels=num_el_alp, seed=None, directed=False) + return graphs + + +def xp_synthesized_graphs_num_edge_label_alphabet(): + + # Run and save. + import pickle + import os + save_dir = 'outputs/synthesized_graphs_num_edge_label_alphabet/' + if not os.path.exists(save_dir): + os.makedirs(save_dir) + + run_times = {} + + for kernel_name in Graph_Kernel_List_ESym: + print() + print('Kernel:', kernel_name) + + run_times[kernel_name] = [] + for num_el_alp in [0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40]: + print() + print('Number of edge label alphabet:', num_el_alp) + + # Generate graphs. + graphs = generate_graphs(num_el_alp) + + # Compute Gram matrix. + run_time = 'error' + try: + gram_matrix, run_time = compute_graph_kernel(graphs, kernel_name) + except Exception as exp: + print('An exception occured when running this experiment:') + LOG_FILENAME = save_dir + 'error.txt' + logging.basicConfig(filename=LOG_FILENAME, level=logging.DEBUG) + logging.exception('') + print(repr(exp)) + run_times[kernel_name].append(run_time) + + pickle.dump(run_time, open(save_dir + 'run_time.' + kernel_name + '.' + str(num_el_alp) + '.pkl', 'wb')) + + # Save all. + pickle.dump(run_times, open(save_dir + 'run_times.pkl', 'wb')) + + return + + +if __name__ == '__main__': + xp_synthesized_graphs_num_edge_label_alphabet() diff --git a/lang/fr/gklearn/experiments/papers/PRL_2020/synthesized_graphs_num_nl.py b/lang/fr/gklearn/experiments/papers/PRL_2020/synthesized_graphs_num_nl.py new file mode 100644 index 0000000000..51e1382ff5 --- /dev/null +++ b/lang/fr/gklearn/experiments/papers/PRL_2020/synthesized_graphs_num_nl.py @@ -0,0 +1,64 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Mon Sep 21 10:34:26 2020 + +@author: ljia +""" +from utils import Graph_Kernel_List_VSym, compute_graph_kernel +import logging + + +def generate_graphs(num_nl_alp): + from gklearn.utils.graph_synthesizer import GraphSynthesizer + gsyzer = GraphSynthesizer() + graphs = gsyzer.unified_graphs(num_graphs=100, num_nodes=20, num_edges=40, num_node_labels=num_nl_alp, num_edge_labels=0, seed=None, directed=False) + return graphs + + +def xp_synthesized_graphs_num_node_label_alphabet(): + + # Run and save. + import pickle + import os + save_dir = 'outputs/synthesized_graphs_num_node_label_alphabet/' + if not os.path.exists(save_dir): + os.makedirs(save_dir) + + run_times = {} + + for kernel_name in Graph_Kernel_List_VSym: + print() + print('Kernel:', kernel_name) + + run_times[kernel_name] = [] + for num_nl_alp in [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20]: + print() + print('Number of node label alphabet:', num_nl_alp) + + # Generate graphs. + graphs = generate_graphs(num_nl_alp) + + # Compute Gram matrix. + run_time = 'error' + try: + gram_matrix, run_time = compute_graph_kernel(graphs, kernel_name) + except Exception as exp: + run_times[kernel_name].append('error') + print('An exception occured when running this experiment:') + LOG_FILENAME = save_dir + 'error.txt' + logging.basicConfig(filename=LOG_FILENAME, level=logging.DEBUG) + logging.exception('') + print(repr(exp)) + run_times[kernel_name].append(run_time) + + pickle.dump(run_time, open(save_dir + 'run_time.' + kernel_name + '.' + str(num_nl_alp) + '.pkl', 'wb')) + + # Save all. + pickle.dump(run_times, open(save_dir + 'run_times.pkl', 'wb')) + + return + + +if __name__ == '__main__': + xp_synthesized_graphs_num_node_label_alphabet() diff --git a/lang/fr/gklearn/experiments/papers/PRL_2020/synthesized_graphs_num_nodes.py b/lang/fr/gklearn/experiments/papers/PRL_2020/synthesized_graphs_num_nodes.py new file mode 100644 index 0000000000..f63c404588 --- /dev/null +++ b/lang/fr/gklearn/experiments/papers/PRL_2020/synthesized_graphs_num_nodes.py @@ -0,0 +1,64 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Mon Sep 21 10:34:26 2020 + +@author: ljia +""" +from utils import Graph_Kernel_List, compute_graph_kernel +import logging + + +def generate_graphs(num_nodes): + from gklearn.utils.graph_synthesizer import GraphSynthesizer + gsyzer = GraphSynthesizer() + graphs = gsyzer.unified_graphs(num_graphs=100, num_nodes=num_nodes, num_edges=int(num_nodes*2), num_node_labels=0, num_edge_labels=0, seed=None, directed=False) + return graphs + + +def xp_synthesized_graphs_num_nodes(): + + # Run and save. + import pickle + import os + save_dir = 'outputs/synthesized_graphs_num_nodes/' + if not os.path.exists(save_dir): + os.makedirs(save_dir) + + run_times = {} + + for kernel_name in Graph_Kernel_List: + print() + print('Kernel:', kernel_name) + + run_times[kernel_name] = [] + for num_nodes in [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]: + print() + print('Number of nodes:', num_nodes) + + # Generate graphs. + graphs = generate_graphs(num_nodes) + + # Compute Gram matrix. + run_time = 'error' + try: + gram_matrix, run_time = compute_graph_kernel(graphs, kernel_name) + except Exception as exp: + run_times[kernel_name].append('error') + print('An exception occured when running this experiment:') + LOG_FILENAME = save_dir + 'error.txt' + logging.basicConfig(filename=LOG_FILENAME, level=logging.DEBUG) + logging.exception('') + print(repr(exp)) + run_times[kernel_name].append(run_time) + + pickle.dump(run_time, open(save_dir + 'run_time.' + kernel_name + '.' + str(num_nodes) + '.pkl', 'wb')) + + # Save all. + pickle.dump(run_times, open(save_dir + 'run_times.pkl', 'wb')) + + return + + +if __name__ == '__main__': + xp_synthesized_graphs_num_nodes() diff --git a/lang/fr/gklearn/experiments/papers/PRL_2020/utils.py b/lang/fr/gklearn/experiments/papers/PRL_2020/utils.py new file mode 100644 index 0000000000..b676af0021 --- /dev/null +++ b/lang/fr/gklearn/experiments/papers/PRL_2020/utils.py @@ -0,0 +1,236 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Tue Sep 22 11:33:28 2020 + +@author: ljia +""" +import multiprocessing +import numpy as np +from gklearn.utils import model_selection_for_precomputed_kernel + + +Graph_Kernel_List = ['PathUpToH', 'WLSubtree', 'SylvesterEquation', 'Marginalized', 'ShortestPath', 'Treelet', 'ConjugateGradient', 'FixedPoint', 'SpectralDecomposition', 'StructuralSP', 'CommonWalk'] +# Graph_Kernel_List = ['CommonWalk', 'Marginalized', 'SylvesterEquation', 'ConjugateGradient', 'FixedPoint', 'SpectralDecomposition', 'ShortestPath', 'StructuralSP', 'PathUpToH', 'Treelet', 'WLSubtree'] + + +Graph_Kernel_List_VSym = ['PathUpToH', 'WLSubtree', 'Marginalized', 'ShortestPath', 'Treelet', 'ConjugateGradient', 'FixedPoint', 'StructuralSP', 'CommonWalk'] + + +Graph_Kernel_List_ESym = ['PathUpToH', 'Marginalized', 'Treelet', 'ConjugateGradient', 'FixedPoint', 'StructuralSP', 'CommonWalk'] + + +Graph_Kernel_List_VCon = ['ShortestPath', 'ConjugateGradient', 'FixedPoint', 'StructuralSP'] + + +Graph_Kernel_List_ECon = ['ConjugateGradient', 'FixedPoint', 'StructuralSP'] + + +Dataset_List = ['Alkane', 'Acyclic', 'MAO', 'PAH', 'MUTAG', 'Letter-med', 'ENZYMES', 'AIDS', 'NCI1', 'NCI109', 'DD'] + + +def compute_graph_kernel(graphs, kernel_name, n_jobs=multiprocessing.cpu_count(), chunksize=None): + + if kernel_name == 'CommonWalk': + from gklearn.kernels.commonWalkKernel import commonwalkkernel + estimator = commonwalkkernel + params = {'compute_method': 'geo', 'weight': 0.1} + + elif kernel_name == 'Marginalized': + from gklearn.kernels.marginalizedKernel import marginalizedkernel + estimator = marginalizedkernel + params = {'p_quit': 0.5, 'n_iteration': 5, 'remove_totters': False} + + elif kernel_name == 'SylvesterEquation': + from gklearn.kernels.randomWalkKernel import randomwalkkernel + estimator = randomwalkkernel + params = {'compute_method': 'sylvester', 'weight': 0.1} + + elif kernel_name == 'ConjugateGradient': + from gklearn.kernels.randomWalkKernel import randomwalkkernel + estimator = randomwalkkernel + from gklearn.utils.kernels import deltakernel, gaussiankernel, kernelproduct + import functools + mixkernel = functools.partial(kernelproduct, deltakernel, gaussiankernel) + sub_kernel = {'symb': deltakernel, 'nsymb': gaussiankernel, 'mix': mixkernel} + params = {'compute_method': 'conjugate', 'weight': 0.1, 'node_kernels': sub_kernel, 'edge_kernels': sub_kernel} + + elif kernel_name == 'FixedPoint': + from gklearn.kernels.randomWalkKernel import randomwalkkernel + estimator = randomwalkkernel + from gklearn.utils.kernels import deltakernel, gaussiankernel, kernelproduct + import functools + mixkernel = functools.partial(kernelproduct, deltakernel, gaussiankernel) + sub_kernel = {'symb': deltakernel, 'nsymb': gaussiankernel, 'mix': mixkernel} + params = {'compute_method': 'fp', 'weight': 1e-4, 'node_kernels': sub_kernel, 'edge_kernels': sub_kernel} + + elif kernel_name == 'SpectralDecomposition': + from gklearn.kernels.randomWalkKernel import randomwalkkernel + estimator = randomwalkkernel + params = {'compute_method': 'spectral', 'sub_kernel': 'geo', 'weight': 0.1} + + elif kernel_name == 'ShortestPath': + from gklearn.kernels.spKernel import spkernel + estimator = spkernel + from gklearn.utils.kernels import deltakernel, gaussiankernel, kernelproduct + import functools + mixkernel = functools.partial(kernelproduct, deltakernel, gaussiankernel) + sub_kernel = {'symb': deltakernel, 'nsymb': gaussiankernel, 'mix': mixkernel} + params = {'node_kernels': sub_kernel} + + elif kernel_name == 'StructuralSP': + from gklearn.kernels.structuralspKernel import structuralspkernel + estimator = structuralspkernel + from gklearn.utils.kernels import deltakernel, gaussiankernel, kernelproduct + import functools + mixkernel = functools.partial(kernelproduct, deltakernel, gaussiankernel) + sub_kernel = {'symb': deltakernel, 'nsymb': gaussiankernel, 'mix': mixkernel} + params = {'node_kernels': sub_kernel, 'edge_kernels': sub_kernel} + + elif kernel_name == 'PathUpToH': + from gklearn.kernels.untilHPathKernel import untilhpathkernel + estimator = untilhpathkernel + params = {'depth': 5, 'k_func': 'MinMax', 'compute_method': 'trie'} + + elif kernel_name == 'Treelet': + from gklearn.kernels.treeletKernel import treeletkernel + estimator = treeletkernel + from gklearn.utils.kernels import polynomialkernel + import functools + sub_kernel = functools.partial(polynomialkernel, d=4, c=1e+8) + params = {'sub_kernel': sub_kernel} + + elif kernel_name == 'WLSubtree': + from gklearn.kernels.weisfeilerLehmanKernel import weisfeilerlehmankernel + estimator = weisfeilerlehmankernel + params = {'base_kernel': 'subtree', 'height': 5} + +# params['parallel'] = None + params['n_jobs'] = n_jobs + params['chunksize'] = chunksize + params['verbose'] = True + results = estimator(graphs, **params) + + return results[0], results[1] + + +def cross_validate(graphs, targets, kernel_name, output_dir='outputs/', ds_name='synthesized', n_jobs=multiprocessing.cpu_count()): + + param_grid = None + + if kernel_name == 'CommonWalk': + from gklearn.kernels.commonWalkKernel import commonwalkkernel + estimator = commonwalkkernel + param_grid_precomputed = [{'compute_method': ['geo'], + 'weight': np.linspace(0.01, 0.15, 15)}] + + elif kernel_name == 'Marginalized': + from gklearn.kernels.marginalizedKernel import marginalizedkernel + estimator = marginalizedkernel + param_grid_precomputed = {'p_quit': np.linspace(0.1, 0.9, 9), + 'n_iteration': np.linspace(1, 19, 7), + 'remove_totters': [False]} + + elif kernel_name == 'SylvesterEquation': + from gklearn.kernels.randomWalkKernel import randomwalkkernel + estimator = randomwalkkernel + param_grid_precomputed = {'compute_method': ['sylvester'], +# 'weight': np.linspace(0.01, 0.10, 10)} + 'weight': np.logspace(-1, -10, num=10, base=10)} + + elif kernel_name == 'ConjugateGradient': + from gklearn.kernels.randomWalkKernel import randomwalkkernel + estimator = randomwalkkernel + from gklearn.utils.kernels import deltakernel, gaussiankernel, kernelproduct + import functools + mixkernel = functools.partial(kernelproduct, deltakernel, gaussiankernel) + sub_kernel = {'symb': deltakernel, 'nsymb': gaussiankernel, 'mix': mixkernel} + param_grid_precomputed = {'compute_method': ['conjugate'], + 'node_kernels': [sub_kernel], 'edge_kernels': [sub_kernel], + 'weight': np.logspace(-1, -10, num=10, base=10)} + + elif kernel_name == 'FixedPoint': + from gklearn.kernels.randomWalkKernel import randomwalkkernel + estimator = randomwalkkernel + from gklearn.utils.kernels import deltakernel, gaussiankernel, kernelproduct + import functools + mixkernel = functools.partial(kernelproduct, deltakernel, gaussiankernel) + sub_kernel = {'symb': deltakernel, 'nsymb': gaussiankernel, 'mix': mixkernel} + param_grid_precomputed = {'compute_method': ['fp'], + 'node_kernels': [sub_kernel], 'edge_kernels': [sub_kernel], + 'weight': np.logspace(-4, -10, num=7, base=10)} + + elif kernel_name == 'SpectralDecomposition': + from gklearn.kernels.randomWalkKernel import randomwalkkernel + estimator = randomwalkkernel + param_grid_precomputed = {'compute_method': ['spectral'], + 'weight': np.logspace(-1, -10, num=10, base=10), + 'sub_kernel': ['geo', 'exp']} + + elif kernel_name == 'ShortestPath': + from gklearn.kernels.spKernel import spkernel + estimator = spkernel + from gklearn.utils.kernels import deltakernel, gaussiankernel, kernelproduct + import functools + mixkernel = functools.partial(kernelproduct, deltakernel, gaussiankernel) + sub_kernel = {'symb': deltakernel, 'nsymb': gaussiankernel, 'mix': mixkernel} + param_grid_precomputed = {'node_kernels': [sub_kernel]} + + elif kernel_name == 'StructuralSP': + from gklearn.kernels.structuralspKernel import structuralspkernel + estimator = structuralspkernel + from gklearn.utils.kernels import deltakernel, gaussiankernel, kernelproduct + import functools + mixkernel = functools.partial(kernelproduct, deltakernel, gaussiankernel) + sub_kernel = {'symb': deltakernel, 'nsymb': gaussiankernel, 'mix': mixkernel} + param_grid_precomputed = {'node_kernels': [sub_kernel], 'edge_kernels': [sub_kernel], + 'compute_method': ['naive']} + + elif kernel_name == 'PathUpToH': + from gklearn.kernels.untilHPathKernel import untilhpathkernel + estimator = untilhpathkernel + param_grid_precomputed = {'depth': np.linspace(1, 10, 10), # [2], + 'k_func': ['MinMax', 'tanimoto'], # ['MinMax'], # + 'compute_method': ['trie']} # ['MinMax']} + + elif kernel_name == 'Treelet': + from gklearn.kernels.treeletKernel import treeletkernel + estimator = treeletkernel + from gklearn.utils.kernels import gaussiankernel, polynomialkernel + import functools + gkernels = [functools.partial(gaussiankernel, gamma=1 / ga) + # for ga in np.linspace(1, 10, 10)] + for ga in np.logspace(0, 10, num=11, base=10)] + pkernels = [functools.partial(polynomialkernel, d=d, c=c) for d in range(1, 11) + for c in np.logspace(0, 10, num=11, base=10)] +# pkernels = [functools.partial(polynomialkernel, d=1, c=1)] + + param_grid_precomputed = {'sub_kernel': pkernels + gkernels} +# 'parallel': [None]} + + elif kernel_name == 'WLSubtree': + from gklearn.kernels.weisfeilerLehmanKernel import weisfeilerlehmankernel + estimator = weisfeilerlehmankernel + param_grid_precomputed = {'base_kernel': ['subtree'], + 'height': np.linspace(0, 10, 11)} + param_grid = {'C': np.logspace(-10, 4, num=29, base=10)} + + if param_grid is None: + param_grid = {'C': np.logspace(-10, 10, num=41, base=10)} + + results = model_selection_for_precomputed_kernel( + graphs, + estimator, + param_grid_precomputed, + param_grid, + 'classification', + NUM_TRIALS=28, + datafile_y=targets, + extra_params=None, + ds_name=ds_name, + output_dir=output_dir, + n_jobs=n_jobs, + read_gm_from_file=False, + verbose=True) + + return results[0], results[1] \ No newline at end of file diff --git a/lang/fr/gklearn/ged/edit_costs/__init__.py b/lang/fr/gklearn/ged/edit_costs/__init__.py new file mode 100644 index 0000000000..b2a2b12361 --- /dev/null +++ b/lang/fr/gklearn/ged/edit_costs/__init__.py @@ -0,0 +1,2 @@ +from gklearn.ged.edit_costs.edit_cost import EditCost +from gklearn.ged.edit_costs.constant import Constant \ No newline at end of file diff --git a/lang/fr/gklearn/ged/edit_costs/constant.py b/lang/fr/gklearn/ged/edit_costs/constant.py new file mode 100644 index 0000000000..9dca1a214e --- /dev/null +++ b/lang/fr/gklearn/ged/edit_costs/constant.py @@ -0,0 +1,50 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Wed Jun 17 17:52:23 2020 + +@author: ljia +""" +from gklearn.ged.edit_costs import EditCost + + +class Constant(EditCost): + """Implements constant edit cost functions. + """ + + + def __init__(self, node_ins_cost=1, node_del_cost=1, node_rel_cost=1, edge_ins_cost=1, edge_del_cost=1, edge_rel_cost=1): + self._node_ins_cost = node_ins_cost + self._node_del_cost = node_del_cost + self._node_rel_cost = node_rel_cost + self._edge_ins_cost = edge_ins_cost + self._edge_del_cost = edge_del_cost + self._edge_rel_cost = edge_rel_cost + + + def node_ins_cost_fun(self, node_label): + return self._node_ins_cost + + + def node_del_cost_fun(self, node_label): + return self._node_del_cost + + + def node_rel_cost_fun(self, node_label_1, node_label_2): + if node_label_1 != node_label_2: + return self._node_rel_cost + return 0 + + + def edge_ins_cost_fun(self, edge_label): + return self._edge_ins_cost + + + def edge_del_cost_fun(self, edge_label): + return self._edge_del_cost + + + def edge_rel_cost_fun(self, edge_label_1, edge_label_2): + if edge_label_1 != edge_label_2: + return self._edge_rel_cost + return 0 \ No newline at end of file diff --git a/lang/fr/gklearn/ged/edit_costs/edit_cost.py b/lang/fr/gklearn/ged/edit_costs/edit_cost.py new file mode 100644 index 0000000000..5d15827e5d --- /dev/null +++ b/lang/fr/gklearn/ged/edit_costs/edit_cost.py @@ -0,0 +1,88 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Wed Jun 17 17:49:24 2020 + +@author: ljia +""" + + +class EditCost(object): + + + def __init__(self): + pass + + + def node_ins_cost_fun(self, node_label): + """ + /*! + * @brief Node insertions cost function. + * @param[in] node_label A node label. + * @return The cost of inserting a node with label @p node_label. + * @note Must be implemented by derived classes of ged::EditCosts. + */ + """ + return 0 + + + def node_del_cost_fun(self, node_label): + """ + /*! + * @brief Node deletion cost function. + * @param[in] node_label A node label. + * @return The cost of deleting a node with label @p node_label. + * @note Must be implemented by derived classes of ged::EditCosts. + */ + """ + return 0 + + + def node_rel_cost_fun(self, node_label_1, node_label_2): + """ + /*! + * @brief Node relabeling cost function. + * @param[in] node_label_1 A node label. + * @param[in] node_label_2 A node label. + * @return The cost of changing a node's label from @p node_label_1 to @p node_label_2. + * @note Must be implemented by derived classes of ged::EditCosts. + */ + """ + return 0 + + + def edge_ins_cost_fun(self, edge_label): + """ + /*! + * @brief Edge insertion cost function. + * @param[in] edge_label An edge label. + * @return The cost of inserting an edge with label @p edge_label. + * @note Must be implemented by derived classes of ged::EditCosts. + */ + """ + return 0 + + + def edge_del_cost_fun(self, edge_label): + """ + /*! + * @brief Edge deletion cost function. + * @param[in] edge_label An edge label. + * @return The cost of deleting an edge with label @p edge_label. + * @note Must be implemented by derived classes of ged::EditCosts. + */ + """ + return 0 + + + def edge_rel_cost_fun(self, edge_label_1, edge_label_2): + """ + /*! + * @brief Edge relabeling cost function. + * @param[in] edge_label_1 An edge label. + * @param[in] edge_label_2 An edge label. + * @return The cost of changing an edge's label from @p edge_label_1 to @p edge_label_2. + * @note Must be implemented by derived classes of ged::EditCosts. + */ + """ + return 0 \ No newline at end of file diff --git a/lang/fr/gklearn/ged/env/__init__.py b/lang/fr/gklearn/ged/env/__init__.py new file mode 100644 index 0000000000..1a5a0cefec --- /dev/null +++ b/lang/fr/gklearn/ged/env/__init__.py @@ -0,0 +1,4 @@ +from gklearn.ged.env.common_types import Options, OptionsStringMap, AlgorithmState +from gklearn.ged.env.ged_data import GEDData +from gklearn.ged.env.ged_env import GEDEnv +from gklearn.ged.env.node_map import NodeMap \ No newline at end of file diff --git a/lang/fr/gklearn/ged/env/common_types.py b/lang/fr/gklearn/ged/env/common_types.py new file mode 100644 index 0000000000..091d952a44 --- /dev/null +++ b/lang/fr/gklearn/ged/env/common_types.py @@ -0,0 +1,159 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Thu Mar 19 18:17:38 2020 + +@author: ljia +""" + +from enum import Enum, unique + + +class Options(object): + """Contains enums for options employed by ged::GEDEnv. + """ + + + @unique + class GEDMethod(Enum): + """Selects the method. + """ +# @todo: what is this? #ifdef GUROBI + F1 = 1 # Selects ged::F1. + F2 = 2 # Selects ged::F2. + COMPACT_MIP = 3 # Selects ged::CompactMIP. + BLP_NO_EDGE_LABELS = 4 # Selects ged::BLPNoEdgeLabels. +#endif /* GUROBI */ + BRANCH = 5 # Selects ged::Branch. + BRANCH_FAST = 6 # Selects ged::BranchFast. + BRANCH_TIGHT = 7 # Selects ged::BranchTight. + BRANCH_UNIFORM = 8 # Selects ged::BranchUniform. + BRANCH_COMPACT = 9 # Selects ged::BranchCompact. + PARTITION = 10 # Selects ged::Partition. + HYBRID = 11 # Selects ged::Hybrid. + RING = 12 # Selects ged::Ring. + ANCHOR_AWARE_GED = 13 # Selects ged::AnchorAwareGED. + WALKS = 14 # Selects ged::Walks. + IPFP = 15 # Selects ged::IPFP + BIPARTITE = 16 # Selects ged::Bipartite. + SUBGRAPH = 17 # Selects ged::Subgraph. + NODE = 18 # Selects ged::Node. + RING_ML = 19 # Selects ged::RingML. + BIPARTITE_ML = 20 # Selects ged::BipartiteML. + REFINE = 21 # Selects ged::Refine. + BP_BEAM = 22 # Selects ged::BPBeam. + SIMULATED_ANNEALING = 23 # Selects ged::SimulatedAnnealing. + HED = 24 # Selects ged::HED. + STAR = 25 # Selects ged::Star. + + + @unique + class EditCosts(Enum): + """Selects the edit costs. + """ + CHEM_1 = 1 # Selects ged::CHEM1. + CHEM_2 = 2 # Selects ged::CHEM2. + CMU = 3 # Selects ged::CMU. + GREC_1 = 4 # Selects ged::GREC1. + GREC_2 = 5 # Selects ged::GREC2. + PROTEIN = 6 # Selects ged::Protein. + FINGERPRINT = 7 # Selects ged::Fingerprint. + LETTER = 8 # Selects ged::Letter. + LETTER2 = 9 # Selects ged:Letter2. + NON_SYMBOLIC = 10 # Selects ged:NonSymbolic. + CONSTANT = 11 # Selects ged::Constant. + + + @unique + class InitType(Enum): + """@brief Selects the initialization type of the environment. + * @details If eager initialization is selected, all edit costs are pre-computed when initializing the environment. + * Otherwise, they are computed at runtime. If initialization with shuffled copies is selected, shuffled copies of + * all graphs are created. These copies are used when calling ged::GEDEnv::run_method() with two identical graph IDs. + * In this case, one of the IDs is internally replaced by the ID of the shuffled copy and the graph is hence + * compared to an isomorphic but non-identical graph. If initialization without shuffled copies is selected, no shuffled copies + * are created and calling ged::GEDEnv::run_method() with two identical graph IDs amounts to comparing a graph to itself. + """ + LAZY_WITHOUT_SHUFFLED_COPIES = 1 # Lazy initialization, no shuffled graph copies are constructed. + EAGER_WITHOUT_SHUFFLED_COPIES = 2 # Eager initialization, no shuffled graph copies are constructed. + LAZY_WITH_SHUFFLED_COPIES = 3 # Lazy initialization, shuffled graph copies are constructed. + EAGER_WITH_SHUFFLED_COPIES = 4 # Eager initialization, shuffled graph copies are constructed. + + + @unique + class AlgorithmState(Enum): + """can be used to specify the state of an algorithm. + """ + CALLED = 1 # The algorithm has been called. + INITIALIZED = 2 # The algorithm has been initialized. + CONVERGED = 3 # The algorithm has converged. + TERMINATED = 4 # The algorithm has terminated. + + +class OptionsStringMap(object): + + + # Map of available computation methods between enum type and string. + GEDMethod = { + "BRANCH": Options.GEDMethod.BRANCH, + "BRANCH_FAST": Options.GEDMethod.BRANCH_FAST, + "BRANCH_TIGHT": Options.GEDMethod.BRANCH_TIGHT, + "BRANCH_UNIFORM": Options.GEDMethod.BRANCH_UNIFORM, + "BRANCH_COMPACT": Options.GEDMethod.BRANCH_COMPACT, + "PARTITION": Options.GEDMethod.PARTITION, + "HYBRID": Options.GEDMethod.HYBRID, + "RING": Options.GEDMethod.RING, + "ANCHOR_AWARE_GED": Options.GEDMethod.ANCHOR_AWARE_GED, + "WALKS": Options.GEDMethod.WALKS, + "IPFP": Options.GEDMethod.IPFP, + "BIPARTITE": Options.GEDMethod.BIPARTITE, + "SUBGRAPH": Options.GEDMethod.SUBGRAPH, + "NODE": Options.GEDMethod.NODE, + "RING_ML": Options.GEDMethod.RING_ML, + "BIPARTITE_ML": Options.GEDMethod.BIPARTITE_ML, + "REFINE": Options.GEDMethod.REFINE, + "BP_BEAM": Options.GEDMethod.BP_BEAM, + "SIMULATED_ANNEALING": Options.GEDMethod.SIMULATED_ANNEALING, + "HED": Options.GEDMethod.HED, + "STAR": Options.GEDMethod.STAR, + # ifdef GUROBI + "F1": Options.GEDMethod.F1, + "F2": Options.GEDMethod.F2, + "COMPACT_MIP": Options.GEDMethod.COMPACT_MIP, + "BLP_NO_EDGE_LABELS": Options.GEDMethod.BLP_NO_EDGE_LABELS + } + + + # Map of available edit cost functions between enum type and string. + EditCosts = { + "CHEM_1": Options.EditCosts.CHEM_1, + "CHEM_2": Options.EditCosts.CHEM_2, + "CMU": Options.EditCosts.CMU, + "GREC_1": Options.EditCosts.GREC_1, + "GREC_2": Options.EditCosts.GREC_2, + "LETTER": Options.EditCosts.LETTER, + "LETTER2": Options.EditCosts.LETTER2, + "NON_SYMBOLIC": Options.EditCosts.NON_SYMBOLIC, + "FINGERPRINT": Options.EditCosts.FINGERPRINT, + "PROTEIN": Options.EditCosts.PROTEIN, + "CONSTANT": Options.EditCosts.CONSTANT + } + + # Map of available initialization types of the environment between enum type and string. + InitType = { + "LAZY_WITHOUT_SHUFFLED_COPIES": Options.InitType.LAZY_WITHOUT_SHUFFLED_COPIES, + "EAGER_WITHOUT_SHUFFLED_COPIES": Options.InitType.EAGER_WITHOUT_SHUFFLED_COPIES, + "LAZY_WITH_SHUFFLED_COPIES": Options.InitType.LAZY_WITH_SHUFFLED_COPIES, + "LAZY_WITH_SHUFFLED_COPIES": Options.InitType.LAZY_WITH_SHUFFLED_COPIES + } + + +@unique +class AlgorithmState(Enum): + """can be used to specify the state of an algorithm. + """ + CALLED = 1 # The algorithm has been called. + INITIALIZED = 2 # The algorithm has been initialized. + CONVERGED = 3 # The algorithm has converged. + TERMINATED = 4 # The algorithm has terminated. + diff --git a/lang/fr/gklearn/ged/env/ged_data.py b/lang/fr/gklearn/ged/env/ged_data.py new file mode 100644 index 0000000000..0e6881fa56 --- /dev/null +++ b/lang/fr/gklearn/ged/env/ged_data.py @@ -0,0 +1,249 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Wed Jun 17 15:05:01 2020 + +@author: ljia +""" +from gklearn.ged.env import Options, OptionsStringMap +from gklearn.ged.edit_costs import Constant +from gklearn.utils import SpecialLabel, dummy_node + + +class GEDData(object): + + + def __init__(self): + self._graphs = [] + self._graph_names = [] + self._graph_classes = [] + self._num_graphs_without_shuffled_copies = 0 + self._strings_to_internal_node_ids = [] + self._internal_node_ids_to_strings = [] + self._edit_cost = None + self._node_costs = None + self._edge_costs = None + self._node_label_costs = None + self._edge_label_costs = None + self._node_labels = [] + self._edge_labels = [] + self._init_type = Options.InitType.EAGER_WITHOUT_SHUFFLED_COPIES + self._delete_edit_cost = True + self._max_num_nodes = 0 + self._max_num_edges = 0 + + + def num_graphs(self): + """ + /*! + * @brief Returns the number of graphs. + * @return Number of graphs in the instance. + */ + """ + return len(self._graphs) + + + def graph(self, graph_id): + """ + /*! + * @brief Provides access to a graph. + * @param[in] graph_id The ID of the graph. + * @return Constant reference to the graph with ID @p graph_id. + */ + """ + return self._graphs[graph_id] + + + def shuffled_graph_copies_available(self): + """ + /*! + * @brief Checks if shuffled graph copies are available. + * @return Boolean @p true if shuffled graph copies are available. + */ + """ + return (self._init_type == Options.InitType.EAGER_WITH_SHUFFLED_COPIES or self._init_type == Options.InitType.LAZY_WITH_SHUFFLED_COPIES) + + + def num_graphs_without_shuffled_copies(self): + """ + /*! + * @brief Returns the number of graphs in the instance without the shuffled copies. + * @return Number of graphs without shuffled copies contained in the instance. + */ + """ + return self._num_graphs_without_shuffled_copies + + + def node_cost(self, label1, label2): + """ + /*! + * @brief Returns node relabeling, insertion, or deletion cost. + * @param[in] label1 First node label. + * @param[in] label2 Second node label. + * @return Node relabeling cost if @p label1 and @p label2 are both different from ged::dummy_label(), + * node insertion cost if @p label1 equals ged::dummy_label and @p label2 does not, + * node deletion cost if @p label1 does not equal ged::dummy_label and @p label2 does, + * and 0 otherwise. + */ + """ + if self._node_label_costs is None: + if self._eager_init(): # @todo: check if correct + return self._node_costs[label1, label2] + if label1 == label2: + return 0 + if label1 == SpecialLabel.DUMMY: # @todo: check dummy + return self._edit_cost.node_ins_cost_fun(label2) # self._node_labels[label2 - 1]) # @todo: check + if label2 == SpecialLabel.DUMMY: # @todo: check dummy + return self._edit_cost.node_del_cost_fun(label1) # self._node_labels[label1 - 1]) + return self._edit_cost.node_rel_cost_fun(label1, label2) # self._node_labels[label1 - 1], self._node_labels[label2 - 1]) + # use pre-computed node label costs. + else: + id1 = 0 if label1 == SpecialLabel.DUMMY else self._node_label_to_id(label1) # @todo: this is slow. + id2 = 0 if label2 == SpecialLabel.DUMMY else self._node_label_to_id(label2) + return self._node_label_costs[id1, id2] + + + def edge_cost(self, label1, label2): + """ + /*! + * @brief Returns edge relabeling, insertion, or deletion cost. + * @param[in] label1 First edge label. + * @param[in] label2 Second edge label. + * @return Edge relabeling cost if @p label1 and @p label2 are both different from ged::dummy_label(), + * edge insertion cost if @p label1 equals ged::dummy_label and @p label2 does not, + * edge deletion cost if @p label1 does not equal ged::dummy_label and @p label2 does, + * and 0 otherwise. + */ + """ + if self._edge_label_costs is None: + if self._eager_init(): # @todo: check if correct + return self._node_costs[label1, label2] + if label1 == label2: + return 0 + if label1 == SpecialLabel.DUMMY: + return self._edit_cost.edge_ins_cost_fun(label2) # self._edge_labels[label2 - 1]) + if label2 == SpecialLabel.DUMMY: + return self._edit_cost.edge_del_cost_fun(label1) # self._edge_labels[label1 - 1]) + return self._edit_cost.edge_rel_cost_fun(label1, label2) # self._edge_labels[label1 - 1], self._edge_labels[label2 - 1]) + + # use pre-computed edge label costs. + else: + id1 = 0 if label1 == SpecialLabel.DUMMY else self._edge_label_to_id(label1) # @todo: this is slow. + id2 = 0 if label2 == SpecialLabel.DUMMY else self._edge_label_to_id(label2) + return self._edge_label_costs[id1, id2] + + + def compute_induced_cost(self, g, h, node_map): + """ + /*! + * @brief Computes the edit cost between two graphs induced by a node map. + * @param[in] g Input graph. + * @param[in] h Input graph. + * @param[in,out] node_map Node map whose induced edit cost is to be computed. + */ + """ + cost = 0 + + # collect node costs + for node in g.nodes(): + image = node_map.image(node) + label2 = (SpecialLabel.DUMMY if image == dummy_node() else h.nodes[image]['label']) + cost += self.node_cost(g.nodes[node]['label'], label2) + for node in h.nodes(): + pre_image = node_map.pre_image(node) + if pre_image == dummy_node(): + cost += self.node_cost(SpecialLabel.DUMMY, h.nodes[node]['label']) + + # collect edge costs + for (n1, n2) in g.edges(): + image1 = node_map.image(n1) + image2 = node_map.image(n2) + label2 = (h.edges[(image2, image1)]['label'] if h.has_edge(image2, image1) else SpecialLabel.DUMMY) + cost += self.edge_cost(g.edges[(n1, n2)]['label'], label2) + for (n1, n2) in h.edges(): + if not g.has_edge(node_map.pre_image(n2), node_map.pre_image(n1)): + cost += self.edge_cost(SpecialLabel.DUMMY, h.edges[(n1, n2)]['label']) + + node_map.set_induced_cost(cost) + + + def _set_edit_cost(self, edit_cost, edit_cost_constants): + if self._delete_edit_cost: + self._edit_cost = None + + if isinstance(edit_cost, str): + edit_cost = OptionsStringMap.EditCosts[edit_cost] + + if edit_cost == Options.EditCosts.CHEM_1: + if len(edit_cost_constants) == 4: + self._edit_cost = CHEM1(edit_cost_constants[0], edit_cost_constants[1], edit_cost_constants[2], edit_cost_constants[3]) + elif len(edit_cost_constants) == 0: + self._edit_cost = CHEM1() + else: + raise Exception('Wrong number of constants for selected edit costs Options::EditCosts::CHEM_1. Expected: 4 or 0; actual:', len(edit_cost_constants), '.') + elif edit_cost == Options.EditCosts.LETTER: + if len(edit_cost_constants) == 3: + self._edit_cost = Letter(edit_cost_constants[0], edit_cost_constants[1], edit_cost_constants[2]) + elif len(edit_cost_constants) == 0: + self._edit_cost = Letter() + else: + raise Exception('Wrong number of constants for selected edit costs Options::EditCosts::LETTER. Expected: 3 or 0; actual:', len(edit_cost_constants), '.') + elif edit_cost == Options.EditCosts.LETTER2: + if len(edit_cost_constants) == 5: + self._edit_cost = Letter2(edit_cost_constants[0], edit_cost_constants[1], edit_cost_constants[2], edit_cost_constants[3], edit_cost_constants[4]) + elif len(edit_cost_constants) == 0: + self._edit_cost = Letter2() + else: + raise Exception('Wrong number of constants for selected edit costs Options::EditCosts::LETTER2. Expected: 5 or 0; actual:', len(edit_cost_constants), '.') + elif edit_cost == Options.EditCosts.NON_SYMBOLIC: + if len(edit_cost_constants) == 6: + self._edit_cost = NonSymbolic(edit_cost_constants[0], edit_cost_constants[1], edit_cost_constants[2], edit_cost_constants[3], edit_cost_constants[4], edit_cost_constants[5]) + elif len(edit_cost_constants) == 0: + self._edit_cost = NonSymbolic() + else: + raise Exception('Wrong number of constants for selected edit costs Options::EditCosts::NON_SYMBOLIC. Expected: 6 or 0; actual:', len(edit_cost_constants), '.') + elif edit_cost == Options.EditCosts.CONSTANT: + if len(edit_cost_constants) == 6: + self._edit_cost = Constant(edit_cost_constants[0], edit_cost_constants[1], edit_cost_constants[2], edit_cost_constants[3], edit_cost_constants[4], edit_cost_constants[5]) + elif len(edit_cost_constants) == 0: + self._edit_cost = Constant() + else: + raise Exception('Wrong number of constants for selected edit costs Options::EditCosts::CONSTANT. Expected: 6 or 0; actual:', len(edit_cost_constants), '.') + + self._delete_edit_cost = True + + + def id_to_node_label(self, label_id): + if label_id > len(self._node_labels) or label_id == 0: + raise Exception('Invalid node label ID', str(label_id), '.') + return self._node_labels[label_id - 1] + + + def _node_label_to_id(self, node_label): + n_id = 0 + for n_l in self._node_labels: + if n_l == node_label: + return n_id + 1 + n_id += 1 + self._node_labels.append(node_label) + return n_id + 1 + + + def id_to_edge_label(self, label_id): + if label_id > len(self._edge_labels) or label_id == 0: + raise Exception('Invalid edge label ID', str(label_id), '.') + return self._edge_labels[label_id - 1] + + + def _edge_label_to_id(self, edge_label): + e_id = 0 + for e_l in self._edge_labels: + if e_l == edge_label: + return e_id + 1 + e_id += 1 + self._edge_labels.append(edge_label) + return e_id + 1 + + + def _eager_init(self): + return (self._init_type == Options.InitType.EAGER_WITHOUT_SHUFFLED_COPIES or self._init_type == Options.InitType.EAGER_WITH_SHUFFLED_COPIES) \ No newline at end of file diff --git a/lang/fr/gklearn/ged/env/ged_env.py b/lang/fr/gklearn/ged/env/ged_env.py new file mode 100644 index 0000000000..3d7644b77f --- /dev/null +++ b/lang/fr/gklearn/ged/env/ged_env.py @@ -0,0 +1,733 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Wed Jun 17 12:02:36 2020 + +@author: ljia +""" +import numpy as np +import networkx as nx +from gklearn.ged.env import Options, OptionsStringMap +from gklearn.ged.env import GEDData + + +class GEDEnv(object): + + + def __init__(self): + self._initialized = False + self._new_graph_ids = [] + self._ged_data = GEDData() + # Variables needed for approximating ged_instance_. + self._lower_bounds = {} + self._upper_bounds = {} + self._runtimes = {} + self._node_maps = {} + self._original_to_internal_node_ids = [] + self._internal_to_original_node_ids = [] + self._ged_method = None + + + def set_edit_cost(self, edit_cost, edit_cost_constants=[]): + """ + /*! + * @brief Sets the edit costs to one of the predefined edit costs. + * @param[in] edit_costs Select one of the predefined edit costs. + * @param[in] edit_cost_constants Constants passed to the constructor of the edit cost class selected by @p edit_costs. + */ + """ + self._ged_data._set_edit_cost(edit_cost, edit_cost_constants) + + + def add_graph(self, graph_name='', graph_class=''): + """ + /*! + * @brief Adds a new uninitialized graph to the environment. Call init() after calling this method. + * @param[in] graph_name The name of the added graph. Empty if not specified. + * @param[in] graph_class The class of the added graph. Empty if not specified. + * @return The ID of the newly added graph. + */ + """ + # @todo: graphs are not uninitialized. + self._initialized = False + graph_id = self._ged_data._num_graphs_without_shuffled_copies + self._ged_data._num_graphs_without_shuffled_copies += 1 + self._new_graph_ids.append(graph_id) + self._ged_data._graphs.append(nx.Graph()) + self._ged_data._graph_names.append(graph_name) + self._ged_data._graph_classes.append(graph_class) + self._original_to_internal_node_ids.append({}) + self._internal_to_original_node_ids.append({}) + self._ged_data._strings_to_internal_node_ids.append({}) + self._ged_data._internal_node_ids_to_strings.append({}) + return graph_id + + + def clear_graph(self, graph_id): + """ + /*! + * @brief Clears and de-initializes a graph that has previously been added to the environment. Call init() after calling this method. + * @param[in] graph_id ID of graph that has to be cleared. + */ + """ + if graph_id > self._ged_data.num_graphs_without_shuffled_copies(): + raise Exception('The graph', self.get_graph_name(graph_id), 'has not been added to the environment.') + self._ged_data._graphs[graph_id].clear() + self._original_to_internal_node_ids[graph_id].clear() + self._internal_to_original_node_ids[graph_id].clear() + self._ged_data._strings_to_internal_node_ids[graph_id].clear() + self._ged_data._internal_node_ids_to_strings[graph_id].clear() + self._initialized = False + + + def add_node(self, graph_id, node_id, node_label): + """ + /*! + * @brief Adds a labeled node. + * @param[in] graph_id ID of graph that has been added to the environment. + * @param[in] node_id The user-specific ID of the vertex that has to be added. + * @param[in] node_label The label of the vertex that has to be added. Set to ged::NoLabel() if template parameter @p UserNodeLabel equals ged::NoLabel. + */ + """ + # @todo: check ids. + self._initialized = False + internal_node_id = nx.number_of_nodes(self._ged_data._graphs[graph_id]) + self._ged_data._graphs[graph_id].add_node(internal_node_id, label=node_label) + self._original_to_internal_node_ids[graph_id][node_id] = internal_node_id + self._internal_to_original_node_ids[graph_id][internal_node_id] = node_id + self._ged_data._strings_to_internal_node_ids[graph_id][str(node_id)] = internal_node_id + self._ged_data._internal_node_ids_to_strings[graph_id][internal_node_id] = str(node_id) + self._ged_data._node_label_to_id(node_label) + label_id = self._ged_data._node_label_to_id(node_label) + # @todo: ged_data_.graphs_[graph_id].set_label + + + def add_edge(self, graph_id, nd_from, nd_to, edge_label, ignore_duplicates=True): + """ + /*! + * @brief Adds a labeled edge. + * @param[in] graph_id ID of graph that has been added to the environment. + * @param[in] tail The user-specific ID of the tail of the edge that has to be added. + * @param[in] head The user-specific ID of the head of the edge that has to be added. + * @param[in] edge_label The label of the vertex that has to be added. Set to ged::NoLabel() if template parameter @p UserEdgeLabel equals ged::NoLabel. + * @param[in] ignore_duplicates If @p true, duplicate edges are ignores. Otherwise, an exception is thrown if an existing edge is added to the graph. + */ + """ + # @todo: check everything. + self._initialized = False + # @todo: check ignore_duplicates. + self._ged_data._graphs[graph_id].add_edge(self._original_to_internal_node_ids[graph_id][nd_from], self._original_to_internal_node_ids[graph_id][nd_to], label=edge_label) + label_id = self._ged_data._edge_label_to_id(edge_label) + # @todo: ged_data_.graphs_[graph_id].set_label + + + def add_nx_graph(self, g, classe, ignore_duplicates=True) : + """ + Add a Graph (made by networkx) on the environment. Be careful to respect the same format as GXL graphs for labelling nodes and edges. + + :param g: The graph to add (networkx graph) + :param ignore_duplicates: If True, duplicate edges are ignored, otherwise it's raise an error if an existing edge is added. True by default + :type g: networkx.graph + :type ignore_duplicates: bool + :return: The ID of the newly added graphe + :rtype: size_t + + .. note:: The NX graph must respect the GXL structure. Please see how a GXL graph is construct. + + """ + graph_id = self.add_graph(g.name, classe) # check if the graph name already exists. + for node in g.nodes: # @todo: if the keys of labels include int and str at the same time. + self.add_node(graph_id, node, tuple(sorted(g.nodes[node].items(), key=lambda kv: kv[0]))) + for edge in g.edges: + self.add_edge(graph_id, edge[0], edge[1], tuple(sorted(g.edges[(edge[0], edge[1])].items(), key=lambda kv: kv[0])), ignore_duplicates) + return graph_id + + + def load_nx_graph(self, nx_graph, graph_id, graph_name='', graph_class=''): + """ + Loads NetworkX Graph into the GED environment. + + Parameters + ---------- + nx_graph : NetworkX Graph object + The graph that should be loaded. + + graph_id : int or None + The ID of a graph contained the environment (overwrite existing graph) or add new graph if `None`. + + graph_name : string, optional + The name of newly added graph. The default is ''. Has no effect unless `graph_id` equals `None`. + + graph_class : string, optional + The class of newly added graph. The default is ''. Has no effect unless `graph_id` equals `None`. + + Returns + ------- + int + The ID of the newly loaded graph. + """ + if graph_id is None: # @todo: undefined. + graph_id = self.add_graph(graph_name, graph_class) + else: + self.clear_graph(graph_id) + for node in nx_graph.nodes: + self.add_node(graph_id, node, tuple(sorted(nx_graph.nodes[node].items(), key=lambda kv: kv[0]))) + for edge in nx_graph.edges: + self.add_edge(graph_id, edge[0], edge[1], tuple(sorted(nx_graph.edges[(edge[0], edge[1])].items(), key=lambda kv: kv[0]))) + return graph_id + + + def init(self, init_type=Options.InitType.EAGER_WITHOUT_SHUFFLED_COPIES, print_to_stdout=False): + if isinstance(init_type, str): + init_type = OptionsStringMap.InitType[init_type] + + # Throw an exception if no edit costs have been selected. + if self._ged_data._edit_cost is None: + raise Exception('No edit costs have been selected. Call set_edit_cost() before calling init().') + + # Return if the environment is initialized. + if self._initialized: + return + + # Set initialization type. + self._ged_data._init_type = init_type + + # @todo: Construct shuffled graph copies if necessary. + + # Re-initialize adjacency matrices (also previously initialized graphs must be re-initialized because of possible re-allocation). + # @todo: setup_adjacency_matrix, don't know if neccessary. + self._ged_data._max_num_nodes = np.max([nx.number_of_nodes(g) for g in self._ged_data._graphs]) + self._ged_data._max_num_edges = np.max([nx.number_of_edges(g) for g in self._ged_data._graphs]) + + # Initialize cost matrices if necessary. + if self._ged_data._eager_init(): + pass # @todo: init_cost_matrices_: 1. Update node cost matrix if new node labels have been added to the environment; 2. Update edge cost matrix if new edge labels have been added to the environment. + + # Mark environment as initialized. + self._initialized = True + self._new_graph_ids.clear() + + + def is_initialized(self): + """ + /*! + * @brief Check if the environment is initialized. + * @return True if the environment is initialized. + */ + """ + return self._initialized + + + def get_init_type(self): + """ + /*! + * @brief Returns the initialization type of the last initialization. + * @return Initialization type. + */ + """ + return self._ged_data._init_type + + + def set_label_costs(self, node_label_costs=None, edge_label_costs=None): + """Set the costs between labels. + """ + if node_label_costs is not None: + self._ged_data._node_label_costs = node_label_costs + if edge_label_costs is not None: + self._ged_data._edge_label_costs = edge_label_costs + + + def set_method(self, method, options=''): + """ + /*! + * @brief Sets the GEDMethod to be used by run_method(). + * @param[in] method Select the method that is to be used. + * @param[in] options An options string of the form @"[--@ @] [...]@" passed to the selected method. + */ + """ + del self._ged_method + + if isinstance(method, str): + method = OptionsStringMap.GEDMethod[method] + + if method == Options.GEDMethod.BRANCH: + self._ged_method = Branch(self._ged_data) + elif method == Options.GEDMethod.BRANCH_FAST: + self._ged_method = BranchFast(self._ged_data) + elif method == Options.GEDMethod.BRANCH_FAST: + self._ged_method = BranchFast(self._ged_data) + elif method == Options.GEDMethod.BRANCH_TIGHT: + self._ged_method = BranchTight(self._ged_data) + elif method == Options.GEDMethod.BRANCH_UNIFORM: + self._ged_method = BranchUniform(self._ged_data) + elif method == Options.GEDMethod.BRANCH_COMPACT: + self._ged_method = BranchCompact(self._ged_data) + elif method == Options.GEDMethod.PARTITION: + self._ged_method = Partition(self._ged_data) + elif method == Options.GEDMethod.HYBRID: + self._ged_method = Hybrid(self._ged_data) + elif method == Options.GEDMethod.RING: + self._ged_method = Ring(self._ged_data) + elif method == Options.GEDMethod.ANCHOR_AWARE_GED: + self._ged_method = AnchorAwareGED(self._ged_data) + elif method == Options.GEDMethod.WALKS: + self._ged_method = Walks(self._ged_data) + elif method == Options.GEDMethod.IPFP: + self._ged_method = IPFP(self._ged_data) + elif method == Options.GEDMethod.BIPARTITE: + from gklearn.ged.methods import Bipartite + self._ged_method = Bipartite(self._ged_data) + elif method == Options.GEDMethod.SUBGRAPH: + self._ged_method = Subgraph(self._ged_data) + elif method == Options.GEDMethod.NODE: + self._ged_method = Node(self._ged_data) + elif method == Options.GEDMethod.RING_ML: + self._ged_method = RingML(self._ged_data) + elif method == Options.GEDMethod.BIPARTITE_ML: + self._ged_method = BipartiteML(self._ged_data) + elif method == Options.GEDMethod.REFINE: + self._ged_method = Refine(self._ged_data) + elif method == Options.GEDMethod.BP_BEAM: + self._ged_method = BPBeam(self._ged_data) + elif method == Options.GEDMethod.SIMULATED_ANNEALING: + self._ged_method = SimulatedAnnealing(self._ged_data) + elif method == Options.GEDMethod.HED: + self._ged_method = HED(self._ged_data) + elif method == Options.GEDMethod.STAR: + self._ged_method = STAR(self._ged_data) + # #ifdef GUROBI + elif method == Options.GEDMethod.F1: + self._ged_method = F1(self._ged_data) + elif method == Options.GEDMethod.F2: + self._ged_method = F2(self._ged_data) + elif method == Options.GEDMethod.COMPACT_MIP: + self._ged_method = CompactMIP(self._ged_data) + elif method == Options.GEDMethod.BLP_NO_EDGE_LABELS: + self._ged_method = BLPNoEdgeLabels(self._ged_data) + + self._ged_method.set_options(options) + + + def run_method(self, g_id, h_id): + """ + /*! + * @brief Runs the GED method specified by call to set_method() between the graphs with IDs @p g_id and @p h_id. + * @param[in] g_id ID of an input graph that has been added to the environment. + * @param[in] h_id ID of an input graph that has been added to the environment. + */ + """ + if g_id >= self._ged_data.num_graphs(): + raise Exception('The graph with ID', str(g_id), 'has not been added to the environment.') + if h_id >= self._ged_data.num_graphs(): + raise Exception('The graph with ID', str(h_id), 'has not been added to the environment.') + if not self._initialized: + raise Exception('The environment is uninitialized. Call init() after adding all graphs to the environment.') + if self._ged_method is None: + raise Exception('No method has been set. Call set_method() before calling run().') + + # Call selected GEDMethod and store results. + if self._ged_data.shuffled_graph_copies_available() and (g_id == h_id): + self._ged_method.run(g_id, self._ged_data.id_shuffled_graph_copy(h_id)) # @todo: why shuffle? + else: + self._ged_method.run(g_id, h_id) + self._lower_bounds[(g_id, h_id)] = self._ged_method.get_lower_bound() + self._upper_bounds[(g_id, h_id)] = self._ged_method.get_upper_bound() + self._runtimes[(g_id, h_id)] = self._ged_method.get_runtime() + self._node_maps[(g_id, h_id)] = self._ged_method.get_node_map() + + + def init_method(self): + """Initializes the method specified by call to set_method(). + """ + if not self._initialized: + raise Exception('The environment is uninitialized. Call init() before calling init_method().') + if self._ged_method is None: + raise Exception('No method has been set. Call set_method() before calling init_method().') + self._ged_method.init() + + + def get_num_node_labels(self): + """ + /*! + * @brief Returns the number of node labels. + * @return Number of pairwise different node labels contained in the environment. + * @note If @p 1 is returned, the nodes are unlabeled. + */ + """ + return len(self._ged_data._node_labels) + + + def get_all_node_labels(self): + """ + /*! + * @brief Returns the list of all node labels. + * @return List of pairwise different node labels contained in the environment. + * @note If @p 1 is returned, the nodes are unlabeled. + */ + """ + return self._ged_data._node_labels + + + def get_node_label(self, label_id, to_dict=True): + """ + /*! + * @brief Returns node label. + * @param[in] label_id ID of node label that should be returned. Must be between 1 and num_node_labels(). + * @return Node label for selected label ID. + */ + """ + if label_id < 1 or label_id > self.get_num_node_labels(): + raise Exception('The environment does not contain a node label with ID', str(label_id), '.') + if to_dict: + return dict(self._ged_data._node_labels[label_id - 1]) + return self._ged_data._node_labels[label_id - 1] + + + def get_num_edge_labels(self): + """ + /*! + * @brief Returns the number of edge labels. + * @return Number of pairwise different edge labels contained in the environment. + * @note If @p 1 is returned, the edges are unlabeled. + */ + """ + return len(self._ged_data._edge_labels) + + + def get_all_edge_labels(self): + """ + /*! + * @brief Returns the list of all edge labels. + * @return List of pairwise different edge labels contained in the environment. + * @note If @p 1 is returned, the edges are unlabeled. + */ + """ + return self._ged_data._edge_labels + + + def get_edge_label(self, label_id, to_dict=True): + """ + /*! + * @brief Returns edge label. + * @param[in] label_id ID of edge label that should be returned. Must be between 1 and num_node_labels(). + * @return Edge label for selected label ID. + */ + """ + if label_id < 1 or label_id > self.get_num_edge_labels(): + raise Exception('The environment does not contain an edge label with ID', str(label_id), '.') + if to_dict: + return dict(self._ged_data._edge_labels[label_id - 1]) + return self._ged_data._edge_labels[label_id - 1] + + + def get_upper_bound(self, g_id, h_id): + """ + /*! + * @brief Returns upper bound for edit distance between the input graphs. + * @param[in] g_id ID of an input graph that has been added to the environment. + * @param[in] h_id ID of an input graph that has been added to the environment. + * @return Upper bound computed by the last call to run_method() with arguments @p g_id and @p h_id. + */ + """ + if (g_id, h_id) not in self._upper_bounds: + raise Exception('Call run(' + str(g_id) + ',' + str(h_id) + ') before calling get_upper_bound(' + str(g_id) + ',' + str(h_id) + ').') + return self._upper_bounds[(g_id, h_id)] + + + def get_lower_bound(self, g_id, h_id): + """ + /*! + * @brief Returns lower bound for edit distance between the input graphs. + * @param[in] g_id ID of an input graph that has been added to the environment. + * @param[in] h_id ID of an input graph that has been added to the environment. + * @return Lower bound computed by the last call to run_method() with arguments @p g_id and @p h_id. + */ + """ + if (g_id, h_id) not in self._lower_bounds: + raise Exception('Call run(' + str(g_id) + ',' + str(h_id) + ') before calling get_lower_bound(' + str(g_id) + ',' + str(h_id) + ').') + return self._lower_bounds[(g_id, h_id)] + + + def get_runtime(self, g_id, h_id): + """ + /*! + * @brief Returns runtime. + * @param[in] g_id ID of an input graph that has been added to the environment. + * @param[in] h_id ID of an input graph that has been added to the environment. + * @return Runtime of last call to run_method() with arguments @p g_id and @p h_id. + */ + """ + if (g_id, h_id) not in self._runtimes: + raise Exception('Call run(' + str(g_id) + ',' + str(h_id) + ') before calling get_runtime(' + str(g_id) + ',' + str(h_id) + ').') + return self._runtimes[(g_id, h_id)] + + + def get_init_time(self): + """ + /*! + * @brief Returns initialization time. + * @return Runtime of the last call to init_method(). + */ + """ + return self._ged_method.get_init_time() + + + def get_node_map(self, g_id, h_id): + """ + /*! + * @brief Returns node map between the input graphs. + * @param[in] g_id ID of an input graph that has been added to the environment. + * @param[in] h_id ID of an input graph that has been added to the environment. + * @return Node map computed by the last call to run_method() with arguments @p g_id and @p h_id. + */ + """ + if (g_id, h_id) not in self._node_maps: + raise Exception('Call run(' + str(g_id) + ',' + str(h_id) + ') before calling get_node_map(' + str(g_id) + ',' + str(h_id) + ').') + return self._node_maps[(g_id, h_id)] + + + def get_forward_map(self, g_id, h_id) : + """ + Returns the forward map (or the half of the adjacence matrix) between nodes of the two indicated graphs. + + :param g: The Id of the first compared graph + :param h: The Id of the second compared graph + :type g: size_t + :type h: size_t + :return: The forward map to the adjacence matrix between nodes of the two graphs + :rtype: list[npy_uint32] + + .. seealso:: run_method(), get_upper_bound(), get_lower_bound(), get_backward_map(), get_runtime(), quasimetric_cost(), get_node_map(), get_assignment_matrix() + .. warning:: run_method() between the same two graph must be called before this function. + .. note:: I don't know how to connect the two map to reconstruct the adjacence matrix. Please come back when I know how it's work ! + """ + return self.get_node_map(g_id, h_id).forward_map + + + def get_backward_map(self, g_id, h_id) : + """ + Returns the backward map (or the half of the adjacence matrix) between nodes of the two indicated graphs. + + :param g: The Id of the first compared graph + :param h: The Id of the second compared graph + :type g: size_t + :type h: size_t + :return: The backward map to the adjacence matrix between nodes of the two graphs + :rtype: list[npy_uint32] + + .. seealso:: run_method(), get_upper_bound(), get_lower_bound(), get_forward_map(), get_runtime(), quasimetric_cost(), get_node_map(), get_assignment_matrix() + .. warning:: run_method() between the same two graph must be called before this function. + .. note:: I don't know how to connect the two map to reconstruct the adjacence matrix. Please come back when I know how it's work ! + """ + return self.get_node_map(g_id, h_id).backward_map + + + def compute_induced_cost(self, g_id, h_id, node_map): + """ + /*! + * @brief Computes the edit cost between two graphs induced by a node map. + * @param[in] g_id ID of input graph. + * @param[in] h_id ID of input graph. + * @param[in,out] node_map Node map whose induced edit cost is to be computed. + */ + """ + self._ged_data.compute_induced_cost(self._ged_data._graphs[g_id], self._ged_data._graphs[h_id], node_map) + + + def get_nx_graph(self, graph_id): + """ + * @brief Returns NetworkX.Graph() representation. + * @param[in] graph_id ID of the selected graph. + """ + graph = nx.Graph() # @todo: add graph attributes. + graph.graph['id'] = graph_id + + nb_nodes = self.get_graph_num_nodes(graph_id) + original_node_ids = self.get_original_node_ids(graph_id) + node_labels = self.get_graph_node_labels(graph_id, to_dict=True) + graph.graph['original_node_ids'] = original_node_ids + + for node_id in range(0, nb_nodes): + graph.add_node(node_id, **node_labels[node_id]) + + edges = self.get_graph_edges(graph_id, to_dict=True) + for (head, tail), labels in edges.items(): + graph.add_edge(head, tail, **labels) + + return graph + + + def get_graph_node_labels(self, graph_id, to_dict=True): + """ + Searchs and returns all the labels of nodes on a graph, selected by its ID. + + :param graph_id: The ID of the wanted graph + :type graph_id: size_t + :return: The list of nodes' labels on the selected graph + :rtype: list[dict{string : string}] + + .. seealso:: get_graph_internal_id(), get_graph_num_nodes(), get_graph_num_edges(), get_original_node_ids(), get_graph_edges(), get_graph_adjacence_matrix() + .. note:: These functions allow to collect all the graph's informations. + """ + graph = self._ged_data.graph(graph_id) + node_labels = [] + for n in graph.nodes(): + node_labels.append(graph.nodes[n]['label']) + if to_dict: + return [dict(i) for i in node_labels] + return node_labels + + + def get_graph_edges(self, graph_id, to_dict=True): + """ + Searchs and returns all the edges on a graph, selected by its ID. + + :param graph_id: The ID of the wanted graph + :type graph_id: size_t + :return: The list of edges on the selected graph + :rtype: dict{tuple(size_t, size_t) : dict{string : string}} + + .. seealso::get_graph_internal_id(), get_graph_num_nodes(), get_graph_num_edges(), get_original_node_ids(), get_graph_node_labels(), get_graph_adjacence_matrix() + .. note:: These functions allow to collect all the graph's informations. + """ + graph = self._ged_data.graph(graph_id) + if to_dict: + edges = {} + for n1, n2, attr in graph.edges(data=True): + edges[(n1, n2)] = dict(attr['label']) + return edges + return {(n1, n2): attr['label'] for n1, n2, attr in graph.edges(data=True)} + + + + def get_graph_name(self, graph_id): + """ + /*! + * @brief Returns the graph name. + * @param[in] graph_id ID of an input graph that has been added to the environment. + * @return Name of the input graph. + */ + """ + return self._ged_data._graph_names[graph_id] + + + def get_graph_num_nodes(self, graph_id): + """ + /*! + * @brief Returns the number of nodes. + * @param[in] graph_id ID of an input graph that has been added to the environment. + * @return Number of nodes in the graph. + */ + """ + return nx.number_of_nodes(self._ged_data.graph(graph_id)) + + + def get_original_node_ids(self, graph_id): + """ + Searchs and returns all th Ids of nodes on a graph, selected by its ID. + + :param graph_id: The ID of the wanted graph + :type graph_id: size_t + :return: The list of IDs's nodes on the selected graph + :rtype: list[string] + + .. seealso::get_graph_internal_id(), get_graph_num_nodes(), get_graph_num_edges(), get_graph_node_labels(), get_graph_edges(), get_graph_adjacence_matrix() + .. note:: These functions allow to collect all the graph's informations. + """ + return [i for i in self._internal_to_original_node_ids[graph_id].values()] + + + def get_node_cost(self, node_label_1, node_label_2): + return self._ged_data.node_cost(node_label_1, node_label_2) + + + def get_node_rel_cost(self, node_label_1, node_label_2): + """ + /*! + * @brief Returns node relabeling cost. + * @param[in] node_label_1 First node label. + * @param[in] node_label_2 Second node label. + * @return Node relabeling cost for the given node labels. + */ + """ + if isinstance(node_label_1, dict): + node_label_1 = tuple(sorted(node_label_1.items(), key=lambda kv: kv[0])) + if isinstance(node_label_2, dict): + node_label_2 = tuple(sorted(node_label_2.items(), key=lambda kv: kv[0])) + return self._ged_data._edit_cost.node_rel_cost_fun(node_label_1, node_label_2) # @todo: may need to use node_cost() instead (or change node_cost() and modify ged_method for pre-defined cost matrices.) + + + def get_node_del_cost(self, node_label): + """ + /*! + * @brief Returns node deletion cost. + * @param[in] node_label Node label. + * @return Cost of deleting node with given label. + */ + """ + if isinstance(node_label, dict): + node_label = tuple(sorted(node_label.items(), key=lambda kv: kv[0])) + return self._ged_data._edit_cost.node_del_cost_fun(node_label) + + + def get_node_ins_cost(self, node_label): + """ + /*! + * @brief Returns node insertion cost. + * @param[in] node_label Node label. + * @return Cost of inserting node with given label. + */ + """ + if isinstance(node_label, dict): + node_label = tuple(sorted(node_label.items(), key=lambda kv: kv[0])) + return self._ged_data._edit_cost.node_ins_cost_fun(node_label) + + + def get_edge_cost(self, edge_label_1, edge_label_2): + return self._ged_data.edge_cost(edge_label_1, edge_label_2) + + + def get_edge_rel_cost(self, edge_label_1, edge_label_2): + """ + /*! + * @brief Returns edge relabeling cost. + * @param[in] edge_label_1 First edge label. + * @param[in] edge_label_2 Second edge label. + * @return Edge relabeling cost for the given edge labels. + */ + """ + if isinstance(edge_label_1, dict): + edge_label_1 = tuple(sorted(edge_label_1.items(), key=lambda kv: kv[0])) + if isinstance(edge_label_2, dict): + edge_label_2 = tuple(sorted(edge_label_2.items(), key=lambda kv: kv[0])) + return self._ged_data._edit_cost.edge_rel_cost_fun(edge_label_1, edge_label_2) + + + def get_edge_del_cost(self, edge_label): + """ + /*! + * @brief Returns edge deletion cost. + * @param[in] edge_label Edge label. + * @return Cost of deleting edge with given label. + */ + """ + if isinstance(edge_label, dict): + edge_label = tuple(sorted(edge_label.items(), key=lambda kv: kv[0])) + return self._ged_data._edit_cost.edge_del_cost_fun(edge_label) + + + def get_edge_ins_cost(self, edge_label): + """ + /*! + * @brief Returns edge insertion cost. + * @param[in] edge_label Edge label. + * @return Cost of inserting edge with given label. + */ + """ + if isinstance(edge_label, dict): + edge_label = tuple(sorted(edge_label.items(), key=lambda kv: kv[0])) + return self._ged_data._edit_cost.edge_ins_cost_fun(edge_label) + + + def get_all_graph_ids(self): + return [i for i in range(0, self._ged_data._num_graphs_without_shuffled_copies)] \ No newline at end of file diff --git a/lang/fr/gklearn/ged/env/node_map.py b/lang/fr/gklearn/ged/env/node_map.py new file mode 100644 index 0000000000..71b68d8502 --- /dev/null +++ b/lang/fr/gklearn/ged/env/node_map.py @@ -0,0 +1,102 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Wed Apr 22 11:31:26 2020 + +@author: ljia +""" +import numpy as np +from gklearn.utils import dummy_node, undefined_node + + +class NodeMap(object): + + def __init__(self, num_nodes_g, num_nodes_h): + self._forward_map = [undefined_node()] * num_nodes_g + self._backward_map = [undefined_node()] * num_nodes_h + self._induced_cost = np.inf + + + def clear(self): + """ + /*! + * @brief Clears the node map. + */ + """ + self._forward_map = [undefined_node() for i in range(len(self._forward_map))] + self._backward_map = [undefined_node() for i in range(len(self._backward_map))] + + + def num_source_nodes(self): + return len(self._forward_map) + + + def num_target_nodes(self): + return len(self._backward_map) + + + def image(self, node): + if node < len(self._forward_map): + return self._forward_map[node] + else: + raise Exception('The node with ID ', str(node), ' is not contained in the source nodes of the node map.') + return undefined_node() + + + def pre_image(self, node): + if node < len(self._backward_map): + return self._backward_map[node] + else: + raise Exception('The node with ID ', str(node), ' is not contained in the target nodes of the node map.') + return undefined_node() + + + def as_relation(self, relation): + relation.clear() + for i in range(0, len(self._forward_map)): + k = self._forward_map[i] + if k != undefined_node(): + relation.append(tuple((i, k))) + for k in range(0, len(self._backward_map)): + i = self._backward_map[k] + if i == dummy_node(): + relation.append(tuple((i, k))) + + + def add_assignment(self, i, k): + if i != dummy_node(): + if i < len(self._forward_map): + self._forward_map[i] = k + else: + raise Exception('The node with ID ', str(i), ' is not contained in the source nodes of the node map.') + if k != dummy_node(): + if k < len(self._backward_map): + self._backward_map[k] = i + else: + raise Exception('The node with ID ', str(k), ' is not contained in the target nodes of the node map.') + + + def set_induced_cost(self, induced_cost): + self._induced_cost = induced_cost + + + def induced_cost(self): + return self._induced_cost + + + @property + def forward_map(self): + return self._forward_map + + @forward_map.setter + def forward_map(self, value): + self._forward_map = value + + + @property + def backward_map(self): + return self._backward_map + + @backward_map.setter + def backward_map(self, value): + self._backward_map = value \ No newline at end of file diff --git a/lang/fr/gklearn/ged/learning/__init__.py b/lang/fr/gklearn/ged/learning/__init__.py new file mode 100644 index 0000000000..f867ab3987 --- /dev/null +++ b/lang/fr/gklearn/ged/learning/__init__.py @@ -0,0 +1,9 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Tue Jul 7 16:07:25 2020 + +@author: ljia +""" + +from gklearn.ged.learning.cost_matrices_learner import CostMatricesLearner \ No newline at end of file diff --git a/lang/fr/gklearn/ged/learning/cost_matrices_learner.py b/lang/fr/gklearn/ged/learning/cost_matrices_learner.py new file mode 100644 index 0000000000..d2c39c22d5 --- /dev/null +++ b/lang/fr/gklearn/ged/learning/cost_matrices_learner.py @@ -0,0 +1,148 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Tue Jul 7 11:42:48 2020 + +@author: ljia +""" +import numpy as np +import cvxpy as cp +import time +from gklearn.ged.learning.costs_learner import CostsLearner +from gklearn.ged.util import compute_geds_cml + + +class CostMatricesLearner(CostsLearner): + + + def __init__(self, edit_cost='CONSTANT', triangle_rule=False, allow_zeros=True, parallel=False, verbose=2): + super().__init__(parallel, verbose) + self._edit_cost = edit_cost + self._triangle_rule = triangle_rule + self._allow_zeros = allow_zeros + + + def fit(self, X, y): + if self._edit_cost == 'LETTER': + raise Exception('Cannot compute for cost "LETTER".') + elif self._edit_cost == 'LETTER2': + raise Exception('Cannot compute for cost "LETTER2".') + elif self._edit_cost == 'NON_SYMBOLIC': + raise Exception('Cannot compute for cost "NON_SYMBOLIC".') + elif self._edit_cost == 'CONSTANT': # @todo: node/edge may not labeled. + if not self._triangle_rule and self._allow_zeros: + w = cp.Variable(X.shape[1]) + cost_fun = cp.sum_squares(X @ w - y) + constraints = [w >= [0.0 for i in range(X.shape[1])]] + prob = cp.Problem(cp.Minimize(cost_fun), constraints) + self.execute_cvx(prob) + edit_costs_new = w.value + residual = np.sqrt(prob.value) + elif self._triangle_rule and self._allow_zeros: # @todo + x = cp.Variable(nb_cost_mat.shape[1]) + cost_fun = cp.sum_squares(nb_cost_mat @ x - dis_k_vec) + constraints = [x >= [0.0 for i in range(nb_cost_mat.shape[1])], + np.array([1.0, 0.0, 0.0, 0.0, 0.0, 0.0]).T@x >= 0.01, + np.array([0.0, 1.0, 0.0, 0.0, 0.0, 0.0]).T@x >= 0.01, + np.array([0.0, 0.0, 0.0, 1.0, 0.0, 0.0]).T@x >= 0.01, + np.array([0.0, 0.0, 0.0, 0.0, 1.0, 0.0]).T@x >= 0.01, + np.array([1.0, 1.0, -1.0, 0.0, 0.0, 0.0]).T@x >= 0.0, + np.array([0.0, 0.0, 0.0, 1.0, 1.0, -1.0]).T@x >= 0.0] + prob = cp.Problem(cp.Minimize(cost_fun), constraints) + self._execute_cvx(prob) + edit_costs_new = x.value + residual = np.sqrt(prob.value) + elif not self._triangle_rule and not self._allow_zeros: # @todo + x = cp.Variable(nb_cost_mat.shape[1]) + cost_fun = cp.sum_squares(nb_cost_mat @ x - dis_k_vec) + constraints = [x >= [0.01 for i in range(nb_cost_mat.shape[1])]] + prob = cp.Problem(cp.Minimize(cost_fun), constraints) + self._execute_cvx(prob) + edit_costs_new = x.value + residual = np.sqrt(prob.value) + elif self._triangle_rule and not self._allow_zeros: # @todo + x = cp.Variable(nb_cost_mat.shape[1]) + cost_fun = cp.sum_squares(nb_cost_mat @ x - dis_k_vec) + constraints = [x >= [0.01 for i in range(nb_cost_mat.shape[1])], + np.array([1.0, 1.0, -1.0, 0.0, 0.0, 0.0]).T@x >= 0.0, + np.array([0.0, 0.0, 0.0, 1.0, 1.0, -1.0]).T@x >= 0.0] + prob = cp.Problem(cp.Minimize(cost_fun), constraints) + self._execute_cvx(prob) + edit_costs_new = x.value + residual = np.sqrt(prob.value) + else: + raise Exception('The edit cost "', self._ged_options['edit_cost'], '" is not supported for update progress.') + + self._cost_list.append(edit_costs_new) + + + def init_geds_and_nb_eo(self, y, graphs): + time0 = time.time() + self._cost_list.append(np.concatenate((self._ged_options['node_label_costs'], + self._ged_options['edge_label_costs']))) + ged_vec, self._nb_eo = self.compute_geds_and_nb_eo(graphs) + self._residual_list.append(np.sqrt(np.sum(np.square(np.array(ged_vec) - y)))) + self._runtime_list.append(time.time() - time0) + + if self._verbose >= 2: + print('Current node label costs:', self._cost_list[-1][0:len(self._ged_options['node_label_costs'])]) + print('Current edge label costs:', self._cost_list[-1][len(self._ged_options['node_label_costs']):]) + print('Residual list:', self._residual_list) + + + def update_geds_and_nb_eo(self, y, graphs, time0): + self._ged_options['node_label_costs'] = self._cost_list[-1][0:len(self._ged_options['node_label_costs'])] + self._ged_options['edge_label_costs'] = self._cost_list[-1][len(self._ged_options['node_label_costs']):] + ged_vec, self._nb_eo = self.compute_geds_and_nb_eo(graphs) + self._residual_list.append(np.sqrt(np.sum(np.square(np.array(ged_vec) - y)))) + self._runtime_list.append(time.time() - time0) + + + def compute_geds_and_nb_eo(self, graphs): + ged_vec, ged_mat, n_edit_operations = compute_geds_cml(graphs, options=self._ged_options, parallel=self._parallel, verbose=(self._verbose > 1)) + return ged_vec, np.array(n_edit_operations) + + + def check_convergency(self): + self._ec_changed = False + for i, cost in enumerate(self._cost_list[-1]): + if cost == 0: + if self._cost_list[-2][i] > self._epsilon_ec: + self._ec_changed = True + break + elif abs(cost - self._cost_list[-2][i]) / cost > self._epsilon_ec: + self._ec_changed = True + break +# if abs(cost - edit_cost_list[-2][i]) > self._epsilon_ec: +# ec_changed = True +# break + self._residual_changed = False + if self._residual_list[-1] == 0: + if self._residual_list[-2] > self._epsilon_residual: + self._residual_changed = True + elif abs(self._residual_list[-1] - self._residual_list[-2]) / self._residual_list[-1] > self._epsilon_residual: + self._residual_changed = True + self._converged = not (self._ec_changed or self._residual_changed) + if self._converged: + self._itrs_without_update += 1 + else: + self._itrs_without_update = 0 + self._num_updates_ecs += 1 + + + def print_current_states(self): + print() + print('-------------------------------------------------------------------------') + print('States of iteration', self._itrs + 1) + print('-------------------------------------------------------------------------') +# print('Time spend:', self._runtime_optimize_ec) + print('Total number of iterations for optimizing:', self._itrs + 1) + print('Total number of updating edit costs:', self._num_updates_ecs) + print('Was optimization of edit costs converged:', self._converged) + print('Did edit costs change:', self._ec_changed) + print('Did residual change:', self._residual_changed) + print('Iterations without update:', self._itrs_without_update) + print('Current node label costs:', self._cost_list[-1][0:len(self._ged_options['node_label_costs'])]) + print('Current edge label costs:', self._cost_list[-1][len(self._ged_options['node_label_costs']):]) + print('Residual list:', self._residual_list) + print('-------------------------------------------------------------------------') \ No newline at end of file diff --git a/lang/fr/gklearn/ged/learning/costs_learner.py b/lang/fr/gklearn/ged/learning/costs_learner.py new file mode 100644 index 0000000000..844a1f5706 --- /dev/null +++ b/lang/fr/gklearn/ged/learning/costs_learner.py @@ -0,0 +1,175 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Tue Jul 7 11:30:31 2020 + +@author: ljia +""" +import numpy as np +import cvxpy as cp +import time +from gklearn.utils import Timer + + +class CostsLearner(object): + + + def __init__(self, parallel, verbose): + ### To set. + self._parallel = parallel + self._verbose = verbose + # For update(). + self._time_limit_in_sec = 0 + self._max_itrs = 100 + self._max_itrs_without_update = 3 + self._epsilon_residual = 0.01 + self._epsilon_ec = 0.1 + ### To compute. + self._residual_list = [] + self._runtime_list = [] + self._cost_list = [] + self._nb_eo = None + # For update(). + self._itrs = 0 + self._converged = False + self._num_updates_ecs = 0 + self._ec_changed = None + self._residual_changed = None + self._itrs_without_update = 0 + ### Both set and get. + self._ged_options = None + + + def fit(self, X, y): + pass + + + def preprocess(self): + pass # @todo: remove the zero numbers of edit costs. + + + def postprocess(self): + for i in range(len(self._cost_list[-1])): + if -1e-9 <= self._cost_list[-1][i] <= 1e-9: + self._cost_list[-1][i] = 0 + if self._cost_list[-1][i] < 0: + raise ValueError('The edit cost is negative.') + + + def set_update_params(self, **kwargs): + self._time_limit_in_sec = kwargs.get('time_limit_in_sec', self._time_limit_in_sec) + self._max_itrs = kwargs.get('max_itrs', self._max_itrs) + self._max_itrs_without_update = kwargs.get('max_itrs_without_update', self._max_itrs_without_update) + self._epsilon_residual = kwargs.get('epsilon_residual', self._epsilon_residual) + self._epsilon_ec = kwargs.get('epsilon_ec', self._epsilon_ec) + + + def update(self, y, graphs, ged_options, **kwargs): + # Set parameters. + self._ged_options = ged_options + if kwargs != {}: + self.set_update_params(**kwargs) + + # The initial iteration. + if self._verbose >= 2: + print('\ninitial:') + self.init_geds_and_nb_eo(y, graphs) + + self._converged = False + self._itrs_without_update = 0 + self._itrs = 0 + self._num_updates_ecs = 0 + timer = Timer(self._time_limit_in_sec) + # Run iterations from initial edit costs. + while not self.termination_criterion_met(self._converged, timer, self._itrs, self._itrs_without_update): + if self._verbose >= 2: + print('\niteration', self._itrs + 1) + time0 = time.time() + + # Fit GED space to the target space. + self.preprocess() + self.fit(self._nb_eo, y) + self.postprocess() + + # Compute new GEDs and numbers of edit operations. + self.update_geds_and_nb_eo(y, graphs, time0) + + # Check convergency. + self.check_convergency() + + # Print current states. + if self._verbose >= 2: + self.print_current_states() + + self._itrs += 1 + + + def init_geds_and_nb_eo(self, y, graphs): + pass + + + def update_geds_and_nb_eo(self, y, graphs, time0): + pass + + + def compute_geds_and_nb_eo(self, graphs): + pass + + + def check_convergency(self): + pass + + + def print_current_states(self): + pass + + + def termination_criterion_met(self, converged, timer, itr, itrs_without_update): + if timer.expired() or (itr >= self._max_itrs if self._max_itrs >= 0 else False): +# if self._state == AlgorithmState.TERMINATED: +# self._state = AlgorithmState.INITIALIZED + return True + return converged or (itrs_without_update > self._max_itrs_without_update if self._max_itrs_without_update >= 0 else False) + + + def execute_cvx(self, prob): + try: + prob.solve(verbose=(self._verbose>=2)) + except MemoryError as error0: + if self._verbose >= 2: + print('\nUsing solver "OSQP" caused a memory error.') + print('the original error message is\n', error0) + print('solver status: ', prob.status) + print('trying solver "CVXOPT" instead...\n') + try: + prob.solve(solver=cp.CVXOPT, verbose=(self._verbose>=2)) + except Exception as error1: + if self._verbose >= 2: + print('\nAn error occured when using solver "CVXOPT".') + print('the original error message is\n', error1) + print('solver status: ', prob.status) + print('trying solver "MOSEK" instead. Notice this solver is commercial and a lisence is required.\n') + prob.solve(solver=cp.MOSEK, verbose=(self._verbose>=2)) + else: + if self._verbose >= 2: + print('solver status: ', prob.status) + else: + if self._verbose >= 2: + print('solver status: ', prob.status) + if self._verbose >= 2: + print() + + + def get_results(self): + results = {} + results['residual_list'] = self._residual_list + results['runtime_list'] = self._runtime_list + results['cost_list'] = self._cost_list + results['nb_eo'] = self._nb_eo + results['itrs'] = self._itrs + results['converged'] = self._converged + results['num_updates_ecs'] = self._num_updates_ecs + results['ec_changed'] = self._ec_changed + results['residual_changed'] = self._residual_changed + results['itrs_without_update'] = self._itrs_without_update + return results \ No newline at end of file diff --git a/lang/fr/gklearn/ged/median/__init__.py b/lang/fr/gklearn/ged/median/__init__.py new file mode 100644 index 0000000000..9eb4384706 --- /dev/null +++ b/lang/fr/gklearn/ged/median/__init__.py @@ -0,0 +1,4 @@ +from gklearn.ged.median.median_graph_estimator import MedianGraphEstimator +from gklearn.ged.median.median_graph_estimator_py import MedianGraphEstimatorPy +from gklearn.ged.median.median_graph_estimator_cml import MedianGraphEstimatorCML +from gklearn.ged.median.utils import constant_node_costs, mge_options_to_string diff --git a/lang/fr/gklearn/ged/median/median_graph_estimator.py b/lang/fr/gklearn/ged/median/median_graph_estimator.py new file mode 100644 index 0000000000..03c789290c --- /dev/null +++ b/lang/fr/gklearn/ged/median/median_graph_estimator.py @@ -0,0 +1,1709 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Mon Mar 16 18:04:55 2020 + +@author: ljia +""" +import numpy as np +from gklearn.ged.env import AlgorithmState, NodeMap +from gklearn.ged.util import misc +from gklearn.utils import Timer +import time +from tqdm import tqdm +import sys +import networkx as nx +import multiprocessing +from multiprocessing import Pool +from functools import partial + + +class MedianGraphEstimator(object): # @todo: differ dummy_node from undifined node? + + def __init__(self, ged_env, constant_node_costs): + """Constructor. + + Parameters + ---------- + ged_env : gklearn.gedlib.gedlibpy.GEDEnv + Initialized GED environment. The edit costs must be set by the user. + + constant_node_costs : Boolean + Set to True if the node relabeling costs are constant. + """ + self.__ged_env = ged_env + self.__init_method = 'BRANCH_FAST' + self.__init_options = '' + self.__descent_method = 'BRANCH_FAST' + self.__descent_options = '' + self.__refine_method = 'IPFP' + self.__refine_options = '' + self.__constant_node_costs = constant_node_costs + self.__labeled_nodes = (ged_env.get_num_node_labels() > 1) + self.__node_del_cost = ged_env.get_node_del_cost(ged_env.get_node_label(1)) + self.__node_ins_cost = ged_env.get_node_ins_cost(ged_env.get_node_label(1)) + self.__labeled_edges = (ged_env.get_num_edge_labels() > 1) + self.__edge_del_cost = ged_env.get_edge_del_cost(ged_env.get_edge_label(1)) + self.__edge_ins_cost = ged_env.get_edge_ins_cost(ged_env.get_edge_label(1)) + self.__init_type = 'RANDOM' + self.__num_random_inits = 10 + self.__desired_num_random_inits = 10 + self.__use_real_randomness = True + self.__seed = 0 + self.__parallel = True + self.__update_order = True + self.__sort_graphs = True # sort graphs by size when computing GEDs. + self.__refine = True + self.__time_limit_in_sec = 0 + self.__epsilon = 0.0001 + self.__max_itrs = 100 + self.__max_itrs_without_update = 3 + self.__num_inits_increase_order = 10 + self.__init_type_increase_order = 'K-MEANS++' + self.__max_itrs_increase_order = 10 + self.__print_to_stdout = 2 + self.__median_id = np.inf # @todo: check + self.__node_maps_from_median = {} + self.__sum_of_distances = 0 + self.__best_init_sum_of_distances = np.inf + self.__converged_sum_of_distances = np.inf + self.__runtime = None + self.__runtime_initialized = None + self.__runtime_converged = None + self.__itrs = [] # @todo: check: {} ? + self.__num_decrease_order = 0 + self.__num_increase_order = 0 + self.__num_converged_descents = 0 + self.__state = AlgorithmState.TERMINATED + self.__label_names = {} + + if ged_env is None: + raise Exception('The GED environment pointer passed to the constructor of MedianGraphEstimator is null.') + elif not ged_env.is_initialized(): + raise Exception('The GED environment is uninitialized. Call gedlibpy.GEDEnv.init() before passing it to the constructor of MedianGraphEstimator.') + + + def set_options(self, options): + """Sets the options of the estimator. + + Parameters + ---------- + options : string + String that specifies with which options to run the estimator. + """ + self.__set_default_options() + options_map = misc.options_string_to_options_map(options) + for opt_name, opt_val in options_map.items(): + if opt_name == 'init-type': + self.__init_type = opt_val + if opt_val != 'MEDOID' and opt_val != 'RANDOM' and opt_val != 'MIN' and opt_val != 'MAX' and opt_val != 'MEAN': + raise Exception('Invalid argument ' + opt_val + ' for option init-type. Usage: options = "[--init-type RANDOM|MEDOID|EMPTY|MIN|MAX|MEAN] [...]"') + elif opt_name == 'random-inits': + try: + self.__num_random_inits = int(opt_val) + self.__desired_num_random_inits = self.__num_random_inits + except: + raise Exception('Invalid argument "' + opt_val + '" for option random-inits. Usage: options = "[--random-inits ]"') + + if self.__num_random_inits <= 0: + raise Exception('Invalid argument "' + opt_val + '" for option random-inits. Usage: options = "[--random-inits ]"') + + elif opt_name == 'randomness': + if opt_val == 'PSEUDO': + self.__use_real_randomness = False + + elif opt_val == 'REAL': + self.__use_real_randomness = True + + else: + raise Exception('Invalid argument "' + opt_val + '" for option randomness. Usage: options = "[--randomness REAL|PSEUDO] [...]"') + + elif opt_name == 'stdout': + if opt_val == '0': + self.__print_to_stdout = 0 + + elif opt_val == '1': + self.__print_to_stdout = 1 + + elif opt_val == '2': + self.__print_to_stdout = 2 + + else: + raise Exception('Invalid argument "' + opt_val + '" for option stdout. Usage: options = "[--stdout 0|1|2] [...]"') + + elif opt_name == 'parallel': + if opt_val == 'TRUE': + self.__parallel = True + + elif opt_val == 'FALSE': + self.__parallel = False + + else: + raise Exception('Invalid argument "' + opt_val + '" for option parallel. Usage: options = "[--parallel TRUE|FALSE] [...]"') + + elif opt_name == 'update-order': + if opt_val == 'TRUE': + self.__update_order = True + + elif opt_val == 'FALSE': + self.__update_order = False + + else: + raise Exception('Invalid argument "' + opt_val + '" for option update-order. Usage: options = "[--update-order TRUE|FALSE] [...]"') + + elif opt_name == 'sort-graphs': + if opt_val == 'TRUE': + self.__sort_graphs = True + + elif opt_val == 'FALSE': + self.__sort_graphs = False + + else: + raise Exception('Invalid argument "' + opt_val + '" for option sort-graphs. Usage: options = "[--sort-graphs TRUE|FALSE] [...]"') + + elif opt_name == 'refine': + if opt_val == 'TRUE': + self.__refine = True + + elif opt_val == 'FALSE': + self.__refine = False + + else: + raise Exception('Invalid argument "' + opt_val + '" for option refine. Usage: options = "[--refine TRUE|FALSE] [...]"') + + elif opt_name == 'time-limit': + try: + self.__time_limit_in_sec = float(opt_val) + + except: + raise Exception('Invalid argument "' + opt_val + '" for option time-limit. Usage: options = "[--time-limit ] [...]') + + elif opt_name == 'max-itrs': + try: + self.__max_itrs = int(opt_val) + + except: + raise Exception('Invalid argument "' + opt_val + '" for option max-itrs. Usage: options = "[--max-itrs ] [...]') + + elif opt_name == 'max-itrs-without-update': + try: + self.__max_itrs_without_update = int(opt_val) + + except: + raise Exception('Invalid argument "' + opt_val + '" for option max-itrs-without-update. Usage: options = "[--max-itrs-without-update ] [...]') + + elif opt_name == 'seed': + try: + self.__seed = int(opt_val) + + except: + raise Exception('Invalid argument "' + opt_val + '" for option seed. Usage: options = "[--seed ] [...]') + + elif opt_name == 'epsilon': + try: + self.__epsilon = float(opt_val) + + except: + raise Exception('Invalid argument "' + opt_val + '" for option epsilon. Usage: options = "[--epsilon ] [...]') + + if self.__epsilon <= 0: + raise Exception('Invalid argument "' + opt_val + '" for option epsilon. Usage: options = "[--epsilon ] [...]') + + elif opt_name == 'inits-increase-order': + try: + self.__num_inits_increase_order = int(opt_val) + + except: + raise Exception('Invalid argument "' + opt_val + '" for option inits-increase-order. Usage: options = "[--inits-increase-order ]"') + + if self.__num_inits_increase_order <= 0: + raise Exception('Invalid argument "' + opt_val + '" for option inits-increase-order. Usage: options = "[--inits-increase-order ]"') + + elif opt_name == 'init-type-increase-order': + self.__init_type_increase_order = opt_val + if opt_val != 'CLUSTERS' and opt_val != 'K-MEANS++': + raise Exception('Invalid argument ' + opt_val + ' for option init-type-increase-order. Usage: options = "[--init-type-increase-order CLUSTERS|K-MEANS++] [...]"') + + elif opt_name == 'max-itrs-increase-order': + try: + self.__max_itrs_increase_order = int(opt_val) + + except: + raise Exception('Invalid argument "' + opt_val + '" for option max-itrs-increase-order. Usage: options = "[--max-itrs-increase-order ] [...]') + + else: + valid_options = '[--init-type ] [--random-inits ] [--randomness ] [--seed ] [--stdout ] ' + valid_options += '[--time-limit ] [--max-itrs ] [--epsilon ] ' + valid_options += '[--inits-increase-order ] [--init-type-increase-order ] [--max-itrs-increase-order ]' + raise Exception('Invalid option "' + opt_name + '". Usage: options = "' + valid_options + '"') + + + def set_init_method(self, init_method, init_options=''): + """Selects method to be used for computing the initial medoid graph. + + Parameters + ---------- + init_method : string + The selected method. Default: ged::Options::GEDMethod::BRANCH_UNIFORM. + + init_options : string + The options for the selected method. Default: "". + + Notes + ----- + Has no effect unless "--init-type MEDOID" is passed to set_options(). + """ + self.__init_method = init_method; + self.__init_options = init_options; + + + def set_descent_method(self, descent_method, descent_options=''): + """Selects method to be used for block gradient descent.. + + Parameters + ---------- + descent_method : string + The selected method. Default: ged::Options::GEDMethod::BRANCH_FAST. + + descent_options : string + The options for the selected method. Default: "". + + Notes + ----- + Has no effect unless "--init-type MEDOID" is passed to set_options(). + """ + self.__descent_method = descent_method; + self.__descent_options = descent_options; + + + def set_refine_method(self, refine_method, refine_options): + """Selects method to be used for improving the sum of distances and the node maps for the converged median. + + Parameters + ---------- + refine_method : string + The selected method. Default: "IPFP". + + refine_options : string + The options for the selected method. Default: "". + + Notes + ----- + Has no effect if "--refine FALSE" is passed to set_options(). + """ + self.__refine_method = refine_method + self.__refine_options = refine_options + + + def run(self, graph_ids, set_median_id, gen_median_id): + """Computes a generalized median graph. + + Parameters + ---------- + graph_ids : list[integer] + The IDs of the graphs for which the median should be computed. Must have been added to the environment passed to the constructor. + + set_median_id : integer + The ID of the computed set-median. A dummy graph with this ID must have been added to the environment passed to the constructor. Upon termination, the computed median can be obtained via gklearn.gedlib.gedlibpy.GEDEnv.get_graph(). + + + gen_median_id : integer + The ID of the computed generalized median. Upon termination, the computed median can be obtained via gklearn.gedlib.gedlibpy.GEDEnv.get_graph(). + """ + # Sanity checks. + if len(graph_ids) == 0: + raise Exception('Empty vector of graph IDs, unable to compute median.') + all_graphs_empty = True + for graph_id in graph_ids: + if self.__ged_env.get_graph_num_nodes(graph_id) > 0: + all_graphs_empty = False + break + if all_graphs_empty: + raise Exception('All graphs in the collection are empty.') + + # Start timer and record start time. + start = time.time() + timer = Timer(self.__time_limit_in_sec) + self.__median_id = gen_median_id + self.__state = AlgorithmState.TERMINATED + + # Get NetworkX graph representations of the input graphs. + graphs = {} + for graph_id in graph_ids: + # @todo: get_nx_graph() function may need to be modified according to the coming code. + graphs[graph_id] = self.__ged_env.get_nx_graph(graph_id, True, True, False) +# print(self.__ged_env.get_graph_internal_id(0)) +# print(graphs[0].graph) +# print(graphs[0].nodes(data=True)) +# print(graphs[0].edges(data=True)) +# print(nx.adjacency_matrix(graphs[0])) + + # Construct initial medians. + medians = [] + self.__construct_initial_medians(graph_ids, timer, medians) + end_init = time.time() + self.__runtime_initialized = end_init - start +# print(medians[0].graph) +# print(medians[0].nodes(data=True)) +# print(medians[0].edges(data=True)) +# print(nx.adjacency_matrix(medians[0])) + + # Reset information about iterations and number of times the median decreases and increases. + self.__itrs = [0] * len(medians) + self.__num_decrease_order = 0 + self.__num_increase_order = 0 + self.__num_converged_descents = 0 + + # Initialize the best median. + best_sum_of_distances = np.inf + self.__best_init_sum_of_distances = np.inf + node_maps_from_best_median = {} + + # Run block gradient descent from all initial medians. + self.__ged_env.set_method(self.__descent_method, self.__descent_options) + for median_pos in range(0, len(medians)): + + # Terminate if the timer has expired and at least one SOD has been computed. + if timer.expired() and median_pos > 0: + break + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('\n===========================================================') + print('Block gradient descent for initial median', str(median_pos + 1), 'of', str(len(medians)), '.') + print('-----------------------------------------------------------') + + # Get reference to the median. + median = medians[median_pos] + + # Load initial median into the environment. + self.__ged_env.load_nx_graph(median, gen_median_id) + self.__ged_env.init(self.__ged_env.get_init_type()) + + # Compute node maps and sum of distances for initial median. +# xxx = self.__node_maps_from_median + self.__compute_init_node_maps(graph_ids, gen_median_id) +# yyy = self.__node_maps_from_median + + self.__best_init_sum_of_distances = min(self.__best_init_sum_of_distances, self.__sum_of_distances) + self.__ged_env.load_nx_graph(median, set_median_id) +# print(self.__best_init_sum_of_distances) + + # Run block gradient descent from initial median. + converged = False + itrs_without_update = 0 + while not self.__termination_criterion_met(converged, timer, self.__itrs[median_pos], itrs_without_update): + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('\n===========================================================') + print('Iteration', str(self.__itrs[median_pos] + 1), 'for initial median', str(median_pos + 1), 'of', str(len(medians)), '.') + print('-----------------------------------------------------------') + + # Initialize flags that tell us what happened in the iteration. + median_modified = False + node_maps_modified = False + decreased_order = False + increased_order = False + + # Update the median. + median_modified = self.__update_median(graphs, median) + if self.__update_order: + if not median_modified or self.__itrs[median_pos] == 0: + decreased_order = self.__decrease_order(graphs, median) + if not decreased_order or self.__itrs[median_pos] == 0: + increased_order = self.__increase_order(graphs, median) + + # Update the number of iterations without update of the median. + if median_modified or decreased_order or increased_order: + itrs_without_update = 0 + else: + itrs_without_update += 1 + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('Loading median to environment: ... ', end='') + + # Load the median into the environment. + # @todo: should this function use the original node label? + self.__ged_env.load_nx_graph(median, gen_median_id) + self.__ged_env.init(self.__ged_env.get_init_type()) + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('done.') + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('Updating induced costs: ... ', end='') + + # Compute induced costs of the old node maps w.r.t. the updated median. + for graph_id in graph_ids: +# print(self.__node_maps_from_median[graph_id].induced_cost()) +# xxx = self.__node_maps_from_median[graph_id] + self.__ged_env.compute_induced_cost(gen_median_id, graph_id, self.__node_maps_from_median[graph_id]) +# print('---------------------------------------') +# print(self.__node_maps_from_median[graph_id].induced_cost()) + # @todo:!!!!!!!!!!!!!!!!!!!!!!!!!!!!This value is a slight different from the c++ program, which might be a bug! Use it very carefully! + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('done.') + + # Update the node maps. + node_maps_modified = self.__update_node_maps() + + # Update the order of the median if no improvement can be found with the current order. + + # Update the sum of distances. + old_sum_of_distances = self.__sum_of_distances + self.__sum_of_distances = 0 + for graph_id, node_map in self.__node_maps_from_median.items(): + self.__sum_of_distances += node_map.induced_cost() +# print(self.__sum_of_distances) + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('Old local SOD: ', old_sum_of_distances) + print('New local SOD: ', self.__sum_of_distances) + print('Best converged SOD: ', best_sum_of_distances) + print('Modified median: ', median_modified) + print('Modified node maps: ', node_maps_modified) + print('Decreased order: ', decreased_order) + print('Increased order: ', increased_order) + print('===========================================================\n') + + converged = not (median_modified or node_maps_modified or decreased_order or increased_order) + + self.__itrs[median_pos] += 1 + + # Update the best median. + if self.__sum_of_distances < best_sum_of_distances: + best_sum_of_distances = self.__sum_of_distances + node_maps_from_best_median = self.__node_maps_from_median.copy() # @todo: this is a shallow copy, not sure if it is enough. + best_median = median + + # Update the number of converged descents. + if converged: + self.__num_converged_descents += 1 + + # Store the best encountered median. + self.__sum_of_distances = best_sum_of_distances + self.__node_maps_from_median = node_maps_from_best_median + self.__ged_env.load_nx_graph(best_median, gen_median_id) + self.__ged_env.init(self.__ged_env.get_init_type()) + end_descent = time.time() + self.__runtime_converged = end_descent - start + + # Refine the sum of distances and the node maps for the converged median. + self.__converged_sum_of_distances = self.__sum_of_distances + if self.__refine: + self.__improve_sum_of_distances(timer) + + # Record end time, set runtime and reset the number of initial medians. + end = time.time() + self.__runtime = end - start + self.__num_random_inits = self.__desired_num_random_inits + + # Print global information. + if self.__print_to_stdout != 0: + print('\n===========================================================') + print('Finished computation of generalized median graph.') + print('-----------------------------------------------------------') + print('Best SOD after initialization: ', self.__best_init_sum_of_distances) + print('Converged SOD: ', self.__converged_sum_of_distances) + if self.__refine: + print('Refined SOD: ', self.__sum_of_distances) + print('Overall runtime: ', self.__runtime) + print('Runtime of initialization: ', self.__runtime_initialized) + print('Runtime of block gradient descent: ', self.__runtime_converged - self.__runtime_initialized) + if self.__refine: + print('Runtime of refinement: ', self.__runtime - self.__runtime_converged) + print('Number of initial medians: ', len(medians)) + total_itr = 0 + num_started_descents = 0 + for itr in self.__itrs: + total_itr += itr + if itr > 0: + num_started_descents += 1 + print('Size of graph collection: ', len(graph_ids)) + print('Number of started descents: ', num_started_descents) + print('Number of converged descents: ', self.__num_converged_descents) + print('Overall number of iterations: ', total_itr) + print('Overall number of times the order decreased: ', self.__num_decrease_order) + print('Overall number of times the order increased: ', self.__num_increase_order) + print('===========================================================\n') + + + def __improve_sum_of_distances(self, timer): # @todo: go through and test + # Use method selected for refinement phase. + self.__ged_env.set_method(self.__refine_method, self.__refine_options) + + # Print information about current iteration. + if self.__print_to_stdout == 2: + progress = tqdm(desc='Improving node maps', total=len(self.__node_maps_from_median), file=sys.stdout) + print('\n===========================================================') + print('Improving node maps and SOD for converged median.') + print('-----------------------------------------------------------') + progress.update(1) + + # Improving the node maps. + nb_nodes_median = self.__ged_env.get_graph_num_nodes(self.__gen_median_id) + for graph_id, node_map in self.__node_maps_from_median.items(): + if time.expired(): + if self.__state == AlgorithmState.TERMINATED: + self.__state = AlgorithmState.CONVERGED + break + + nb_nodes_g = self.__ged_env.get_graph_num_nodes(graph_id) + if nb_nodes_median <= nb_nodes_g or not self.__sort_graphs: + self.__ged_env.run_method(self.__gen_median_id, graph_id) + if self.__ged_env.get_upper_bound(self.__gen_median_id, graph_id) < node_map.induced_cost(): + self.__node_maps_from_median[graph_id] = self.__ged_env.get_node_map(self.__gen_median_id, graph_id) + else: + self.__ged_env.run_method(graph_id, self.__gen_median_id) + if self.__ged_env.get_upper_bound(graph_id, self.__gen_median_id) < node_map.induced_cost(): + node_map_tmp = self.__ged_env.get_node_map(graph_id, self.__gen_median_id) + node_map_tmp.forward_map, node_map_tmp.backward_map = node_map_tmp.backward_map, node_map_tmp.forward_map + self.__node_maps_from_median[graph_id] = node_map_tmp + + self.__sum_of_distances += self.__node_maps_from_median[graph_id].induced_cost() + + # Print information. + if self.__print_to_stdout == 2: + progress.update(1) + + self.__sum_of_distances = 0.0 + for key, val in self.__node_maps_from_median.items(): + self.__sum_of_distances += val.induced_cost() + + # Print information. + if self.__print_to_stdout == 2: + print('===========================================================\n') + + + def __median_available(self): + return self.__median_id != np.inf + + + def get_state(self): + if not self.__median_available(): + raise Exception('No median has been computed. Call run() before calling get_state().') + return self.__state + + + def get_sum_of_distances(self, state=''): + """Returns the sum of distances. + + Parameters + ---------- + state : string + The state of the estimator. Can be 'initialized' or 'converged'. Default: "" + + Returns + ------- + float + The sum of distances (SOD) of the median when the estimator was in the state `state` during the last call to run(). If `state` is not given, the converged SOD (without refinement) or refined SOD (with refinement) is returned. + """ + if not self.__median_available(): + raise Exception('No median has been computed. Call run() before calling get_sum_of_distances().') + if state == 'initialized': + return self.__best_init_sum_of_distances + if state == 'converged': + return self.__converged_sum_of_distances + return self.__sum_of_distances + + + def get_runtime(self, state): + if not self.__median_available(): + raise Exception('No median has been computed. Call run() before calling get_runtime().') + if state == AlgorithmState.INITIALIZED: + return self.__runtime_initialized + if state == AlgorithmState.CONVERGED: + return self.__runtime_converged + return self.__runtime + + + def get_num_itrs(self): + if not self.__median_available(): + raise Exception('No median has been computed. Call run() before calling get_num_itrs().') + return self.__itrs + + + def get_num_times_order_decreased(self): + if not self.__median_available(): + raise Exception('No median has been computed. Call run() before calling get_num_times_order_decreased().') + return self.__num_decrease_order + + + def get_num_times_order_increased(self): + if not self.__median_available(): + raise Exception('No median has been computed. Call run() before calling get_num_times_order_increased().') + return self.__num_increase_order + + + def get_num_converged_descents(self): + if not self.__median_available(): + raise Exception('No median has been computed. Call run() before calling get_num_converged_descents().') + return self.__num_converged_descents + + + def get_ged_env(self): + return self.__ged_env + + + def __set_default_options(self): + self.__init_type = 'RANDOM' + self.__num_random_inits = 10 + self.__desired_num_random_inits = 10 + self.__use_real_randomness = True + self.__seed = 0 + self.__parallel = True + self.__update_order = True + self.__sort_graphs = True + self.__refine = True + self.__time_limit_in_sec = 0 + self.__epsilon = 0.0001 + self.__max_itrs = 100 + self.__max_itrs_without_update = 3 + self.__num_inits_increase_order = 10 + self.__init_type_increase_order = 'K-MEANS++' + self.__max_itrs_increase_order = 10 + self.__print_to_stdout = 2 + self.__label_names = {} + + + def __construct_initial_medians(self, graph_ids, timer, initial_medians): + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('\n===========================================================') + print('Constructing initial median(s).') + print('-----------------------------------------------------------') + + # Compute or sample the initial median(s). + initial_medians.clear() + if self.__init_type == 'MEDOID': + self.__compute_medoid(graph_ids, timer, initial_medians) + elif self.__init_type == 'MAX': + pass # @todo +# compute_max_order_graph_(graph_ids, initial_medians) + elif self.__init_type == 'MIN': + pass # @todo +# compute_min_order_graph_(graph_ids, initial_medians) + elif self.__init_type == 'MEAN': + pass # @todo +# compute_mean_order_graph_(graph_ids, initial_medians) + else: + pass # @todo +# sample_initial_medians_(graph_ids, initial_medians) + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('===========================================================') + + + def __compute_medoid(self, graph_ids, timer, initial_medians): + # Use method selected for initialization phase. + self.__ged_env.set_method(self.__init_method, self.__init_options) + + # Compute the medoid. + if self.__parallel: + # @todo: notice when parallel self.__ged_env is not modified. + sum_of_distances_list = [np.inf] * len(graph_ids) + len_itr = len(graph_ids) + itr = zip(graph_ids, range(0, len(graph_ids))) + n_jobs = multiprocessing.cpu_count() + if len_itr < 100 * n_jobs: + chunksize = int(len_itr / n_jobs) + 1 + else: + chunksize = 100 + def init_worker(ged_env_toshare): + global G_ged_env + G_ged_env = ged_env_toshare + do_fun = partial(_compute_medoid_parallel, graph_ids, self.__sort_graphs) + pool = Pool(processes=n_jobs, initializer=init_worker, initargs=(self.__ged_env,)) + if self.__print_to_stdout == 2: + iterator = tqdm(pool.imap_unordered(do_fun, itr, chunksize), + desc='Computing medoid', file=sys.stdout) + else: + iterator = pool.imap_unordered(do_fun, itr, chunksize) + for i, dis in iterator: + sum_of_distances_list[i] = dis + pool.close() + pool.join() + + medoid_id = np.argmin(sum_of_distances_list) + best_sum_of_distances = sum_of_distances_list[medoid_id] + + initial_medians.append(self.__ged_env.get_nx_graph(medoid_id, True, True, False)) # @todo + + else: + # Print information about current iteration. + if self.__print_to_stdout == 2: + progress = tqdm(desc='Computing medoid', total=len(graph_ids), file=sys.stdout) + + medoid_id = graph_ids[0] + best_sum_of_distances = np.inf + for g_id in graph_ids: + if timer.expired(): + self.__state = AlgorithmState.CALLED + break + nb_nodes_g = self.__ged_env.get_graph_num_nodes(g_id) + sum_of_distances = 0 + for h_id in graph_ids: + nb_nodes_h = self.__ged_env.get_graph_num_nodes(h_id) + if nb_nodes_g <= nb_nodes_h or not self.__sort_graphs: + self.__ged_env.run_method(g_id, h_id) + sum_of_distances += self.__ged_env.get_upper_bound(g_id, h_id) + else: + self.__ged_env.run_method(h_id, g_id) + sum_of_distances += self.__ged_env.get_upper_bound(h_id, g_id) + if sum_of_distances < best_sum_of_distances: + best_sum_of_distances = sum_of_distances + medoid_id = g_id + + # Print information about current iteration. + if self.__print_to_stdout == 2: + progress.update(1) + + initial_medians.append(self.__ged_env.get_nx_graph(medoid_id, True, True, False)) # @todo + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('\n') + + + def __compute_init_node_maps(self, graph_ids, gen_median_id): + # Compute node maps and sum of distances for initial median. + if self.__parallel: + # @todo: notice when parallel self.__ged_env is not modified. + self.__sum_of_distances = 0 + self.__node_maps_from_median.clear() + sum_of_distances_list = [0] * len(graph_ids) + + len_itr = len(graph_ids) + itr = graph_ids + n_jobs = multiprocessing.cpu_count() + if len_itr < 100 * n_jobs: + chunksize = int(len_itr / n_jobs) + 1 + else: + chunksize = 100 + def init_worker(ged_env_toshare): + global G_ged_env + G_ged_env = ged_env_toshare + nb_nodes_median = self.__ged_env.get_graph_num_nodes(gen_median_id) + do_fun = partial(_compute_init_node_maps_parallel, gen_median_id, self.__sort_graphs, nb_nodes_median) + pool = Pool(processes=n_jobs, initializer=init_worker, initargs=(self.__ged_env,)) + if self.__print_to_stdout == 2: + iterator = tqdm(pool.imap_unordered(do_fun, itr, chunksize), + desc='Computing initial node maps', file=sys.stdout) + else: + iterator = pool.imap_unordered(do_fun, itr, chunksize) + for g_id, sod, node_maps in iterator: + sum_of_distances_list[g_id] = sod + self.__node_maps_from_median[g_id] = node_maps + pool.close() + pool.join() + + self.__sum_of_distances = np.sum(sum_of_distances_list) +# xxx = self.__node_maps_from_median + + else: + # Print information about current iteration. + if self.__print_to_stdout == 2: + progress = tqdm(desc='Computing initial node maps', total=len(graph_ids), file=sys.stdout) + + self.__sum_of_distances = 0 + self.__node_maps_from_median.clear() + nb_nodes_median = self.__ged_env.get_graph_num_nodes(gen_median_id) + for graph_id in graph_ids: + nb_nodes_g = self.__ged_env.get_graph_num_nodes(graph_id) + if nb_nodes_median <= nb_nodes_g or not self.__sort_graphs: + self.__ged_env.run_method(gen_median_id, graph_id) + self.__node_maps_from_median[graph_id] = self.__ged_env.get_node_map(gen_median_id, graph_id) + else: + self.__ged_env.run_method(graph_id, gen_median_id) + node_map_tmp = self.__ged_env.get_node_map(graph_id, gen_median_id) + node_map_tmp.forward_map, node_map_tmp.backward_map = node_map_tmp.backward_map, node_map_tmp.forward_map + self.__node_maps_from_median[graph_id] = node_map_tmp + # print(self.__node_maps_from_median[graph_id]) + self.__sum_of_distances += self.__node_maps_from_median[graph_id].induced_cost() + # print(self.__sum_of_distances) + # Print information about current iteration. + if self.__print_to_stdout == 2: + progress.update(1) + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('\n') + + + def __termination_criterion_met(self, converged, timer, itr, itrs_without_update): + if timer.expired() or (itr >= self.__max_itrs if self.__max_itrs >= 0 else False): + if self.__state == AlgorithmState.TERMINATED: + self.__state = AlgorithmState.INITIALIZED + return True + return converged or (itrs_without_update > self.__max_itrs_without_update if self.__max_itrs_without_update >= 0 else False) + + + def __update_median(self, graphs, median): + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('Updating median: ', end='') + + # Store copy of the old median. + old_median = median.copy() # @todo: this is just a shallow copy. + + # Update the node labels. + if self.__labeled_nodes: + self.__update_node_labels(graphs, median) + + # Update the edges and their labels. + self.__update_edges(graphs, median) + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('done.') + + return not self.__are_graphs_equal(median, old_median) + + + def __update_node_labels(self, graphs, median): +# print('----------------------------') + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('nodes ... ', end='') + + # Iterate through all nodes of the median. + for i in range(0, nx.number_of_nodes(median)): +# print('i: ', i) + # Collect the labels of the substituted nodes. + node_labels = [] + for graph_id, graph in graphs.items(): +# print('graph_id: ', graph_id) +# print(self.__node_maps_from_median[graph_id]) +# print(self.__node_maps_from_median[graph_id].forward_map, self.__node_maps_from_median[graph_id].backward_map) + k = self.__node_maps_from_median[graph_id].image(i) +# print('k: ', k) + if k != np.inf: + node_labels.append(graph.nodes[k]) + + # Compute the median label and update the median. + if len(node_labels) > 0: +# median_label = self.__ged_env.get_median_node_label(node_labels) + median_label = self.__get_median_node_label(node_labels) + if self.__ged_env.get_node_rel_cost(median.nodes[i], median_label) > self.__epsilon: + nx.set_node_attributes(median, {i: median_label}) + + + def __update_edges(self, graphs, median): + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('edges ... ', end='') + +# # Clear the adjacency lists of the median and reset number of edges to 0. +# median_edges = list(median.edges) +# for (head, tail) in median_edges: +# median.remove_edge(head, tail) + + # @todo: what if edge is not labeled? + # Iterate through all possible edges (i,j) of the median. + for i in range(0, nx.number_of_nodes(median)): + for j in range(i + 1, nx.number_of_nodes(median)): + + # Collect the labels of the edges to which (i,j) is mapped by the node maps. + edge_labels = [] + for graph_id, graph in graphs.items(): + k = self.__node_maps_from_median[graph_id].image(i) + l = self.__node_maps_from_median[graph_id].image(j) + if k != np.inf and l != np.inf: + if graph.has_edge(k, l): + edge_labels.append(graph.edges[(k, l)]) + + # Compute the median edge label and the overall edge relabeling cost. + rel_cost = 0 + median_label = self.__ged_env.get_edge_label(1) + if median.has_edge(i, j): + median_label = median.edges[(i, j)] + if self.__labeled_edges and len(edge_labels) > 0: + new_median_label = self.__get_median_edge_label(edge_labels) + if self.__ged_env.get_edge_rel_cost(median_label, new_median_label) > self.__epsilon: + median_label = new_median_label + for edge_label in edge_labels: + rel_cost += self.__ged_env.get_edge_rel_cost(median_label, edge_label) + + # Update the median. + if median.has_edge(i, j): + median.remove_edge(i, j) + if rel_cost < (self.__edge_ins_cost + self.__edge_del_cost) * len(edge_labels) - self.__edge_del_cost * len(graphs): + median.add_edge(i, j, **median_label) +# else: +# if median.has_edge(i, j): +# median.remove_edge(i, j) + + + def __update_node_maps(self): + # Update the node maps. + if self.__parallel: + # @todo: notice when parallel self.__ged_env is not modified. + node_maps_were_modified = False +# xxx = self.__node_maps_from_median.copy() + + len_itr = len(self.__node_maps_from_median) + itr = [item for item in self.__node_maps_from_median.items()] + n_jobs = multiprocessing.cpu_count() + if len_itr < 100 * n_jobs: + chunksize = int(len_itr / n_jobs) + 1 + else: + chunksize = 100 + def init_worker(ged_env_toshare): + global G_ged_env + G_ged_env = ged_env_toshare + nb_nodes_median = self.__ged_env.get_graph_num_nodes(self.__median_id) + do_fun = partial(_update_node_maps_parallel, self.__median_id, self.__epsilon, self.__sort_graphs, nb_nodes_median) + pool = Pool(processes=n_jobs, initializer=init_worker, initargs=(self.__ged_env,)) + if self.__print_to_stdout == 2: + iterator = tqdm(pool.imap_unordered(do_fun, itr, chunksize), + desc='Updating node maps', file=sys.stdout) + else: + iterator = pool.imap_unordered(do_fun, itr, chunksize) + for g_id, node_map, nm_modified in iterator: + self.__node_maps_from_median[g_id] = node_map + if nm_modified: + node_maps_were_modified = True + pool.close() + pool.join() +# yyy = self.__node_maps_from_median.copy() + + else: + # Print information about current iteration. + if self.__print_to_stdout == 2: + progress = tqdm(desc='Updating node maps', total=len(self.__node_maps_from_median), file=sys.stdout) + + node_maps_were_modified = False + nb_nodes_median = self.__ged_env.get_graph_num_nodes(self.__median_id) + for graph_id, node_map in self.__node_maps_from_median.items(): + nb_nodes_g = self.__ged_env.get_graph_num_nodes(graph_id) + + if nb_nodes_median <= nb_nodes_g or not self.__sort_graphs: + self.__ged_env.run_method(self.__median_id, graph_id) + if self.__ged_env.get_upper_bound(self.__median_id, graph_id) < node_map.induced_cost() - self.__epsilon: + # xxx = self.__node_maps_from_median[graph_id] + self.__node_maps_from_median[graph_id] = self.__ged_env.get_node_map(self.__median_id, graph_id) + node_maps_were_modified = True + + else: + self.__ged_env.run_method(graph_id, self.__median_id) + if self.__ged_env.get_upper_bound(graph_id, self.__median_id) < node_map.induced_cost() - self.__epsilon: + node_map_tmp = self.__ged_env.get_node_map(graph_id, self.__median_id) + node_map_tmp.forward_map, node_map_tmp.backward_map = node_map_tmp.backward_map, node_map_tmp.forward_map + self.__node_maps_from_median[graph_id] = node_map_tmp + node_maps_were_modified = True + + # Print information about current iteration. + if self.__print_to_stdout == 2: + progress.update(1) + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('\n') + + # Return true if the node maps were modified. + return node_maps_were_modified + + + def __decrease_order(self, graphs, median): + # Print information about current iteration + if self.__print_to_stdout == 2: + print('Trying to decrease order: ... ', end='') + + if nx.number_of_nodes(median) <= 1: + if self.__print_to_stdout == 2: + print('median graph has only 1 node, skip decrease.') + return False + + # Initialize ID of the node that is to be deleted. + id_deleted_node = [None] # @todo: or np.inf + decreased_order = False + + # Decrease the order as long as the best deletion delta is negative. + while self.__compute_best_deletion_delta(graphs, median, id_deleted_node) < -self.__epsilon: + decreased_order = True + self.__delete_node_from_median(id_deleted_node[0], median) + if nx.number_of_nodes(median) <= 1: + if self.__print_to_stdout == 2: + print('decrease stopped because median graph remains only 1 node. ', end='') + break + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('done.') + + # Return true iff the order was decreased. + return decreased_order + + + def __compute_best_deletion_delta(self, graphs, median, id_deleted_node): + best_delta = 0.0 + + # Determine node that should be deleted (if any). + for i in range(0, nx.number_of_nodes(median)): + # Compute cost delta. + delta = 0.0 + for graph_id, graph in graphs.items(): + k = self.__node_maps_from_median[graph_id].image(i) + if k == np.inf: + delta -= self.__node_del_cost + else: + delta += self.__node_ins_cost - self.__ged_env.get_node_rel_cost(median.nodes[i], graph.nodes[k]) + for j, j_label in median[i].items(): + l = self.__node_maps_from_median[graph_id].image(j) + if k == np.inf or l == np.inf: + delta -= self.__edge_del_cost + elif not graph.has_edge(k, l): + delta -= self.__edge_del_cost + else: + delta += self.__edge_ins_cost - self.__ged_env.get_edge_rel_cost(j_label, graph.edges[(k, l)]) + + # Update best deletion delta. + if delta < best_delta - self.__epsilon: + best_delta = delta + id_deleted_node[0] = i +# id_deleted_node[0] = 3 # @todo: + + return best_delta + + + def __delete_node_from_median(self, id_deleted_node, median): + # Update the median. + mapping = {} + for i in range(0, nx.number_of_nodes(median)): + if i != id_deleted_node: + new_i = (i if i < id_deleted_node else (i - 1)) + mapping[i] = new_i + median.remove_node(id_deleted_node) + nx.relabel_nodes(median, mapping, copy=False) + + # Update the node maps. +# xxx = self.__node_maps_from_median + for key, node_map in self.__node_maps_from_median.items(): + new_node_map = NodeMap(nx.number_of_nodes(median), node_map.num_target_nodes()) + is_unassigned_target_node = [True] * node_map.num_target_nodes() + for i in range(0, nx.number_of_nodes(median) + 1): + if i != id_deleted_node: + new_i = (i if i < id_deleted_node else (i - 1)) + k = node_map.image(i) + new_node_map.add_assignment(new_i, k) + if k != np.inf: + is_unassigned_target_node[k] = False + for k in range(0, node_map.num_target_nodes()): + if is_unassigned_target_node[k]: + new_node_map.add_assignment(np.inf, k) +# print(self.__node_maps_from_median[key].forward_map, self.__node_maps_from_median[key].backward_map) +# print(new_node_map.forward_map, new_node_map.backward_map + self.__node_maps_from_median[key] = new_node_map + + # Increase overall number of decreases. + self.__num_decrease_order += 1 + + + def __increase_order(self, graphs, median): + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('Trying to increase order: ... ', end='') + + # Initialize the best configuration and the best label of the node that is to be inserted. + best_config = {} + best_label = self.__ged_env.get_node_label(1) + increased_order = False + + # Increase the order as long as the best insertion delta is negative. + while self.__compute_best_insertion_delta(graphs, best_config, best_label) < - self.__epsilon: + increased_order = True + self.__add_node_to_median(best_config, best_label, median) + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('done.') + + # Return true iff the order was increased. + return increased_order + + + def __compute_best_insertion_delta(self, graphs, best_config, best_label): + # Construct sets of inserted nodes. + no_inserted_node = True + inserted_nodes = {} + for graph_id, graph in graphs.items(): + inserted_nodes[graph_id] = [] + best_config[graph_id] = np.inf + for k in range(nx.number_of_nodes(graph)): + if self.__node_maps_from_median[graph_id].pre_image(k) == np.inf: + no_inserted_node = False + inserted_nodes[graph_id].append((k, tuple(item for item in graph.nodes[k].items()))) # @todo: can order of label names be garantteed? + + # Return 0.0 if no node is inserted in any of the graphs. + if no_inserted_node: + return 0.0 + + # Compute insertion configuration, label, and delta. + best_delta = 0.0 # @todo + if len(self.__label_names['node_labels']) == 0 and len(self.__label_names['node_attrs']) == 0: # @todo + best_delta = self.__compute_insertion_delta_unlabeled(inserted_nodes, best_config, best_label) + elif len(self.__label_names['node_labels']) > 0: # self.__constant_node_costs: + best_delta = self.__compute_insertion_delta_constant(inserted_nodes, best_config, best_label) + else: + best_delta = self.__compute_insertion_delta_generic(inserted_nodes, best_config, best_label) + + # Return the best delta. + return best_delta + + + def __compute_insertion_delta_unlabeled(self, inserted_nodes, best_config, best_label): # @todo: go through and test. + # Construct the nest configuration and compute its insertion delta. + best_delta = 0.0 + best_config.clear() + for graph_id, node_set in inserted_nodes.items(): + if len(node_set) == 0: + best_config[graph_id] = np.inf + best_delta += self.__node_del_cost + else: + best_config[graph_id] = node_set[0][0] + best_delta -= self.__node_ins_cost + + # Return the best insertion delta. + return best_delta + + + def __compute_insertion_delta_constant(self, inserted_nodes, best_config, best_label): + # Construct histogram and inverse label maps. + hist = {} + inverse_label_maps = {} + for graph_id, node_set in inserted_nodes.items(): + inverse_label_maps[graph_id] = {} + for node in node_set: + k = node[0] + label = node[1] + if label not in inverse_label_maps[graph_id]: + inverse_label_maps[graph_id][label] = k + if label not in hist: + hist[label] = 1 + else: + hist[label] += 1 + + # Determine the best label. + best_count = 0 + for key, val in hist.items(): + if val > best_count: + best_count = val + best_label_tuple = key + + # get best label. + best_label.clear() + for key, val in best_label_tuple: + best_label[key] = val + + # Construct the best configuration and compute its insertion delta. + best_config.clear() + best_delta = 0.0 + node_rel_cost = self.__ged_env.get_node_rel_cost(self.__ged_env.get_node_label(1), self.__ged_env.get_node_label(2)) + triangle_ineq_holds = (node_rel_cost <= self.__node_del_cost + self.__node_ins_cost) + for graph_id, _ in inserted_nodes.items(): + if best_label_tuple in inverse_label_maps[graph_id]: + best_config[graph_id] = inverse_label_maps[graph_id][best_label_tuple] + best_delta -= self.__node_ins_cost + elif triangle_ineq_holds and not len(inserted_nodes[graph_id]) == 0: + best_config[graph_id] = inserted_nodes[graph_id][0][0] + best_delta += node_rel_cost - self.__node_ins_cost + else: + best_config[graph_id] = np.inf + best_delta += self.__node_del_cost + + # Return the best insertion delta. + return best_delta + + + def __compute_insertion_delta_generic(self, inserted_nodes, best_config, best_label): + # Collect all node labels of inserted nodes. + node_labels = [] + for _, node_set in inserted_nodes.items(): + for node in node_set: + node_labels.append(node[1]) + + # Compute node label medians that serve as initial solutions for block gradient descent. + initial_node_labels = [] + self.__compute_initial_node_labels(node_labels, initial_node_labels) + + # Determine best insertion configuration, label, and delta via parallel block gradient descent from all initial node labels. + best_delta = 0.0 + for node_label in initial_node_labels: + # Construct local configuration. + config = {} + for graph_id, _ in inserted_nodes.items(): + config[graph_id] = tuple((np.inf, tuple(item for item in self.__ged_env.get_node_label(1).items()))) + + # Run block gradient descent. + converged = False + itr = 0 + while not self.__insertion_termination_criterion_met(converged, itr): + converged = not self.__update_config(node_label, inserted_nodes, config, node_labels) + node_label_dict = dict(node_label) + converged = converged and (not self.__update_node_label([dict(item) for item in node_labels], node_label_dict)) # @todo: the dict is tupled again in the function, can be better. + node_label = tuple(item for item in node_label_dict.items()) # @todo: watch out: initial_node_labels[i] is not modified here. + + itr += 1 + + # Compute insertion delta of converged solution. + delta = 0.0 + for _, node in config.items(): + if node[0] == np.inf: + delta += self.__node_del_cost + else: + delta += self.__ged_env.get_node_rel_cost(dict(node_label), dict(node[1])) - self.__node_ins_cost + + # Update best delta and global configuration if improvement has been found. + if delta < best_delta - self.__epsilon: + best_delta = delta + best_label.clear() + for key, val in node_label: + best_label[key] = val + best_config.clear() + for graph_id, val in config.items(): + best_config[graph_id] = val[0] + + # Return the best delta. + return best_delta + + + def __compute_initial_node_labels(self, node_labels, median_labels): + median_labels.clear() + if self.__use_real_randomness: # @todo: may not work if parallelized. + rng = np.random.randint(0, high=2**32 - 1, size=1) + urng = np.random.RandomState(seed=rng[0]) + else: + urng = np.random.RandomState(seed=self.__seed) + + # Generate the initial node label medians. + if self.__init_type_increase_order == 'K-MEANS++': + # Use k-means++ heuristic to generate the initial node label medians. + already_selected = [False] * len(node_labels) + selected_label_id = urng.randint(low=0, high=len(node_labels), size=1)[0] # c++ test: 23 + median_labels.append(node_labels[selected_label_id]) + already_selected[selected_label_id] = True +# xxx = [41, 0, 18, 9, 6, 14, 21, 25, 33] for c++ test +# iii = 0 for c++ test + while len(median_labels) < self.__num_inits_increase_order: + weights = [np.inf] * len(node_labels) + for label_id in range(0, len(node_labels)): + if already_selected[label_id]: + weights[label_id] = 0 + continue + for label in median_labels: + weights[label_id] = min(weights[label_id], self.__ged_env.get_node_rel_cost(dict(label), dict(node_labels[label_id]))) + + # get non-zero weights. + weights_p, idx_p = [], [] + for i, w in enumerate(weights): + if w != 0: + weights_p.append(w) + idx_p.append(i) + if len(weights_p) > 0: + p = np.array(weights_p) / np.sum(weights_p) + selected_label_id = urng.choice(range(0, len(weights_p)), size=1, p=p)[0] # for c++ test: xxx[iii] + selected_label_id = idx_p[selected_label_id] +# iii += 1 for c++ test + median_labels.append(node_labels[selected_label_id]) + already_selected[selected_label_id] = True + else: # skip the loop when all node_labels are selected. This happens when len(node_labels) <= self.__num_inits_increase_order. + break + else: + # Compute the initial node medians as the medians of randomly generated clusters of (roughly) equal size. + # @todo: go through and test. + shuffled_node_labels = [np.inf] * len(node_labels) #@todo: random? + # @todo: std::shuffle(shuffled_node_labels.begin(), shuffled_node_labels.end(), urng);? + cluster_size = len(node_labels) / self.__num_inits_increase_order + pos = 0.0 + cluster = [] + while len(median_labels) < self.__num_inits_increase_order - 1: + while pos < (len(median_labels) + 1) * cluster_size: + cluster.append(shuffled_node_labels[pos]) + pos += 1 + median_labels.append(self.__get_median_node_label(cluster)) + cluster.clear() + while pos < len(shuffled_node_labels): + pos += 1 + cluster.append(shuffled_node_labels[pos]) + median_labels.append(self.__get_median_node_label(cluster)) + cluster.clear() + + # Run Lloyd's Algorithm. + converged = False + closest_median_ids = [np.inf] * len(node_labels) + clusters = [[] for _ in range(len(median_labels))] + itr = 1 + while not self.__insertion_termination_criterion_met(converged, itr): + converged = not self.__update_clusters(node_labels, median_labels, closest_median_ids) + if not converged: + for cluster in clusters: + cluster.clear() + for label_id in range(0, len(node_labels)): + clusters[closest_median_ids[label_id]].append(node_labels[label_id]) + for cluster_id in range(0, len(clusters)): + node_label = dict(median_labels[cluster_id]) + self.__update_node_label([dict(item) for item in clusters[cluster_id]], node_label) # @todo: the dict is tupled again in the function, can be better. + median_labels[cluster_id] = tuple(item for item in node_label.items()) + itr += 1 + + + def __insertion_termination_criterion_met(self, converged, itr): + return converged or (itr >= self.__max_itrs_increase_order if self.__max_itrs_increase_order > 0 else False) + + + def __update_config(self, node_label, inserted_nodes, config, node_labels): + # Determine the best configuration. + config_modified = False + for graph_id, node_set in inserted_nodes.items(): + best_assignment = config[graph_id] + best_cost = 0.0 + if best_assignment[0] == np.inf: + best_cost = self.__node_del_cost + else: + best_cost = self.__ged_env.get_node_rel_cost(dict(node_label), dict(best_assignment[1])) - self.__node_ins_cost + for node in node_set: + cost = self.__ged_env.get_node_rel_cost(dict(node_label), dict(node[1])) - self.__node_ins_cost + if cost < best_cost - self.__epsilon: + best_cost = cost + best_assignment = node + config_modified = True + if self.__node_del_cost < best_cost - self.__epsilon: + best_cost = self.__node_del_cost + best_assignment = tuple((np.inf, best_assignment[1])) + config_modified = True + config[graph_id] = best_assignment + + # Collect the node labels contained in the best configuration. + node_labels.clear() + for key, val in config.items(): + if val[0] != np.inf: + node_labels.append(val[1]) + + # Return true if the configuration was modified. + return config_modified + + + def __update_node_label(self, node_labels, node_label): + if len(node_labels) == 0: # @todo: check if this is the correct solution. Especially after calling __update_config(). + return False + new_node_label = self.__get_median_node_label(node_labels) + if self.__ged_env.get_node_rel_cost(new_node_label, node_label) > self.__epsilon: + node_label.clear() + for key, val in new_node_label.items(): + node_label[key] = val + return True + return False + + + def __update_clusters(self, node_labels, median_labels, closest_median_ids): + # Determine the closest median for each node label. + clusters_modified = False + for label_id in range(0, len(node_labels)): + closest_median_id = np.inf + dist_to_closest_median = np.inf + for median_id in range(0, len(median_labels)): + dist_to_median = self.__ged_env.get_node_rel_cost(dict(median_labels[median_id]), dict(node_labels[label_id])) + if dist_to_median < dist_to_closest_median - self.__epsilon: + dist_to_closest_median = dist_to_median + closest_median_id = median_id + if closest_median_id != closest_median_ids[label_id]: + closest_median_ids[label_id] = closest_median_id + clusters_modified = True + + # Return true if the clusters were modified. + return clusters_modified + + + def __add_node_to_median(self, best_config, best_label, median): + # Update the median. + nb_nodes_median = nx.number_of_nodes(median) + median.add_node(nb_nodes_median, **best_label) + + # Update the node maps. + for graph_id, node_map in self.__node_maps_from_median.items(): + node_map_as_rel = [] + node_map.as_relation(node_map_as_rel) + new_node_map = NodeMap(nx.number_of_nodes(median), node_map.num_target_nodes()) + for assignment in node_map_as_rel: + new_node_map.add_assignment(assignment[0], assignment[1]) + new_node_map.add_assignment(nx.number_of_nodes(median) - 1, best_config[graph_id]) + self.__node_maps_from_median[graph_id] = new_node_map + + # Increase overall number of increases. + self.__num_increase_order += 1 + + + def __are_graphs_equal(self, g1, g2): + """ + Check if the two graphs are equal. + + Parameters + ---------- + g1 : NetworkX graph object + Graph 1 to be compared. + + g2 : NetworkX graph object + Graph 2 to be compared. + + Returns + ------- + bool + True if the two graph are equal. + + Notes + ----- + This is not an identical check. Here the two graphs are equal if and only if their original_node_ids, nodes, all node labels, edges and all edge labels are equal. This function is specifically designed for class `MedianGraphEstimator` and should not be used elsewhere. + """ + # check original node ids. + if not g1.graph['original_node_ids'] == g2.graph['original_node_ids']: + return False + # check nodes. + nlist1 = [n for n in g1.nodes(data=True)] + nlist2 = [n for n in g2.nodes(data=True)] + if not nlist1 == nlist2: + return False + # check edges. + elist1 = [n for n in g1.edges(data=True)] + elist2 = [n for n in g2.edges(data=True)] + if not elist1 == elist2: + return False + + return True + + + def compute_my_cost(g, h, node_map): + cost = 0.0 + for node in g.nodes: + cost += 0 + + + def set_label_names(self, node_labels=[], edge_labels=[], node_attrs=[], edge_attrs=[]): + self.__label_names = {'node_labels': node_labels, 'edge_labels': edge_labels, + 'node_attrs': node_attrs, 'edge_attrs': edge_attrs} + + + def __get_median_node_label(self, node_labels): + if len(self.__label_names['node_labels']) > 0: + return self.__get_median_label_symbolic(node_labels) + elif len(self.__label_names['node_attrs']) > 0: + return self.__get_median_label_nonsymbolic(node_labels) + else: + raise Exception('Node label names are not given.') + + + def __get_median_edge_label(self, edge_labels): + if len(self.__label_names['edge_labels']) > 0: + return self.__get_median_label_symbolic(edge_labels) + elif len(self.__label_names['edge_attrs']) > 0: + return self.__get_median_label_nonsymbolic(edge_labels) + else: + raise Exception('Edge label names are not given.') + + + def __get_median_label_symbolic(self, labels): + # Construct histogram. + hist = {} + for label in labels: + label = tuple([kv for kv in label.items()]) # @todo: this may be slow. + if label not in hist: + hist[label] = 1 + else: + hist[label] += 1 + + # Return the label that appears most frequently. + best_count = 0 + median_label = {} + for label, count in hist.items(): + if count > best_count: + best_count = count + median_label = {kv[0]: kv[1] for kv in label} + + return median_label + + + def __get_median_label_nonsymbolic(self, labels): + if len(labels) == 0: + return {} # @todo + else: + # Transform the labels into coordinates and compute mean label as initial solution. + labels_as_coords = [] + sums = {} + for key, val in labels[0].items(): + sums[key] = 0 + for label in labels: + coords = {} + for key, val in label.items(): + label_f = float(val) + sums[key] += label_f + coords[key] = label_f + labels_as_coords.append(coords) + median = {} + for key, val in sums.items(): + median[key] = val / len(labels) + + # Run main loop of Weiszfeld's Algorithm. + epsilon = 0.0001 + delta = 1.0 + num_itrs = 0 + all_equal = False + while ((delta > epsilon) and (num_itrs < 100) and (not all_equal)): + numerator = {} + for key, val in sums.items(): + numerator[key] = 0 + denominator = 0 + for label_as_coord in labels_as_coords: + norm = 0 + for key, val in label_as_coord.items(): + norm += (val - median[key]) ** 2 + norm = np.sqrt(norm) + if norm > 0: + for key, val in label_as_coord.items(): + numerator[key] += val / norm + denominator += 1.0 / norm + if denominator == 0: + all_equal = True + else: + new_median = {} + delta = 0.0 + for key, val in numerator.items(): + this_median = val / denominator + new_median[key] = this_median + delta += np.abs(median[key] - this_median) + median = new_median + + num_itrs += 1 + + # Transform the solution to strings and return it. + median_label = {} + for key, val in median.items(): + median_label[key] = str(val) + return median_label + + +# def __get_median_edge_label_symbolic(self, edge_labels): +# pass + + +# def __get_median_edge_label_nonsymbolic(self, edge_labels): +# if len(edge_labels) == 0: +# return {} +# else: +# # Transform the labels into coordinates and compute mean label as initial solution. +# edge_labels_as_coords = [] +# sums = {} +# for key, val in edge_labels[0].items(): +# sums[key] = 0 +# for edge_label in edge_labels: +# coords = {} +# for key, val in edge_label.items(): +# label = float(val) +# sums[key] += label +# coords[key] = label +# edge_labels_as_coords.append(coords) +# median = {} +# for key, val in sums.items(): +# median[key] = val / len(edge_labels) +# +# # Run main loop of Weiszfeld's Algorithm. +# epsilon = 0.0001 +# delta = 1.0 +# num_itrs = 0 +# all_equal = False +# while ((delta > epsilon) and (num_itrs < 100) and (not all_equal)): +# numerator = {} +# for key, val in sums.items(): +# numerator[key] = 0 +# denominator = 0 +# for edge_label_as_coord in edge_labels_as_coords: +# norm = 0 +# for key, val in edge_label_as_coord.items(): +# norm += (val - median[key]) ** 2 +# norm += np.sqrt(norm) +# if norm > 0: +# for key, val in edge_label_as_coord.items(): +# numerator[key] += val / norm +# denominator += 1.0 / norm +# if denominator == 0: +# all_equal = True +# else: +# new_median = {} +# delta = 0.0 +# for key, val in numerator.items(): +# this_median = val / denominator +# new_median[key] = this_median +# delta += np.abs(median[key] - this_median) +# median = new_median +# +# num_itrs += 1 +# +# # Transform the solution to ged::GXLLabel and return it. +# median_label = {} +# for key, val in median.items(): +# median_label[key] = str(val) +# return median_label + + +def _compute_medoid_parallel(graph_ids, sort, itr): + g_id = itr[0] + i = itr[1] + # @todo: timer not considered here. +# if timer.expired(): +# self.__state = AlgorithmState.CALLED +# break + nb_nodes_g = G_ged_env.get_graph_num_nodes(g_id) + sum_of_distances = 0 + for h_id in graph_ids: + nb_nodes_h = G_ged_env.get_graph_num_nodes(h_id) + if nb_nodes_g <= nb_nodes_h or not sort: + G_ged_env.run_method(g_id, h_id) + sum_of_distances += G_ged_env.get_upper_bound(g_id, h_id) + else: + G_ged_env.run_method(h_id, g_id) + sum_of_distances += G_ged_env.get_upper_bound(h_id, g_id) + return i, sum_of_distances + + +def _compute_init_node_maps_parallel(gen_median_id, sort, nb_nodes_median, itr): + graph_id = itr + nb_nodes_g = G_ged_env.get_graph_num_nodes(graph_id) + if nb_nodes_median <= nb_nodes_g or not sort: + G_ged_env.run_method(gen_median_id, graph_id) + node_map = G_ged_env.get_node_map(gen_median_id, graph_id) +# print(self.__node_maps_from_median[graph_id]) + else: + G_ged_env.run_method(graph_id, gen_median_id) + node_map = G_ged_env.get_node_map(graph_id, gen_median_id) + node_map.forward_map, node_map.backward_map = node_map.backward_map, node_map.forward_map + sum_of_distance = node_map.induced_cost() +# print(self.__sum_of_distances) + return graph_id, sum_of_distance, node_map + + +def _update_node_maps_parallel(median_id, epsilon, sort, nb_nodes_median, itr): + graph_id = itr[0] + node_map = itr[1] + + node_maps_were_modified = False + nb_nodes_g = G_ged_env.get_graph_num_nodes(graph_id) + if nb_nodes_median <= nb_nodes_g or not sort: + G_ged_env.run_method(median_id, graph_id) + if G_ged_env.get_upper_bound(median_id, graph_id) < node_map.induced_cost() - epsilon: + node_map = G_ged_env.get_node_map(median_id, graph_id) + node_maps_were_modified = True + else: + G_ged_env.run_method(graph_id, median_id) + if G_ged_env.get_upper_bound(graph_id, median_id) < node_map.induced_cost() - epsilon: + node_map = G_ged_env.get_node_map(graph_id, median_id) + node_map.forward_map, node_map.backward_map = node_map.backward_map, node_map.forward_map + node_maps_were_modified = True + + return graph_id, node_map, node_maps_were_modified \ No newline at end of file diff --git a/lang/fr/gklearn/ged/median/median_graph_estimator_cml.py b/lang/fr/gklearn/ged/median/median_graph_estimator_cml.py new file mode 100644 index 0000000000..2d5b110868 --- /dev/null +++ b/lang/fr/gklearn/ged/median/median_graph_estimator_cml.py @@ -0,0 +1,1676 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Mon Mar 16 18:04:55 2020 + +@author: ljia +""" +import numpy as np +import time +from tqdm import tqdm +import sys +import networkx as nx +import multiprocessing +from multiprocessing import Pool +from functools import partial +from gklearn.ged.env import AlgorithmState, NodeMap +from gklearn.ged.util import misc +from gklearn.utils import Timer, SpecialLabel + + +class MedianGraphEstimatorCML(object): # @todo: differ dummy_node from undifined node? + """Estimate median graphs using the pure Python version of GEDEnv. + """ + + def __init__(self, ged_env, constant_node_costs): + """Constructor. + + Parameters + ---------- + ged_env : gklearn.gedlib.gedlibpy.GEDEnv + Initialized GED environment. The edit costs must be set by the user. + + constant_node_costs : Boolean + Set to True if the node relabeling costs are constant. + """ + self.__ged_env = ged_env + self.__init_method = 'BRANCH_FAST' + self.__init_options = '' + self.__descent_method = 'BRANCH_FAST' + self.__descent_options = '' + self.__refine_method = 'IPFP' + self.__refine_options = '' + self.__constant_node_costs = constant_node_costs + self.__labeled_nodes = (ged_env.get_num_node_labels() > 1) + self.__node_del_cost = ged_env.get_node_del_cost(ged_env.get_node_label(1, to_dict=False)) + self.__node_ins_cost = ged_env.get_node_ins_cost(ged_env.get_node_label(1, to_dict=False)) + self.__labeled_edges = (ged_env.get_num_edge_labels() > 1) + self.__edge_del_cost = ged_env.get_edge_del_cost(ged_env.get_edge_label(1, to_dict=False)) + self.__edge_ins_cost = ged_env.get_edge_ins_cost(ged_env.get_edge_label(1, to_dict=False)) + self.__init_type = 'RANDOM' + self.__num_random_inits = 10 + self.__desired_num_random_inits = 10 + self.__use_real_randomness = True + self.__seed = 0 + self.__parallel = True + self.__update_order = True + self.__sort_graphs = True # sort graphs by size when computing GEDs. + self.__refine = True + self.__time_limit_in_sec = 0 + self.__epsilon = 0.0001 + self.__max_itrs = 100 + self.__max_itrs_without_update = 3 + self.__num_inits_increase_order = 10 + self.__init_type_increase_order = 'K-MEANS++' + self.__max_itrs_increase_order = 10 + self.__print_to_stdout = 2 + self.__median_id = np.inf # @todo: check + self.__node_maps_from_median = {} + self.__sum_of_distances = 0 + self.__best_init_sum_of_distances = np.inf + self.__converged_sum_of_distances = np.inf + self.__runtime = None + self.__runtime_initialized = None + self.__runtime_converged = None + self.__itrs = [] # @todo: check: {} ? + self.__num_decrease_order = 0 + self.__num_increase_order = 0 + self.__num_converged_descents = 0 + self.__state = AlgorithmState.TERMINATED + self.__label_names = {} + + if ged_env is None: + raise Exception('The GED environment pointer passed to the constructor of MedianGraphEstimator is null.') + elif not ged_env.is_initialized(): + raise Exception('The GED environment is uninitialized. Call gedlibpy.GEDEnv.init() before passing it to the constructor of MedianGraphEstimator.') + + + def set_options(self, options): + """Sets the options of the estimator. + + Parameters + ---------- + options : string + String that specifies with which options to run the estimator. + """ + self.__set_default_options() + options_map = misc.options_string_to_options_map(options) + for opt_name, opt_val in options_map.items(): + if opt_name == 'init-type': + self.__init_type = opt_val + if opt_val != 'MEDOID' and opt_val != 'RANDOM' and opt_val != 'MIN' and opt_val != 'MAX' and opt_val != 'MEAN': + raise Exception('Invalid argument ' + opt_val + ' for option init-type. Usage: options = "[--init-type RANDOM|MEDOID|EMPTY|MIN|MAX|MEAN] [...]"') + elif opt_name == 'random-inits': + try: + self.__num_random_inits = int(opt_val) + self.__desired_num_random_inits = self.__num_random_inits + except: + raise Exception('Invalid argument "' + opt_val + '" for option random-inits. Usage: options = "[--random-inits ]"') + + if self.__num_random_inits <= 0: + raise Exception('Invalid argument "' + opt_val + '" for option random-inits. Usage: options = "[--random-inits ]"') + + elif opt_name == 'randomness': + if opt_val == 'PSEUDO': + self.__use_real_randomness = False + + elif opt_val == 'REAL': + self.__use_real_randomness = True + + else: + raise Exception('Invalid argument "' + opt_val + '" for option randomness. Usage: options = "[--randomness REAL|PSEUDO] [...]"') + + elif opt_name == 'stdout': + if opt_val == '0': + self.__print_to_stdout = 0 + + elif opt_val == '1': + self.__print_to_stdout = 1 + + elif opt_val == '2': + self.__print_to_stdout = 2 + + else: + raise Exception('Invalid argument "' + opt_val + '" for option stdout. Usage: options = "[--stdout 0|1|2] [...]"') + + elif opt_name == 'parallel': + if opt_val == 'TRUE': + self.__parallel = True + + elif opt_val == 'FALSE': + self.__parallel = False + + else: + raise Exception('Invalid argument "' + opt_val + '" for option parallel. Usage: options = "[--parallel TRUE|FALSE] [...]"') + + elif opt_name == 'update-order': + if opt_val == 'TRUE': + self.__update_order = True + + elif opt_val == 'FALSE': + self.__update_order = False + + else: + raise Exception('Invalid argument "' + opt_val + '" for option update-order. Usage: options = "[--update-order TRUE|FALSE] [...]"') + + elif opt_name == 'sort-graphs': + if opt_val == 'TRUE': + self.__sort_graphs = True + + elif opt_val == 'FALSE': + self.__sort_graphs = False + + else: + raise Exception('Invalid argument "' + opt_val + '" for option sort-graphs. Usage: options = "[--sort-graphs TRUE|FALSE] [...]"') + + elif opt_name == 'refine': + if opt_val == 'TRUE': + self.__refine = True + + elif opt_val == 'FALSE': + self.__refine = False + + else: + raise Exception('Invalid argument "' + opt_val + '" for option refine. Usage: options = "[--refine TRUE|FALSE] [...]"') + + elif opt_name == 'time-limit': + try: + self.__time_limit_in_sec = float(opt_val) + + except: + raise Exception('Invalid argument "' + opt_val + '" for option time-limit. Usage: options = "[--time-limit ] [...]') + + elif opt_name == 'max-itrs': + try: + self.__max_itrs = int(opt_val) + + except: + raise Exception('Invalid argument "' + opt_val + '" for option max-itrs. Usage: options = "[--max-itrs ] [...]') + + elif opt_name == 'max-itrs-without-update': + try: + self.__max_itrs_without_update = int(opt_val) + + except: + raise Exception('Invalid argument "' + opt_val + '" for option max-itrs-without-update. Usage: options = "[--max-itrs-without-update ] [...]') + + elif opt_name == 'seed': + try: + self.__seed = int(opt_val) + + except: + raise Exception('Invalid argument "' + opt_val + '" for option seed. Usage: options = "[--seed ] [...]') + + elif opt_name == 'epsilon': + try: + self.__epsilon = float(opt_val) + + except: + raise Exception('Invalid argument "' + opt_val + '" for option epsilon. Usage: options = "[--epsilon ] [...]') + + if self.__epsilon <= 0: + raise Exception('Invalid argument "' + opt_val + '" for option epsilon. Usage: options = "[--epsilon ] [...]') + + elif opt_name == 'inits-increase-order': + try: + self.__num_inits_increase_order = int(opt_val) + + except: + raise Exception('Invalid argument "' + opt_val + '" for option inits-increase-order. Usage: options = "[--inits-increase-order ]"') + + if self.__num_inits_increase_order <= 0: + raise Exception('Invalid argument "' + opt_val + '" for option inits-increase-order. Usage: options = "[--inits-increase-order ]"') + + elif opt_name == 'init-type-increase-order': + self.__init_type_increase_order = opt_val + if opt_val != 'CLUSTERS' and opt_val != 'K-MEANS++': + raise Exception('Invalid argument ' + opt_val + ' for option init-type-increase-order. Usage: options = "[--init-type-increase-order CLUSTERS|K-MEANS++] [...]"') + + elif opt_name == 'max-itrs-increase-order': + try: + self.__max_itrs_increase_order = int(opt_val) + + except: + raise Exception('Invalid argument "' + opt_val + '" for option max-itrs-increase-order. Usage: options = "[--max-itrs-increase-order ] [...]') + + else: + valid_options = '[--init-type ] [--random-inits ] [--randomness ] [--seed ] [--stdout ] ' + valid_options += '[--time-limit ] [--max-itrs ] [--epsilon ] ' + valid_options += '[--inits-increase-order ] [--init-type-increase-order ] [--max-itrs-increase-order ]' + raise Exception('Invalid option "' + opt_name + '". Usage: options = "' + valid_options + '"') + + + def set_init_method(self, init_method, init_options={}): + """Selects method to be used for computing the initial medoid graph. + + Parameters + ---------- + init_method : string + The selected method. Default: ged::Options::GEDMethod::BRANCH_UNIFORM. + + init_options : string + The options for the selected method. Default: "". + + Notes + ----- + Has no effect unless "--init-type MEDOID" is passed to set_options(). + """ + self.__init_method = init_method; + self.__init_options = init_options; + + + def set_descent_method(self, descent_method, descent_options=''): + """Selects method to be used for block gradient descent.. + + Parameters + ---------- + descent_method : string + The selected method. Default: ged::Options::GEDMethod::BRANCH_FAST. + + descent_options : string + The options for the selected method. Default: "". + + Notes + ----- + Has no effect unless "--init-type MEDOID" is passed to set_options(). + """ + self.__descent_method = descent_method; + self.__descent_options = descent_options; + + + def set_refine_method(self, refine_method, refine_options): + """Selects method to be used for improving the sum of distances and the node maps for the converged median. + + Parameters + ---------- + refine_method : string + The selected method. Default: "IPFP". + + refine_options : string + The options for the selected method. Default: "". + + Notes + ----- + Has no effect if "--refine FALSE" is passed to set_options(). + """ + self.__refine_method = refine_method + self.__refine_options = refine_options + + + def run(self, graph_ids, set_median_id, gen_median_id): + """Computes a generalized median graph. + + Parameters + ---------- + graph_ids : list[integer] + The IDs of the graphs for which the median should be computed. Must have been added to the environment passed to the constructor. + + set_median_id : integer + The ID of the computed set-median. A dummy graph with this ID must have been added to the environment passed to the constructor. Upon termination, the computed median can be obtained via gklearn.gedlib.gedlibpy.GEDEnv.get_graph(). + + + gen_median_id : integer + The ID of the computed generalized median. Upon termination, the computed median can be obtained via gklearn.gedlib.gedlibpy.GEDEnv.get_graph(). + """ + # Sanity checks. + if len(graph_ids) == 0: + raise Exception('Empty vector of graph IDs, unable to compute median.') + all_graphs_empty = True + for graph_id in graph_ids: + if self.__ged_env.get_graph_num_nodes(graph_id) > 0: + all_graphs_empty = False + break + if all_graphs_empty: + raise Exception('All graphs in the collection are empty.') + + # Start timer and record start time. + start = time.time() + timer = Timer(self.__time_limit_in_sec) + self.__median_id = gen_median_id + self.__state = AlgorithmState.TERMINATED + + # Get NetworkX graph representations of the input graphs. + graphs = {} + for graph_id in graph_ids: + # @todo: get_nx_graph() function may need to be modified according to the coming code. + graphs[graph_id] = self.__ged_env.get_nx_graph(graph_id) +# print(self.__ged_env.get_graph_internal_id(0)) +# print(graphs[0].graph) +# print(graphs[0].nodes(data=True)) +# print(graphs[0].edges(data=True)) +# print(nx.adjacency_matrix(graphs[0])) + + # Construct initial medians. + medians = [] + self.__construct_initial_medians(graph_ids, timer, medians) + end_init = time.time() + self.__runtime_initialized = end_init - start +# print(medians[0].graph) +# print(medians[0].nodes(data=True)) +# print(medians[0].edges(data=True)) +# print(nx.adjacency_matrix(medians[0])) + + # Reset information about iterations and number of times the median decreases and increases. + self.__itrs = [0] * len(medians) + self.__num_decrease_order = 0 + self.__num_increase_order = 0 + self.__num_converged_descents = 0 + + # Initialize the best median. + best_sum_of_distances = np.inf + self.__best_init_sum_of_distances = np.inf + node_maps_from_best_median = {} + + # Run block gradient descent from all initial medians. + self.__ged_env.set_method(self.__descent_method, self.__descent_options) + for median_pos in range(0, len(medians)): + + # Terminate if the timer has expired and at least one SOD has been computed. + if timer.expired() and median_pos > 0: + break + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('\n===========================================================') + print('Block gradient descent for initial median', str(median_pos + 1), 'of', str(len(medians)), '.') + print('-----------------------------------------------------------') + + # Get reference to the median. + median = medians[median_pos] + + # Load initial median into the environment. + self.__ged_env.load_nx_graph(median, gen_median_id) + self.__ged_env.init(self.__ged_env.get_init_type()) + + # Compute node maps and sum of distances for initial median. +# xxx = self.__node_maps_from_median + self.__compute_init_node_maps(graph_ids, gen_median_id) +# yyy = self.__node_maps_from_median + + self.__best_init_sum_of_distances = min(self.__best_init_sum_of_distances, self.__sum_of_distances) + self.__ged_env.load_nx_graph(median, set_median_id) +# print(self.__best_init_sum_of_distances) + + # Run block gradient descent from initial median. + converged = False + itrs_without_update = 0 + while not self.__termination_criterion_met(converged, timer, self.__itrs[median_pos], itrs_without_update): + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('\n===========================================================') + print('Iteration', str(self.__itrs[median_pos] + 1), 'for initial median', str(median_pos + 1), 'of', str(len(medians)), '.') + print('-----------------------------------------------------------') + + # Initialize flags that tell us what happened in the iteration. + median_modified = False + node_maps_modified = False + decreased_order = False + increased_order = False + + # Update the median. + median_modified = self.__update_median(graphs, median) + if self.__update_order: + pass # @todo: +# if not median_modified or self.__itrs[median_pos] == 0: +# decreased_order = self.__decrease_order(graphs, median) +# if not decreased_order or self.__itrs[median_pos] == 0: +# increased_order = self.__increase_order(graphs, median) + + # Update the number of iterations without update of the median. + if median_modified or decreased_order or increased_order: + itrs_without_update = 0 + else: + itrs_without_update += 1 + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('Loading median to environment: ... ', end='') + + # Load the median into the environment. + # @todo: should this function use the original node label? + self.__ged_env.load_nx_graph(median, gen_median_id) + self.__ged_env.init(self.__ged_env.get_init_type()) + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('done.') + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('Updating induced costs: ... ', end='') + + # Compute induced costs of the old node maps w.r.t. the updated median. + for graph_id in graph_ids: +# print(self.__node_maps_from_median[graph_id].induced_cost()) +# xxx = self.__node_maps_from_median[graph_id] + self.__ged_env.compute_induced_cost(gen_median_id, graph_id, self.__node_maps_from_median[graph_id]) +# print('---------------------------------------') +# print(self.__node_maps_from_median[graph_id].induced_cost()) + # @todo:!!!!!!!!!!!!!!!!!!!!!!!!!!!!This value is a slight different from the c++ program, which might be a bug! Use it very carefully! + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('done.') + + # Update the node maps. + node_maps_modified = self.__update_node_maps() + + # Update the order of the median if no improvement can be found with the current order. + + # Update the sum of distances. + old_sum_of_distances = self.__sum_of_distances + self.__sum_of_distances = 0 + for graph_id, node_map in self.__node_maps_from_median.items(): + self.__sum_of_distances += node_map.induced_cost() +# print(self.__sum_of_distances) + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('Old local SOD: ', old_sum_of_distances) + print('New local SOD: ', self.__sum_of_distances) + print('Best converged SOD: ', best_sum_of_distances) + print('Modified median: ', median_modified) + print('Modified node maps: ', node_maps_modified) + print('Decreased order: ', decreased_order) + print('Increased order: ', increased_order) + print('===========================================================\n') + + converged = not (median_modified or node_maps_modified or decreased_order or increased_order) + + self.__itrs[median_pos] += 1 + + # Update the best median. + if self.__sum_of_distances < best_sum_of_distances: + best_sum_of_distances = self.__sum_of_distances + node_maps_from_best_median = self.__node_maps_from_median.copy() # @todo: this is a shallow copy, not sure if it is enough. + best_median = median + + # Update the number of converged descents. + if converged: + self.__num_converged_descents += 1 + + # Store the best encountered median. + self.__sum_of_distances = best_sum_of_distances + self.__node_maps_from_median = node_maps_from_best_median + self.__ged_env.load_nx_graph(best_median, gen_median_id) + self.__ged_env.init(self.__ged_env.get_init_type()) + end_descent = time.time() + self.__runtime_converged = end_descent - start + + # Refine the sum of distances and the node maps for the converged median. + self.__converged_sum_of_distances = self.__sum_of_distances + if self.__refine: + self.__improve_sum_of_distances(timer) + + # Record end time, set runtime and reset the number of initial medians. + end = time.time() + self.__runtime = end - start + self.__num_random_inits = self.__desired_num_random_inits + + # Print global information. + if self.__print_to_stdout != 0: + print('\n===========================================================') + print('Finished computation of generalized median graph.') + print('-----------------------------------------------------------') + print('Best SOD after initialization: ', self.__best_init_sum_of_distances) + print('Converged SOD: ', self.__converged_sum_of_distances) + if self.__refine: + print('Refined SOD: ', self.__sum_of_distances) + print('Overall runtime: ', self.__runtime) + print('Runtime of initialization: ', self.__runtime_initialized) + print('Runtime of block gradient descent: ', self.__runtime_converged - self.__runtime_initialized) + if self.__refine: + print('Runtime of refinement: ', self.__runtime - self.__runtime_converged) + print('Number of initial medians: ', len(medians)) + total_itr = 0 + num_started_descents = 0 + for itr in self.__itrs: + total_itr += itr + if itr > 0: + num_started_descents += 1 + print('Size of graph collection: ', len(graph_ids)) + print('Number of started descents: ', num_started_descents) + print('Number of converged descents: ', self.__num_converged_descents) + print('Overall number of iterations: ', total_itr) + print('Overall number of times the order decreased: ', self.__num_decrease_order) + print('Overall number of times the order increased: ', self.__num_increase_order) + print('===========================================================\n') + + + def __improve_sum_of_distances(self, timer): # @todo: go through and test + # Use method selected for refinement phase. + self.__ged_env.set_method(self.__refine_method, self.__refine_options) + + # Print information about current iteration. + if self.__print_to_stdout == 2: + progress = tqdm(desc='Improving node maps', total=len(self.__node_maps_from_median), file=sys.stdout) + print('\n===========================================================') + print('Improving node maps and SOD for converged median.') + print('-----------------------------------------------------------') + progress.update(1) + + # Improving the node maps. + nb_nodes_median = self.__ged_env.get_graph_num_nodes(self.__gen_median_id) + for graph_id, node_map in self.__node_maps_from_median.items(): + if time.expired(): + if self.__state == AlgorithmState.TERMINATED: + self.__state = AlgorithmState.CONVERGED + break + + nb_nodes_g = self.__ged_env.get_graph_num_nodes(graph_id) + if nb_nodes_median <= nb_nodes_g or not self.__sort_graphs: + self.__ged_env.run_method(self.__gen_median_id, graph_id) + if self.__ged_env.get_upper_bound(self.__gen_median_id, graph_id) < node_map.induced_cost(): + self.__node_maps_from_median[graph_id] = self.__ged_env.get_node_map(self.__gen_median_id, graph_id) + else: + self.__ged_env.run_method(graph_id, self.__gen_median_id) + if self.__ged_env.get_upper_bound(graph_id, self.__gen_median_id) < node_map.induced_cost(): + node_map_tmp = self.__ged_env.get_node_map(graph_id, self.__gen_median_id) + node_map_tmp.forward_map, node_map_tmp.backward_map = node_map_tmp.backward_map, node_map_tmp.forward_map + self.__node_maps_from_median[graph_id] = node_map_tmp + + self.__sum_of_distances += self.__node_maps_from_median[graph_id].induced_cost() + + # Print information. + if self.__print_to_stdout == 2: + progress.update(1) + + self.__sum_of_distances = 0.0 + for key, val in self.__node_maps_from_median.items(): + self.__sum_of_distances += val.induced_cost() + + # Print information. + if self.__print_to_stdout == 2: + print('===========================================================\n') + + + def __median_available(self): + return self.__median_id != np.inf + + + def get_state(self): + if not self.__median_available(): + raise Exception('No median has been computed. Call run() before calling get_state().') + return self.__state + + + def get_sum_of_distances(self, state=''): + """Returns the sum of distances. + + Parameters + ---------- + state : string + The state of the estimator. Can be 'initialized' or 'converged'. Default: "" + + Returns + ------- + float + The sum of distances (SOD) of the median when the estimator was in the state `state` during the last call to run(). If `state` is not given, the converged SOD (without refinement) or refined SOD (with refinement) is returned. + """ + if not self.__median_available(): + raise Exception('No median has been computed. Call run() before calling get_sum_of_distances().') + if state == 'initialized': + return self.__best_init_sum_of_distances + if state == 'converged': + return self.__converged_sum_of_distances + return self.__sum_of_distances + + + def get_runtime(self, state): + if not self.__median_available(): + raise Exception('No median has been computed. Call run() before calling get_runtime().') + if state == AlgorithmState.INITIALIZED: + return self.__runtime_initialized + if state == AlgorithmState.CONVERGED: + return self.__runtime_converged + return self.__runtime + + + def get_num_itrs(self): + if not self.__median_available(): + raise Exception('No median has been computed. Call run() before calling get_num_itrs().') + return self.__itrs + + + def get_num_times_order_decreased(self): + if not self.__median_available(): + raise Exception('No median has been computed. Call run() before calling get_num_times_order_decreased().') + return self.__num_decrease_order + + + def get_num_times_order_increased(self): + if not self.__median_available(): + raise Exception('No median has been computed. Call run() before calling get_num_times_order_increased().') + return self.__num_increase_order + + + def get_num_converged_descents(self): + if not self.__median_available(): + raise Exception('No median has been computed. Call run() before calling get_num_converged_descents().') + return self.__num_converged_descents + + + def get_ged_env(self): + return self.__ged_env + + + def __set_default_options(self): + self.__init_type = 'RANDOM' + self.__num_random_inits = 10 + self.__desired_num_random_inits = 10 + self.__use_real_randomness = True + self.__seed = 0 + self.__parallel = True + self.__update_order = True + self.__sort_graphs = True + self.__refine = True + self.__time_limit_in_sec = 0 + self.__epsilon = 0.0001 + self.__max_itrs = 100 + self.__max_itrs_without_update = 3 + self.__num_inits_increase_order = 10 + self.__init_type_increase_order = 'K-MEANS++' + self.__max_itrs_increase_order = 10 + self.__print_to_stdout = 2 + self.__label_names = {} + + + def __construct_initial_medians(self, graph_ids, timer, initial_medians): + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('\n===========================================================') + print('Constructing initial median(s).') + print('-----------------------------------------------------------') + + # Compute or sample the initial median(s). + initial_medians.clear() + if self.__init_type == 'MEDOID': + self.__compute_medoid(graph_ids, timer, initial_medians) + elif self.__init_type == 'MAX': + pass # @todo +# compute_max_order_graph_(graph_ids, initial_medians) + elif self.__init_type == 'MIN': + pass # @todo +# compute_min_order_graph_(graph_ids, initial_medians) + elif self.__init_type == 'MEAN': + pass # @todo +# compute_mean_order_graph_(graph_ids, initial_medians) + else: + pass # @todo +# sample_initial_medians_(graph_ids, initial_medians) + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('===========================================================') + + + def __compute_medoid(self, graph_ids, timer, initial_medians): + # Use method selected for initialization phase. + self.__ged_env.set_method(self.__init_method, self.__init_options) + + # Compute the medoid. + if self.__parallel: + # @todo: notice when parallel self.__ged_env is not modified. + sum_of_distances_list = [np.inf] * len(graph_ids) + len_itr = len(graph_ids) + itr = zip(graph_ids, range(0, len(graph_ids))) + n_jobs = multiprocessing.cpu_count() + if len_itr < 100 * n_jobs: + chunksize = int(len_itr / n_jobs) + 1 + else: + chunksize = 100 + def init_worker(ged_env_toshare): + global G_ged_env + G_ged_env = ged_env_toshare + do_fun = partial(_compute_medoid_parallel, graph_ids, self.__sort_graphs) + pool = Pool(processes=n_jobs, initializer=init_worker, initargs=(self.__ged_env,)) + if self.__print_to_stdout == 2: + iterator = tqdm(pool.imap_unordered(do_fun, itr, chunksize), + desc='Computing medoid', file=sys.stdout) + else: + iterator = pool.imap_unordered(do_fun, itr, chunksize) + for i, dis in iterator: + sum_of_distances_list[i] = dis + pool.close() + pool.join() + + medoid_id = np.argmin(sum_of_distances_list) + best_sum_of_distances = sum_of_distances_list[medoid_id] + + initial_medians.append(self.__ged_env.get_nx_graph(medoid_id)) # @todo + + else: + # Print information about current iteration. + if self.__print_to_stdout == 2: + progress = tqdm(desc='Computing medoid', total=len(graph_ids), file=sys.stdout) + + medoid_id = graph_ids[0] + best_sum_of_distances = np.inf + for g_id in graph_ids: + if timer.expired(): + self.__state = AlgorithmState.CALLED + break + nb_nodes_g = self.__ged_env.get_graph_num_nodes(g_id) + sum_of_distances = 0 + for h_id in graph_ids: # @todo: this can be faster, only a half is needed. + nb_nodes_h = self.__ged_env.get_graph_num_nodes(h_id) + if nb_nodes_g <= nb_nodes_h or not self.__sort_graphs: + self.__ged_env.run_method(g_id, h_id) # @todo + sum_of_distances += self.__ged_env.get_upper_bound(g_id, h_id) + else: + self.__ged_env.run_method(h_id, g_id) + sum_of_distances += self.__ged_env.get_upper_bound(h_id, g_id) + if sum_of_distances < best_sum_of_distances: + best_sum_of_distances = sum_of_distances + medoid_id = g_id + + # Print information about current iteration. + if self.__print_to_stdout == 2: + progress.update(1) + + initial_medians.append(self.__ged_env.get_nx_graph(medoid_id)) # @todo + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('\n') + + + def __compute_init_node_maps(self, graph_ids, gen_median_id): + # Compute node maps and sum of distances for initial median. + if self.__parallel: + # @todo: notice when parallel self.__ged_env is not modified. + self.__sum_of_distances = 0 + self.__node_maps_from_median.clear() + sum_of_distances_list = [0] * len(graph_ids) + + len_itr = len(graph_ids) + itr = graph_ids + n_jobs = multiprocessing.cpu_count() + if len_itr < 100 * n_jobs: + chunksize = int(len_itr / n_jobs) + 1 + else: + chunksize = 100 + def init_worker(ged_env_toshare): + global G_ged_env + G_ged_env = ged_env_toshare + nb_nodes_median = self.__ged_env.get_graph_num_nodes(gen_median_id) + do_fun = partial(_compute_init_node_maps_parallel, gen_median_id, self.__sort_graphs, nb_nodes_median) + pool = Pool(processes=n_jobs, initializer=init_worker, initargs=(self.__ged_env,)) + if self.__print_to_stdout == 2: + iterator = tqdm(pool.imap_unordered(do_fun, itr, chunksize), + desc='Computing initial node maps', file=sys.stdout) + else: + iterator = pool.imap_unordered(do_fun, itr, chunksize) + for g_id, sod, node_maps in iterator: + sum_of_distances_list[g_id] = sod + self.__node_maps_from_median[g_id] = node_maps + pool.close() + pool.join() + + self.__sum_of_distances = np.sum(sum_of_distances_list) +# xxx = self.__node_maps_from_median + + else: + # Print information about current iteration. + if self.__print_to_stdout == 2: + progress = tqdm(desc='Computing initial node maps', total=len(graph_ids), file=sys.stdout) + + self.__sum_of_distances = 0 + self.__node_maps_from_median.clear() + nb_nodes_median = self.__ged_env.get_graph_num_nodes(gen_median_id) + for graph_id in graph_ids: + nb_nodes_g = self.__ged_env.get_graph_num_nodes(graph_id) + if nb_nodes_median <= nb_nodes_g or not self.__sort_graphs: + self.__ged_env.run_method(gen_median_id, graph_id) + self.__node_maps_from_median[graph_id] = self.__ged_env.get_node_map(gen_median_id, graph_id) + else: + self.__ged_env.run_method(graph_id, gen_median_id) + node_map_tmp = self.__ged_env.get_node_map(graph_id, gen_median_id) + node_map_tmp.forward_map, node_map_tmp.backward_map = node_map_tmp.backward_map, node_map_tmp.forward_map + self.__node_maps_from_median[graph_id] = node_map_tmp + # print(self.__node_maps_from_median[graph_id]) + self.__sum_of_distances += self.__node_maps_from_median[graph_id].induced_cost() + # print(self.__sum_of_distances) + # Print information about current iteration. + if self.__print_to_stdout == 2: + progress.update(1) + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('\n') + + + def __termination_criterion_met(self, converged, timer, itr, itrs_without_update): + if timer.expired() or (itr >= self.__max_itrs if self.__max_itrs >= 0 else False): + if self.__state == AlgorithmState.TERMINATED: + self.__state = AlgorithmState.INITIALIZED + return True + return converged or (itrs_without_update > self.__max_itrs_without_update if self.__max_itrs_without_update >= 0 else False) + + + def __update_median(self, graphs, median): + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('Updating median: ', end='') + + # Store copy of the old median. + old_median = median.copy() # @todo: this is just a shallow copy. + + # Update the node labels. + if self.__labeled_nodes: + self.__update_node_labels(graphs, median) + + # Update the edges and their labels. + self.__update_edges(graphs, median) + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('done.') + + return not self.__are_graphs_equal(median, old_median) + + + def __update_node_labels(self, graphs, median): +# print('----------------------------') + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('nodes ... ', end='') + + # Collect all possible node labels. + all_labels = self.__ged_env.get_all_node_labels() + + # Iterate through all nodes of the median. + for i in range(0, nx.number_of_nodes(median)): +# print('i: ', i) + + # Collect the labels of the substituted nodes. + node_labels = [] + for graph_id, graph in graphs.items(): + k = self.__node_maps_from_median[graph_id].image(i) + if k != np.inf: + node_labels.append(tuple(graph.nodes[k].items())) # @todo: sort + else: + node_labels.append(SpecialLabel.DUMMY) + + # Compute the median label and update the median. + if len(node_labels) > 0: + fi_min = np.inf + median_label = tuple() + + for label1 in all_labels: + fi = 0 + for label2 in node_labels: + fi += self.__ged_env.get_node_cost(label1, label2) # @todo: check inside, this might be slow + if fi < fi_min: # @todo: fi is too easy to be zero. use <= or consider multiple optimal labels. + fi_min = fi + median_label = label1 + + median_label = {kv[0]: kv[1] for kv in median_label} + nx.set_node_attributes(median, {i: median_label}) + +# median_label = self.__get_median_node_label(node_labels) +# if self.__ged_env.get_node_rel_cost(median.nodes[i], median_label) > self.__epsilon: +# nx.set_node_attributes(median, {i: median_label}) + + + def __update_edges(self, graphs, median): + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('edges ... ', end='') + + # Collect all possible edge labels. + all_labels = self.__ged_env.get_all_edge_labels() + + # @todo: what if edge is not labeled? + # Iterate through all possible edges (i,j) of the median. + for i in range(0, nx.number_of_nodes(median)): + for j in range(i + 1, nx.number_of_nodes(median)): + + # Collect the labels of the edges to which (i,j) is mapped by the node maps. + edge_labels = [] + for graph_id, graph in graphs.items(): + k = self.__node_maps_from_median[graph_id].image(i) + l = self.__node_maps_from_median[graph_id].image(j) + if k != np.inf and l != np.inf and graph.has_edge(k, l): + edge_labels.append(tuple(graph.edges[(k, l)].items())) # @todo: sort + else: + edge_labels.append(SpecialLabel.DUMMY) + + # Compute the median edge label and the overall edge relabeling cost. + if self.__labeled_edges and len(edge_labels) > 0: + fij1_min = np.inf + median_label = tuple() + + # Compute f_ij^0. + fij0 = 0 + for label2 in edge_labels: + fij0 += self.__ged_env.get_edge_cost(SpecialLabel.DUMMY, label2) + + for label1 in all_labels: + fij1 = 0 + for label2 in edge_labels: + fij1 += self.__ged_env.get_edge_cost(label1, label2) + + if fij1 < fij1_min: + fij1_min = fij1 + median_label = label1 + + # Update the median. + if median.has_edge(i, j): + median.remove_edge(i, j) + if fij1_min < fij0: # @todo: this never happens. + median_label = {kv[0]: kv[1] for kv in median_label} + median.add_edge(i, j, **median_label) + +# if self.__ged_env.get_edge_rel_cost(median_label, new_median_label) > self.__epsilon: +# median_label = new_median_label + + + def __update_node_maps(self): + # Update the node maps. + if self.__parallel: + # @todo: notice when parallel self.__ged_env is not modified. + node_maps_were_modified = False +# xxx = self.__node_maps_from_median.copy() + + len_itr = len(self.__node_maps_from_median) + itr = [item for item in self.__node_maps_from_median.items()] + n_jobs = multiprocessing.cpu_count() + if len_itr < 100 * n_jobs: + chunksize = int(len_itr / n_jobs) + 1 + else: + chunksize = 100 + def init_worker(ged_env_toshare): + global G_ged_env + G_ged_env = ged_env_toshare + nb_nodes_median = self.__ged_env.get_graph_num_nodes(self.__median_id) + do_fun = partial(_update_node_maps_parallel, self.__median_id, self.__epsilon, self.__sort_graphs, nb_nodes_median) + pool = Pool(processes=n_jobs, initializer=init_worker, initargs=(self.__ged_env,)) + if self.__print_to_stdout == 2: + iterator = tqdm(pool.imap_unordered(do_fun, itr, chunksize), + desc='Updating node maps', file=sys.stdout) + else: + iterator = pool.imap_unordered(do_fun, itr, chunksize) + for g_id, node_map, nm_modified in iterator: + self.__node_maps_from_median[g_id] = node_map + if nm_modified: + node_maps_were_modified = True + pool.close() + pool.join() +# yyy = self.__node_maps_from_median.copy() + + else: + # Print information about current iteration. + if self.__print_to_stdout == 2: + progress = tqdm(desc='Updating node maps', total=len(self.__node_maps_from_median), file=sys.stdout) + + node_maps_were_modified = False + nb_nodes_median = self.__ged_env.get_graph_num_nodes(self.__median_id) + for graph_id, node_map in self.__node_maps_from_median.items(): + nb_nodes_g = self.__ged_env.get_graph_num_nodes(graph_id) + + if nb_nodes_median <= nb_nodes_g or not self.__sort_graphs: + self.__ged_env.run_method(self.__median_id, graph_id) + if self.__ged_env.get_upper_bound(self.__median_id, graph_id) < node_map.induced_cost() - self.__epsilon: + # xxx = self.__node_maps_from_median[graph_id] + self.__node_maps_from_median[graph_id] = self.__ged_env.get_node_map(self.__median_id, graph_id) + node_maps_were_modified = True + + else: + self.__ged_env.run_method(graph_id, self.__median_id) + if self.__ged_env.get_upper_bound(graph_id, self.__median_id) < node_map.induced_cost() - self.__epsilon: + node_map_tmp = self.__ged_env.get_node_map(graph_id, self.__median_id) + node_map_tmp.forward_map, node_map_tmp.backward_map = node_map_tmp.backward_map, node_map_tmp.forward_map + self.__node_maps_from_median[graph_id] = node_map_tmp + node_maps_were_modified = True + + # Print information about current iteration. + if self.__print_to_stdout == 2: + progress.update(1) + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('\n') + + # Return true if the node maps were modified. + return node_maps_were_modified + + + def __decrease_order(self, graphs, median): + # Print information about current iteration + if self.__print_to_stdout == 2: + print('Trying to decrease order: ... ', end='') + + if nx.number_of_nodes(median) <= 1: + if self.__print_to_stdout == 2: + print('median graph has only 1 node, skip decrease.') + return False + + # Initialize ID of the node that is to be deleted. + id_deleted_node = [None] # @todo: or np.inf + decreased_order = False + + # Decrease the order as long as the best deletion delta is negative. + while self.__compute_best_deletion_delta(graphs, median, id_deleted_node) < -self.__epsilon: + decreased_order = True + self.__delete_node_from_median(id_deleted_node[0], median) + if nx.number_of_nodes(median) <= 1: + if self.__print_to_stdout == 2: + print('decrease stopped because median graph remains only 1 node. ', end='') + break + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('done.') + + # Return true iff the order was decreased. + return decreased_order + + + def __compute_best_deletion_delta(self, graphs, median, id_deleted_node): + best_delta = 0.0 + + # Determine node that should be deleted (if any). + for i in range(0, nx.number_of_nodes(median)): + # Compute cost delta. + delta = 0.0 + for graph_id, graph in graphs.items(): + k = self.__node_maps_from_median[graph_id].image(i) + if k == np.inf: + delta -= self.__node_del_cost + else: + delta += self.__node_ins_cost - self.__ged_env.get_node_rel_cost(median.nodes[i], graph.nodes[k]) + for j, j_label in median[i].items(): + l = self.__node_maps_from_median[graph_id].image(j) + if k == np.inf or l == np.inf: + delta -= self.__edge_del_cost + elif not graph.has_edge(k, l): + delta -= self.__edge_del_cost + else: + delta += self.__edge_ins_cost - self.__ged_env.get_edge_rel_cost(j_label, graph.edges[(k, l)]) + + # Update best deletion delta. + if delta < best_delta - self.__epsilon: + best_delta = delta + id_deleted_node[0] = i +# id_deleted_node[0] = 3 # @todo: + + return best_delta + + + def __delete_node_from_median(self, id_deleted_node, median): + # Update the median. + mapping = {} + for i in range(0, nx.number_of_nodes(median)): + if i != id_deleted_node: + new_i = (i if i < id_deleted_node else (i - 1)) + mapping[i] = new_i + median.remove_node(id_deleted_node) + nx.relabel_nodes(median, mapping, copy=False) + + # Update the node maps. +# xxx = self.__node_maps_from_median + for key, node_map in self.__node_maps_from_median.items(): + new_node_map = NodeMap(nx.number_of_nodes(median), node_map.num_target_nodes()) + is_unassigned_target_node = [True] * node_map.num_target_nodes() + for i in range(0, nx.number_of_nodes(median) + 1): + if i != id_deleted_node: + new_i = (i if i < id_deleted_node else (i - 1)) + k = node_map.image(i) + new_node_map.add_assignment(new_i, k) + if k != np.inf: + is_unassigned_target_node[k] = False + for k in range(0, node_map.num_target_nodes()): + if is_unassigned_target_node[k]: + new_node_map.add_assignment(np.inf, k) +# print(self.__node_maps_from_median[key].forward_map, self.__node_maps_from_median[key].backward_map) +# print(new_node_map.forward_map, new_node_map.backward_map + self.__node_maps_from_median[key] = new_node_map + + # Increase overall number of decreases. + self.__num_decrease_order += 1 + + + def __increase_order(self, graphs, median): + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('Trying to increase order: ... ', end='') + + # Initialize the best configuration and the best label of the node that is to be inserted. + best_config = {} + best_label = self.__ged_env.get_node_label(1, to_dict=True) + increased_order = False + + # Increase the order as long as the best insertion delta is negative. + while self.__compute_best_insertion_delta(graphs, best_config, best_label) < - self.__epsilon: + increased_order = True + self.__add_node_to_median(best_config, best_label, median) + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('done.') + + # Return true iff the order was increased. + return increased_order + + + def __compute_best_insertion_delta(self, graphs, best_config, best_label): + # Construct sets of inserted nodes. + no_inserted_node = True + inserted_nodes = {} + for graph_id, graph in graphs.items(): + inserted_nodes[graph_id] = [] + best_config[graph_id] = np.inf + for k in range(nx.number_of_nodes(graph)): + if self.__node_maps_from_median[graph_id].pre_image(k) == np.inf: + no_inserted_node = False + inserted_nodes[graph_id].append((k, tuple(item for item in graph.nodes[k].items()))) # @todo: can order of label names be garantteed? + + # Return 0.0 if no node is inserted in any of the graphs. + if no_inserted_node: + return 0.0 + + # Compute insertion configuration, label, and delta. + best_delta = 0.0 # @todo + if len(self.__label_names['node_labels']) == 0 and len(self.__label_names['node_attrs']) == 0: # @todo + best_delta = self.__compute_insertion_delta_unlabeled(inserted_nodes, best_config, best_label) + elif len(self.__label_names['node_labels']) > 0: # self.__constant_node_costs: + best_delta = self.__compute_insertion_delta_constant(inserted_nodes, best_config, best_label) + else: + best_delta = self.__compute_insertion_delta_generic(inserted_nodes, best_config, best_label) + + # Return the best delta. + return best_delta + + + def __compute_insertion_delta_unlabeled(self, inserted_nodes, best_config, best_label): # @todo: go through and test. + # Construct the nest configuration and compute its insertion delta. + best_delta = 0.0 + best_config.clear() + for graph_id, node_set in inserted_nodes.items(): + if len(node_set) == 0: + best_config[graph_id] = np.inf + best_delta += self.__node_del_cost + else: + best_config[graph_id] = node_set[0][0] + best_delta -= self.__node_ins_cost + + # Return the best insertion delta. + return best_delta + + + def __compute_insertion_delta_constant(self, inserted_nodes, best_config, best_label): + # Construct histogram and inverse label maps. + hist = {} + inverse_label_maps = {} + for graph_id, node_set in inserted_nodes.items(): + inverse_label_maps[graph_id] = {} + for node in node_set: + k = node[0] + label = node[1] + if label not in inverse_label_maps[graph_id]: + inverse_label_maps[graph_id][label] = k + if label not in hist: + hist[label] = 1 + else: + hist[label] += 1 + + # Determine the best label. + best_count = 0 + for key, val in hist.items(): + if val > best_count: + best_count = val + best_label_tuple = key + + # get best label. + best_label.clear() + for key, val in best_label_tuple: + best_label[key] = val + + # Construct the best configuration and compute its insertion delta. + best_config.clear() + best_delta = 0.0 + node_rel_cost = self.__ged_env.get_node_rel_cost(self.__ged_env.get_node_label(1, to_dict=False), self.__ged_env.get_node_label(2, to_dict=False)) + triangle_ineq_holds = (node_rel_cost <= self.__node_del_cost + self.__node_ins_cost) + for graph_id, _ in inserted_nodes.items(): + if best_label_tuple in inverse_label_maps[graph_id]: + best_config[graph_id] = inverse_label_maps[graph_id][best_label_tuple] + best_delta -= self.__node_ins_cost + elif triangle_ineq_holds and not len(inserted_nodes[graph_id]) == 0: + best_config[graph_id] = inserted_nodes[graph_id][0][0] + best_delta += node_rel_cost - self.__node_ins_cost + else: + best_config[graph_id] = np.inf + best_delta += self.__node_del_cost + + # Return the best insertion delta. + return best_delta + + + def __compute_insertion_delta_generic(self, inserted_nodes, best_config, best_label): + # Collect all node labels of inserted nodes. + node_labels = [] + for _, node_set in inserted_nodes.items(): + for node in node_set: + node_labels.append(node[1]) + + # Compute node label medians that serve as initial solutions for block gradient descent. + initial_node_labels = [] + self.__compute_initial_node_labels(node_labels, initial_node_labels) + + # Determine best insertion configuration, label, and delta via parallel block gradient descent from all initial node labels. + best_delta = 0.0 + for node_label in initial_node_labels: + # Construct local configuration. + config = {} + for graph_id, _ in inserted_nodes.items(): + config[graph_id] = tuple((np.inf, self.__ged_env.get_node_label(1, to_dict=False))) + + # Run block gradient descent. + converged = False + itr = 0 + while not self.__insertion_termination_criterion_met(converged, itr): + converged = not self.__update_config(node_label, inserted_nodes, config, node_labels) + node_label_dict = dict(node_label) + converged = converged and (not self.__update_node_label([dict(item) for item in node_labels], node_label_dict)) # @todo: the dict is tupled again in the function, can be better. + node_label = tuple(item for item in node_label_dict.items()) # @todo: watch out: initial_node_labels[i] is not modified here. + + itr += 1 + + # Compute insertion delta of converged solution. + delta = 0.0 + for _, node in config.items(): + if node[0] == np.inf: + delta += self.__node_del_cost + else: + delta += self.__ged_env.get_node_rel_cost(dict(node_label), dict(node[1])) - self.__node_ins_cost + + # Update best delta and global configuration if improvement has been found. + if delta < best_delta - self.__epsilon: + best_delta = delta + best_label.clear() + for key, val in node_label: + best_label[key] = val + best_config.clear() + for graph_id, val in config.items(): + best_config[graph_id] = val[0] + + # Return the best delta. + return best_delta + + + def __compute_initial_node_labels(self, node_labels, median_labels): + median_labels.clear() + if self.__use_real_randomness: # @todo: may not work if parallelized. + rng = np.random.randint(0, high=2**32 - 1, size=1) + urng = np.random.RandomState(seed=rng[0]) + else: + urng = np.random.RandomState(seed=self.__seed) + + # Generate the initial node label medians. + if self.__init_type_increase_order == 'K-MEANS++': + # Use k-means++ heuristic to generate the initial node label medians. + already_selected = [False] * len(node_labels) + selected_label_id = urng.randint(low=0, high=len(node_labels), size=1)[0] # c++ test: 23 + median_labels.append(node_labels[selected_label_id]) + already_selected[selected_label_id] = True +# xxx = [41, 0, 18, 9, 6, 14, 21, 25, 33] for c++ test +# iii = 0 for c++ test + while len(median_labels) < self.__num_inits_increase_order: + weights = [np.inf] * len(node_labels) + for label_id in range(0, len(node_labels)): + if already_selected[label_id]: + weights[label_id] = 0 + continue + for label in median_labels: + weights[label_id] = min(weights[label_id], self.__ged_env.get_node_rel_cost(dict(label), dict(node_labels[label_id]))) + + # get non-zero weights. + weights_p, idx_p = [], [] + for i, w in enumerate(weights): + if w != 0: + weights_p.append(w) + idx_p.append(i) + if len(weights_p) > 0: + p = np.array(weights_p) / np.sum(weights_p) + selected_label_id = urng.choice(range(0, len(weights_p)), size=1, p=p)[0] # for c++ test: xxx[iii] + selected_label_id = idx_p[selected_label_id] +# iii += 1 for c++ test + median_labels.append(node_labels[selected_label_id]) + already_selected[selected_label_id] = True + else: # skip the loop when all node_labels are selected. This happens when len(node_labels) <= self.__num_inits_increase_order. + break + else: + # Compute the initial node medians as the medians of randomly generated clusters of (roughly) equal size. + # @todo: go through and test. + shuffled_node_labels = [np.inf] * len(node_labels) #@todo: random? + # @todo: std::shuffle(shuffled_node_labels.begin(), shuffled_node_labels.end(), urng);? + cluster_size = len(node_labels) / self.__num_inits_increase_order + pos = 0.0 + cluster = [] + while len(median_labels) < self.__num_inits_increase_order - 1: + while pos < (len(median_labels) + 1) * cluster_size: + cluster.append(shuffled_node_labels[pos]) + pos += 1 + median_labels.append(self.__get_median_node_label(cluster)) + cluster.clear() + while pos < len(shuffled_node_labels): + pos += 1 + cluster.append(shuffled_node_labels[pos]) + median_labels.append(self.__get_median_node_label(cluster)) + cluster.clear() + + # Run Lloyd's Algorithm. + converged = False + closest_median_ids = [np.inf] * len(node_labels) + clusters = [[] for _ in range(len(median_labels))] + itr = 1 + while not self.__insertion_termination_criterion_met(converged, itr): + converged = not self.__update_clusters(node_labels, median_labels, closest_median_ids) + if not converged: + for cluster in clusters: + cluster.clear() + for label_id in range(0, len(node_labels)): + clusters[closest_median_ids[label_id]].append(node_labels[label_id]) + for cluster_id in range(0, len(clusters)): + node_label = dict(median_labels[cluster_id]) + self.__update_node_label([dict(item) for item in clusters[cluster_id]], node_label) # @todo: the dict is tupled again in the function, can be better. + median_labels[cluster_id] = tuple(item for item in node_label.items()) + itr += 1 + + + def __insertion_termination_criterion_met(self, converged, itr): + return converged or (itr >= self.__max_itrs_increase_order if self.__max_itrs_increase_order > 0 else False) + + + def __update_config(self, node_label, inserted_nodes, config, node_labels): + # Determine the best configuration. + config_modified = False + for graph_id, node_set in inserted_nodes.items(): + best_assignment = config[graph_id] + best_cost = 0.0 + if best_assignment[0] == np.inf: + best_cost = self.__node_del_cost + else: + best_cost = self.__ged_env.get_node_rel_cost(dict(node_label), dict(best_assignment[1])) - self.__node_ins_cost + for node in node_set: + cost = self.__ged_env.get_node_rel_cost(dict(node_label), dict(node[1])) - self.__node_ins_cost + if cost < best_cost - self.__epsilon: + best_cost = cost + best_assignment = node + config_modified = True + if self.__node_del_cost < best_cost - self.__epsilon: + best_cost = self.__node_del_cost + best_assignment = tuple((np.inf, best_assignment[1])) + config_modified = True + config[graph_id] = best_assignment + + # Collect the node labels contained in the best configuration. + node_labels.clear() + for key, val in config.items(): + if val[0] != np.inf: + node_labels.append(val[1]) + + # Return true if the configuration was modified. + return config_modified + + + def __update_node_label(self, node_labels, node_label): + if len(node_labels) == 0: # @todo: check if this is the correct solution. Especially after calling __update_config(). + return False + new_node_label = self.__get_median_node_label(node_labels) + if self.__ged_env.get_node_rel_cost(new_node_label, node_label) > self.__epsilon: + node_label.clear() + for key, val in new_node_label.items(): + node_label[key] = val + return True + return False + + + def __update_clusters(self, node_labels, median_labels, closest_median_ids): + # Determine the closest median for each node label. + clusters_modified = False + for label_id in range(0, len(node_labels)): + closest_median_id = np.inf + dist_to_closest_median = np.inf + for median_id in range(0, len(median_labels)): + dist_to_median = self.__ged_env.get_node_rel_cost(dict(median_labels[median_id]), dict(node_labels[label_id])) + if dist_to_median < dist_to_closest_median - self.__epsilon: + dist_to_closest_median = dist_to_median + closest_median_id = median_id + if closest_median_id != closest_median_ids[label_id]: + closest_median_ids[label_id] = closest_median_id + clusters_modified = True + + # Return true if the clusters were modified. + return clusters_modified + + + def __add_node_to_median(self, best_config, best_label, median): + # Update the median. + nb_nodes_median = nx.number_of_nodes(median) + median.add_node(nb_nodes_median, **best_label) + + # Update the node maps. + for graph_id, node_map in self.__node_maps_from_median.items(): + node_map_as_rel = [] + node_map.as_relation(node_map_as_rel) + new_node_map = NodeMap(nx.number_of_nodes(median), node_map.num_target_nodes()) + for assignment in node_map_as_rel: + new_node_map.add_assignment(assignment[0], assignment[1]) + new_node_map.add_assignment(nx.number_of_nodes(median) - 1, best_config[graph_id]) + self.__node_maps_from_median[graph_id] = new_node_map + + # Increase overall number of increases. + self.__num_increase_order += 1 + + + def __are_graphs_equal(self, g1, g2): + """ + Check if the two graphs are equal. + + Parameters + ---------- + g1 : NetworkX graph object + Graph 1 to be compared. + + g2 : NetworkX graph object + Graph 2 to be compared. + + Returns + ------- + bool + True if the two graph are equal. + + Notes + ----- + This is not an identical check. Here the two graphs are equal if and only if their original_node_ids, nodes, all node labels, edges and all edge labels are equal. This function is specifically designed for class `MedianGraphEstimator` and should not be used elsewhere. + """ + # check original node ids. + if not g1.graph['original_node_ids'] == g2.graph['original_node_ids']: + return False # @todo: why check this? + # check nodes. + nlist1 = [n for n in g1.nodes(data=True)] # @todo: shallow? + nlist2 = [n for n in g2.nodes(data=True)] + if not nlist1 == nlist2: + return False + # check edges. + elist1 = [n for n in g1.edges(data=True)] + elist2 = [n for n in g2.edges(data=True)] + if not elist1 == elist2: + return False + + return True + + + def compute_my_cost(g, h, node_map): + cost = 0.0 + for node in g.nodes: + cost += 0 + + + def set_label_names(self, node_labels=[], edge_labels=[], node_attrs=[], edge_attrs=[]): + self.__label_names = {'node_labels': node_labels, 'edge_labels': edge_labels, + 'node_attrs': node_attrs, 'edge_attrs': edge_attrs} + + +# def __get_median_node_label(self, node_labels): +# if len(self.__label_names['node_labels']) > 0: +# return self.__get_median_label_symbolic(node_labels) +# elif len(self.__label_names['node_attrs']) > 0: +# return self.__get_median_label_nonsymbolic(node_labels) +# else: +# raise Exception('Node label names are not given.') +# +# +# def __get_median_edge_label(self, edge_labels): +# if len(self.__label_names['edge_labels']) > 0: +# return self.__get_median_label_symbolic(edge_labels) +# elif len(self.__label_names['edge_attrs']) > 0: +# return self.__get_median_label_nonsymbolic(edge_labels) +# else: +# raise Exception('Edge label names are not given.') +# +# +# def __get_median_label_symbolic(self, labels): +# f_i = np.inf +# +# for label in labels: +# pass +# +# # Construct histogram. +# hist = {} +# for label in labels: +# label = tuple([kv for kv in label.items()]) # @todo: this may be slow. +# if label not in hist: +# hist[label] = 1 +# else: +# hist[label] += 1 +# +# # Return the label that appears most frequently. +# best_count = 0 +# median_label = {} +# for label, count in hist.items(): +# if count > best_count: +# best_count = count +# median_label = {kv[0]: kv[1] for kv in label} +# +# return median_label +# +# +# def __get_median_label_nonsymbolic(self, labels): +# if len(labels) == 0: +# return {} # @todo +# else: +# # Transform the labels into coordinates and compute mean label as initial solution. +# labels_as_coords = [] +# sums = {} +# for key, val in labels[0].items(): +# sums[key] = 0 +# for label in labels: +# coords = {} +# for key, val in label.items(): +# label_f = float(val) +# sums[key] += label_f +# coords[key] = label_f +# labels_as_coords.append(coords) +# median = {} +# for key, val in sums.items(): +# median[key] = val / len(labels) +# +# # Run main loop of Weiszfeld's Algorithm. +# epsilon = 0.0001 +# delta = 1.0 +# num_itrs = 0 +# all_equal = False +# while ((delta > epsilon) and (num_itrs < 100) and (not all_equal)): +# numerator = {} +# for key, val in sums.items(): +# numerator[key] = 0 +# denominator = 0 +# for label_as_coord in labels_as_coords: +# norm = 0 +# for key, val in label_as_coord.items(): +# norm += (val - median[key]) ** 2 +# norm = np.sqrt(norm) +# if norm > 0: +# for key, val in label_as_coord.items(): +# numerator[key] += val / norm +# denominator += 1.0 / norm +# if denominator == 0: +# all_equal = True +# else: +# new_median = {} +# delta = 0.0 +# for key, val in numerator.items(): +# this_median = val / denominator +# new_median[key] = this_median +# delta += np.abs(median[key] - this_median) +# median = new_median +# +# num_itrs += 1 +# +# # Transform the solution to strings and return it. +# median_label = {} +# for key, val in median.items(): +# median_label[key] = str(val) +# return median_label + + +def _compute_medoid_parallel(graph_ids, sort, itr): + g_id = itr[0] + i = itr[1] + # @todo: timer not considered here. +# if timer.expired(): +# self.__state = AlgorithmState.CALLED +# break + nb_nodes_g = G_ged_env.get_graph_num_nodes(g_id) + sum_of_distances = 0 + for h_id in graph_ids: + nb_nodes_h = G_ged_env.get_graph_num_nodes(h_id) + if nb_nodes_g <= nb_nodes_h or not sort: + G_ged_env.run_method(g_id, h_id) + sum_of_distances += G_ged_env.get_upper_bound(g_id, h_id) + else: + G_ged_env.run_method(h_id, g_id) + sum_of_distances += G_ged_env.get_upper_bound(h_id, g_id) + return i, sum_of_distances + + +def _compute_init_node_maps_parallel(gen_median_id, sort, nb_nodes_median, itr): + graph_id = itr + nb_nodes_g = G_ged_env.get_graph_num_nodes(graph_id) + if nb_nodes_median <= nb_nodes_g or not sort: + G_ged_env.run_method(gen_median_id, graph_id) + node_map = G_ged_env.get_node_map(gen_median_id, graph_id) +# print(self.__node_maps_from_median[graph_id]) + else: + G_ged_env.run_method(graph_id, gen_median_id) + node_map = G_ged_env.get_node_map(graph_id, gen_median_id) + node_map.forward_map, node_map.backward_map = node_map.backward_map, node_map.forward_map + sum_of_distance = node_map.induced_cost() +# print(self.__sum_of_distances) + return graph_id, sum_of_distance, node_map + + +def _update_node_maps_parallel(median_id, epsilon, sort, nb_nodes_median, itr): + graph_id = itr[0] + node_map = itr[1] + + node_maps_were_modified = False + nb_nodes_g = G_ged_env.get_graph_num_nodes(graph_id) + if nb_nodes_median <= nb_nodes_g or not sort: + G_ged_env.run_method(median_id, graph_id) + if G_ged_env.get_upper_bound(median_id, graph_id) < node_map.induced_cost() - epsilon: + node_map = G_ged_env.get_node_map(median_id, graph_id) + node_maps_were_modified = True + else: + G_ged_env.run_method(graph_id, median_id) + if G_ged_env.get_upper_bound(graph_id, median_id) < node_map.induced_cost() - epsilon: + node_map = G_ged_env.get_node_map(graph_id, median_id) + node_map.forward_map, node_map.backward_map = node_map.backward_map, node_map.forward_map + node_maps_were_modified = True + + return graph_id, node_map, node_maps_were_modified \ No newline at end of file diff --git a/lang/fr/gklearn/ged/median/median_graph_estimator_py.py b/lang/fr/gklearn/ged/median/median_graph_estimator_py.py new file mode 100644 index 0000000000..41dc3c91e3 --- /dev/null +++ b/lang/fr/gklearn/ged/median/median_graph_estimator_py.py @@ -0,0 +1,1711 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Mon Mar 16 18:04:55 2020 + +@author: ljia +""" +import numpy as np +from gklearn.ged.env import AlgorithmState, NodeMap +from gklearn.ged.util import misc +from gklearn.utils import Timer +import time +from tqdm import tqdm +import sys +import networkx as nx +import multiprocessing +from multiprocessing import Pool +from functools import partial + + +class MedianGraphEstimatorPy(object): # @todo: differ dummy_node from undifined node? + """Estimate median graphs using the pure Python version of GEDEnv. + """ + + def __init__(self, ged_env, constant_node_costs): + """Constructor. + + Parameters + ---------- + ged_env : gklearn.gedlib.gedlibpy.GEDEnv + Initialized GED environment. The edit costs must be set by the user. + + constant_node_costs : Boolean + Set to True if the node relabeling costs are constant. + """ + self.__ged_env = ged_env + self.__init_method = 'BRANCH_FAST' + self.__init_options = '' + self.__descent_method = 'BRANCH_FAST' + self.__descent_options = '' + self.__refine_method = 'IPFP' + self.__refine_options = '' + self.__constant_node_costs = constant_node_costs + self.__labeled_nodes = (ged_env.get_num_node_labels() > 1) + self.__node_del_cost = ged_env.get_node_del_cost(ged_env.get_node_label(1, to_dict=False)) + self.__node_ins_cost = ged_env.get_node_ins_cost(ged_env.get_node_label(1, to_dict=False)) + self.__labeled_edges = (ged_env.get_num_edge_labels() > 1) + self.__edge_del_cost = ged_env.get_edge_del_cost(ged_env.get_edge_label(1, to_dict=False)) + self.__edge_ins_cost = ged_env.get_edge_ins_cost(ged_env.get_edge_label(1, to_dict=False)) + self.__init_type = 'RANDOM' + self.__num_random_inits = 10 + self.__desired_num_random_inits = 10 + self.__use_real_randomness = True + self.__seed = 0 + self.__parallel = True + self.__update_order = True + self.__sort_graphs = True # sort graphs by size when computing GEDs. + self.__refine = True + self.__time_limit_in_sec = 0 + self.__epsilon = 0.0001 + self.__max_itrs = 100 + self.__max_itrs_without_update = 3 + self.__num_inits_increase_order = 10 + self.__init_type_increase_order = 'K-MEANS++' + self.__max_itrs_increase_order = 10 + self.__print_to_stdout = 2 + self.__median_id = np.inf # @todo: check + self.__node_maps_from_median = {} + self.__sum_of_distances = 0 + self.__best_init_sum_of_distances = np.inf + self.__converged_sum_of_distances = np.inf + self.__runtime = None + self.__runtime_initialized = None + self.__runtime_converged = None + self.__itrs = [] # @todo: check: {} ? + self.__num_decrease_order = 0 + self.__num_increase_order = 0 + self.__num_converged_descents = 0 + self.__state = AlgorithmState.TERMINATED + self.__label_names = {} + + if ged_env is None: + raise Exception('The GED environment pointer passed to the constructor of MedianGraphEstimator is null.') + elif not ged_env.is_initialized(): + raise Exception('The GED environment is uninitialized. Call gedlibpy.GEDEnv.init() before passing it to the constructor of MedianGraphEstimator.') + + + def set_options(self, options): + """Sets the options of the estimator. + + Parameters + ---------- + options : string + String that specifies with which options to run the estimator. + """ + self.__set_default_options() + options_map = misc.options_string_to_options_map(options) + for opt_name, opt_val in options_map.items(): + if opt_name == 'init-type': + self.__init_type = opt_val + if opt_val != 'MEDOID' and opt_val != 'RANDOM' and opt_val != 'MIN' and opt_val != 'MAX' and opt_val != 'MEAN': + raise Exception('Invalid argument ' + opt_val + ' for option init-type. Usage: options = "[--init-type RANDOM|MEDOID|EMPTY|MIN|MAX|MEAN] [...]"') + elif opt_name == 'random-inits': + try: + self.__num_random_inits = int(opt_val) + self.__desired_num_random_inits = self.__num_random_inits + except: + raise Exception('Invalid argument "' + opt_val + '" for option random-inits. Usage: options = "[--random-inits ]"') + + if self.__num_random_inits <= 0: + raise Exception('Invalid argument "' + opt_val + '" for option random-inits. Usage: options = "[--random-inits ]"') + + elif opt_name == 'randomness': + if opt_val == 'PSEUDO': + self.__use_real_randomness = False + + elif opt_val == 'REAL': + self.__use_real_randomness = True + + else: + raise Exception('Invalid argument "' + opt_val + '" for option randomness. Usage: options = "[--randomness REAL|PSEUDO] [...]"') + + elif opt_name == 'stdout': + if opt_val == '0': + self.__print_to_stdout = 0 + + elif opt_val == '1': + self.__print_to_stdout = 1 + + elif opt_val == '2': + self.__print_to_stdout = 2 + + else: + raise Exception('Invalid argument "' + opt_val + '" for option stdout. Usage: options = "[--stdout 0|1|2] [...]"') + + elif opt_name == 'parallel': + if opt_val == 'TRUE': + self.__parallel = True + + elif opt_val == 'FALSE': + self.__parallel = False + + else: + raise Exception('Invalid argument "' + opt_val + '" for option parallel. Usage: options = "[--parallel TRUE|FALSE] [...]"') + + elif opt_name == 'update-order': + if opt_val == 'TRUE': + self.__update_order = True + + elif opt_val == 'FALSE': + self.__update_order = False + + else: + raise Exception('Invalid argument "' + opt_val + '" for option update-order. Usage: options = "[--update-order TRUE|FALSE] [...]"') + + elif opt_name == 'sort-graphs': + if opt_val == 'TRUE': + self.__sort_graphs = True + + elif opt_val == 'FALSE': + self.__sort_graphs = False + + else: + raise Exception('Invalid argument "' + opt_val + '" for option sort-graphs. Usage: options = "[--sort-graphs TRUE|FALSE] [...]"') + + elif opt_name == 'refine': + if opt_val == 'TRUE': + self.__refine = True + + elif opt_val == 'FALSE': + self.__refine = False + + else: + raise Exception('Invalid argument "' + opt_val + '" for option refine. Usage: options = "[--refine TRUE|FALSE] [...]"') + + elif opt_name == 'time-limit': + try: + self.__time_limit_in_sec = float(opt_val) + + except: + raise Exception('Invalid argument "' + opt_val + '" for option time-limit. Usage: options = "[--time-limit ] [...]') + + elif opt_name == 'max-itrs': + try: + self.__max_itrs = int(opt_val) + + except: + raise Exception('Invalid argument "' + opt_val + '" for option max-itrs. Usage: options = "[--max-itrs ] [...]') + + elif opt_name == 'max-itrs-without-update': + try: + self.__max_itrs_without_update = int(opt_val) + + except: + raise Exception('Invalid argument "' + opt_val + '" for option max-itrs-without-update. Usage: options = "[--max-itrs-without-update ] [...]') + + elif opt_name == 'seed': + try: + self.__seed = int(opt_val) + + except: + raise Exception('Invalid argument "' + opt_val + '" for option seed. Usage: options = "[--seed ] [...]') + + elif opt_name == 'epsilon': + try: + self.__epsilon = float(opt_val) + + except: + raise Exception('Invalid argument "' + opt_val + '" for option epsilon. Usage: options = "[--epsilon ] [...]') + + if self.__epsilon <= 0: + raise Exception('Invalid argument "' + opt_val + '" for option epsilon. Usage: options = "[--epsilon ] [...]') + + elif opt_name == 'inits-increase-order': + try: + self.__num_inits_increase_order = int(opt_val) + + except: + raise Exception('Invalid argument "' + opt_val + '" for option inits-increase-order. Usage: options = "[--inits-increase-order ]"') + + if self.__num_inits_increase_order <= 0: + raise Exception('Invalid argument "' + opt_val + '" for option inits-increase-order. Usage: options = "[--inits-increase-order ]"') + + elif opt_name == 'init-type-increase-order': + self.__init_type_increase_order = opt_val + if opt_val != 'CLUSTERS' and opt_val != 'K-MEANS++': + raise Exception('Invalid argument ' + opt_val + ' for option init-type-increase-order. Usage: options = "[--init-type-increase-order CLUSTERS|K-MEANS++] [...]"') + + elif opt_name == 'max-itrs-increase-order': + try: + self.__max_itrs_increase_order = int(opt_val) + + except: + raise Exception('Invalid argument "' + opt_val + '" for option max-itrs-increase-order. Usage: options = "[--max-itrs-increase-order ] [...]') + + else: + valid_options = '[--init-type ] [--random-inits ] [--randomness ] [--seed ] [--stdout ] ' + valid_options += '[--time-limit ] [--max-itrs ] [--epsilon ] ' + valid_options += '[--inits-increase-order ] [--init-type-increase-order ] [--max-itrs-increase-order ]' + raise Exception('Invalid option "' + opt_name + '". Usage: options = "' + valid_options + '"') + + + def set_init_method(self, init_method, init_options={}): + """Selects method to be used for computing the initial medoid graph. + + Parameters + ---------- + init_method : string + The selected method. Default: ged::Options::GEDMethod::BRANCH_UNIFORM. + + init_options : string + The options for the selected method. Default: "". + + Notes + ----- + Has no effect unless "--init-type MEDOID" is passed to set_options(). + """ + self.__init_method = init_method; + self.__init_options = init_options; + + + def set_descent_method(self, descent_method, descent_options=''): + """Selects method to be used for block gradient descent.. + + Parameters + ---------- + descent_method : string + The selected method. Default: ged::Options::GEDMethod::BRANCH_FAST. + + descent_options : string + The options for the selected method. Default: "". + + Notes + ----- + Has no effect unless "--init-type MEDOID" is passed to set_options(). + """ + self.__descent_method = descent_method; + self.__descent_options = descent_options; + + + def set_refine_method(self, refine_method, refine_options): + """Selects method to be used for improving the sum of distances and the node maps for the converged median. + + Parameters + ---------- + refine_method : string + The selected method. Default: "IPFP". + + refine_options : string + The options for the selected method. Default: "". + + Notes + ----- + Has no effect if "--refine FALSE" is passed to set_options(). + """ + self.__refine_method = refine_method + self.__refine_options = refine_options + + + def run(self, graph_ids, set_median_id, gen_median_id): + """Computes a generalized median graph. + + Parameters + ---------- + graph_ids : list[integer] + The IDs of the graphs for which the median should be computed. Must have been added to the environment passed to the constructor. + + set_median_id : integer + The ID of the computed set-median. A dummy graph with this ID must have been added to the environment passed to the constructor. Upon termination, the computed median can be obtained via gklearn.gedlib.gedlibpy.GEDEnv.get_graph(). + + + gen_median_id : integer + The ID of the computed generalized median. Upon termination, the computed median can be obtained via gklearn.gedlib.gedlibpy.GEDEnv.get_graph(). + """ + # Sanity checks. + if len(graph_ids) == 0: + raise Exception('Empty vector of graph IDs, unable to compute median.') + all_graphs_empty = True + for graph_id in graph_ids: + if self.__ged_env.get_graph_num_nodes(graph_id) > 0: + all_graphs_empty = False + break + if all_graphs_empty: + raise Exception('All graphs in the collection are empty.') + + # Start timer and record start time. + start = time.time() + timer = Timer(self.__time_limit_in_sec) + self.__median_id = gen_median_id + self.__state = AlgorithmState.TERMINATED + + # Get NetworkX graph representations of the input graphs. + graphs = {} + for graph_id in graph_ids: + # @todo: get_nx_graph() function may need to be modified according to the coming code. + graphs[graph_id] = self.__ged_env.get_nx_graph(graph_id) +# print(self.__ged_env.get_graph_internal_id(0)) +# print(graphs[0].graph) +# print(graphs[0].nodes(data=True)) +# print(graphs[0].edges(data=True)) +# print(nx.adjacency_matrix(graphs[0])) + + # Construct initial medians. + medians = [] + self.__construct_initial_medians(graph_ids, timer, medians) + end_init = time.time() + self.__runtime_initialized = end_init - start +# print(medians[0].graph) +# print(medians[0].nodes(data=True)) +# print(medians[0].edges(data=True)) +# print(nx.adjacency_matrix(medians[0])) + + # Reset information about iterations and number of times the median decreases and increases. + self.__itrs = [0] * len(medians) + self.__num_decrease_order = 0 + self.__num_increase_order = 0 + self.__num_converged_descents = 0 + + # Initialize the best median. + best_sum_of_distances = np.inf + self.__best_init_sum_of_distances = np.inf + node_maps_from_best_median = {} + + # Run block gradient descent from all initial medians. + self.__ged_env.set_method(self.__descent_method, self.__descent_options) + for median_pos in range(0, len(medians)): + + # Terminate if the timer has expired and at least one SOD has been computed. + if timer.expired() and median_pos > 0: + break + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('\n===========================================================') + print('Block gradient descent for initial median', str(median_pos + 1), 'of', str(len(medians)), '.') + print('-----------------------------------------------------------') + + # Get reference to the median. + median = medians[median_pos] + + # Load initial median into the environment. + self.__ged_env.load_nx_graph(median, gen_median_id) + self.__ged_env.init(self.__ged_env.get_init_type()) + + # Compute node maps and sum of distances for initial median. +# xxx = self.__node_maps_from_median + self.__compute_init_node_maps(graph_ids, gen_median_id) +# yyy = self.__node_maps_from_median + + self.__best_init_sum_of_distances = min(self.__best_init_sum_of_distances, self.__sum_of_distances) + self.__ged_env.load_nx_graph(median, set_median_id) +# print(self.__best_init_sum_of_distances) + + # Run block gradient descent from initial median. + converged = False + itrs_without_update = 0 + while not self.__termination_criterion_met(converged, timer, self.__itrs[median_pos], itrs_without_update): + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('\n===========================================================') + print('Iteration', str(self.__itrs[median_pos] + 1), 'for initial median', str(median_pos + 1), 'of', str(len(medians)), '.') + print('-----------------------------------------------------------') + + # Initialize flags that tell us what happened in the iteration. + median_modified = False + node_maps_modified = False + decreased_order = False + increased_order = False + + # Update the median. + median_modified = self.__update_median(graphs, median) + if self.__update_order: + if not median_modified or self.__itrs[median_pos] == 0: + decreased_order = self.__decrease_order(graphs, median) + if not decreased_order or self.__itrs[median_pos] == 0: + increased_order = self.__increase_order(graphs, median) + + # Update the number of iterations without update of the median. + if median_modified or decreased_order or increased_order: + itrs_without_update = 0 + else: + itrs_without_update += 1 + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('Loading median to environment: ... ', end='') + + # Load the median into the environment. + # @todo: should this function use the original node label? + self.__ged_env.load_nx_graph(median, gen_median_id) + self.__ged_env.init(self.__ged_env.get_init_type()) + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('done.') + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('Updating induced costs: ... ', end='') + + # Compute induced costs of the old node maps w.r.t. the updated median. + for graph_id in graph_ids: +# print(self.__node_maps_from_median[graph_id].induced_cost()) +# xxx = self.__node_maps_from_median[graph_id] + self.__ged_env.compute_induced_cost(gen_median_id, graph_id, self.__node_maps_from_median[graph_id]) +# print('---------------------------------------') +# print(self.__node_maps_from_median[graph_id].induced_cost()) + # @todo:!!!!!!!!!!!!!!!!!!!!!!!!!!!!This value is a slight different from the c++ program, which might be a bug! Use it very carefully! + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('done.') + + # Update the node maps. + node_maps_modified = self.__update_node_maps() + + # Update the order of the median if no improvement can be found with the current order. + + # Update the sum of distances. + old_sum_of_distances = self.__sum_of_distances + self.__sum_of_distances = 0 + for graph_id, node_map in self.__node_maps_from_median.items(): + self.__sum_of_distances += node_map.induced_cost() +# print(self.__sum_of_distances) + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('Old local SOD: ', old_sum_of_distances) + print('New local SOD: ', self.__sum_of_distances) + print('Best converged SOD: ', best_sum_of_distances) + print('Modified median: ', median_modified) + print('Modified node maps: ', node_maps_modified) + print('Decreased order: ', decreased_order) + print('Increased order: ', increased_order) + print('===========================================================\n') + + converged = not (median_modified or node_maps_modified or decreased_order or increased_order) + + self.__itrs[median_pos] += 1 + + # Update the best median. + if self.__sum_of_distances < best_sum_of_distances: + best_sum_of_distances = self.__sum_of_distances + node_maps_from_best_median = self.__node_maps_from_median.copy() # @todo: this is a shallow copy, not sure if it is enough. + best_median = median + + # Update the number of converged descents. + if converged: + self.__num_converged_descents += 1 + + # Store the best encountered median. + self.__sum_of_distances = best_sum_of_distances + self.__node_maps_from_median = node_maps_from_best_median + self.__ged_env.load_nx_graph(best_median, gen_median_id) + self.__ged_env.init(self.__ged_env.get_init_type()) + end_descent = time.time() + self.__runtime_converged = end_descent - start + + # Refine the sum of distances and the node maps for the converged median. + self.__converged_sum_of_distances = self.__sum_of_distances + if self.__refine: + self.__improve_sum_of_distances(timer) + + # Record end time, set runtime and reset the number of initial medians. + end = time.time() + self.__runtime = end - start + self.__num_random_inits = self.__desired_num_random_inits + + # Print global information. + if self.__print_to_stdout != 0: + print('\n===========================================================') + print('Finished computation of generalized median graph.') + print('-----------------------------------------------------------') + print('Best SOD after initialization: ', self.__best_init_sum_of_distances) + print('Converged SOD: ', self.__converged_sum_of_distances) + if self.__refine: + print('Refined SOD: ', self.__sum_of_distances) + print('Overall runtime: ', self.__runtime) + print('Runtime of initialization: ', self.__runtime_initialized) + print('Runtime of block gradient descent: ', self.__runtime_converged - self.__runtime_initialized) + if self.__refine: + print('Runtime of refinement: ', self.__runtime - self.__runtime_converged) + print('Number of initial medians: ', len(medians)) + total_itr = 0 + num_started_descents = 0 + for itr in self.__itrs: + total_itr += itr + if itr > 0: + num_started_descents += 1 + print('Size of graph collection: ', len(graph_ids)) + print('Number of started descents: ', num_started_descents) + print('Number of converged descents: ', self.__num_converged_descents) + print('Overall number of iterations: ', total_itr) + print('Overall number of times the order decreased: ', self.__num_decrease_order) + print('Overall number of times the order increased: ', self.__num_increase_order) + print('===========================================================\n') + + + def __improve_sum_of_distances(self, timer): # @todo: go through and test + # Use method selected for refinement phase. + self.__ged_env.set_method(self.__refine_method, self.__refine_options) + + # Print information about current iteration. + if self.__print_to_stdout == 2: + progress = tqdm(desc='Improving node maps', total=len(self.__node_maps_from_median), file=sys.stdout) + print('\n===========================================================') + print('Improving node maps and SOD for converged median.') + print('-----------------------------------------------------------') + progress.update(1) + + # Improving the node maps. + nb_nodes_median = self.__ged_env.get_graph_num_nodes(self.__gen_median_id) + for graph_id, node_map in self.__node_maps_from_median.items(): + if time.expired(): + if self.__state == AlgorithmState.TERMINATED: + self.__state = AlgorithmState.CONVERGED + break + + nb_nodes_g = self.__ged_env.get_graph_num_nodes(graph_id) + if nb_nodes_median <= nb_nodes_g or not self.__sort_graphs: + self.__ged_env.run_method(self.__gen_median_id, graph_id) + if self.__ged_env.get_upper_bound(self.__gen_median_id, graph_id) < node_map.induced_cost(): + self.__node_maps_from_median[graph_id] = self.__ged_env.get_node_map(self.__gen_median_id, graph_id) + else: + self.__ged_env.run_method(graph_id, self.__gen_median_id) + if self.__ged_env.get_upper_bound(graph_id, self.__gen_median_id) < node_map.induced_cost(): + node_map_tmp = self.__ged_env.get_node_map(graph_id, self.__gen_median_id) + node_map_tmp.forward_map, node_map_tmp.backward_map = node_map_tmp.backward_map, node_map_tmp.forward_map + self.__node_maps_from_median[graph_id] = node_map_tmp + + self.__sum_of_distances += self.__node_maps_from_median[graph_id].induced_cost() + + # Print information. + if self.__print_to_stdout == 2: + progress.update(1) + + self.__sum_of_distances = 0.0 + for key, val in self.__node_maps_from_median.items(): + self.__sum_of_distances += val.induced_cost() + + # Print information. + if self.__print_to_stdout == 2: + print('===========================================================\n') + + + def __median_available(self): + return self.__median_id != np.inf + + + def get_state(self): + if not self.__median_available(): + raise Exception('No median has been computed. Call run() before calling get_state().') + return self.__state + + + def get_sum_of_distances(self, state=''): + """Returns the sum of distances. + + Parameters + ---------- + state : string + The state of the estimator. Can be 'initialized' or 'converged'. Default: "" + + Returns + ------- + float + The sum of distances (SOD) of the median when the estimator was in the state `state` during the last call to run(). If `state` is not given, the converged SOD (without refinement) or refined SOD (with refinement) is returned. + """ + if not self.__median_available(): + raise Exception('No median has been computed. Call run() before calling get_sum_of_distances().') + if state == 'initialized': + return self.__best_init_sum_of_distances + if state == 'converged': + return self.__converged_sum_of_distances + return self.__sum_of_distances + + + def get_runtime(self, state): + if not self.__median_available(): + raise Exception('No median has been computed. Call run() before calling get_runtime().') + if state == AlgorithmState.INITIALIZED: + return self.__runtime_initialized + if state == AlgorithmState.CONVERGED: + return self.__runtime_converged + return self.__runtime + + + def get_num_itrs(self): + if not self.__median_available(): + raise Exception('No median has been computed. Call run() before calling get_num_itrs().') + return self.__itrs + + + def get_num_times_order_decreased(self): + if not self.__median_available(): + raise Exception('No median has been computed. Call run() before calling get_num_times_order_decreased().') + return self.__num_decrease_order + + + def get_num_times_order_increased(self): + if not self.__median_available(): + raise Exception('No median has been computed. Call run() before calling get_num_times_order_increased().') + return self.__num_increase_order + + + def get_num_converged_descents(self): + if not self.__median_available(): + raise Exception('No median has been computed. Call run() before calling get_num_converged_descents().') + return self.__num_converged_descents + + + def get_ged_env(self): + return self.__ged_env + + + def __set_default_options(self): + self.__init_type = 'RANDOM' + self.__num_random_inits = 10 + self.__desired_num_random_inits = 10 + self.__use_real_randomness = True + self.__seed = 0 + self.__parallel = True + self.__update_order = True + self.__sort_graphs = True + self.__refine = True + self.__time_limit_in_sec = 0 + self.__epsilon = 0.0001 + self.__max_itrs = 100 + self.__max_itrs_without_update = 3 + self.__num_inits_increase_order = 10 + self.__init_type_increase_order = 'K-MEANS++' + self.__max_itrs_increase_order = 10 + self.__print_to_stdout = 2 + self.__label_names = {} + + + def __construct_initial_medians(self, graph_ids, timer, initial_medians): + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('\n===========================================================') + print('Constructing initial median(s).') + print('-----------------------------------------------------------') + + # Compute or sample the initial median(s). + initial_medians.clear() + if self.__init_type == 'MEDOID': + self.__compute_medoid(graph_ids, timer, initial_medians) + elif self.__init_type == 'MAX': + pass # @todo +# compute_max_order_graph_(graph_ids, initial_medians) + elif self.__init_type == 'MIN': + pass # @todo +# compute_min_order_graph_(graph_ids, initial_medians) + elif self.__init_type == 'MEAN': + pass # @todo +# compute_mean_order_graph_(graph_ids, initial_medians) + else: + pass # @todo +# sample_initial_medians_(graph_ids, initial_medians) + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('===========================================================') + + + def __compute_medoid(self, graph_ids, timer, initial_medians): + # Use method selected for initialization phase. + self.__ged_env.set_method(self.__init_method, self.__init_options) + + # Compute the medoid. + if self.__parallel: + # @todo: notice when parallel self.__ged_env is not modified. + sum_of_distances_list = [np.inf] * len(graph_ids) + len_itr = len(graph_ids) + itr = zip(graph_ids, range(0, len(graph_ids))) + n_jobs = multiprocessing.cpu_count() + if len_itr < 100 * n_jobs: + chunksize = int(len_itr / n_jobs) + 1 + else: + chunksize = 100 + def init_worker(ged_env_toshare): + global G_ged_env + G_ged_env = ged_env_toshare + do_fun = partial(_compute_medoid_parallel, graph_ids, self.__sort_graphs) + pool = Pool(processes=n_jobs, initializer=init_worker, initargs=(self.__ged_env,)) + if self.__print_to_stdout == 2: + iterator = tqdm(pool.imap_unordered(do_fun, itr, chunksize), + desc='Computing medoid', file=sys.stdout) + else: + iterator = pool.imap_unordered(do_fun, itr, chunksize) + for i, dis in iterator: + sum_of_distances_list[i] = dis + pool.close() + pool.join() + + medoid_id = np.argmin(sum_of_distances_list) + best_sum_of_distances = sum_of_distances_list[medoid_id] + + initial_medians.append(self.__ged_env.get_nx_graph(medoid_id)) # @todo + + else: + # Print information about current iteration. + if self.__print_to_stdout == 2: + progress = tqdm(desc='Computing medoid', total=len(graph_ids), file=sys.stdout) + + medoid_id = graph_ids[0] + best_sum_of_distances = np.inf + for g_id in graph_ids: + if timer.expired(): + self.__state = AlgorithmState.CALLED + break + nb_nodes_g = self.__ged_env.get_graph_num_nodes(g_id) + sum_of_distances = 0 + for h_id in graph_ids: # @todo: this can be faster, only a half is needed. + nb_nodes_h = self.__ged_env.get_graph_num_nodes(h_id) + if nb_nodes_g <= nb_nodes_h or not self.__sort_graphs: + self.__ged_env.run_method(g_id, h_id) # @todo + sum_of_distances += self.__ged_env.get_upper_bound(g_id, h_id) + else: + self.__ged_env.run_method(h_id, g_id) + sum_of_distances += self.__ged_env.get_upper_bound(h_id, g_id) + if sum_of_distances < best_sum_of_distances: + best_sum_of_distances = sum_of_distances + medoid_id = g_id + + # Print information about current iteration. + if self.__print_to_stdout == 2: + progress.update(1) + + initial_medians.append(self.__ged_env.get_nx_graph(medoid_id)) # @todo + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('\n') + + + def __compute_init_node_maps(self, graph_ids, gen_median_id): + # Compute node maps and sum of distances for initial median. + if self.__parallel: + # @todo: notice when parallel self.__ged_env is not modified. + self.__sum_of_distances = 0 + self.__node_maps_from_median.clear() + sum_of_distances_list = [0] * len(graph_ids) + + len_itr = len(graph_ids) + itr = graph_ids + n_jobs = multiprocessing.cpu_count() + if len_itr < 100 * n_jobs: + chunksize = int(len_itr / n_jobs) + 1 + else: + chunksize = 100 + def init_worker(ged_env_toshare): + global G_ged_env + G_ged_env = ged_env_toshare + nb_nodes_median = self.__ged_env.get_graph_num_nodes(gen_median_id) + do_fun = partial(_compute_init_node_maps_parallel, gen_median_id, self.__sort_graphs, nb_nodes_median) + pool = Pool(processes=n_jobs, initializer=init_worker, initargs=(self.__ged_env,)) + if self.__print_to_stdout == 2: + iterator = tqdm(pool.imap_unordered(do_fun, itr, chunksize), + desc='Computing initial node maps', file=sys.stdout) + else: + iterator = pool.imap_unordered(do_fun, itr, chunksize) + for g_id, sod, node_maps in iterator: + sum_of_distances_list[g_id] = sod + self.__node_maps_from_median[g_id] = node_maps + pool.close() + pool.join() + + self.__sum_of_distances = np.sum(sum_of_distances_list) +# xxx = self.__node_maps_from_median + + else: + # Print information about current iteration. + if self.__print_to_stdout == 2: + progress = tqdm(desc='Computing initial node maps', total=len(graph_ids), file=sys.stdout) + + self.__sum_of_distances = 0 + self.__node_maps_from_median.clear() + nb_nodes_median = self.__ged_env.get_graph_num_nodes(gen_median_id) + for graph_id in graph_ids: + nb_nodes_g = self.__ged_env.get_graph_num_nodes(graph_id) + if nb_nodes_median <= nb_nodes_g or not self.__sort_graphs: + self.__ged_env.run_method(gen_median_id, graph_id) + self.__node_maps_from_median[graph_id] = self.__ged_env.get_node_map(gen_median_id, graph_id) + else: + self.__ged_env.run_method(graph_id, gen_median_id) + node_map_tmp = self.__ged_env.get_node_map(graph_id, gen_median_id) + node_map_tmp.forward_map, node_map_tmp.backward_map = node_map_tmp.backward_map, node_map_tmp.forward_map + self.__node_maps_from_median[graph_id] = node_map_tmp + # print(self.__node_maps_from_median[graph_id]) + self.__sum_of_distances += self.__node_maps_from_median[graph_id].induced_cost() + # print(self.__sum_of_distances) + # Print information about current iteration. + if self.__print_to_stdout == 2: + progress.update(1) + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('\n') + + + def __termination_criterion_met(self, converged, timer, itr, itrs_without_update): + if timer.expired() or (itr >= self.__max_itrs if self.__max_itrs >= 0 else False): + if self.__state == AlgorithmState.TERMINATED: + self.__state = AlgorithmState.INITIALIZED + return True + return converged or (itrs_without_update > self.__max_itrs_without_update if self.__max_itrs_without_update >= 0 else False) + + + def __update_median(self, graphs, median): + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('Updating median: ', end='') + + # Store copy of the old median. + old_median = median.copy() # @todo: this is just a shallow copy. + + # Update the node labels. + if self.__labeled_nodes: + self.__update_node_labels(graphs, median) + + # Update the edges and their labels. + self.__update_edges(graphs, median) + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('done.') + + return not self.__are_graphs_equal(median, old_median) + + + def __update_node_labels(self, graphs, median): +# print('----------------------------') + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('nodes ... ', end='') + + # Iterate through all nodes of the median. + for i in range(0, nx.number_of_nodes(median)): +# print('i: ', i) + # Collect the labels of the substituted nodes. + node_labels = [] + for graph_id, graph in graphs.items(): +# print('graph_id: ', graph_id) +# print(self.__node_maps_from_median[graph_id]) +# print(self.__node_maps_from_median[graph_id].forward_map, self.__node_maps_from_median[graph_id].backward_map) + k = self.__node_maps_from_median[graph_id].image(i) +# print('k: ', k) + if k != np.inf: + node_labels.append(graph.nodes[k]) + + # Compute the median label and update the median. + if len(node_labels) > 0: +# median_label = self.__ged_env.get_median_node_label(node_labels) + median_label = self.__get_median_node_label(node_labels) + if self.__ged_env.get_node_rel_cost(median.nodes[i], median_label) > self.__epsilon: + nx.set_node_attributes(median, {i: median_label}) + + + def __update_edges(self, graphs, median): + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('edges ... ', end='') + +# # Clear the adjacency lists of the median and reset number of edges to 0. +# median_edges = list(median.edges) +# for (head, tail) in median_edges: +# median.remove_edge(head, tail) + + # @todo: what if edge is not labeled? + # Iterate through all possible edges (i,j) of the median. + for i in range(0, nx.number_of_nodes(median)): + for j in range(i + 1, nx.number_of_nodes(median)): + + # Collect the labels of the edges to which (i,j) is mapped by the node maps. + edge_labels = [] + for graph_id, graph in graphs.items(): + k = self.__node_maps_from_median[graph_id].image(i) + l = self.__node_maps_from_median[graph_id].image(j) + if k != np.inf and l != np.inf: + if graph.has_edge(k, l): + edge_labels.append(graph.edges[(k, l)]) + + # Compute the median edge label and the overall edge relabeling cost. + rel_cost = 0 + median_label = self.__ged_env.get_edge_label(1, to_dict=True) + if median.has_edge(i, j): + median_label = median.edges[(i, j)] + if self.__labeled_edges and len(edge_labels) > 0: + new_median_label = self.__get_median_edge_label(edge_labels) + if self.__ged_env.get_edge_rel_cost(median_label, new_median_label) > self.__epsilon: + median_label = new_median_label + for edge_label in edge_labels: + rel_cost += self.__ged_env.get_edge_rel_cost(median_label, edge_label) + + # Update the median. + if median.has_edge(i, j): + median.remove_edge(i, j) + if rel_cost < (self.__edge_ins_cost + self.__edge_del_cost) * len(edge_labels) - self.__edge_del_cost * len(graphs): + median.add_edge(i, j, **median_label) +# else: +# if median.has_edge(i, j): +# median.remove_edge(i, j) + + + def __update_node_maps(self): + # Update the node maps. + if self.__parallel: + # @todo: notice when parallel self.__ged_env is not modified. + node_maps_were_modified = False +# xxx = self.__node_maps_from_median.copy() + + len_itr = len(self.__node_maps_from_median) + itr = [item for item in self.__node_maps_from_median.items()] + n_jobs = multiprocessing.cpu_count() + if len_itr < 100 * n_jobs: + chunksize = int(len_itr / n_jobs) + 1 + else: + chunksize = 100 + def init_worker(ged_env_toshare): + global G_ged_env + G_ged_env = ged_env_toshare + nb_nodes_median = self.__ged_env.get_graph_num_nodes(self.__median_id) + do_fun = partial(_update_node_maps_parallel, self.__median_id, self.__epsilon, self.__sort_graphs, nb_nodes_median) + pool = Pool(processes=n_jobs, initializer=init_worker, initargs=(self.__ged_env,)) + if self.__print_to_stdout == 2: + iterator = tqdm(pool.imap_unordered(do_fun, itr, chunksize), + desc='Updating node maps', file=sys.stdout) + else: + iterator = pool.imap_unordered(do_fun, itr, chunksize) + for g_id, node_map, nm_modified in iterator: + self.__node_maps_from_median[g_id] = node_map + if nm_modified: + node_maps_were_modified = True + pool.close() + pool.join() +# yyy = self.__node_maps_from_median.copy() + + else: + # Print information about current iteration. + if self.__print_to_stdout == 2: + progress = tqdm(desc='Updating node maps', total=len(self.__node_maps_from_median), file=sys.stdout) + + node_maps_were_modified = False + nb_nodes_median = self.__ged_env.get_graph_num_nodes(self.__median_id) + for graph_id, node_map in self.__node_maps_from_median.items(): + nb_nodes_g = self.__ged_env.get_graph_num_nodes(graph_id) + + if nb_nodes_median <= nb_nodes_g or not self.__sort_graphs: + self.__ged_env.run_method(self.__median_id, graph_id) + if self.__ged_env.get_upper_bound(self.__median_id, graph_id) < node_map.induced_cost() - self.__epsilon: + # xxx = self.__node_maps_from_median[graph_id] + self.__node_maps_from_median[graph_id] = self.__ged_env.get_node_map(self.__median_id, graph_id) + node_maps_were_modified = True + + else: + self.__ged_env.run_method(graph_id, self.__median_id) + if self.__ged_env.get_upper_bound(graph_id, self.__median_id) < node_map.induced_cost() - self.__epsilon: + node_map_tmp = self.__ged_env.get_node_map(graph_id, self.__median_id) + node_map_tmp.forward_map, node_map_tmp.backward_map = node_map_tmp.backward_map, node_map_tmp.forward_map + self.__node_maps_from_median[graph_id] = node_map_tmp + node_maps_were_modified = True + + # Print information about current iteration. + if self.__print_to_stdout == 2: + progress.update(1) + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('\n') + + # Return true if the node maps were modified. + return node_maps_were_modified + + + def __decrease_order(self, graphs, median): + # Print information about current iteration + if self.__print_to_stdout == 2: + print('Trying to decrease order: ... ', end='') + + if nx.number_of_nodes(median) <= 1: + if self.__print_to_stdout == 2: + print('median graph has only 1 node, skip decrease.') + return False + + # Initialize ID of the node that is to be deleted. + id_deleted_node = [None] # @todo: or np.inf + decreased_order = False + + # Decrease the order as long as the best deletion delta is negative. + while self.__compute_best_deletion_delta(graphs, median, id_deleted_node) < -self.__epsilon: + decreased_order = True + self.__delete_node_from_median(id_deleted_node[0], median) + if nx.number_of_nodes(median) <= 1: + if self.__print_to_stdout == 2: + print('decrease stopped because median graph remains only 1 node. ', end='') + break + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('done.') + + # Return true iff the order was decreased. + return decreased_order + + + def __compute_best_deletion_delta(self, graphs, median, id_deleted_node): + best_delta = 0.0 + + # Determine node that should be deleted (if any). + for i in range(0, nx.number_of_nodes(median)): + # Compute cost delta. + delta = 0.0 + for graph_id, graph in graphs.items(): + k = self.__node_maps_from_median[graph_id].image(i) + if k == np.inf: + delta -= self.__node_del_cost + else: + delta += self.__node_ins_cost - self.__ged_env.get_node_rel_cost(median.nodes[i], graph.nodes[k]) + for j, j_label in median[i].items(): + l = self.__node_maps_from_median[graph_id].image(j) + if k == np.inf or l == np.inf: + delta -= self.__edge_del_cost + elif not graph.has_edge(k, l): + delta -= self.__edge_del_cost + else: + delta += self.__edge_ins_cost - self.__ged_env.get_edge_rel_cost(j_label, graph.edges[(k, l)]) + + # Update best deletion delta. + if delta < best_delta - self.__epsilon: + best_delta = delta + id_deleted_node[0] = i +# id_deleted_node[0] = 3 # @todo: + + return best_delta + + + def __delete_node_from_median(self, id_deleted_node, median): + # Update the median. + mapping = {} + for i in range(0, nx.number_of_nodes(median)): + if i != id_deleted_node: + new_i = (i if i < id_deleted_node else (i - 1)) + mapping[i] = new_i + median.remove_node(id_deleted_node) + nx.relabel_nodes(median, mapping, copy=False) + + # Update the node maps. +# xxx = self.__node_maps_from_median + for key, node_map in self.__node_maps_from_median.items(): + new_node_map = NodeMap(nx.number_of_nodes(median), node_map.num_target_nodes()) + is_unassigned_target_node = [True] * node_map.num_target_nodes() + for i in range(0, nx.number_of_nodes(median) + 1): + if i != id_deleted_node: + new_i = (i if i < id_deleted_node else (i - 1)) + k = node_map.image(i) + new_node_map.add_assignment(new_i, k) + if k != np.inf: + is_unassigned_target_node[k] = False + for k in range(0, node_map.num_target_nodes()): + if is_unassigned_target_node[k]: + new_node_map.add_assignment(np.inf, k) +# print(self.__node_maps_from_median[key].forward_map, self.__node_maps_from_median[key].backward_map) +# print(new_node_map.forward_map, new_node_map.backward_map + self.__node_maps_from_median[key] = new_node_map + + # Increase overall number of decreases. + self.__num_decrease_order += 1 + + + def __increase_order(self, graphs, median): + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('Trying to increase order: ... ', end='') + + # Initialize the best configuration and the best label of the node that is to be inserted. + best_config = {} + best_label = self.__ged_env.get_node_label(1, to_dict=True) + increased_order = False + + # Increase the order as long as the best insertion delta is negative. + while self.__compute_best_insertion_delta(graphs, best_config, best_label) < - self.__epsilon: + increased_order = True + self.__add_node_to_median(best_config, best_label, median) + + # Print information about current iteration. + if self.__print_to_stdout == 2: + print('done.') + + # Return true iff the order was increased. + return increased_order + + + def __compute_best_insertion_delta(self, graphs, best_config, best_label): + # Construct sets of inserted nodes. + no_inserted_node = True + inserted_nodes = {} + for graph_id, graph in graphs.items(): + inserted_nodes[graph_id] = [] + best_config[graph_id] = np.inf + for k in range(nx.number_of_nodes(graph)): + if self.__node_maps_from_median[graph_id].pre_image(k) == np.inf: + no_inserted_node = False + inserted_nodes[graph_id].append((k, tuple(item for item in graph.nodes[k].items()))) # @todo: can order of label names be garantteed? + + # Return 0.0 if no node is inserted in any of the graphs. + if no_inserted_node: + return 0.0 + + # Compute insertion configuration, label, and delta. + best_delta = 0.0 # @todo + if len(self.__label_names['node_labels']) == 0 and len(self.__label_names['node_attrs']) == 0: # @todo + best_delta = self.__compute_insertion_delta_unlabeled(inserted_nodes, best_config, best_label) + elif len(self.__label_names['node_labels']) > 0: # self.__constant_node_costs: + best_delta = self.__compute_insertion_delta_constant(inserted_nodes, best_config, best_label) + else: + best_delta = self.__compute_insertion_delta_generic(inserted_nodes, best_config, best_label) + + # Return the best delta. + return best_delta + + + def __compute_insertion_delta_unlabeled(self, inserted_nodes, best_config, best_label): # @todo: go through and test. + # Construct the nest configuration and compute its insertion delta. + best_delta = 0.0 + best_config.clear() + for graph_id, node_set in inserted_nodes.items(): + if len(node_set) == 0: + best_config[graph_id] = np.inf + best_delta += self.__node_del_cost + else: + best_config[graph_id] = node_set[0][0] + best_delta -= self.__node_ins_cost + + # Return the best insertion delta. + return best_delta + + + def __compute_insertion_delta_constant(self, inserted_nodes, best_config, best_label): + # Construct histogram and inverse label maps. + hist = {} + inverse_label_maps = {} + for graph_id, node_set in inserted_nodes.items(): + inverse_label_maps[graph_id] = {} + for node in node_set: + k = node[0] + label = node[1] + if label not in inverse_label_maps[graph_id]: + inverse_label_maps[graph_id][label] = k + if label not in hist: + hist[label] = 1 + else: + hist[label] += 1 + + # Determine the best label. + best_count = 0 + for key, val in hist.items(): + if val > best_count: + best_count = val + best_label_tuple = key + + # get best label. + best_label.clear() + for key, val in best_label_tuple: + best_label[key] = val + + # Construct the best configuration and compute its insertion delta. + best_config.clear() + best_delta = 0.0 + node_rel_cost = self.__ged_env.get_node_rel_cost(self.__ged_env.get_node_label(1, to_dict=False), self.__ged_env.get_node_label(2, to_dict=False)) + triangle_ineq_holds = (node_rel_cost <= self.__node_del_cost + self.__node_ins_cost) + for graph_id, _ in inserted_nodes.items(): + if best_label_tuple in inverse_label_maps[graph_id]: + best_config[graph_id] = inverse_label_maps[graph_id][best_label_tuple] + best_delta -= self.__node_ins_cost + elif triangle_ineq_holds and not len(inserted_nodes[graph_id]) == 0: + best_config[graph_id] = inserted_nodes[graph_id][0][0] + best_delta += node_rel_cost - self.__node_ins_cost + else: + best_config[graph_id] = np.inf + best_delta += self.__node_del_cost + + # Return the best insertion delta. + return best_delta + + + def __compute_insertion_delta_generic(self, inserted_nodes, best_config, best_label): + # Collect all node labels of inserted nodes. + node_labels = [] + for _, node_set in inserted_nodes.items(): + for node in node_set: + node_labels.append(node[1]) + + # Compute node label medians that serve as initial solutions for block gradient descent. + initial_node_labels = [] + self.__compute_initial_node_labels(node_labels, initial_node_labels) + + # Determine best insertion configuration, label, and delta via parallel block gradient descent from all initial node labels. + best_delta = 0.0 + for node_label in initial_node_labels: + # Construct local configuration. + config = {} + for graph_id, _ in inserted_nodes.items(): + config[graph_id] = tuple((np.inf, self.__ged_env.get_node_label(1, to_dict=False))) + + # Run block gradient descent. + converged = False + itr = 0 + while not self.__insertion_termination_criterion_met(converged, itr): + converged = not self.__update_config(node_label, inserted_nodes, config, node_labels) + node_label_dict = dict(node_label) + converged = converged and (not self.__update_node_label([dict(item) for item in node_labels], node_label_dict)) # @todo: the dict is tupled again in the function, can be better. + node_label = tuple(item for item in node_label_dict.items()) # @todo: watch out: initial_node_labels[i] is not modified here. + + itr += 1 + + # Compute insertion delta of converged solution. + delta = 0.0 + for _, node in config.items(): + if node[0] == np.inf: + delta += self.__node_del_cost + else: + delta += self.__ged_env.get_node_rel_cost(dict(node_label), dict(node[1])) - self.__node_ins_cost + + # Update best delta and global configuration if improvement has been found. + if delta < best_delta - self.__epsilon: + best_delta = delta + best_label.clear() + for key, val in node_label: + best_label[key] = val + best_config.clear() + for graph_id, val in config.items(): + best_config[graph_id] = val[0] + + # Return the best delta. + return best_delta + + + def __compute_initial_node_labels(self, node_labels, median_labels): + median_labels.clear() + if self.__use_real_randomness: # @todo: may not work if parallelized. + rng = np.random.randint(0, high=2**32 - 1, size=1) + urng = np.random.RandomState(seed=rng[0]) + else: + urng = np.random.RandomState(seed=self.__seed) + + # Generate the initial node label medians. + if self.__init_type_increase_order == 'K-MEANS++': + # Use k-means++ heuristic to generate the initial node label medians. + already_selected = [False] * len(node_labels) + selected_label_id = urng.randint(low=0, high=len(node_labels), size=1)[0] # c++ test: 23 + median_labels.append(node_labels[selected_label_id]) + already_selected[selected_label_id] = True +# xxx = [41, 0, 18, 9, 6, 14, 21, 25, 33] for c++ test +# iii = 0 for c++ test + while len(median_labels) < self.__num_inits_increase_order: + weights = [np.inf] * len(node_labels) + for label_id in range(0, len(node_labels)): + if already_selected[label_id]: + weights[label_id] = 0 + continue + for label in median_labels: + weights[label_id] = min(weights[label_id], self.__ged_env.get_node_rel_cost(dict(label), dict(node_labels[label_id]))) + + # get non-zero weights. + weights_p, idx_p = [], [] + for i, w in enumerate(weights): + if w != 0: + weights_p.append(w) + idx_p.append(i) + if len(weights_p) > 0: + p = np.array(weights_p) / np.sum(weights_p) + selected_label_id = urng.choice(range(0, len(weights_p)), size=1, p=p)[0] # for c++ test: xxx[iii] + selected_label_id = idx_p[selected_label_id] +# iii += 1 for c++ test + median_labels.append(node_labels[selected_label_id]) + already_selected[selected_label_id] = True + else: # skip the loop when all node_labels are selected. This happens when len(node_labels) <= self.__num_inits_increase_order. + break + else: + # Compute the initial node medians as the medians of randomly generated clusters of (roughly) equal size. + # @todo: go through and test. + shuffled_node_labels = [np.inf] * len(node_labels) #@todo: random? + # @todo: std::shuffle(shuffled_node_labels.begin(), shuffled_node_labels.end(), urng);? + cluster_size = len(node_labels) / self.__num_inits_increase_order + pos = 0.0 + cluster = [] + while len(median_labels) < self.__num_inits_increase_order - 1: + while pos < (len(median_labels) + 1) * cluster_size: + cluster.append(shuffled_node_labels[pos]) + pos += 1 + median_labels.append(self.__get_median_node_label(cluster)) + cluster.clear() + while pos < len(shuffled_node_labels): + pos += 1 + cluster.append(shuffled_node_labels[pos]) + median_labels.append(self.__get_median_node_label(cluster)) + cluster.clear() + + # Run Lloyd's Algorithm. + converged = False + closest_median_ids = [np.inf] * len(node_labels) + clusters = [[] for _ in range(len(median_labels))] + itr = 1 + while not self.__insertion_termination_criterion_met(converged, itr): + converged = not self.__update_clusters(node_labels, median_labels, closest_median_ids) + if not converged: + for cluster in clusters: + cluster.clear() + for label_id in range(0, len(node_labels)): + clusters[closest_median_ids[label_id]].append(node_labels[label_id]) + for cluster_id in range(0, len(clusters)): + node_label = dict(median_labels[cluster_id]) + self.__update_node_label([dict(item) for item in clusters[cluster_id]], node_label) # @todo: the dict is tupled again in the function, can be better. + median_labels[cluster_id] = tuple(item for item in node_label.items()) + itr += 1 + + + def __insertion_termination_criterion_met(self, converged, itr): + return converged or (itr >= self.__max_itrs_increase_order if self.__max_itrs_increase_order > 0 else False) + + + def __update_config(self, node_label, inserted_nodes, config, node_labels): + # Determine the best configuration. + config_modified = False + for graph_id, node_set in inserted_nodes.items(): + best_assignment = config[graph_id] + best_cost = 0.0 + if best_assignment[0] == np.inf: + best_cost = self.__node_del_cost + else: + best_cost = self.__ged_env.get_node_rel_cost(dict(node_label), dict(best_assignment[1])) - self.__node_ins_cost + for node in node_set: + cost = self.__ged_env.get_node_rel_cost(dict(node_label), dict(node[1])) - self.__node_ins_cost + if cost < best_cost - self.__epsilon: + best_cost = cost + best_assignment = node + config_modified = True + if self.__node_del_cost < best_cost - self.__epsilon: + best_cost = self.__node_del_cost + best_assignment = tuple((np.inf, best_assignment[1])) + config_modified = True + config[graph_id] = best_assignment + + # Collect the node labels contained in the best configuration. + node_labels.clear() + for key, val in config.items(): + if val[0] != np.inf: + node_labels.append(val[1]) + + # Return true if the configuration was modified. + return config_modified + + + def __update_node_label(self, node_labels, node_label): + if len(node_labels) == 0: # @todo: check if this is the correct solution. Especially after calling __update_config(). + return False + new_node_label = self.__get_median_node_label(node_labels) + if self.__ged_env.get_node_rel_cost(new_node_label, node_label) > self.__epsilon: + node_label.clear() + for key, val in new_node_label.items(): + node_label[key] = val + return True + return False + + + def __update_clusters(self, node_labels, median_labels, closest_median_ids): + # Determine the closest median for each node label. + clusters_modified = False + for label_id in range(0, len(node_labels)): + closest_median_id = np.inf + dist_to_closest_median = np.inf + for median_id in range(0, len(median_labels)): + dist_to_median = self.__ged_env.get_node_rel_cost(dict(median_labels[median_id]), dict(node_labels[label_id])) + if dist_to_median < dist_to_closest_median - self.__epsilon: + dist_to_closest_median = dist_to_median + closest_median_id = median_id + if closest_median_id != closest_median_ids[label_id]: + closest_median_ids[label_id] = closest_median_id + clusters_modified = True + + # Return true if the clusters were modified. + return clusters_modified + + + def __add_node_to_median(self, best_config, best_label, median): + # Update the median. + nb_nodes_median = nx.number_of_nodes(median) + median.add_node(nb_nodes_median, **best_label) + + # Update the node maps. + for graph_id, node_map in self.__node_maps_from_median.items(): + node_map_as_rel = [] + node_map.as_relation(node_map_as_rel) + new_node_map = NodeMap(nx.number_of_nodes(median), node_map.num_target_nodes()) + for assignment in node_map_as_rel: + new_node_map.add_assignment(assignment[0], assignment[1]) + new_node_map.add_assignment(nx.number_of_nodes(median) - 1, best_config[graph_id]) + self.__node_maps_from_median[graph_id] = new_node_map + + # Increase overall number of increases. + self.__num_increase_order += 1 + + + def __are_graphs_equal(self, g1, g2): + """ + Check if the two graphs are equal. + + Parameters + ---------- + g1 : NetworkX graph object + Graph 1 to be compared. + + g2 : NetworkX graph object + Graph 2 to be compared. + + Returns + ------- + bool + True if the two graph are equal. + + Notes + ----- + This is not an identical check. Here the two graphs are equal if and only if their original_node_ids, nodes, all node labels, edges and all edge labels are equal. This function is specifically designed for class `MedianGraphEstimator` and should not be used elsewhere. + """ + # check original node ids. + if not g1.graph['original_node_ids'] == g2.graph['original_node_ids']: + return False # @todo: why check this? + # check nodes. + nlist1 = [n for n in g1.nodes(data=True)] # @todo: shallow? + nlist2 = [n for n in g2.nodes(data=True)] + if not nlist1 == nlist2: + return False + # check edges. + elist1 = [n for n in g1.edges(data=True)] + elist2 = [n for n in g2.edges(data=True)] + if not elist1 == elist2: + return False + + return True + + + def compute_my_cost(g, h, node_map): + cost = 0.0 + for node in g.nodes: + cost += 0 + + + def set_label_names(self, node_labels=[], edge_labels=[], node_attrs=[], edge_attrs=[]): + self.__label_names = {'node_labels': node_labels, 'edge_labels': edge_labels, + 'node_attrs': node_attrs, 'edge_attrs': edge_attrs} + + + def __get_median_node_label(self, node_labels): + if len(self.__label_names['node_labels']) > 0: + return self.__get_median_label_symbolic(node_labels) + elif len(self.__label_names['node_attrs']) > 0: + return self.__get_median_label_nonsymbolic(node_labels) + else: + raise Exception('Node label names are not given.') + + + def __get_median_edge_label(self, edge_labels): + if len(self.__label_names['edge_labels']) > 0: + return self.__get_median_label_symbolic(edge_labels) + elif len(self.__label_names['edge_attrs']) > 0: + return self.__get_median_label_nonsymbolic(edge_labels) + else: + raise Exception('Edge label names are not given.') + + + def __get_median_label_symbolic(self, labels): + # Construct histogram. + hist = {} + for label in labels: + label = tuple([kv for kv in label.items()]) # @todo: this may be slow. + if label not in hist: + hist[label] = 1 + else: + hist[label] += 1 + + # Return the label that appears most frequently. + best_count = 0 + median_label = {} + for label, count in hist.items(): + if count > best_count: + best_count = count + median_label = {kv[0]: kv[1] for kv in label} + + return median_label + + + def __get_median_label_nonsymbolic(self, labels): + if len(labels) == 0: + return {} # @todo + else: + # Transform the labels into coordinates and compute mean label as initial solution. + labels_as_coords = [] + sums = {} + for key, val in labels[0].items(): + sums[key] = 0 + for label in labels: + coords = {} + for key, val in label.items(): + label_f = float(val) + sums[key] += label_f + coords[key] = label_f + labels_as_coords.append(coords) + median = {} + for key, val in sums.items(): + median[key] = val / len(labels) + + # Run main loop of Weiszfeld's Algorithm. + epsilon = 0.0001 + delta = 1.0 + num_itrs = 0 + all_equal = False + while ((delta > epsilon) and (num_itrs < 100) and (not all_equal)): + numerator = {} + for key, val in sums.items(): + numerator[key] = 0 + denominator = 0 + for label_as_coord in labels_as_coords: + norm = 0 + for key, val in label_as_coord.items(): + norm += (val - median[key]) ** 2 + norm = np.sqrt(norm) + if norm > 0: + for key, val in label_as_coord.items(): + numerator[key] += val / norm + denominator += 1.0 / norm + if denominator == 0: + all_equal = True + else: + new_median = {} + delta = 0.0 + for key, val in numerator.items(): + this_median = val / denominator + new_median[key] = this_median + delta += np.abs(median[key] - this_median) + median = new_median + + num_itrs += 1 + + # Transform the solution to strings and return it. + median_label = {} + for key, val in median.items(): + median_label[key] = str(val) + return median_label + + +# def __get_median_edge_label_symbolic(self, edge_labels): +# pass + + +# def __get_median_edge_label_nonsymbolic(self, edge_labels): +# if len(edge_labels) == 0: +# return {} +# else: +# # Transform the labels into coordinates and compute mean label as initial solution. +# edge_labels_as_coords = [] +# sums = {} +# for key, val in edge_labels[0].items(): +# sums[key] = 0 +# for edge_label in edge_labels: +# coords = {} +# for key, val in edge_label.items(): +# label = float(val) +# sums[key] += label +# coords[key] = label +# edge_labels_as_coords.append(coords) +# median = {} +# for key, val in sums.items(): +# median[key] = val / len(edge_labels) +# +# # Run main loop of Weiszfeld's Algorithm. +# epsilon = 0.0001 +# delta = 1.0 +# num_itrs = 0 +# all_equal = False +# while ((delta > epsilon) and (num_itrs < 100) and (not all_equal)): +# numerator = {} +# for key, val in sums.items(): +# numerator[key] = 0 +# denominator = 0 +# for edge_label_as_coord in edge_labels_as_coords: +# norm = 0 +# for key, val in edge_label_as_coord.items(): +# norm += (val - median[key]) ** 2 +# norm += np.sqrt(norm) +# if norm > 0: +# for key, val in edge_label_as_coord.items(): +# numerator[key] += val / norm +# denominator += 1.0 / norm +# if denominator == 0: +# all_equal = True +# else: +# new_median = {} +# delta = 0.0 +# for key, val in numerator.items(): +# this_median = val / denominator +# new_median[key] = this_median +# delta += np.abs(median[key] - this_median) +# median = new_median +# +# num_itrs += 1 +# +# # Transform the solution to ged::GXLLabel and return it. +# median_label = {} +# for key, val in median.items(): +# median_label[key] = str(val) +# return median_label + + +def _compute_medoid_parallel(graph_ids, sort, itr): + g_id = itr[0] + i = itr[1] + # @todo: timer not considered here. +# if timer.expired(): +# self.__state = AlgorithmState.CALLED +# break + nb_nodes_g = G_ged_env.get_graph_num_nodes(g_id) + sum_of_distances = 0 + for h_id in graph_ids: + nb_nodes_h = G_ged_env.get_graph_num_nodes(h_id) + if nb_nodes_g <= nb_nodes_h or not sort: + G_ged_env.run_method(g_id, h_id) + sum_of_distances += G_ged_env.get_upper_bound(g_id, h_id) + else: + G_ged_env.run_method(h_id, g_id) + sum_of_distances += G_ged_env.get_upper_bound(h_id, g_id) + return i, sum_of_distances + + +def _compute_init_node_maps_parallel(gen_median_id, sort, nb_nodes_median, itr): + graph_id = itr + nb_nodes_g = G_ged_env.get_graph_num_nodes(graph_id) + if nb_nodes_median <= nb_nodes_g or not sort: + G_ged_env.run_method(gen_median_id, graph_id) + node_map = G_ged_env.get_node_map(gen_median_id, graph_id) +# print(self.__node_maps_from_median[graph_id]) + else: + G_ged_env.run_method(graph_id, gen_median_id) + node_map = G_ged_env.get_node_map(graph_id, gen_median_id) + node_map.forward_map, node_map.backward_map = node_map.backward_map, node_map.forward_map + sum_of_distance = node_map.induced_cost() +# print(self.__sum_of_distances) + return graph_id, sum_of_distance, node_map + + +def _update_node_maps_parallel(median_id, epsilon, sort, nb_nodes_median, itr): + graph_id = itr[0] + node_map = itr[1] + + node_maps_were_modified = False + nb_nodes_g = G_ged_env.get_graph_num_nodes(graph_id) + if nb_nodes_median <= nb_nodes_g or not sort: + G_ged_env.run_method(median_id, graph_id) + if G_ged_env.get_upper_bound(median_id, graph_id) < node_map.induced_cost() - epsilon: + node_map = G_ged_env.get_node_map(median_id, graph_id) + node_maps_were_modified = True + else: + G_ged_env.run_method(graph_id, median_id) + if G_ged_env.get_upper_bound(graph_id, median_id) < node_map.induced_cost() - epsilon: + node_map = G_ged_env.get_node_map(graph_id, median_id) + node_map.forward_map, node_map.backward_map = node_map.backward_map, node_map.forward_map + node_maps_were_modified = True + + return graph_id, node_map, node_maps_were_modified \ No newline at end of file diff --git a/lang/fr/gklearn/ged/median/test_median_graph_estimator.py b/lang/fr/gklearn/ged/median/test_median_graph_estimator.py new file mode 100644 index 0000000000..60bce83260 --- /dev/null +++ b/lang/fr/gklearn/ged/median/test_median_graph_estimator.py @@ -0,0 +1,159 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Mon Mar 16 17:26:40 2020 + +@author: ljia +""" + +def test_median_graph_estimator(): + from gklearn.utils import load_dataset + from gklearn.ged.median import MedianGraphEstimator, constant_node_costs + from gklearn.gedlib import librariesImport, gedlibpy + from gklearn.preimage.utils import get_same_item_indices + import multiprocessing + + # estimator parameters. + init_type = 'MEDOID' + num_inits = 1 + threads = multiprocessing.cpu_count() + time_limit = 60000 + + # algorithm parameters. + algo = 'IPFP' + initial_solutions = 1 + algo_options_suffix = ' --initial-solutions ' + str(initial_solutions) + ' --ratio-runs-from-initial-solutions 1 --initialization-method NODE ' + + edit_cost_name = 'LETTER2' + edit_cost_constants = [0.02987291, 0.0178211, 0.01431966, 0.001, 0.001] + ds_name = 'Letter_high' + + # Load dataset. + # dataset = '../../datasets/COIL-DEL/COIL-DEL_A.txt' + dataset = '../../../datasets/Letter-high/Letter-high_A.txt' + Gn, y_all, label_names = load_dataset(dataset) + y_idx = get_same_item_indices(y_all) + for i, (y, values) in enumerate(y_idx.items()): + Gn_i = [Gn[val] for val in values] + break + + # Set up the environment. + ged_env = gedlibpy.GEDEnv() + # gedlibpy.restart_env() + ged_env.set_edit_cost(edit_cost_name, edit_cost_constant=edit_cost_constants) + for G in Gn_i: + ged_env.add_nx_graph(G, '') + graph_ids = ged_env.get_all_graph_ids() + set_median_id = ged_env.add_graph('set_median') + gen_median_id = ged_env.add_graph('gen_median') + ged_env.init(init_option='EAGER_WITHOUT_SHUFFLED_COPIES') + + # Set up the estimator. + mge = MedianGraphEstimator(ged_env, constant_node_costs(edit_cost_name)) + mge.set_refine_method(algo, '--threads ' + str(threads) + ' --initial-solutions ' + str(initial_solutions) + ' --ratio-runs-from-initial-solutions 1') + + mge_options = '--time-limit ' + str(time_limit) + ' --stdout 2 --init-type ' + init_type + mge_options += ' --random-inits ' + str(num_inits) + ' --seed ' + '1' + ' --update-order TRUE --refine FALSE --randomness PSEUDO --parallel TRUE '# @todo: std::to_string(rng()) + + # Select the GED algorithm. + algo_options = '--threads ' + str(threads) + algo_options_suffix + mge.set_options(mge_options) + mge.set_label_names(node_labels=label_names['node_labels'], + edge_labels=label_names['edge_labels'], + node_attrs=label_names['node_attrs'], + edge_attrs=label_names['edge_attrs']) + mge.set_init_method(algo, algo_options) + mge.set_descent_method(algo, algo_options) + + # Run the estimator. + mge.run(graph_ids, set_median_id, gen_median_id) + + # Get SODs. + sod_sm = mge.get_sum_of_distances('initialized') + sod_gm = mge.get_sum_of_distances('converged') + print('sod_sm, sod_gm: ', sod_sm, sod_gm) + + # Get median graphs. + set_median = ged_env.get_nx_graph(set_median_id) + gen_median = ged_env.get_nx_graph(gen_median_id) + + return set_median, gen_median + + +def test_median_graph_estimator_symb(): + from gklearn.utils import load_dataset + from gklearn.ged.median import MedianGraphEstimator, constant_node_costs + from gklearn.gedlib import librariesImport, gedlibpy + from gklearn.preimage.utils import get_same_item_indices + import multiprocessing + + # estimator parameters. + init_type = 'MEDOID' + num_inits = 1 + threads = multiprocessing.cpu_count() + time_limit = 60000 + + # algorithm parameters. + algo = 'IPFP' + initial_solutions = 1 + algo_options_suffix = ' --initial-solutions ' + str(initial_solutions) + ' --ratio-runs-from-initial-solutions 1 --initialization-method NODE ' + + edit_cost_name = 'CONSTANT' + edit_cost_constants = [4, 4, 2, 1, 1, 1] + ds_name = 'MUTAG' + + # Load dataset. + dataset = '../../../datasets/MUTAG/MUTAG_A.txt' + Gn, y_all, label_names = load_dataset(dataset) + y_idx = get_same_item_indices(y_all) + for i, (y, values) in enumerate(y_idx.items()): + Gn_i = [Gn[val] for val in values] + break + Gn_i = Gn_i[0:10] + + # Set up the environment. + ged_env = gedlibpy.GEDEnv() + # gedlibpy.restart_env() + ged_env.set_edit_cost(edit_cost_name, edit_cost_constant=edit_cost_constants) + for G in Gn_i: + ged_env.add_nx_graph(G, '') + graph_ids = ged_env.get_all_graph_ids() + set_median_id = ged_env.add_graph('set_median') + gen_median_id = ged_env.add_graph('gen_median') + ged_env.init(init_option='EAGER_WITHOUT_SHUFFLED_COPIES') + + # Set up the estimator. + mge = MedianGraphEstimator(ged_env, constant_node_costs(edit_cost_name)) + mge.set_refine_method(algo, '--threads ' + str(threads) + ' --initial-solutions ' + str(initial_solutions) + ' --ratio-runs-from-initial-solutions 1') + + mge_options = '--time-limit ' + str(time_limit) + ' --stdout 2 --init-type ' + init_type + mge_options += ' --random-inits ' + str(num_inits) + ' --seed ' + '1' + ' --update-order TRUE --refine FALSE --randomness PSEUDO --parallel TRUE '# @todo: std::to_string(rng()) + + # Select the GED algorithm. + algo_options = '--threads ' + str(threads) + algo_options_suffix + mge.set_options(mge_options) + mge.set_label_names(node_labels=label_names['node_labels'], + edge_labels=label_names['edge_labels'], + node_attrs=label_names['node_attrs'], + edge_attrs=label_names['edge_attrs']) + mge.set_init_method(algo, algo_options) + mge.set_descent_method(algo, algo_options) + + # Run the estimator. + mge.run(graph_ids, set_median_id, gen_median_id) + + # Get SODs. + sod_sm = mge.get_sum_of_distances('initialized') + sod_gm = mge.get_sum_of_distances('converged') + print('sod_sm, sod_gm: ', sod_sm, sod_gm) + + # Get median graphs. + set_median = ged_env.get_nx_graph(set_median_id) + gen_median = ged_env.get_nx_graph(gen_median_id) + + return set_median, gen_median + + +if __name__ == '__main__': + # set_median, gen_median = test_median_graph_estimator() + set_median, gen_median = test_median_graph_estimator_symb() \ No newline at end of file diff --git a/lang/fr/gklearn/ged/median/utils.py b/lang/fr/gklearn/ged/median/utils.py new file mode 100644 index 0000000000..d27c86da51 --- /dev/null +++ b/lang/fr/gklearn/ged/median/utils.py @@ -0,0 +1,63 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Wed Apr 1 15:12:31 2020 + +@author: ljia +""" + +def constant_node_costs(edit_cost_name): + if edit_cost_name == 'NON_SYMBOLIC' or edit_cost_name == 'LETTER2' or edit_cost_name == 'LETTER': + return False + elif edit_cost_name == 'CONSTANT': + return True + else: + raise Exception('Can not recognize the given edit cost. Possible edit costs include: "NON_SYMBOLIC", "LETTER", "LETTER2", "CONSTANT".') +# elif edit_cost_name != '': +# # throw ged::Error("Invalid dataset " + dataset + ". Usage: ./median_tests "); +# return False + # return True + + +def mge_options_to_string(options): + opt_str = ' ' + for key, val in options.items(): + if key == 'init_type': + opt_str += '--init-type ' + str(val) + ' ' + elif key == 'random_inits': + opt_str += '--random-inits ' + str(val) + ' ' + elif key == 'randomness': + opt_str += '--randomness ' + str(val) + ' ' + elif key == 'verbose': + opt_str += '--stdout ' + str(val) + ' ' + elif key == 'parallel': + opt_str += '--parallel ' + ('TRUE' if val else 'FALSE') + ' ' + elif key == 'update_order': + opt_str += '--update-order ' + ('TRUE' if val else 'FALSE') + ' ' + elif key == 'sort_graphs': + opt_str += '--sort-graphs ' + ('TRUE' if val else 'FALSE') + ' ' + elif key == 'refine': + opt_str += '--refine ' + ('TRUE' if val else 'FALSE') + ' ' + elif key == 'time_limit': + opt_str += '--time-limit ' + str(val) + ' ' + elif key == 'max_itrs': + opt_str += '--max-itrs ' + str(val) + ' ' + elif key == 'max_itrs_without_update': + opt_str += '--max-itrs-without-update ' + str(val) + ' ' + elif key == 'seed': + opt_str += '--seed ' + str(val) + ' ' + elif key == 'epsilon': + opt_str += '--epsilon ' + str(val) + ' ' + elif key == 'inits_increase_order': + opt_str += '--inits-increase-order ' + str(val) + ' ' + elif key == 'init_type_increase_order': + opt_str += '--init-type-increase-order ' + str(val) + ' ' + elif key == 'max_itrs_increase_order': + opt_str += '--max-itrs-increase-order ' + str(val) + ' ' +# else: +# valid_options = '[--init-type ] [--random_inits ] [--randomness ] [--seed ] [--verbose ] ' +# valid_options += '[--time_limit ] [--max_itrs ] [--epsilon ] ' +# valid_options += '[--inits_increase_order ] [--init_type_increase_order ] [--max_itrs_increase_order ]' +# raise Exception('Invalid option "' + key + '". Options available = "' + valid_options + '"') + + return opt_str \ No newline at end of file diff --git a/lang/fr/gklearn/ged/methods/__init__.py b/lang/fr/gklearn/ged/methods/__init__.py new file mode 100644 index 0000000000..5879b9c54e --- /dev/null +++ b/lang/fr/gklearn/ged/methods/__init__.py @@ -0,0 +1,3 @@ +from gklearn.ged.methods.ged_method import GEDMethod +from gklearn.ged.methods.lsape_based_method import LSAPEBasedMethod +from gklearn.ged.methods.bipartite import Bipartite diff --git a/lang/fr/gklearn/ged/methods/bipartite.py b/lang/fr/gklearn/ged/methods/bipartite.py new file mode 100644 index 0000000000..aa295c4cba --- /dev/null +++ b/lang/fr/gklearn/ged/methods/bipartite.py @@ -0,0 +1,117 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Thu Jun 18 16:09:29 2020 + +@author: ljia +""" +import numpy as np +import networkx as nx +from gklearn.ged.methods import LSAPEBasedMethod +from gklearn.ged.util import LSAPESolver +from gklearn.utils import SpecialLabel + + +class Bipartite(LSAPEBasedMethod): + + + def __init__(self, ged_data): + super().__init__(ged_data) + self._compute_lower_bound = False + + + ########################################################################### + # Inherited member functions from LSAPEBasedMethod. + ########################################################################### + + + def _lsape_populate_instance(self, g, h, master_problem): + # #ifdef _OPENMP + for row_in_master in range(0, nx.number_of_nodes(g)): + for col_in_master in range(0, nx.number_of_nodes(h)): + master_problem[row_in_master, col_in_master] = self._compute_substitution_cost(g, h, row_in_master, col_in_master) + for row_in_master in range(0, nx.number_of_nodes(g)): + master_problem[row_in_master, nx.number_of_nodes(h) + row_in_master] = self._compute_deletion_cost(g, row_in_master) + for col_in_master in range(0, nx.number_of_nodes(h)): + master_problem[nx.number_of_nodes(g) + col_in_master, col_in_master] = self._compute_insertion_cost(h, col_in_master) + +# for row_in_master in range(0, master_problem.shape[0]): +# for col_in_master in range(0, master_problem.shape[1]): +# if row_in_master < nx.number_of_nodes(g) and col_in_master < nx.number_of_nodes(h): +# master_problem[row_in_master, col_in_master] = self._compute_substitution_cost(g, h, row_in_master, col_in_master) +# elif row_in_master < nx.number_of_nodes(g): +# master_problem[row_in_master, nx.number_of_nodes(h)] = self._compute_deletion_cost(g, row_in_master) +# elif col_in_master < nx.number_of_nodes(h): +# master_problem[nx.number_of_nodes(g), col_in_master] = self._compute_insertion_cost(h, col_in_master) + + + ########################################################################### + # Helper member functions. + ########################################################################### + + + def _compute_substitution_cost(self, g, h, u, v): + # Collect node substitution costs. + cost = self._ged_data.node_cost(g.nodes[u]['label'], h.nodes[v]['label']) + + # Initialize subproblem. + d1, d2 = g.degree[u], h.degree[v] + subproblem = np.ones((d1 + d2, d1 + d2)) * np.inf + subproblem[d1:, d2:] = 0 +# subproblem = np.empty((g.degree[u] + 1, h.degree[v] + 1)) + + # Collect edge deletion costs. + i = 0 # @todo: should directed graphs be considered? + for label in g[u].values(): # all u's neighbor + subproblem[i, d2 + i] = self._ged_data.edge_cost(label['label'], SpecialLabel.DUMMY) +# subproblem[i, h.degree[v]] = self._ged_data.edge_cost(label['label'], SpecialLabel.DUMMY) + i += 1 + + # Collect edge insertion costs. + i = 0 # @todo: should directed graphs be considered? + for label in h[v].values(): # all u's neighbor + subproblem[d1 + i, i] = self._ged_data.edge_cost(SpecialLabel.DUMMY, label['label']) +# subproblem[g.degree[u], i] = self._ged_data.edge_cost(SpecialLabel.DUMMY, label['label']) + i += 1 + + # Collect edge relabelling costs. + i = 0 + for label1 in g[u].values(): + j = 0 + for label2 in h[v].values(): + subproblem[i, j] = self._ged_data.edge_cost(label1['label'], label2['label']) + j += 1 + i += 1 + + # Solve subproblem. + subproblem_solver = LSAPESolver(subproblem) + subproblem_solver.set_model(self._lsape_model) + subproblem_solver.solve() + + # Update and return overall substitution cost. + cost += subproblem_solver.minimal_cost() + return cost + + + def _compute_deletion_cost(self, g, v): + # Collect node deletion cost. + cost = self._ged_data.node_cost(g.nodes[v]['label'], SpecialLabel.DUMMY) + + # Collect edge deletion costs. + for label in g[v].values(): + cost += self._ged_data.edge_cost(label['label'], SpecialLabel.DUMMY) + + # Return overall deletion cost. + return cost + + + def _compute_insertion_cost(self, g, v): + # Collect node insertion cost. + cost = self._ged_data.node_cost(SpecialLabel.DUMMY, g.nodes[v]['label']) + + # Collect edge insertion costs. + for label in g[v].values(): + cost += self._ged_data.edge_cost(SpecialLabel.DUMMY, label['label']) + + # Return overall insertion cost. + return cost \ No newline at end of file diff --git a/lang/fr/gklearn/ged/methods/ged_method.py b/lang/fr/gklearn/ged/methods/ged_method.py new file mode 100644 index 0000000000..aecd16b5e2 --- /dev/null +++ b/lang/fr/gklearn/ged/methods/ged_method.py @@ -0,0 +1,195 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Thu Jun 18 15:52:35 2020 + +@author: ljia +""" +import numpy as np +import time +import networkx as nx + + +class GEDMethod(object): + + + def __init__(self, ged_data): + self._initialized = False + self._ged_data = ged_data + self._options = None + self._lower_bound = 0 + self._upper_bound = np.inf + self._node_map = [0, 0] # @todo + self._runtime = None + self._init_time = None + + + def init(self): + """Initializes the method with options specified by set_options(). + """ + start = time.time() + self._ged_init() + end = time.time() + self._init_time = end - start + self._initialized = True + + + def set_options(self, options): + """ + /*! + * @brief Sets the options of the method. + * @param[in] options String of the form [--@ @] [...], where @p option contains neither spaces nor single quotes, + * and @p arg contains neither spaces nor single quotes or is of the form '[--@ @] [...]', + * where both @p sub-option and @p sub-arg contain neither spaces nor single quotes. + */ + """ + self._ged_set_default_options() + for key, val in options.items(): + if not self._ged_parse_option(key, val): + raise Exception('Invalid option "', key, '". Usage: options = "' + self._ged_valid_options_string() + '".') # @todo: not implemented. + self._initialized = False + + + def run(self, g_id, h_id): + """ + /*! + * @brief Runs the method with options specified by set_options(). + * @param[in] g_id ID of input graph. + * @param[in] h_id ID of input graph. + */ + """ + start = time.time() + result = self.run_as_util(self._ged_data._graphs[g_id], self._ged_data._graphs[h_id]) + end = time.time() + self._lower_bound = result['lower_bound'] + self._upper_bound = result['upper_bound'] + if len(result['node_maps']) > 0: + self._node_map = result['node_maps'][0] + self._runtime = end - start + + + def run_as_util(self, g, h): + """ + /*! + * @brief Runs the method with options specified by set_options(). + * @param[in] g Input graph. + * @param[in] h Input graph. + * @param[out] result Result variable. + */ + """ + # Compute optimal solution and return if at least one of the two graphs is empty. + if nx.number_of_nodes(g) == 0 or nx.number_of_nodes(h) == 0: + print('This is not implemented.') + pass # @todo: + + # Run the method. + return self._ged_run(g, h) + + + def get_upper_bound(self): + """ + /*! + * @brief Returns an upper bound. + * @return Upper bound for graph edit distance provided by last call to run() or -1 if the method does not yield an upper bound. + */ + """ + return self._upper_bound + + + def get_lower_bound(self): + """ + /*! + * @brief Returns a lower bound. + * @return Lower bound for graph edit distance provided by last call to run() or -1 if the method does not yield a lower bound. + */ + """ + return self._lower_bound + + + def get_runtime(self): + """ + /*! + * @brief Returns the runtime. + * @return Runtime of last call to run() in seconds. + */ + """ + return self._runtime + + + def get_init_time(self): + """ + /*! + * @brief Returns the initialization time. + * @return Runtime of last call to init() in seconds. + */ + """ + return self._init_time + + + def get_node_map(self): + """ + /*! + * @brief Returns a graph matching. + * @return Constant reference to graph matching provided by last call to run() or to an empty matching if the method does not yield a matching. + */ + """ + return self._node_map + + + def _ged_init(self): + """ + /*! + * @brief Initializes the method. + * @note Must be overridden by derived classes that require initialization. + */ + """ + pass + + + def _ged_parse_option(self, option, arg): + """ + /*! + * @brief Parses one option. + * @param[in] option The name of the option. + * @param[in] arg The argument of the option. + * @return Boolean @p true if @p option is a valid option name for the method and @p false otherwise. + * @note Must be overridden by derived classes that have options. + */ + """ + return False + + + def _ged_run(self, g, h): + """ + /*! + * @brief Runs the method with options specified by set_options(). + * @param[in] g Input graph. + * @param[in] h Input graph. + * @param[out] result Result variable. + * @note Must be overridden by derived classes. + */ + """ + return {} + + + + def _ged_valid_options_string(self): + """ + /*! + * @brief Returns string of all valid options. + * @return String of the form [--@ @] [...]. + * @note Must be overridden by derived classes that have options. + */ + """ + return '' + + + def _ged_set_default_options(self): + """ + /*! + * @brief Sets all options to default values. + * @note Must be overridden by derived classes that have options. + */ + """ + pass + \ No newline at end of file diff --git a/lang/fr/gklearn/ged/methods/lsape_based_method.py b/lang/fr/gklearn/ged/methods/lsape_based_method.py new file mode 100644 index 0000000000..79f7b9c662 --- /dev/null +++ b/lang/fr/gklearn/ged/methods/lsape_based_method.py @@ -0,0 +1,254 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Thu Jun 18 16:01:24 2020 + +@author: ljia +""" +import numpy as np +import networkx as nx +from gklearn.ged.methods import GEDMethod +from gklearn.ged.util import LSAPESolver, misc +from gklearn.ged.env import NodeMap + + +class LSAPEBasedMethod(GEDMethod): + + + def __init__(self, ged_data): + super().__init__(ged_data) + self._lsape_model = None # @todo: LSAPESolver::ECBP + self._greedy_method = None # @todo: LSAPESolver::BASIC + self._compute_lower_bound = True + self._solve_optimally = True + self._num_threads = 1 + self._centrality_method = 'NODE' # @todo + self._centrality_weight = 0.7 + self._centralities = {} + self._max_num_solutions = 1 + + + def populate_instance_and_run_as_util(self, g, h): #, lsape_instance): + """ + /*! + * @brief Runs the method with options specified by set_options() and provides access to constructed LSAPE instance. + * @param[in] g Input graph. + * @param[in] h Input graph. + * @param[out] result Result variable. + * @param[out] lsape_instance LSAPE instance. + */ + """ + result = {'node_maps': [], 'lower_bound': 0, 'upper_bound': np.inf} + + # Populate the LSAPE instance and set up the solver. + nb1, nb2 = nx.number_of_nodes(g), nx.number_of_nodes(h) + lsape_instance = np.ones((nb1 + nb2, nb1 + nb2)) * np.inf +# lsape_instance = np.empty((nx.number_of_nodes(g) + 1, nx.number_of_nodes(h) + 1)) + self.populate_instance(g, h, lsape_instance) + +# nb1, nb2 = nx.number_of_nodes(g), nx.number_of_nodes(h) +# lsape_instance_new = np.empty((nb1 + nb2, nb1 + nb2)) * np.inf +# lsape_instance_new[nb1:, nb2:] = 0 +# lsape_instance_new[0:nb1, 0:nb2] = lsape_instance[0:nb1, 0:nb2] +# for i in range(nb1): # all u's neighbor +# lsape_instance_new[i, nb2 + i] = lsape_instance[i, nb2] +# for i in range(nb2): # all u's neighbor +# lsape_instance_new[nb1 + i, i] = lsape_instance[nb2, i] +# lsape_solver = LSAPESolver(lsape_instance_new) + + lsape_solver = LSAPESolver(lsape_instance) + + # Solve the LSAPE instance. + if self._solve_optimally: + lsape_solver.set_model(self._lsape_model) + else: + lsape_solver.set_greedy_method(self._greedy_method) + lsape_solver.solve(self._max_num_solutions) + + # Compute and store lower and upper bound. + if self._compute_lower_bound and self._solve_optimally: + result['lower_bound'] = lsape_solver.minimal_cost() * self._lsape_lower_bound_scaling_factor(g, h) # @todo: test + + for solution_id in range(0, lsape_solver.num_solutions()): + result['node_maps'].append(NodeMap(nx.number_of_nodes(g), nx.number_of_nodes(h))) + misc.construct_node_map_from_solver(lsape_solver, result['node_maps'][-1], solution_id) + self._ged_data.compute_induced_cost(g, h, result['node_maps'][-1]) + + # Add centralities and reoptimize. + if self._centrality_weight > 0 and self._centrality_method != 'NODE': + print('This is not implemented.') + pass # @todo + + # Sort the node maps and set the upper bound. + if len(result['node_maps']) > 1 or len(result['node_maps']) > self._max_num_solutions: + print('This is not implemented.') # @todo: + pass + if len(result['node_maps']) == 0: + result['upper_bound'] = np.inf + else: + result['upper_bound'] = result['node_maps'][0].induced_cost() + + return result + + + + def populate_instance(self, g, h, lsape_instance): + """ + /*! + * @brief Populates the LSAPE instance. + * @param[in] g Input graph. + * @param[in] h Input graph. + * @param[out] lsape_instance LSAPE instance. + */ + """ + if not self._initialized: + pass + # @todo: if (not this->initialized_) { + self._lsape_populate_instance(g, h, lsape_instance) + lsape_instance[nx.number_of_nodes(g):, nx.number_of_nodes(h):] = 0 +# lsape_instance[nx.number_of_nodes(g), nx.number_of_nodes(h)] = 0 + + + ########################################################################### + # Member functions inherited from GEDMethod. + ########################################################################### + + + def _ged_init(self): + self._lsape_pre_graph_init(False) + for graph in self._ged_data._graphs: + self._init_graph(graph) + self._lsape_init() + + + def _ged_run(self, g, h): +# lsape_instance = np.empty((0, 0)) + result = self.populate_instance_and_run_as_util(g, h) # , lsape_instance) + return result + + + def _ged_parse_option(self, option, arg): + is_valid_option = False + + if option == 'threads': # @todo: try.. catch... + self._num_threads = arg + is_valid_option = True + elif option == 'lsape_model': + self._lsape_model = arg # @todo + is_valid_option = True + elif option == 'greedy_method': + self._greedy_method = arg # @todo + is_valid_option = True + elif option == 'optimal': + self._solve_optimally = arg # @todo + is_valid_option = True + elif option == 'centrality_method': + self._centrality_method = arg # @todo + is_valid_option = True + elif option == 'centrality_weight': + self._centrality_weight = arg # @todo + is_valid_option = True + elif option == 'max_num_solutions': + if arg == 'ALL': + self._max_num_solutions = -1 + else: + self._max_num_solutions = arg # @todo + is_valid_option = True + + is_valid_option = is_valid_option or self._lsape_parse_option(option, arg) + is_valid_option = True # @todo: this is not in the C++ code. + return is_valid_option + + + def _ged_set_default_options(self): + self._lsape_model = None # @todo: LSAPESolver::ECBP + self._greedy_method = None # @todo: LSAPESolver::BASIC + self._solve_optimally = True + self._num_threads = 1 + self._centrality_method = 'NODE' # @todo + self._centrality_weight = 0.7 + self._max_num_solutions = 1 + + + ########################################################################### + # Private helper member functions. + ########################################################################### + + + def _init_graph(self, graph): + if self._centrality_method != 'NODE': + self._init_centralities(graph) # @todo + self._lsape_init_graph(graph) + + + ########################################################################### + # Virtual member functions to be overridden by derived classes. + ########################################################################### + + + def _lsape_init(self): + """ + /*! + * @brief Initializes the method after initializing the global variables for the graphs. + * @note Must be overridden by derived classes of ged::LSAPEBasedMethod that require custom initialization. + */ + """ + pass + + + def _lsape_parse_option(self, option, arg): + """ + /*! + * @brief Parses one option that is not among the ones shared by all derived classes of ged::LSAPEBasedMethod. + * @param[in] option The name of the option. + * @param[in] arg The argument of the option. + * @return Returns true if @p option is a valid option name for the method and false otherwise. + * @note Must be overridden by derived classes of ged::LSAPEBasedMethod that have options that are not among the ones shared by all derived classes of ged::LSAPEBasedMethod. + */ + """ + return False + + + def _lsape_set_default_options(self): + """ + /*! + * @brief Sets all options that are not among the ones shared by all derived classes of ged::LSAPEBasedMethod to default values. + * @note Must be overridden by derived classes of ged::LSAPEBasedMethod that have options that are not among the ones shared by all derived classes of ged::LSAPEBasedMethod. + */ + """ + pass + + + def _lsape_populate_instance(self, g, h, lsape_instance): + """ + /*! + * @brief Populates the LSAPE instance. + * @param[in] g Input graph. + * @param[in] h Input graph. + * @param[out] lsape_instance LSAPE instance of size (n + 1) x (m + 1), where n and m are the number of nodes in @p g and @p h. The last row and the last column represent insertion and deletion. + * @note Must be overridden by derived classes of ged::LSAPEBasedMethod. + */ + """ + pass + + + def _lsape_init_graph(self, graph): + """ + /*! + * @brief Initializes global variables for one graph. + * @param[in] graph Graph for which the global variables have to be initialized. + * @note Must be overridden by derived classes of ged::LSAPEBasedMethod that require to initialize custom global variables. + */ + """ + pass + + + def _lsape_pre_graph_init(self, called_at_runtime): + """ + /*! + * @brief Initializes the method at runtime or during initialization before initializing the global variables for the graphs. + * @param[in] called_at_runtime Equals @p true if called at runtime and @p false if called during initialization. + * @brief Must be overridden by derived classes of ged::LSAPEBasedMethod that require default initialization at runtime before initializing the global variables for the graphs. + */ + """ + pass \ No newline at end of file diff --git a/lang/fr/gklearn/ged/util/__init__.py b/lang/fr/gklearn/ged/util/__init__.py new file mode 100644 index 0000000000..f885b181a7 --- /dev/null +++ b/lang/fr/gklearn/ged/util/__init__.py @@ -0,0 +1,3 @@ +from gklearn.ged.util.lsape_solver import LSAPESolver +from gklearn.ged.util.util import compute_geds, ged_options_to_string +from gklearn.ged.util.util import compute_geds_cml, label_costs_to_matrix diff --git a/lang/fr/gklearn/ged/util/cpp2python.py b/lang/fr/gklearn/ged/util/cpp2python.py new file mode 100644 index 0000000000..9d63026dec --- /dev/null +++ b/lang/fr/gklearn/ged/util/cpp2python.py @@ -0,0 +1,134 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Fri Mar 20 11:09:04 2020 + +@author: ljia +""" +import re + +def convert_function(cpp_code): +# f_cpp = open('cpp_code.cpp', 'r') +# # f_cpp = open('cpp_ext/src/median_graph_estimator.ipp', 'r') +# cpp_code = f_cpp.read() + python_code = cpp_code.replace('else if (', 'elif ') + python_code = python_code.replace('if (', 'if ') + python_code = python_code.replace('else {', 'else:') + python_code = python_code.replace(') {', ':') + python_code = python_code.replace(';\n', '\n') + python_code = re.sub('\n(.*)}\n', '\n\n', python_code) + # python_code = python_code.replace('}\n', '') + python_code = python_code.replace('throw', 'raise') + python_code = python_code.replace('error', 'Exception') + python_code = python_code.replace('"', '\'') + python_code = python_code.replace('\\\'', '"') + python_code = python_code.replace('try {', 'try:') + python_code = python_code.replace('true', 'True') + python_code = python_code.replace('false', 'False') + python_code = python_code.replace('catch (...', 'except') + # python_code = re.sub('std::string\(\'(.*)\'\)', '$1', python_code) + + return python_code + + + +# # python_code = python_code.replace('}\n', '') + + + + +# python_code = python_code.replace('option.first', 'opt_name') +# python_code = python_code.replace('option.second', 'opt_val') +# python_code = python_code.replace('ged::Error', 'Exception') +# python_code = python_code.replace('std::string(\'Invalid argument "\')', '\'Invalid argument "\'') + + +# f_cpp.close() +# f_python = open('python_code.py', 'w') +# f_python.write(python_code) +# f_python.close() + + +def convert_function_comment(cpp_fun_cmt, param_types): + cpp_fun_cmt = cpp_fun_cmt.replace('\t', '') + cpp_fun_cmt = cpp_fun_cmt.replace('\n * ', ' ') + # split the input comment according to key words. + param_split = None + note = None + cmt_split = cpp_fun_cmt.split('@brief')[1] + brief = cmt_split + if '@param' in cmt_split: + cmt_split = cmt_split.split('@param') + brief = cmt_split[0] + param_split = cmt_split[1:] + if '@note' in cmt_split[-1]: + note_split = cmt_split[-1].split('@note') + if param_split is not None: + param_split.pop() + param_split.append(note_split[0]) + else: + brief = note_split[0] + note = note_split[1] + + # get parameters. + if param_split is not None: + for idx, param in enumerate(param_split): + _, param_name, param_desc = param.split(' ', 2) + param_name = function_comment_strip(param_name, ' *\n\t/') + param_desc = function_comment_strip(param_desc, ' *\n\t/') + param_split[idx] = (param_name, param_desc) + + # strip comments. + brief = function_comment_strip(brief, ' *\n\t/') + if note is not None: + note = function_comment_strip(note, ' *\n\t/') + + # construct the Python function comment. + python_fun_cmt = '"""' + python_fun_cmt += brief + '\n' + if param_split is not None and len(param_split) > 0: + python_fun_cmt += '\nParameters\n----------' + for idx, param in enumerate(param_split): + python_fun_cmt += '\n' + param[0] + ' : ' + param_types[idx] + python_fun_cmt += '\n\t' + param[1] + '\n' + if note is not None: + python_fun_cmt += '\nNote\n----\n' + note + '\n' + python_fun_cmt += '"""' + + return python_fun_cmt + + +def function_comment_strip(comment, bad_chars): + head_removed, tail_removed = False, False + while not head_removed or not tail_removed: + if comment[0] in bad_chars: + comment = comment[1:] + head_removed = False + else: + head_removed = True + if comment[-1] in bad_chars: + comment = comment[:-1] + tail_removed = False + else: + tail_removed = True + + return comment + + +if __name__ == '__main__': +# python_code = convert_function(""" +# if (print_to_stdout_ == 2) { +# std::cout << "\n===========================================================\n"; +# std::cout << "Block gradient descent for initial median " << median_pos + 1 << " of " << medians.size() << ".\n"; +# std::cout << "-----------------------------------------------------------\n"; +# } +# """) + + + python_fun_cmt = convert_function_comment(""" + /*! + * @brief Returns the sum of distances. + * @param[in] state The state of the estimator. + * @return The sum of distances of the median when the estimator was in the state @p state during the last call to run(). + */ + """, ['string', 'string']) \ No newline at end of file diff --git a/lang/fr/gklearn/ged/util/cpp_code.cpp b/lang/fr/gklearn/ged/util/cpp_code.cpp new file mode 100644 index 0000000000..acbe22a1f6 --- /dev/null +++ b/lang/fr/gklearn/ged/util/cpp_code.cpp @@ -0,0 +1,122 @@ + else if (option.first == "random-inits") { + try { + num_random_inits_ = std::stoul(option.second); + desired_num_random_inits_ = num_random_inits_; + } + catch (...) { + throw Error(std::string("Invalid argument \"") + option.second + "\" for option random-inits. Usage: options = \"[--random-inits ]\""); + } + if (num_random_inits_ <= 0) { + throw Error(std::string("Invalid argument \"") + option.second + "\" for option random-inits. Usage: options = \"[--random-inits ]\""); + } + } + else if (option.first == "randomness") { + if (option.second == "PSEUDO") { + use_real_randomness_ = false; + } + else if (option.second == "REAL") { + use_real_randomness_ = true; + } + else { + throw Error(std::string("Invalid argument \"") + option.second + "\" for option randomness. Usage: options = \"[--randomness REAL|PSEUDO] [...]\""); + } + } + else if (option.first == "stdout") { + if (option.second == "0") { + print_to_stdout_ = 0; + } + else if (option.second == "1") { + print_to_stdout_ = 1; + } + else if (option.second == "2") { + print_to_stdout_ = 2; + } + else { + throw Error(std::string("Invalid argument \"") + option.second + "\" for option stdout. Usage: options = \"[--stdout 0|1|2] [...]\""); + } + } + else if (option.first == "refine") { + if (option.second == "TRUE") { + refine_ = true; + } + else if (option.second == "FALSE") { + refine_ = false; + } + else { + throw Error(std::string("Invalid argument \"") + option.second + "\" for option refine. Usage: options = \"[--refine TRUE|FALSE] [...]\""); + } + } + else if (option.first == "time-limit") { + try { + time_limit_in_sec_ = std::stod(option.second); + } + catch (...) { + throw Error(std::string("Invalid argument \"") + option.second + "\" for option time-limit. Usage: options = \"[--time-limit ] [...]"); + } + } + else if (option.first == "max-itrs") { + try { + max_itrs_ = std::stoi(option.second); + } + catch (...) { + throw Error(std::string("Invalid argument \"") + option.second + "\" for option max-itrs. Usage: options = \"[--max-itrs ] [...]"); + } + } + else if (option.first == "max-itrs-without-update") { + try { + max_itrs_without_update_ = std::stoi(option.second); + } + catch (...) { + throw Error(std::string("Invalid argument \"") + option.second + "\" for option max-itrs-without-update. Usage: options = \"[--max-itrs-without-update ] [...]"); + } + } + else if (option.first == "seed") { + try { + seed_ = std::stoul(option.second); + } + catch (...) { + throw Error(std::string("Invalid argument \"") + option.second + "\" for option seed. Usage: options = \"[--seed ] [...]"); + } + } + else if (option.first == "epsilon") { + try { + epsilon_ = std::stod(option.second); + } + catch (...) { + throw Error(std::string("Invalid argument \"") + option.second + "\" for option epsilon. Usage: options = \"[--epsilon ] [...]"); + } + if (epsilon_ <= 0) { + throw Error(std::string("Invalid argument \"") + option.second + "\" for option epsilon. Usage: options = \"[--epsilon ] [...]"); + } + } + else if (option.first == "inits-increase-order") { + try { + num_inits_increase_order_ = std::stoul(option.second); + } + catch (...) { + throw Error(std::string("Invalid argument \"") + option.second + "\" for option inits-increase-order. Usage: options = \"[--inits-increase-order ]\""); + } + if (num_inits_increase_order_ <= 0) { + throw Error(std::string("Invalid argument \"") + option.second + "\" for option inits-increase-order. Usage: options = \"[--inits-increase-order ]\""); + } + } + else if (option.first == "init-type-increase-order") { + init_type_increase_order_ = option.second; + if (option.second != "CLUSTERS" and option.second != "K-MEANS++") { + throw ged::Error(std::string("Invalid argument ") + option.second + " for option init-type-increase-order. Usage: options = \"[--init-type-increase-order CLUSTERS|K-MEANS++] [...]\""); + } + } + else if (option.first == "max-itrs-increase-order") { + try { + max_itrs_increase_order_ = std::stoi(option.second); + } + catch (...) { + throw Error(std::string("Invalid argument \"") + option.second + "\" for option max-itrs-increase-order. Usage: options = \"[--max-itrs-increase-order ] [...]"); + } + } + else { + std::string valid_options("[--init-type ] [--random-inits ] [--randomness ] [--seed ] [--stdout ] "); + valid_options += "[--time-limit ] [--max-itrs ] [--epsilon ] "; + valid_options += "[--inits-increase-order ] [--init-type-increase-order ] [--max-itrs-increase-order ]"; + throw Error(std::string("Invalid option \"") + option.first + "\". Usage: options = \"" + valid_options + "\""); + } diff --git a/lang/fr/gklearn/ged/util/lsape_solver.py b/lang/fr/gklearn/ged/util/lsape_solver.py new file mode 100644 index 0000000000..71739e7ef5 --- /dev/null +++ b/lang/fr/gklearn/ged/util/lsape_solver.py @@ -0,0 +1,122 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Mon Jun 22 15:37:36 2020 + +@author: ljia +""" +import numpy as np +from scipy.optimize import linear_sum_assignment + + +class LSAPESolver(object): + + + def __init__(self, cost_matrix=None): + """ + /*! + * @brief Constructs solver for LSAPE problem instance. + * @param[in] cost_matrix Pointer to the LSAPE problem instance that should be solved. + */ + """ + self._cost_matrix = cost_matrix + self._model = 'ECBP' + self._greedy_method = 'BASIC' + self._solve_optimally = True + self._minimal_cost = 0 + self._row_to_col_assignments = [] + self._col_to_row_assignments = [] + self._dual_var_rows = [] # @todo + self._dual_var_cols = [] # @todo + + + def clear_solution(self): + """Clears a previously computed solution. + """ + self._minimal_cost = 0 + self._row_to_col_assignments.clear() + self._col_to_row_assignments.clear() + self._row_to_col_assignments.append([]) # @todo + self._col_to_row_assignments.append([]) + self._dual_var_rows = [] # @todo + self._dual_var_cols = [] # @todo + + + def set_model(self, model): + """ + /*! + * @brief Makes the solver use a specific model for optimal solving. + * @param[in] model The model that should be used. + */ + """ + self._solve_optimally = True + self._model = model + + + def solve(self, num_solutions=1): + """ + /*! + * @brief Solves the LSAPE problem instance. + * @param[in] num_solutions The maximal number of solutions that should be computed. + */ + """ + self.clear_solution() + if self._solve_optimally: + row_ind, col_ind = linear_sum_assignment(self._cost_matrix) # @todo: only hungarianLSAPE ('ECBP') can be used. + self._row_to_col_assignments[0] = col_ind + self._col_to_row_assignments[0] = np.argsort(col_ind) # @todo: might be slow, can use row_ind + self._compute_cost_from_assignments() + if num_solutions > 1: + pass # @todo: + else: + print('here is non op.') + pass # @todo: greedy. +# self._ + + + def minimal_cost(self): + """ + /*! + * @brief Returns the cost of the computed solutions. + * @return Cost of computed solutions. + */ + """ + return self._minimal_cost + + + def get_assigned_col(self, row, solution_id=0): + """ + /*! + * @brief Returns the assigned column. + * @param[in] row Row whose assigned column should be returned. + * @param[in] solution_id ID of the solution where the assignment should be looked up. + * @returns Column to which @p row is assigned to in solution with ID @p solution_id or ged::undefined() if @p row is not assigned to any column. + */ + """ + return self._row_to_col_assignments[solution_id][row] + + + def get_assigned_row(self, col, solution_id=0): + """ + /*! + * @brief Returns the assigned row. + * @param[in] col Column whose assigned row should be returned. + * @param[in] solution_id ID of the solution where the assignment should be looked up. + * @returns Row to which @p col is assigned to in solution with ID @p solution_id or ged::undefined() if @p col is not assigned to any row. + */ + """ + return self._col_to_row_assignments[solution_id][col] + + + def num_solutions(self): + """ + /*! + * @brief Returns the number of solutions. + * @returns Actual number of solutions computed by solve(). Might be smaller than @p num_solutions. + */ + """ + return len(self._row_to_col_assignments) + + + def _compute_cost_from_assignments(self): # @todo + self._minimal_cost = np.sum(self._cost_matrix[range(0, len(self._row_to_col_assignments[0])), self._row_to_col_assignments[0]]) \ No newline at end of file diff --git a/lang/fr/gklearn/ged/util/misc.py b/lang/fr/gklearn/ged/util/misc.py new file mode 100644 index 0000000000..457d2766a8 --- /dev/null +++ b/lang/fr/gklearn/ged/util/misc.py @@ -0,0 +1,129 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Thu Mar 19 18:13:56 2020 + +@author: ljia +""" +from gklearn.utils import dummy_node + + +def construct_node_map_from_solver(solver, node_map, solution_id): + node_map.clear() + num_nodes_g = node_map.num_source_nodes() + num_nodes_h = node_map.num_target_nodes() + + # add deletions and substitutions + for row in range(0, num_nodes_g): + col = solver.get_assigned_col(row, solution_id) + if col >= num_nodes_h: + node_map.add_assignment(row, dummy_node()) + else: + node_map.add_assignment(row, col) + + # insertions. + for col in range(0, num_nodes_h): + if solver.get_assigned_row(col, solution_id) >= num_nodes_g: + node_map.add_assignment(dummy_node(), col) + + +def options_string_to_options_map(options_string): + """Transforms an options string into an options map. + + Parameters + ---------- + options_string : string + Options string of the form "[--