Merge pull request #134 from bluescarni/pr/updates

heyoka updates
bluescarni · Sep 6, 2023 · a7c87d5 · a7c87d5
2 parents d29c7b5 + 5bd8725
commit a7c87d5
Show file tree

Hide file tree

Showing 23 changed files with 557 additions and 72 deletions.
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -11,7 +11,7 @@ if(NOT CMAKE_BUILD_TYPE)
 	FORCE)
 endif()
 
-project(heyoka.py VERSION 1.0.0 LANGUAGES CXX C)
+project(heyoka.py VERSION 2.0.0 LANGUAGES CXX C)
 
 list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake" "${CMAKE_CURRENT_SOURCE_DIR}/cmake/yacma")
 
@@ -118,14 +118,7 @@ find_package(fmt REQUIRED CONFIG)
 message(STATUS "fmt version: ${fmt_VERSION}")
 
 # heyoka.
-# NOTE: put the minimum version in a variable
-# so that we can re-use it below.
-set(_HEYOKA_PY_MIN_HEYOKA_VERSION 1.0.0)
-find_package(heyoka REQUIRED CONFIG)
-if(${heyoka_VERSION} VERSION_LESS ${_HEYOKA_PY_MIN_HEYOKA_VERSION})
-    message(FATAL_ERROR "The minimum heyoka version required by heyoka.py is ${_HEYOKA_PY_MIN_HEYOKA_VERSION}, but version ${heyoka_VERSION} was found instead.")
-endif()
-unset(_HEYOKA_PY_MIN_HEYOKA_VERSION)
+find_package(heyoka 2.0.0 REQUIRED CONFIG)
 
 # Python.
 

diff --git a/doc/advanced_tutorials.rst b/doc/advanced_tutorials.rst
@@ -35,3 +35,4 @@ Advanced tutorials
   notebooks/compiled_functions.ipynb
   notebooks/ex_system_revisited.ipynb
   notebooks/pickling.ipynb
+  notebooks/jit_caching.ipynb
diff --git a/doc/changelog.rst b/doc/changelog.rst
@@ -3,6 +3,24 @@
 Changelog
 =========
 
+2.0.0 (unreleased)
+------------------
+
+New
+~~~
+
+- The LLVM SLP vectorizer can now be enabled
+  (`#134 <https://github.com/bluescarni/heyoka.py/pull/134>`__).
+  This feature is opt-in due to the fact that enabling it
+  can considerably increase JIT compilation times.
+- Implement an in-memory cache for ``llvm_state``. The cache is used
+  to avoid re-optimising and re-compiling LLVM code which has
+  already been optimised and compiled during the program execution
+  (`#132 <https://github.com/bluescarni/heyoka.py/pull/132>`__).
+- It is now possible to get the LLVM bitcode of
+  an ``llvm_state``
+  (`#132 <https://github.com/bluescarni/heyoka.py/pull/132>`__).
+
 1.0.0 (2023-08-11)
 ------------------
 

diff --git a/doc/install.rst b/doc/install.rst
@@ -9,7 +9,7 @@ Dependencies
 heyoka.py has several Python and C++ dependencies. On the C++ side, heyoka.py depends on:
 
 * the `heyoka C++ library <https://github.com/bluescarni/heyoka>`__,
-  version 1.0.0 or later (**mandatory**),
+  version 2.0.x (**mandatory**),
 * the `Boost <https://www.boost.org/>`__ C++ libraries (**mandatory**),
 * the `{fmt} <https://fmt.dev/latest/index.html>`__ library (**mandatory**),
 * the `TBB <https://github.com/oneapi-src/oneTBB>`__ library (**mandatory**),

diff --git a/doc/notebooks/ex_system_revisited.ipynb b/doc/notebooks/ex_system_revisited.ipynb
@@ -1304,7 +1304,7 @@
    "id": "61038a9b-a0de-4e0f-87fe-d33abf0e98ef",
    "metadata": {},
    "source": [
-    "When computing the gradient of a multivariate scalar function, ``diff_tensors()`` automatically selects reverse-mode symbolic differentiation. We can see how the reverse-mode AD algorithm collects the terms in nested binary multiplications which occur multiple times in the gradient's components. As a consequence, when creating a compiled function for the evaluation of ``grad_diff_tensors``, the repeated subexpressions are recognised by heyoka.py and evaluated only once. Let us check:"
+    "When computing the gradient of a multivariate scalar function, ``diff_tensors()`` automatically selects reverse-mode symbolic differentiation. We can see how the reverse-mode AD algorithm collects the terms in nested binary multiplications which occur multiple times in the gradient's components. The flattening of these binary multiplications is prevented by the use of the ``fix()`` function. As a consequence, when creating a compiled function for the evaluation of ``grad_diff_tensors``, the repeated subexpressions are recognised by heyoka.py (via a process of [common subexpression elimination](https://en.wikipedia.org/wiki/Common_subexpression_elimination)) and evaluated only once. Let us check:"
    ]
   },
   {

diff --git a/doc/notebooks/jit_caching.ipynb b/doc/notebooks/jit_caching.ipynb
@@ -0,0 +1,317 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "0ef9348f-9f03-4fb7-ae82-4d8fa6fc0b6b",
+   "metadata": {},
+   "source": [
+    "# JIT compilation and caching\n",
+    "\n",
+    "heyoka.py makes extensive use of [just-in-time (JIT)](https://en.wikipedia.org/wiki/Just-in-time_compilation) compilation techniques, implemented via the [LLVM](https://llvm.org/) compiler infrastructure. JIT compilation is used not only in the implementation of the [adaptive integrator](<./The adaptive integrator.ipynb>), but also in [compiled functions](<./compiled_functions.ipynb>) and in the implementation of [dense/continuous output](<./Dense output.ipynb>).\n",
+    "\n",
+    "JIT compilation can provide a noticeable performance boost with respect to the usual [ahead-of-time (AOT)](https://en.wikipedia.org/wiki/Ahead-of-time_compilation) compilation, because it takes advantage of all the features available on the target CPU. The downside is that JIT compilation is computationally expensive, and thus in some cases the compilation overhead can end up dominating the total runtime of the program.\n",
+    "\n",
+    "Starting from version 2.0.0, heyoka.py implements an in-memory cache that alleviates the JIT compilation overhead by avoiding re-compilation of code that has already been compiled during the program execution.\n",
+    "\n",
+    "Let us see the cache in action. We start off by timing the construction of an adaptive integrator:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "2e75df48-41e5-40b4-abb5-20ee1e09b93b",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "CPU times: user 36.2 ms, sys: 1.41 ms, total: 37.6 ms\n",
+      "Wall time: 37.7 ms\n"
+     ]
+    }
+   ],
+   "source": [
+    "import heyoka as hy\n",
+    "\n",
+    "%time ta = hy.taylor_adaptive(hy.model.pendulum(), [0., 1.])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43aad996-41be-4001-acf3-9847fdae3138",
+   "metadata": {},
+   "source": [
+    "Now we construct again the **same** integrator, again with timing:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "64d9f312-51a6-4dcb-8c7c-f6756dba5f26",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "CPU times: user 883 µs, sys: 786 µs, total: 1.67 ms\n",
+      "Wall time: 1.15 ms\n"
+     ]
+    }
+   ],
+   "source": [
+    "%time ta = hy.taylor_adaptive(hy.model.pendulum(), [0., 1.])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "de18befd-bfd0-45a1-9987-7b6af963be72",
+   "metadata": {},
+   "source": [
+    "We can see how the construction runtime has drastically decreased because heyoka.py cached the result of the compilation of the first integrator.\n",
+    "\n",
+    "Let us see another example, this time involving [continuous output](<./Dense output.ipynb>). We propagate the system for a very short timespan, and we ask for the continuous output function object via the ``c_output=True`` flag:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "868a518a-3f13-4921-b022-f473320a9e78",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "CPU times: user 9.2 ms, sys: 0 ns, total: 9.2 ms\n",
+      "Wall time: 9.04 ms\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "(<taylor_outcome.time_limit: -4294967299>,\n",
+       " inf,\n",
+       " 0.0,\n",
+       " 1,\n",
+       " Direction : forward\n",
+       " Time range: [0, 0.01)\n",
+       " N of steps: 1)"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "%time ta.propagate_until(0.01, c_output=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "37b5e258-d129-47dd-bb02-d80983c73421",
+   "metadata": {},
+   "source": [
+    "We can see how such a short integration took several milliseconds. Indeed, most of the time has been spent in the compilation of the function for the evaluation of the continuous output, rather than in the numerical integration.\n",
+    "\n",
+    "Let us now repeat the same computation:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "003e6ad5-aece-4e37-b426-a5642e3bd203",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "CPU times: user 819 µs, sys: 0 ns, total: 819 µs\n",
+      "Wall time: 374 µs\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "(<taylor_outcome.time_limit: -4294967299>,\n",
+       " inf,\n",
+       " 0.0,\n",
+       " 1,\n",
+       " Direction : forward\n",
+       " Time range: [0, 0.01)\n",
+       " N of steps: 1)"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Reset time and state.\n",
+    "ta.time = 0.\n",
+    "ta.state[:] = [0., 1.]\n",
+    "\n",
+    "%time ta.propagate_until(0.01, c_output=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b1bda9be-cfcf-451b-b3c4-578ef0a09851",
+   "metadata": {},
+   "source": [
+    "We can see how the runtime has again drastically decreased thanks to the fact that the code for the evaluation of the continuous output had already been compiled earlier.\n",
+    "\n",
+    "Functions to query and interact with the cache are available as static methods of the ``llvm_state`` class. For instance, we can fetch the current cache size:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "42934aea-a979-4a26-b565-692b20f0936e",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'Current cache size: 115877 bytes'"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "f\"Current cache size: {hy.llvm_state.memcache_size} bytes\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea5783f5-b1c0-4779-a8c3-bf4fdcba87cb",
+   "metadata": {},
+   "source": [
+    "By default, the maximum cache size is set to 2GB:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "0afb70e5-92da-4ad6-a0d0-5f6531b34c22",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'Current cache limit: 2147483648 bytes'"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "f\"Current cache limit: {hy.llvm_state.memcache_limit} bytes\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c659791-bdf1-416e-9707-97ec4a91e113",
+   "metadata": {},
+   "source": [
+    "If the cache size exceeds the limit, items in the cache are removed following a [least-recently-used (LRU)](https://en.wikipedia.org/wiki/Cache_replacement_policies) policy. The cache limit can be changed at will:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "7764bb83-a4ea-41eb-8d1d-2da8698cca3d",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'New cache limit: 1048576 bytes'"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Set the maximum cache size to 1MB.\n",
+    "hy.llvm_state.memcache_limit = 1024*1024\n",
+    "\n",
+    "f\"New cache limit: {hy.llvm_state.memcache_limit} bytes\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8833f88-c5f6-4b68-a677-c3c7099af3b8",
+   "metadata": {},
+   "source": [
+    "The cache can be cleared:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "685b17aa-ab1d-4c26-9d8e-f0b8a7479542",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'Current cache size: 0 bytes'"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Clear the cache.\n",
+    "hy.llvm_state.clear_memcache()\n",
+    "\n",
+    "f\"Current cache size: {hy.llvm_state.memcache_size} bytes\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e5ee78a9-4e8c-4b84-b8b6-60794dfaa0c3",
+   "metadata": {},
+   "source": [
+    "All the methods and attributes to query and interact with the cache are thread-safe.\n",
+    "\n",
+    "Note that in multi-processing scenarios (e.g., in process-based [ensemble propagations](<./ensemble_mode.ipynb>)) each process gets its own cache, and thus any custom cache setup (e.g., changing the default cache limit) needs to be performed in each and every process."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/heyoka/CMakeLists.txt b/heyoka/CMakeLists.txt
@@ -24,6 +24,7 @@ set(HEYOKA_PY_PYTHON_FILES
     _test_scalar_integrator.py
     _test_batch_integrator.py
     _test_ensemble.py
+    _test_memcache.py
     model/__init__.py
 )