Skip to content

Commit

Permalink
Merge pull request #134 from bluescarni/pr/updates
Browse files Browse the repository at this point in the history
heyoka updates
  • Loading branch information
bluescarni authored Sep 6, 2023
2 parents d29c7b5 + 5bd8725 commit a7c87d5
Show file tree
Hide file tree
Showing 23 changed files with 557 additions and 72 deletions.
11 changes: 2 additions & 9 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ if(NOT CMAKE_BUILD_TYPE)
FORCE)
endif()

project(heyoka.py VERSION 1.0.0 LANGUAGES CXX C)
project(heyoka.py VERSION 2.0.0 LANGUAGES CXX C)

list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake" "${CMAKE_CURRENT_SOURCE_DIR}/cmake/yacma")

Expand Down Expand Up @@ -118,14 +118,7 @@ find_package(fmt REQUIRED CONFIG)
message(STATUS "fmt version: ${fmt_VERSION}")

# heyoka.
# NOTE: put the minimum version in a variable
# so that we can re-use it below.
set(_HEYOKA_PY_MIN_HEYOKA_VERSION 1.0.0)
find_package(heyoka REQUIRED CONFIG)
if(${heyoka_VERSION} VERSION_LESS ${_HEYOKA_PY_MIN_HEYOKA_VERSION})
message(FATAL_ERROR "The minimum heyoka version required by heyoka.py is ${_HEYOKA_PY_MIN_HEYOKA_VERSION}, but version ${heyoka_VERSION} was found instead.")
endif()
unset(_HEYOKA_PY_MIN_HEYOKA_VERSION)
find_package(heyoka 2.0.0 REQUIRED CONFIG)

# Python.

Expand Down
1 change: 1 addition & 0 deletions doc/advanced_tutorials.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,4 @@ Advanced tutorials
notebooks/compiled_functions.ipynb
notebooks/ex_system_revisited.ipynb
notebooks/pickling.ipynb
notebooks/jit_caching.ipynb
18 changes: 18 additions & 0 deletions doc/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,24 @@
Changelog
=========

2.0.0 (unreleased)
------------------

New
~~~

- The LLVM SLP vectorizer can now be enabled
(`#134 <https://github.com/bluescarni/heyoka.py/pull/134>`__).
This feature is opt-in due to the fact that enabling it
can considerably increase JIT compilation times.
- Implement an in-memory cache for ``llvm_state``. The cache is used
to avoid re-optimising and re-compiling LLVM code which has
already been optimised and compiled during the program execution
(`#132 <https://github.com/bluescarni/heyoka.py/pull/132>`__).
- It is now possible to get the LLVM bitcode of
an ``llvm_state``
(`#132 <https://github.com/bluescarni/heyoka.py/pull/132>`__).

1.0.0 (2023-08-11)
------------------

Expand Down
2 changes: 1 addition & 1 deletion doc/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Dependencies
heyoka.py has several Python and C++ dependencies. On the C++ side, heyoka.py depends on:

* the `heyoka C++ library <https://github.com/bluescarni/heyoka>`__,
version 1.0.0 or later (**mandatory**),
version 2.0.x (**mandatory**),
* the `Boost <https://www.boost.org/>`__ C++ libraries (**mandatory**),
* the `{fmt} <https://fmt.dev/latest/index.html>`__ library (**mandatory**),
* the `TBB <https://github.com/oneapi-src/oneTBB>`__ library (**mandatory**),
Expand Down
2 changes: 1 addition & 1 deletion doc/notebooks/ex_system_revisited.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1304,7 +1304,7 @@
"id": "61038a9b-a0de-4e0f-87fe-d33abf0e98ef",
"metadata": {},
"source": [
"When computing the gradient of a multivariate scalar function, ``diff_tensors()`` automatically selects reverse-mode symbolic differentiation. We can see how the reverse-mode AD algorithm collects the terms in nested binary multiplications which occur multiple times in the gradient's components. As a consequence, when creating a compiled function for the evaluation of ``grad_diff_tensors``, the repeated subexpressions are recognised by heyoka.py and evaluated only once. Let us check:"
"When computing the gradient of a multivariate scalar function, ``diff_tensors()`` automatically selects reverse-mode symbolic differentiation. We can see how the reverse-mode AD algorithm collects the terms in nested binary multiplications which occur multiple times in the gradient's components. The flattening of these binary multiplications is prevented by the use of the ``fix()`` function. As a consequence, when creating a compiled function for the evaluation of ``grad_diff_tensors``, the repeated subexpressions are recognised by heyoka.py (via a process of [common subexpression elimination](https://en.wikipedia.org/wiki/Common_subexpression_elimination)) and evaluated only once. Let us check:"
]
},
{
Expand Down
317 changes: 317 additions & 0 deletions doc/notebooks/jit_caching.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,317 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "0ef9348f-9f03-4fb7-ae82-4d8fa6fc0b6b",
"metadata": {},
"source": [
"# JIT compilation and caching\n",
"\n",
"heyoka.py makes extensive use of [just-in-time (JIT)](https://en.wikipedia.org/wiki/Just-in-time_compilation) compilation techniques, implemented via the [LLVM](https://llvm.org/) compiler infrastructure. JIT compilation is used not only in the implementation of the [adaptive integrator](<./The adaptive integrator.ipynb>), but also in [compiled functions](<./compiled_functions.ipynb>) and in the implementation of [dense/continuous output](<./Dense output.ipynb>).\n",
"\n",
"JIT compilation can provide a noticeable performance boost with respect to the usual [ahead-of-time (AOT)](https://en.wikipedia.org/wiki/Ahead-of-time_compilation) compilation, because it takes advantage of all the features available on the target CPU. The downside is that JIT compilation is computationally expensive, and thus in some cases the compilation overhead can end up dominating the total runtime of the program.\n",
"\n",
"Starting from version 2.0.0, heyoka.py implements an in-memory cache that alleviates the JIT compilation overhead by avoiding re-compilation of code that has already been compiled during the program execution.\n",
"\n",
"Let us see the cache in action. We start off by timing the construction of an adaptive integrator:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "2e75df48-41e5-40b4-abb5-20ee1e09b93b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 36.2 ms, sys: 1.41 ms, total: 37.6 ms\n",
"Wall time: 37.7 ms\n"
]
}
],
"source": [
"import heyoka as hy\n",
"\n",
"%time ta = hy.taylor_adaptive(hy.model.pendulum(), [0., 1.])"
]
},
{
"cell_type": "markdown",
"id": "43aad996-41be-4001-acf3-9847fdae3138",
"metadata": {},
"source": [
"Now we construct again the **same** integrator, again with timing:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "64d9f312-51a6-4dcb-8c7c-f6756dba5f26",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 883 µs, sys: 786 µs, total: 1.67 ms\n",
"Wall time: 1.15 ms\n"
]
}
],
"source": [
"%time ta = hy.taylor_adaptive(hy.model.pendulum(), [0., 1.])"
]
},
{
"cell_type": "markdown",
"id": "de18befd-bfd0-45a1-9987-7b6af963be72",
"metadata": {},
"source": [
"We can see how the construction runtime has drastically decreased because heyoka.py cached the result of the compilation of the first integrator.\n",
"\n",
"Let us see another example, this time involving [continuous output](<./Dense output.ipynb>). We propagate the system for a very short timespan, and we ask for the continuous output function object via the ``c_output=True`` flag:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "868a518a-3f13-4921-b022-f473320a9e78",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 9.2 ms, sys: 0 ns, total: 9.2 ms\n",
"Wall time: 9.04 ms\n"
]
},
{
"data": {
"text/plain": [
"(<taylor_outcome.time_limit: -4294967299>,\n",
" inf,\n",
" 0.0,\n",
" 1,\n",
" Direction : forward\n",
" Time range: [0, 0.01)\n",
" N of steps: 1)"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%time ta.propagate_until(0.01, c_output=True)"
]
},
{
"cell_type": "markdown",
"id": "37b5e258-d129-47dd-bb02-d80983c73421",
"metadata": {},
"source": [
"We can see how such a short integration took several milliseconds. Indeed, most of the time has been spent in the compilation of the function for the evaluation of the continuous output, rather than in the numerical integration.\n",
"\n",
"Let us now repeat the same computation:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "003e6ad5-aece-4e37-b426-a5642e3bd203",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 819 µs, sys: 0 ns, total: 819 µs\n",
"Wall time: 374 µs\n"
]
},
{
"data": {
"text/plain": [
"(<taylor_outcome.time_limit: -4294967299>,\n",
" inf,\n",
" 0.0,\n",
" 1,\n",
" Direction : forward\n",
" Time range: [0, 0.01)\n",
" N of steps: 1)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Reset time and state.\n",
"ta.time = 0.\n",
"ta.state[:] = [0., 1.]\n",
"\n",
"%time ta.propagate_until(0.01, c_output=True)"
]
},
{
"cell_type": "markdown",
"id": "b1bda9be-cfcf-451b-b3c4-578ef0a09851",
"metadata": {},
"source": [
"We can see how the runtime has again drastically decreased thanks to the fact that the code for the evaluation of the continuous output had already been compiled earlier.\n",
"\n",
"Functions to query and interact with the cache are available as static methods of the ``llvm_state`` class. For instance, we can fetch the current cache size:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "42934aea-a979-4a26-b565-692b20f0936e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Current cache size: 115877 bytes'"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"f\"Current cache size: {hy.llvm_state.memcache_size} bytes\""
]
},
{
"cell_type": "markdown",
"id": "ea5783f5-b1c0-4779-a8c3-bf4fdcba87cb",
"metadata": {},
"source": [
"By default, the maximum cache size is set to 2GB:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "0afb70e5-92da-4ad6-a0d0-5f6531b34c22",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Current cache limit: 2147483648 bytes'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"f\"Current cache limit: {hy.llvm_state.memcache_limit} bytes\""
]
},
{
"cell_type": "markdown",
"id": "1c659791-bdf1-416e-9707-97ec4a91e113",
"metadata": {},
"source": [
"If the cache size exceeds the limit, items in the cache are removed following a [least-recently-used (LRU)](https://en.wikipedia.org/wiki/Cache_replacement_policies) policy. The cache limit can be changed at will:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "7764bb83-a4ea-41eb-8d1d-2da8698cca3d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'New cache limit: 1048576 bytes'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Set the maximum cache size to 1MB.\n",
"hy.llvm_state.memcache_limit = 1024*1024\n",
"\n",
"f\"New cache limit: {hy.llvm_state.memcache_limit} bytes\""
]
},
{
"cell_type": "markdown",
"id": "a8833f88-c5f6-4b68-a677-c3c7099af3b8",
"metadata": {},
"source": [
"The cache can be cleared:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "685b17aa-ab1d-4c26-9d8e-f0b8a7479542",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Current cache size: 0 bytes'"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Clear the cache.\n",
"hy.llvm_state.clear_memcache()\n",
"\n",
"f\"Current cache size: {hy.llvm_state.memcache_size} bytes\""
]
},
{
"cell_type": "markdown",
"id": "e5ee78a9-4e8c-4b84-b8b6-60794dfaa0c3",
"metadata": {},
"source": [
"All the methods and attributes to query and interact with the cache are thread-safe.\n",
"\n",
"Note that in multi-processing scenarios (e.g., in process-based [ensemble propagations](<./ensemble_mode.ipynb>)) each process gets its own cache, and thus any custom cache setup (e.g., changing the default cache limit) needs to be performed in each and every process."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
1 change: 1 addition & 0 deletions heyoka/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ set(HEYOKA_PY_PYTHON_FILES
_test_scalar_integrator.py
_test_batch_integrator.py
_test_ensemble.py
_test_memcache.py
model/__init__.py
)

Expand Down
Loading

0 comments on commit a7c87d5

Please sign in to comment.