Skip to content

Commit

Permalink
Merge pull request #4178 from STEllAR-GROUP/module_checkpoint
Browse files Browse the repository at this point in the history
Move checkpointing support to its own module
  • Loading branch information
msimberg authored Nov 12, 2019
2 parents 3b0408f + 90be121 commit 96c989b
Show file tree
Hide file tree
Showing 26 changed files with 437 additions and 155 deletions.
3 changes: 2 additions & 1 deletion docs/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,6 @@ set(doxygen_dependencies
"${PROJECT_SOURCE_DIR}/hpx/lcos/when_some.hpp"
"${PROJECT_SOURCE_DIR}/hpx/lcos/wait_each.hpp"
"${PROJECT_SOURCE_DIR}/hpx/lcos/when_each.hpp"
"${PROJECT_SOURCE_DIR}/hpx/util/checkpoint.hpp"
"${PROJECT_SOURCE_DIR}/hpx/util/debugging.hpp"
"${PROJECT_SOURCE_DIR}/hpx/util/pack_traversal.hpp"
"${PROJECT_SOURCE_DIR}/hpx/util/pack_traversal_async.hpp"
Expand Down Expand Up @@ -260,6 +259,8 @@ create_symbolic_link("${PROJECT_SOURCE_DIR}/examples"
"${CMAKE_CURRENT_BINARY_DIR}/examples")
create_symbolic_link("${PROJECT_SOURCE_DIR}/tests"
"${CMAKE_CURRENT_BINARY_DIR}/tests")
create_symbolic_link("${PROJECT_SOURCE_DIR}/libs"
"${CMAKE_CURRENT_BINARY_DIR}/libs")

hpx_source_to_doxygen(hpx_autodoc
DEPENDENCIES ${doxygen_dependencies})
Expand Down
127 changes: 0 additions & 127 deletions docs/sphinx/manual/miscellaneous.rst
Original file line number Diff line number Diff line change
Expand Up @@ -150,133 +150,6 @@ Utilities in |hpx|
In order to ease the burden of programming in |hpx| we have provided several
utilities to users. The following section documents those facilies.

.. _checkpoint:

Checkpoint
----------

A common need of users is to periodically backup an application. This practice
provides resiliency and potential restart points in code. We have developed the
concept of a ``checkpoint`` to support this use case.

Found in ``hpx/util/checkpoint.hpp``, ``checkpoint``\ s are defined as objects
which hold a serialized version of an object or set of objects at a particular
moment in time. This representation can be stored in memory for later use or it
can be written to disk for storage and/or recovery at a later point. In order to
create and fill this object with data we use a function called
``save_checkpoint``. In code the function looks like this::

hpx::future<hpx::util::checkpoint> hpx::util::save_checkpoint(a, b, c, ...);

``save_checkpoint`` takes arbitrary data containers such as int, double, float,
vector, and future and serializes them into a newly created ``checkpoint``
object. This function returns a ``future`` to a ``checkpoint`` containing the
data. Let us look a simple use case below::

using hpx::util::checkpoint;
using hpx::util::save_checkpoint;

std::vector<int> vec{1,2,3,4,5};
hpx::future<checkpoint> save_checkpoint(vec);

Once the future is ready the checkpoint object will contain the ``vector``
``vec`` and its five elements.

It is also possible to modify the launch policy used by ``save_checkpoint``.
This is accomplished by passing a launch policy as the first argument. It is
important to note that passing ``hpx::launch::sync`` will cause
``save_checkpoint`` to return a ``checkpoint`` instead of a ``future`` to a
``checkpoint``. All other policies passed to ``save_checkpoint`` will return a
``future`` to a ``checkpoint``.

Sometimes ``checkpoint`` s must be declared before they are used.
``save_checkpoint`` allows users to move pre-created ``checkpoint`` s into the
function as long as they are the first container passing into the function (In
the case where a launch policy is used, the ``checkpoint`` will immediately
follow the launch policy). An example of these features can be found below:

.. literalinclude:: ../../tests/unit/util/checkpoint.cpp
:language: c++
:lines: 27-38

Now that we can create ``checkpoint`` s we now must be able to restore the
objects they contain into memory. This is accomplished by the function
``restore_checkpoint``. This function takes a ``checkpoint`` and fills its data
into the containers it is provided. It is important to remember that the
containers must be ordered in the same way they were placed into the
``checkpoint``. For clarity see the example below:

.. literalinclude:: ../../tests/unit/util/checkpoint.cpp
:language: c++
:lines: 41-49

The core utility of ``checkpoint`` is in its ability to make certain data
persistent. Often this means that the data is needed to be stored in an object,
such as a file, for later use. For these cases we have provided two solutions:
stream operator overloads and access iterators.

We have created the two stream overloads
``operator<<`` and ``operator>>`` to stream data
out of and into ``checkpoint``. You can see an
example of the overloads in use below:

.. literalinclude:: ../../tests/unit/util/checkpoint.cpp
:language: c++
:lines: 176-186

This is the primary way to move data into and out of a ``checkpoint``. It is
important to note, however, that users should be cautious when using a stream
operator to load data an another function to remove it (or vice versa). Both
``operator<<`` and ``operator>>`` rely on a ``.write()`` and a ``.read()``
function respectively. In order to know how much data to read from the
``std::istream``, the ``operator<<`` will write the size of the ``checkpoint``
before writing the ``checkpoint`` data. Correspondingly, the ``operator>>`` will
read the size of the stored data before reading the data into new instance of
``checkpoint``. As long as the user employs the ``operator<<`` and
``operator>>`` to stream the data this detail can be ignored.

.. important::

Be careful when mixing ``operator<<`` and ``operator>>`` with other
facilities to read and write to a ``checkpoint``. ``operator<<`` writes and
extra variable and ``operator>>`` reads this variable back separately. Used
together the user will not encounter any issues and can safely ignore this
detail.

Users may also move the data into and out of a ``checkpoint`` using the exposed
``.begin()`` and ``.end()`` iterators. An example of this use case is
illustrated below.

.. literalinclude:: ../../tests/unit/util/checkpoint.cpp
:language: c++
:lines: 129-150

Checkpointing Components
------------------------

``save_checkpoint`` and ``restore_checkpoint`` are also able to store components
inside ``checkpoint``s. This can be done in one of two ways. First a client of
the component can be passed to ``save_checkpoint``. When the user wishes to
resurrect the component she can pass a client instance to ``restore_checkpoint``.

This technique is demonstrated below:

.. literalinclude:: ../../tests/unit/util/checkpoint.cpp
:language: c++
:lines: 143-144

The second way a user can save a component is by passing a ``shared_ptr`` to the
component to ``save_checkpoint``. This component can be resurrected by creating
a new instance of the component type and passing a ``shared_ptr`` to the new
instance to ``restore_checkpoint``. An example can be found below:

This technique is demonstrated below:

.. literalinclude:: ../../tests/unit/util/checkpoint.cpp
:language: c++
:lines: 113-126


.. _iostreams:

The |hpx| I/O-streams component
Expand Down
2 changes: 0 additions & 2 deletions examples/1d_stencil/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ set(example_programs
1d_stencil_2
1d_stencil_3
1d_stencil_4
1d_stencil_4_checkpoint
1d_stencil_4_parallel
1d_stencil_5
1d_stencil_6
Expand Down Expand Up @@ -42,7 +41,6 @@ set(1d_stencil_1_PARAMETERS THREADS_PER_LOCALITY 4)
set(1d_stencil_2_PARAMETERS THREADS_PER_LOCALITY 4)
set(1d_stencil_3_PARAMETERS THREADS_PER_LOCALITY 4)
set(1d_stencil_4_PARAMETERS THREADS_PER_LOCALITY 4)
set(1d_stencil_4_checkpoint_PARAMETERS THREADS_PER_LOCALITY 4)
set(1d_stencil_4_parallel_PARAMETERS THREADS_PER_LOCALITY 4)
set(1d_stencil_5_PARAMETERS THREADS_PER_LOCALITY 4)
set(1d_stencil_6_PARAMETERS THREADS_PER_LOCALITY 4)
Expand Down
8 changes: 5 additions & 3 deletions libs/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ set(HPX_LIBS
assertion
basic_execution
cache
checkpoint
collectives
compute
compute_cuda
Expand Down Expand Up @@ -104,6 +105,7 @@ foreach(lib ${HPX_LIBS})

set(MODULE_FORCE_LINKING_INCLUDES
"${MODULE_FORCE_LINKING_INCLUDES}#include <hpx/${lib}/force_linking.hpp>\n")

set(MODULE_FORCE_LINKING_CALLS
"${MODULE_FORCE_LINKING_CALLS}\n ${lib}::force_linking();")

Expand All @@ -114,9 +116,9 @@ foreach(lib ${HPX_LIBS})
endforeach()

configure_file(
"${PROJECT_SOURCE_DIR}/cmake/templates/modules.cpp.in"
"${CMAKE_BINARY_DIR}/libs/modules.cpp"
@ONLY)
"${PROJECT_SOURCE_DIR}/cmake/templates/modules.cpp.in"
"${CMAKE_BINARY_DIR}/libs/modules.cpp"
@ONLY)

configure_file(
"${PROJECT_SOURCE_DIR}/cmake/templates/config_defines_strings_modules.hpp.in"
Expand Down
1 change: 1 addition & 0 deletions libs/all_modules.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ All modules
/libs/assertion/docs/index.rst
/libs/basic_execution/docs/index.rst
/libs/cache/docs/index.rst
/libs/checkpoint/docs/index.rst
/libs/collectives/docs/index.rst
/libs/compute/docs/index.rst
/libs/compute_cuda/docs/index.rst
Expand Down
34 changes: 34 additions & 0 deletions libs/checkpoint/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Copyright (c) 2019 The STE||AR-Group
#
# SPDX-License-Identifier: BSL-1.0
# Distributed under the Boost Software License, Version 1.0. (See accompanying
# file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

cmake_minimum_required(VERSION 3.3.2 FATAL_ERROR)

list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake")

# Default location is $HPX_ROOT/libs/checkpoint/include
set(checkpoint_headers
hpx/checkpoint/checkpoint.hpp
)

# Default location is $HPX_ROOT/libs/checkpoint/include_compatibility
set(checkpoint_compat_headers
hpx/util/checkpoint.hpp
)

set(checkpoint_sources)

include(HPX_AddModule)
add_hpx_module(checkpoint
COMPATIBILITY_HEADERS ON
DEPRECATION_WARNINGS
FORCE_LINKING_GEN
GLOBAL_HEADER_GEN ON
SOURCES ${checkpoint_sources}
HEADERS ${checkpoint_headers}
COMPAT_HEADERS ${checkpoint_compat_headers}
DEPENDENCIES hpx_serialization
CMAKE_SUBDIRS examples tests
)
16 changes: 16 additions & 0 deletions libs/checkpoint/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@

..
Copyright (c) 2019 The STE||AR-Group
SPDX-License-Identifier: BSL-1.0
Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

==========
checkpoint
==========

This library is part of HPX.

Documentation can be found `here
<https://stellar-group.github.io/hpx/docs/sphinx/latest/html/libs/checkpoint/docs/index.html>`__.
136 changes: 136 additions & 0 deletions libs/checkpoint/docs/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
..
Copyright (c) 2019 The STE||AR-Group
SPDX-License-Identifier: BSL-1.0
Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

.. _libs_checkpoint:

==========
checkpoint
==========

A common need of users is to periodically backup an application. This practice
provides resiliency and potential restart points in code. We have developed the
concept of a ``checkpoint`` to support this use case.

Found in ``hpx/util/checkpoint.hpp``, ``checkpoint``\ s are defined as objects
which hold a serialized version of an object or set of objects at a particular
moment in time. This representation can be stored in memory for later use or it
can be written to disk for storage and/or recovery at a later point. In order to
create and fill this object with data we use a function called
``save_checkpoint``. In code the function looks like this::

hpx::future<hpx::util::checkpoint> hpx::util::save_checkpoint(a, b, c, ...);

``save_checkpoint`` takes arbitrary data containers such as int, double, float,
vector, and future and serializes them into a newly created ``checkpoint``
object. This function returns a ``future`` to a ``checkpoint`` containing the
data. Let us look a simple use case below::

using hpx::util::checkpoint;
using hpx::util::save_checkpoint;

std::vector<int> vec{1,2,3,4,5};
hpx::future<checkpoint> save_checkpoint(vec);

Once the future is ready the checkpoint object will contain the ``vector``
``vec`` and its five elements.

It is also possible to modify the launch policy used by ``save_checkpoint``.
This is accomplished by passing a launch policy as the first argument. It is
important to note that passing ``hpx::launch::sync`` will cause
``save_checkpoint`` to return a ``checkpoint`` instead of a ``future`` to a
``checkpoint``. All other policies passed to ``save_checkpoint`` will return a
``future`` to a ``checkpoint``.

Sometimes ``checkpoint`` s must be declared before they are used.
``save_checkpoint`` allows users to move pre-created ``checkpoint`` s into the
function as long as they are the first container passing into the function (In
the case where a launch policy is used, the ``checkpoint`` will immediately
follow the launch policy). An example of these features can be found below:

.. literalinclude:: ../../../../libs/tests/unit/checkpoint.cpp
:language: c++
:lines: 27-38

Now that we can create ``checkpoint`` s we now must be able to restore the
objects they contain into memory. This is accomplished by the function
``restore_checkpoint``. This function takes a ``checkpoint`` and fills its data
into the containers it is provided. It is important to remember that the
containers must be ordered in the same way they were placed into the
``checkpoint``. For clarity see the example below:

.. literalinclude:: ../../../../libs/tests/unit/checkpoint.cpp
:language: c++
:lines: 41-49

The core utility of ``checkpoint`` is in its ability to make certain data
persistent. Often this means that the data is needed to be stored in an object,
such as a file, for later use. For these cases we have provided two solutions:
stream operator overloads and access iterators.

We have created the two stream overloads
``operator<<`` and ``operator>>`` to stream data
out of and into ``checkpoint``. You can see an
example of the overloads in use below:

.. literalinclude:: ../../../../libs/tests/unit/checkpoint.cpp
:language: c++
:lines: 176-186

This is the primary way to move data into and out of a ``checkpoint``. It is
important to note, however, that users should be cautious when using a stream
operator to load data an another function to remove it (or vice versa). Both
``operator<<`` and ``operator>>`` rely on a ``.write()`` and a ``.read()``
function respectively. In order to know how much data to read from the
``std::istream``, the ``operator<<`` will write the size of the ``checkpoint``
before writing the ``checkpoint`` data. Correspondingly, the ``operator>>`` will
read the size of the stored data before reading the data into new instance of
``checkpoint``. As long as the user employs the ``operator<<`` and
``operator>>`` to stream the data this detail can be ignored.

.. important::

Be careful when mixing ``operator<<`` and ``operator>>`` with other
facilities to read and write to a ``checkpoint``. ``operator<<`` writes and
extra variable and ``operator>>`` reads this variable back separately. Used
together the user will not encounter any issues and can safely ignore this
detail.

Users may also move the data into and out of a ``checkpoint`` using the exposed
``.begin()`` and ``.end()`` iterators. An example of this use case is
illustrated below.

.. literalinclude:: ../../../../libs/tests/unit/checkpoint.cpp
:language: c++
:lines: 129-150

Checkpointing Components
------------------------

``save_checkpoint`` and ``restore_checkpoint`` are also able to store components
inside ``checkpoint``s. This can be done in one of two ways. First a client of
the component can be passed to ``save_checkpoint``. When the user wishes to
resurrect the component she can pass a client instance to ``restore_checkpoint``.

This technique is demonstrated below:

.. literalinclude:: ../../../../libs/tests/unit/checkpoint.cpp
:language: c++
:lines: 143-144

The second way a user can save a component is by passing a ``shared_ptr`` to the
component to ``save_checkpoint``. This component can be resurrected by creating
a new instance of the component type and passing a ``shared_ptr`` to the new
instance to ``restore_checkpoint``. An example can be found below:

This technique is demonstrated below:

.. literalinclude:: ../../../../libs/tests/unit/checkpoint.cpp
:language: c++
:lines: 113-126



Loading

0 comments on commit 96c989b

Please sign in to comment.