Skip to content

Commit

Permalink
Performance docs (#878)
Browse files Browse the repository at this point in the history
* perf docs part 1

* performance docs

* make persistent sharrow cache the default

* performance tuning checklist

* skim-data-format

* recommend explicit chunking

* multithread defaults

* note on string columns in preprocessors

* update dev install docs

* address review comments

* adding memory profile plotting

* favicon for docs

* change atol for sharrow tests to 1e-6

* troubleshooting docs

* blacken

* add link to notebook

---------

Co-authored-by: David Hensle <[email protected]>
  • Loading branch information
jpn-- and dhensle authored Jul 26, 2024
1 parent 67820ad commit d81f5f2
Show file tree
Hide file tree
Showing 18 changed files with 190,454 additions and 24 deletions.
7 changes: 4 additions & 3 deletions activitysim/core/configuration/filesystem.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,8 +105,8 @@ def data_model_dirs_must_exist(cls, data_model_dir, values):
"""
Name of the output directory for sharrow cache files.
If not given, a directory named "__sharrowcache__" will be created inside
the general cache directory.
If not given, the sharrow cache is stored in a run-independent persistent
location, according to `platformdirs.user_cache_dir`. See `persist_sharrow_cache`.
"""

settings_file_name: str = "settings.yaml"
Expand Down Expand Up @@ -395,7 +395,8 @@ def get_sharrow_cache_dir(self) -> Path:
Path
"""
if self.sharrow_cache_dir is None:
out = self.get_cache_dir("__sharrowcache__")
self.persist_sharrow_cache()
out = self.sharrow_cache_dir
else:
out = self.get_working_subdir(self.sharrow_cache_dir)
if not out.exists():
Expand Down
4 changes: 2 additions & 2 deletions activitysim/core/interaction_sample.py
Original file line number Diff line number Diff line change
Expand Up @@ -359,7 +359,7 @@ def _interaction_sample(
),
interaction_utilities.values,
rtol=1e-2,
atol=0,
atol=1e-6,
err_msg="utility not aligned",
verbose=True,
)
Expand All @@ -370,7 +370,7 @@ def _interaction_sample(
interaction_utilities_sh.values,
interaction_utilities.values,
rtol=1e-2,
atol=0,
atol=1e-6,
)
)
_sh_util_miss1 = interaction_utilities_sh.values[
Expand Down
4 changes: 2 additions & 2 deletions activitysim/core/interaction_simulate.py
Original file line number Diff line number Diff line change
Expand Up @@ -504,14 +504,14 @@ def to_series(x):
sh_util.reshape(utilities.values.shape),
utilities.values,
rtol=1e-2,
atol=0,
atol=1e-6,
err_msg="utility not aligned",
verbose=True,
)
except AssertionError as err:
print(err)
misses = np.where(
~np.isclose(sh_util, utilities.values, rtol=1e-2, atol=0)
~np.isclose(sh_util, utilities.values, rtol=1e-2, atol=1e-6)
)
_sh_util_miss1 = sh_util[tuple(m[0] for m in misses)]
_u_miss1 = utilities.values[tuple(m[0] for m in misses)]
Expand Down
6 changes: 4 additions & 2 deletions activitysim/core/simulate.py
Original file line number Diff line number Diff line change
Expand Up @@ -787,13 +787,15 @@ def eval_utilities(
sh_util,
utilities.values,
rtol=1e-2,
atol=0,
atol=1e-6,
err_msg="utility not aligned",
verbose=True,
)
except AssertionError as err:
print(err)
misses = np.where(~np.isclose(sh_util, utilities.values, rtol=1e-2, atol=0))
misses = np.where(
~np.isclose(sh_util, utilities.values, rtol=1e-2, atol=1e-6)
)
_sh_util_miss1 = sh_util[tuple(m[0] for m in misses)]
_u_miss1 = utilities.values[tuple(m[0] for m in misses)]
_sh_util_miss1 - _u_miss1
Expand Down
Binary file added docs/_static/favicon.ico
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@
# The name of an image file (within the static path) to use as favicon of the
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
# pixels large.
# html_favicon = None
html_favicon = "favicon.ico"

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
Expand Down
9 changes: 0 additions & 9 deletions docs/dev-guide/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,18 +49,9 @@ conda activate ./ASIM-ENV
git clone https://github.com/ActivitySim/sharrow.git
python -m pip install -e ./sharrow
git clone https://github.com/ActivitySim/activitysim.git
cd activitysim
git switch develop
cd ..
python -m pip install -e ./activitysim
```

```{note}
If the environment create step above fails due to a 404 missing error,
the main repository may not be up to date with these docs, try this instead:
https://raw.githubusercontent.com/camsys/activitysim/sharrow-black/conda-environments/activitysim-dev-base.yml
```

Note the above commands will create an environment with all the
necessary dependencies, clone both ActivitySim and sharrow from GitHub,
and `pip install` each of these libraries in editable mode, which
Expand Down
153 changes: 152 additions & 1 deletion docs/dev-guide/using-sharrow.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,55 @@ multiprocessing mode after all the compilation for all model components is
complete.
```

### Top-Level Activation Options

Activating sharrow is done at the top level of the model settings file, typically
`settings.yaml`, by setting the `sharrow` configuration setting to `True`:

```yaml
sharrow: True
```
The default operation for sharrow is to attempt to use the sharrow compiler for
all model specifications, and to revert to the legacy pandas-based evaluation
if the sharrow compiler encounters a problem. Alternatively, the `sharrow`
setting can also be set to `require` or `test`. The `require` setting
will cause the model simply fail if sharrow encounters a problem, which is
useful if the user is interested in ensuring maximum performance.
The `test` setting will run the model in a mode where both sharrow and the
legacy pandas-based evaluation are run on each model specification, and the
results are compared to ensure they are substantially identical. This is
useful for debugging and testing, but is not recommended for production runs
as it is much slower than running only one evaluation path or the other.

Testing is strongly recommended during model development, as it is possible
to write expressions that are valid in one evaluation mode but not the other.
This can happen if model data includes `NaN` values
(see [Performance Considerations](#performance-considerations)), or when
using arithmatic on logical values
(see [Arithmetic on Logical Values](#arithmetic-on-logical-values)).

### Caching of Precompiled Functions

The first time you run a model with sharrow enabled, the compiler will run
and create a cache of compiled functions. This can take a long time, especially
for models with many components or complex utility specifications. However,
once the cache is created, subsequent runs of the model will be much faster.
By default, the cached functions are stored in a subdirectory of the
`platformdirs.user_cache_dir` directory, which is located in a platform-specific
location:

- Windows: `%USERPROFILE%\AppData\Local\ActivitySim\ActivitySim\Cache\...`
- MacOS: `~/Library/Caches/ActivitySim/...`
- Linux: `~/.cache/ActivitySim/...` or `~/$XDG_CACHE_HOME/ActivitySim/...`

The cache directory can be changed from this default location by setting the
[`sharrow_cache_dir`](activitysim.core.configuration.FileSystem.sharrow_cache_dir)
setting in the `settings.yaml` file. Note if you change this setting and provide
a relative path, it will be interpreted as relative to the model working directory,
and cached functions may not carry over to other model runs unless copied there
by the user.

## Model Design Requirements

Activating the `sharrow` optimizations also requires using the new
Expand Down Expand Up @@ -231,6 +280,35 @@ such string operations won't appear in utility specifications at all, or if they
do appear, they are executed only once and stored in a temporary value for re-use
as needed.

A good approach to reduce string operations in model spec files is to convert
string columns to integer or categorical columns in preprocessors. This can
be done using the `map` method, which can be used to convert strings to integers,
for example:

`df['fuel_type'].map({'Gas': 1, 'Diesel': 2, 'Hybrid': 3}).fillna(-1).astype(int)`

Alternatively, data columns can be converted to categorical columns with well-defined
structures. Recent versions of sharrow have made significant improvements in
handling of unordered categorical values, allowing for the use of possibly
more intuitive categorical columns. For example, the fuel type column above
could instead be redefined as a categorical column with the following code:

`df['fuel_type'].astype(pd.CategoricalDtype(categories=['Gas', 'Diesel', 'Hybrid'], ordered=False))`

It is important that the categories are defined with the same set of values
in the same order, as any deviation will from this will void the compiler cache
and cause the model specification to be recompiled. This means that using
`x.astype('category')` is not recommended, as the categories will be inferred
from the data and may not be consistent across multiple calls to the model
specification evaluator.

```{note}
Beginning with ActivitySim version 1.3, string-valued
columns created in preprocessors are converted to categorical columns automatically,
which means that ignoring encoding for string-valued outputs is equivalent to
using the `astype('category')` method, and is not recommended.
```

For models with utility expressions that include a lot of string comparisons,
(e.g. because they are built for the legacy `pandas.eval` interpreter and have not
been updated) sharrow can be disabled by setting
Expand Down Expand Up @@ -410,7 +488,7 @@ taz_skims:
```

If groups of similarly named variables should have the same encoding applied,
they can be identifed by regular expressions ("regex") instead of explicitly
they can be identified by regular expressions ("regex") instead of explicitly
giving each name. For example:

```yaml
Expand Down Expand Up @@ -485,3 +563,76 @@ taz_skims:

For more details on all the settings available for digital encoding, see
[DigitalEncoding](activitysim.core.configuration.network.DigitalEncoding).

## Troubleshooting

If you encounter errors when running the model with sharrow enabled, it is
important to address them before using the model for analysis. This is
especially important when errors are found running in "test" mode (activated
by `sharrow: test` in the top level settings.yaml). Errors may
indicate that either sharrow or the legacy evaluator is not correctly processing
the mathematical expressions in the utility specifications.

### "utility not aligned" Error

One common error that can occur when running the model with sharrow in "test"
mode is the "utility not aligned" error. This error occurs when a sharrow
compiled utility calculation does not sufficiently match the legacy utility
calculation. We say "sufficiently" here because the two calculations may have
slight differences due to numerical precision optimizations applied by sharrow.
These optimizations can result in minor differences in the final utility values,
which are typically inconsequential for model results. However, if the differences
are too large, the "utility not aligned" error will be raised. This error does
not indicate whether the incorrect result is from the sharrow or legacy calculation
(or both), and it is up to the user to determine how to align the calculations
so they are reflective of the model developer's intent.

To troubleshoot the "utility not aligned" error, the user can use a Python debugger
to compare the utility values calculated by sharrow and the legacy evaluator.
ActivitySim also includes error handler code that will attempt to find the
problematic utility expression and print it to the console or log file, under the
heading "possible problematic expressions". This can be helpful in quickly narrowing
down which lines of a specification file are causing the error.

Common causes of the "utility not aligned" error include:

- model data includes `NaN` values but the component settings do not
disable `fastmath` (see [Performance Considerations](#performance-considerations))
- incorrect use of arithmatic on logical values (see
[Arithmetic on Logical Values](#arithmetic-on-logical-values))

### Insufficient system resources

For large models run on large servers, it is possible to overwhelm the system
with too many processes and threads, which can result in the following error:

```
OSError: Insufficient system resources exist to complete the requested service
```

This error can be resolved by reducing the number of processes and/or threads per
process. See [Multiprocessing](../users-guide/performance/multiprocessing.md) and
[Multithreading](../users-guide/performance/multithreading.md) in the User's Guide
for more information on how to adjust these settings.

### Permission Error

If running a model using multiprocessing with sharrow enabled, it is necessary
to have pre-compiled all the utility specifications to prevent the multiple
processes from competing to write to the same cache location on disk. Failure
to do this can result in a permission error, as some processes may be unable to
write to the cache location.

```
PermissionError: The process cannot access the file because it is being used by another process
```
To resolve this error, run the model with sharrow enabled in single-process mode
to pre-compile all the utility specifications. If that does not resolve the error,
it is possible that some compiling is being triggered in multiprocess steps that
is not being handled in the single process mode. This is likely due to the presence
of string or categorical columns created in a preprocessor that are not being
stored in a stable data format. To resolve this error, ensure that all expressions
in pre-processors are written in a manner that results in stable data types (e.g.
integers, floats, or categorical columns with a fixed set of categories). See
see [Performance Considerations](#performance-considerations)) for examples.
2 changes: 1 addition & 1 deletion docs/users-guide/example_models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2756,7 +2756,7 @@ Skims are named <PATH TYPE>_<MEASURE>__<TIME PERIOD>:
Configuration
_____________

This section has been moved to :ref:`configuration`.
This section has been moved to :ref:`user_configuration`.

.. _sub-model-spec-files:

Expand Down
5 changes: 2 additions & 3 deletions docs/users-guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,10 @@ Contents

.. toctree::
:maxdepth: 2

modelsetup
ways_to_run
performance/index
run_primary_example
model_anatomy
../howitworks
Expand All @@ -45,5 +46,3 @@ Contents
.. toctree::
:maxdepth: 1
other_examples


3 changes: 3 additions & 0 deletions docs/users-guide/model_anatomy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -356,6 +356,9 @@ number of TAZs based on impedance and size, the model selects a microzone for ea
on the microzone share of TAZ size. Presampling significantly reduces runtime while producing
similar results.


.. _user_configuration :

Configuration
-------------

Expand Down
71 changes: 71 additions & 0 deletions docs/users-guide/performance/chunking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Chunking

The default operation of ActivitySim is to attempt to run simulations in each
component for that component's entire pool of choosers in a single operation.
This allows for efficient use of vectorization to speed up computations, but can
also lead to memory issues if the pool of choosers is too large. This is particularly
a problem in interaction-type models, where a large pool of choosers is faced
with a large set of alternatives.

ActivitySim includes the ability to "chunk" these model components into more
manageable sized groups of choosers, which can be processed one chunk at a time.
There is a small overhead associated with chunking, but if the total number of
chunks is relatively small, the overhead is usually outweighed by the benefits
in reduced memory usage.

Chunking can be used in two ways in ActivitySim: dynamic and explicit. Dynamic
chunking is the original chunking system in ActivitySim, and it remains available
to support users already familiar with it. It is designed to strive for optimal
chunk sizes, but is complicated to use. Explicit chunking is simpler to use
and understand, and is recommended for most users.

## Dynamic Chunking

This is the original chunking system in ActivitySim, where model components are
chunked into pieces that are selected to be approximately optimal for targeting
a particular memory usage threshold. The chunk size is determined by "training"
the model so that it can estimate the memory usage to simulate each chooser handled
in each component, and then running in "production" mode where the chunk size is
set to keep the memory usage below the selected threshold, based on the results from
the training.

To configure chunking behavior, ActivitySim must first be trained with the model
setup and machine. To do so, first run the model with ``chunk_training_mode: training``.
This tracks the amount of memory used by each table by submodel and writes the results
to a cache file that is then re-used for production runs. This training mode is
*significantly* slower than production mode since it does a lot of memory inspection.
For a training mode run, set ``num_processors`` to about 80% of the available logical
processors and ``chunk_size``to about 80% of the available RAM. This will run the
model and create the ``chunk_cache.csv`` file in the cache directory for reuse. After
creating the chunk cache file, the model can be run with ``chunk_training_mode: production``
and the desired ``num_processors`` and ``chunk_size``. The model will read the chunk
cache file from the cache folder, similar to how it reads cached skims if specified.
The software trains on the size of problem so the cache file can be re-used and
only needs to be updated due to significant revisions in input file or changes in
machine specs. If run in production mode and no cache file is found then ActivitySim falls
back to training mode.

For more detail on running with dynamic chunking, see [Chunking](chunk_in_detail).

## Explicit Chunking

This is a simpler system that allows the user to specify the number of choosers
in each chunk explicitly, either as an integer number of choosers per chunk, or
as a fraction of the overall number of choosers. Although the total amount of
memory engaged for processing any particular chunk is ignored and there is no
effort to find a "optimal" chunk size, this system is easier to use
and understand than dynamic chunking, and in practice has been found to be more
robust and reliable. It requires no "training" and is activated by setting the
`chunk_training_mode` configuration setting to `explicit`.

This method for chunking does rely upon model developers to have identified the
memory-hungry components and to have set reasonable explicit chunk sizes for them.
See [this notebook](https://github.com/ActivitySim/activitysim/blob/main/other_resources/scripts/plot_memory_profile.ipynb)
for an example of how to review component memory usage.
Individual model components are configured to use chunking explicitly by
setting the `explicit_chunk` configuration setting in the model component's
settings, when available. (Refer to each model component's documentation for
details on whether explicit chunking is available with that component.) The
chunk setting can be set to an integer number of choosers to process in each
chunk, or to a fractional value to make chunks approximately that fraction of
the overall number of chooser (e.g. set to 0.25 to get four chunks).
Loading

0 comments on commit d81f5f2

Please sign in to comment.