Skip to content

Commit

Permalink
Merge pull request #32 from aced-differentiate/develop
Browse files Browse the repository at this point in the history
Add learning modules and documentation
  • Loading branch information
hegdevinayi authored May 23, 2022
2 parents 9162eb8 + c977ed6 commit 7a0c0fc
Show file tree
Hide file tree
Showing 70 changed files with 8,039 additions and 247 deletions.
17 changes: 17 additions & 0 deletions .github/workflows/deploy-pages.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
name: deploy-pages
on:
push:
branches:
- master

jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: 3.x
- run: pip install mkdocs-material
- run: pip install mkdocstrings
- run: mkdocs gh-deploy --force
17 changes: 9 additions & 8 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,8 +68,8 @@ A pre-commit hook is available to auto-format code with
3. Install pre-commit: ``$ pip install pre-commit``
4. Intall git hooks in your ``.git`` directory: ``$ pre-commit install``

Names for functions, arguments, classes, and methods should be as descriptive as possible,
even if it means making them a little longer. For example, `generate_surface_structures` is
Names for functions, arguments, classes, and methods should be as descriptive as possible,
even if it means making them a little longer. For example, `generate_surface_structures` is
a preferred function name to `gen_surfs`.
All class names should adhere to [upper CamelCase](https://en.wikipedia.org/wiki/Camel_case).

Expand All @@ -86,16 +86,16 @@ A passing build requires the following:
* Every line of code is executed by a test (100% coverage)
* Documentation has been updated or extended (as needed) and builds

PR descriptions should describe the motivation and context of the code changes in the PR,
both for the reviewer and also for future developers. If there's a Github issue, the PR should
PR descriptions should describe the motivation and context of the code changes in the PR,
both for the reviewer and also for future developers. If there's a Github issue, the PR should
be linked to the issue to provide that context.

## Documentation<a name="documentation"></a>
`AutoCat` documentation is built using `mkdocs` via
[`mkdocs-material`](https://squidfunk.github.io/mkdocs-material/)
and
`AutoCat` documentation is built using `mkdocs` via
[`mkdocs-material`](https://squidfunk.github.io/mkdocs-material/)
and
[`mkdocstrings`](https://mkdocstrings.github.io/).
All custom documentation should be written as `.md` files, appropriately placed within
All custom documentation should be written as `.md` files, appropriately placed within
`docs/`, and referenced within the `mkdocs.yml` file.

With `mkdocs` the docs webpage can be hosted locally with the command:
Expand All @@ -106,3 +106,4 @@ which will give an `html` link that can be pasted in a web-browser.

API documentation is automatically generated with `mkdocstrings` which parses the docstrings.
Please ensure that all docstrings follow the Numpy style.

3 changes: 2 additions & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
include src/autocat/data/**/*.json
include src/autocat/VERSION.txt
include bin/autocat
include bin/autocat
include CONTRIBUTING.md
17 changes: 8 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# AutoCat
# AutoCat

AutoCat is a suite of python tools for **sequential learning for materials applications**
AutoCat is a suite of python tools for **sequential learning for materials applications**
and **automating structure generation for DFT catalysis studies.**

Development of this package stems from [ACED](https://www.cmu.edu/aced/), as part of the
Development of this package stems from [ACED](https://www.cmu.edu/aced/), as part of the
ARPA-E DIFFERENTIATE program.

## Installation
Expand All @@ -13,10 +13,9 @@ There are two options for installation, either via `pip` or from the repo direct
### `pip` (recommended)

If you are planning on strictly using AutoCat rather than contributing to development,
we recommend using `pip` within a virtual environment (e.g.
we recommend using `pip` within a virtual environment (e.g.
[`conda`](https://docs.conda.io/en/latest/)
). This can be done
as follows:
). This can be done as follows:

```
pip install autocat
Expand All @@ -29,10 +28,10 @@ AutoCat can be installed via a clone from Github. First, you'll need to clone th
github repo to your local machine (or wherever you'd like to use AutoCat) using
`git clone`. Once the repo has been cloned, you can install AutoCat as an editable
package by changing into the created directory (the one with `setup.py`) and installing
via:
via:
```
pip install -e .
```
## Contributing
Contributions through issues, feature requests, and pull requests are welcome.
Guidelines are provided [here](CONTRIBUTING.md).
Contributions through issues, feature requests, and pull requests are welcome.
Guidelines are provided [here](CONTRIBUTING.md).
1 change: 1 addition & 0 deletions docs/API/Learning/featurizers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: autocat.learning.featurizers
1 change: 1 addition & 0 deletions docs/API/Learning/predictors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: autocat.learning.predictors
1 change: 1 addition & 0 deletions docs/API/Learning/sequential.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: autocat.learning.sequential
1 change: 1 addition & 0 deletions docs/API/Structure_Generation/adsorption.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: autocat.adsorption
1 change: 1 addition & 0 deletions docs/API/Structure_Generation/bulk.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: autocat.bulk
3 changes: 3 additions & 0 deletions docs/API/Structure_Generation/saa.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Single Atom Alloys

::: autocat.saa
1 change: 1 addition & 0 deletions docs/API/Structure_Generation/surface.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: autocat.surface
20 changes: 0 additions & 20 deletions docs/Makefile

This file was deleted.

91 changes: 91 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# AutoCat Documentation

![AutoCat Logo](img/autocat_logo.png){ align=right }

AutoCat is a suite of python tools for **sequential learning for materials applications**
and **automating structure generation for DFT catalysis studies.**

Development of this package stems from [ACED](https://www.cmu.edu/aced/), as part of the
ARPA-E DIFFERENTIATE program.

Below we provide an overview of the key functionalities of AutoCat.
For additional details please see the User Guide, Tutorials, and API sections.

## Sequential Learning

One of the core philosophies of AutoCat is to provide modular and extensible tooling to
facilitate closed-loop computational materials discovery workflows. Within this submodule
are classes for defining a design space, featurization,
regression, and defining a closed-loop sequential learning iterator. The
key classes intended for each of these purposes are:

- [**`DesignSpace`**](User_Guide/Learning/sequential#designspace): define a design space to explore

- [**`Featurizer`**](User_Guide/Learning/featurizers): featurize the systems for regression

- [**`Predictor`**](User_Guide/Learning/predictors): a regressor for predicting materials properties

- [**`SequentialLearner`**](User_Guide/Learning/sequential#sequentiallearner): define a closed-loop iterator


## Structure Generation

![Adsorption Figure](img/struct_gen_figs/adsorption.png){ align=right }

This submodule contains functions for automating atomic structure generation
within the context of a catalysis study using density functional theory.
Specifically, this includes generating bulk structures, surfaces, and
placing adsorbates. In addition, functions for generating the single-atom alloys
material class are also included. These functions are organized within AutoCat as follows:

- [**`autocat.bulk`**](User_Guide/Structure_Generation/bulk): generation of periodic
mono-elemental bulk structures

- [**`autocat.surface`**](User_Guide/Structure_Generation/surface): mono-elemental surface slab generation

- [**`autocat.adsorption`**](User_Guide/Structure_Generation/adsorption): placement of adsorbates onto surfaces

- [**`autocat.saa`**](User_Guide/Structure_Generation/saa): generation of single-atom alloy surfaces

Structures generated or read with this package are typically of the form of
[`ase.Atoms`](https://wiki.fysik.dtu.dk/ase/ase/atoms.html#module-ase.atoms)
objects.

When opting to write structures to
disk using these functions, they are automatically organized into a clean, scalable directory organization.
All structures are written in the
[`ase.io.Trajectory`](https://wiki.fysik.dtu.dk/ase/ase/io/trajectory.html#trajectory)
file format.
For further details on the directory structure, see the User Guide.

## Installation

There are two options for installation, either via `pip` or from the repo directly.

### `pip` (recommended)

If you are planning on strictly using AutoCat rather than contributing to development,
we recommend using `pip` within a virtual environment (e.g.
[`conda`](https://docs.conda.io/en/latest/)
). This can be done
as follows:

```
pip install autocat
```

### Github (for developers)

Alternatively, if you would like to contribute to the development of this software,
AutoCat can be installed via a clone from Github. First, you'll need to clone the
github repo to your local machine (or wherever you'd like to use AutoCat) using
`git clone`. Once the repo has been cloned, you can install AutoCat as an editable
package by changing into the created directory (the one with `setup.py`) and installing
via:
```
pip install -e .
```

## Contributing
Contributions through issues, feature requests, and pull requests are welcome.
Guidelines are provided here.
131 changes: 131 additions & 0 deletions docs/Tutorials/pred_h.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
In this tutorial we are going to show how to use the learning tools within
AutoCat to train a regressor that can predict adsorption energies of hydrogen
on a set of single-atom alloys.

## Creating a `DesignSpace`

Let's start by creating a `DesignSpace`. Normally each of these
structures would be optimized via DFT, but for demo purposes
we'll use the generated structures directly. First we need to generate the single-atom
alloys. Here, we can use AutoCat's
[`generate_saa_structures`](../API/Structure_Generation/saa.md#autocat.saa.generate_saa_structures)
function.

```py
>>> # Generate the clean single-atom alloy structures
>>> from autocat.saa import generate_saa_structures
>>> from autocat.utils import extract_structures
>>> saa_struct_dict = generate_saa_structures(
... ["Fe", "Cu", "Au"],
... ["Pt", "Pd", "Ni"],
... facets={"Fe":["110"], "Cu":["111"], "Au":["111"]},
... n_fixed_layers=2,
... )
>>> saa_structs = extract_structures(saa_struct_dict)
```

Now that we have the clean structures, let's adsorb hydrogen on the surface.
For convenience let's place H at the origin instead of considering all symmetry sites.
To accomplish this we can make use of AutoCat's
[`place_adsorbate`](../API/Structure_Generation/adsorption.md#autocat.adsorption.place_adsorbate)
function.

```py
>>> # Adsorb hydrogen onto each of the generated SAA surfaces
>>> from autocat.adsorption import place_adsorbate
>>> ads_structs = []
>>> for clean_struct in saa_structs:
... ads_dict = place_adsorbate(
... clean_struct,
... "H",
... (0.,0.)
... )
... ads_struct = extract_structures(ads_dict)[0]
... ads_structs.append(ads_struct)
```

This has collected all of the single-atom alloys with hydrogen adsorbed into
a single list of `ase.Atoms` objects, `ads_structs`. Ideally at this stage we'd have
adsorption energies for each of the generated structures after relaxation. As a proxy
in this demo we'll create random labels, but this should be adsorption energies if you
want to train a meaningful Predictor!

```py
>>> # Generate the labels for each structure
>>> import numpy as np
>>> labels = np.random.uniform(-1.5,1.5,size=len(ads_structs))
```

Finally, using both our structures and labels we can define a `DesignSpace`. In practice,
if any of the labels for a structure are unknown, it can be included as a `numpy.nan`

```py
>>> from autocat.learning.sequential import DesignSpace
>>> design_space = DesignSpace(ads_structs, labels)
```

## Setting up a `Predictor`

When setting up our `Predictor` we now have two choices to make:

1. The technique to be used for featurizing the systems
2. The regression model to be used for training and predictions

Internally, the `Predictor` will contain a `Featurizer` object which contains all of
our choices for how to featurize the systems. Our choice of featurizer class and
the associated kwargs are specified via the `featurizer_class` and
`featurization_kwargs` arguments, respectively. By providing the design space structures
some of the kwargs related to the featurization (e.g. maximum structure size) can be
automatically obtained.

Similarly, we can specify the regressor to be used within the `model_class` and
`model_kwargs` arguments. The class should be "`sklearn`-like" with `fit` and
`predict` methods.

Let's featurize the hydrogen environment via `dscribe`'s `SOAP` class with
`sklearn`'s `GaussianProcessRegressor` for regression.

```py
>>> from sklearn.gaussian_process import GaussianProcessRegressor
>>> from sklearn.gaussian_process.kernels import RBF
>>> from dscribe import SOAP
>>> from autocat.learning.predictors import Predictor
>>> kernel = RBF(1.5)
>>> model_kwargs={"kernel": kernel}
>>> featurization_kwargs={
... "design_space_structures": design_space.design_space_structures,
... "kwargs": {"rcut": 7.0, "nmax": 8, "lmax": 8}
... }
>>> predictor = Predictor(
... model_class=GaussianProcessRegressor,
... model_kwargs=model_kwargs,
... featurizer_class=SOAP,
... featurization_kwargs=featurization_kwargs,
... )
```

## Training and making predictions

With our newly defined `Predictor` we can train it using data from our
`DesignSpace` and the `fit` method.

```py
>>> train_structures = design_space.design_space_structures[:5]
>>> train_labels = design_space.design_space_labels[:5]
>>> predictor.fit(train_structures, train_labels)
```

Making predictions is a similar process except using the `predict` method.

```py
>>> test_structures = design_space.design_space_structures[5:]
>>> predicted_labels = predictor.predict(test_structures)
```

In this example, since we already have the labels for the test structures, we can
also use the `score` method to calculate a prediction score.

```py
>>> test_labels = design_space.design_space_labels[5:]
>>> mae = predictor.score(test_structures, test_labels)
```
Loading

0 comments on commit 7a0c0fc

Please sign in to comment.