Merge pull request #32 from aced-differentiate/develop

Add learning modules and documentation
aced-differentiate · May 23, 2022 · 7a0c0fc · 7a0c0fc
2 parents 9162eb8 + c977ed6
commit 7a0c0fc
Show file tree

Hide file tree

Showing 70 changed files with 8,039 additions and 247 deletions.
diff --git a/.github/workflows/deploy-pages.yml b/.github/workflows/deploy-pages.yml
@@ -0,0 +1,17 @@
+name: deploy-pages
+on:
+  push:
+    branches:
+      - master
+
+jobs:
+  deploy:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - uses: actions/setup-python@v2
+        with:
+          python-version: 3.x
+      - run: pip install mkdocs-material
+      - run: pip install mkdocstrings
+      - run: mkdocs gh-deploy --force
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -68,8 +68,8 @@ A pre-commit hook is available to auto-format code with
 3. Install pre-commit: ``$ pip install pre-commit``
 4. Intall git hooks in your ``.git`` directory: ``$ pre-commit install``
 
-Names for functions, arguments, classes, and methods should be as descriptive as possible, 
-even if it means making them a little longer. For example, `generate_surface_structures` is 
+Names for functions, arguments, classes, and methods should be as descriptive as possible,
+even if it means making them a little longer. For example, `generate_surface_structures` is
 a preferred function name to `gen_surfs`.
 All class names should adhere to [upper CamelCase](https://en.wikipedia.org/wiki/Camel_case).
 
@@ -86,16 +86,16 @@ A passing build requires the following:
 * Every line of code is executed by a test (100% coverage)
 * Documentation has been updated or extended (as needed) and builds
 
-PR descriptions should describe the motivation and context of the code changes in the PR, 
-both for the reviewer and also for future developers. If there's a Github issue, the PR should 
+PR descriptions should describe the motivation and context of the code changes in the PR,
+both for the reviewer and also for future developers. If there's a Github issue, the PR should
 be linked to the issue to provide that context.
 
 ## Documentation<a name="documentation"></a>
-`AutoCat` documentation is built using `mkdocs` via 
-[`mkdocs-material`](https://squidfunk.github.io/mkdocs-material/) 
-and 
+`AutoCat` documentation is built using `mkdocs` via
+[`mkdocs-material`](https://squidfunk.github.io/mkdocs-material/)
+and
 [`mkdocstrings`](https://mkdocstrings.github.io/).
-All custom documentation should be written as `.md` files, appropriately placed within 
+All custom documentation should be written as `.md` files, appropriately placed within
 `docs/`, and referenced within the `mkdocs.yml` file.
 
 With `mkdocs` the docs webpage can be hosted locally with the command:
@@ -106,3 +106,4 @@ which will give an `html` link that can be pasted in a web-browser.
 
 API documentation is automatically generated with `mkdocstrings` which parses the docstrings.
 Please ensure that all docstrings follow the Numpy style.
+
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -1,3 +1,4 @@
 include src/autocat/data/**/*.json
 include src/autocat/VERSION.txt
-include bin/autocat
+include bin/autocat
+include CONTRIBUTING.md
diff --git a/README.md b/README.md
@@ -1,9 +1,9 @@
-# AutoCat 
+# AutoCat
 
-AutoCat is a suite of python tools for **sequential learning for materials applications** 
+AutoCat is a suite of python tools for **sequential learning for materials applications**
 and **automating structure generation for DFT catalysis studies.**
 
-Development of this package stems from [ACED](https://www.cmu.edu/aced/), as part of the 
+Development of this package stems from [ACED](https://www.cmu.edu/aced/), as part of the
 ARPA-E DIFFERENTIATE program.
 
 ## Installation
@@ -13,10 +13,9 @@ There are two options for installation, either via `pip` or from the repo direct
 ### `pip` (recommended)
 
 If you are planning on strictly using AutoCat rather than contributing to development,
- we recommend using `pip` within a virtual environment (e.g. 
+ we recommend using `pip` within a virtual environment (e.g.
  [`conda`](https://docs.conda.io/en/latest/)
- ). This can be done
-as follows:
+ ). This can be done as follows:
 
 ```
 pip install autocat
@@ -29,10 +28,10 @@ AutoCat can be installed via a clone from Github. First, you'll need to clone th
 github repo to your local machine (or wherever you'd like to use AutoCat) using
 `git clone`. Once the repo has been cloned, you can install AutoCat as an editable
 package by changing into the created directory (the one with `setup.py`) and installing
-via: 
+via:
 ```
 pip install -e .
 ```
 ## Contributing
-Contributions through issues, feature requests, and pull requests are welcome. 
-Guidelines are provided [here](CONTRIBUTING.md).
+Contributions through issues, feature requests, and pull requests are welcome.
+Guidelines are provided [here](CONTRIBUTING.md).
diff --git a/docs/API/Learning/featurizers.md b/docs/API/Learning/featurizers.md
@@ -0,0 +1 @@
+::: autocat.learning.featurizers
diff --git a/docs/API/Learning/predictors.md b/docs/API/Learning/predictors.md
@@ -0,0 +1 @@
+::: autocat.learning.predictors
diff --git a/docs/API/Learning/sequential.md b/docs/API/Learning/sequential.md
@@ -0,0 +1 @@
+::: autocat.learning.sequential
diff --git a/docs/API/Structure_Generation/adsorption.md b/docs/API/Structure_Generation/adsorption.md
@@ -0,0 +1 @@
+::: autocat.adsorption
diff --git a/docs/API/Structure_Generation/bulk.md b/docs/API/Structure_Generation/bulk.md
@@ -0,0 +1 @@
+::: autocat.bulk
diff --git a/docs/API/Structure_Generation/saa.md b/docs/API/Structure_Generation/saa.md
@@ -0,0 +1,3 @@
+# Single Atom Alloys
+
+::: autocat.saa
diff --git a/docs/API/Structure_Generation/surface.md b/docs/API/Structure_Generation/surface.md
@@ -0,0 +1 @@
+::: autocat.surface
diff --git a/docs/Makefile b/docs/Makefile
diff --git a/docs/README.md b/docs/README.md
@@ -0,0 +1,91 @@
+# AutoCat Documentation
+
+![AutoCat Logo](img/autocat_logo.png){ align=right }
+
+AutoCat is a suite of python tools for **sequential learning for materials applications** 
+and **automating structure generation for DFT catalysis studies.**
+
+Development of this package stems from [ACED](https://www.cmu.edu/aced/), as part of the 
+ARPA-E DIFFERENTIATE program.
+
+Below we provide an overview of the key functionalities of AutoCat. 
+For additional details please see the User Guide, Tutorials, and API sections.
+
+## Sequential Learning
+
+One of the core philosophies of AutoCat is to provide modular and extensible tooling to
+facilitate closed-loop computational materials discovery workflows. Within this submodule 
+are classes for defining a design space, featurization, 
+regression, and defining a closed-loop sequential learning iterator. The 
+key classes intended for each of these purposes are:
+
+- [**`DesignSpace`**](User_Guide/Learning/sequential#designspace): define a design space to explore
+
+- [**`Featurizer`**](User_Guide/Learning/featurizers): featurize the systems for regression
+
+- [**`Predictor`**](User_Guide/Learning/predictors): a regressor for predicting materials properties
+
+- [**`SequentialLearner`**](User_Guide/Learning/sequential#sequentiallearner): define a closed-loop iterator 
+
+
+## Structure Generation
+
+![Adsorption Figure](img/struct_gen_figs/adsorption.png){ align=right }
+
+This submodule contains functions for automating atomic structure generation 
+within the context of a catalysis study using density functional theory. 
+Specifically, this includes generating bulk structures, surfaces, and 
+placing adsorbates. In addition, functions for generating the single-atom alloys 
+material class are also included. These functions are organized within AutoCat as follows:
+
+- [**`autocat.bulk`**](User_Guide/Structure_Generation/bulk): generation of periodic 
+mono-elemental bulk structures
+
+- [**`autocat.surface`**](User_Guide/Structure_Generation/surface): mono-elemental surface slab generation
+
+- [**`autocat.adsorption`**](User_Guide/Structure_Generation/adsorption): placement of adsorbates onto surfaces
+
+- [**`autocat.saa`**](User_Guide/Structure_Generation/saa): generation of single-atom alloy surfaces
+
+Structures generated or read with this package are typically of the form of 
+[`ase.Atoms`](https://wiki.fysik.dtu.dk/ase/ase/atoms.html#module-ase.atoms) 
+objects.
+
+When opting to write structures to 
+disk using these functions, they are automatically organized into a clean, scalable directory organization. 
+All structures are written in the 
+[`ase.io.Trajectory`](https://wiki.fysik.dtu.dk/ase/ase/io/trajectory.html#trajectory) 
+file format. 
+For further details on the directory structure, see the User Guide.
+
+## Installation
+
+There are two options for installation, either via `pip` or from the repo directly.
+
+### `pip` (recommended)
+
+If you are planning on strictly using AutoCat rather than contributing to development,
+ we recommend using `pip` within a virtual environment (e.g. 
+ [`conda`](https://docs.conda.io/en/latest/)
+ ). This can be done
+as follows:
+
+```
+pip install autocat
+```
+
+### Github (for developers)
+
+Alternatively, if you would like to contribute to the development of this software,
+AutoCat can be installed via a clone from Github. First, you'll need to clone the
+github repo to your local machine (or wherever you'd like to use AutoCat) using
+`git clone`. Once the repo has been cloned, you can install AutoCat as an editable
+package by changing into the created directory (the one with `setup.py`) and installing
+via: 
+```
+pip install -e .
+```
+
+## Contributing
+Contributions through issues, feature requests, and pull requests are welcome. 
+Guidelines are provided here.
diff --git a/docs/Tutorials/pred_h.md b/docs/Tutorials/pred_h.md
@@ -0,0 +1,131 @@
+In this tutorial we are going to show how to use the learning tools within 
+AutoCat to train a regressor that can predict adsorption energies of hydrogen 
+on a set of single-atom alloys.
+
+## Creating a `DesignSpace`
+
+Let's start by creating a `DesignSpace`. Normally each of these 
+structures would be optimized via DFT, but for demo purposes 
+we'll use the generated structures directly. First we need to generate the single-atom 
+alloys. Here, we can use AutoCat's 
+[`generate_saa_structures`](../API/Structure_Generation/saa.md#autocat.saa.generate_saa_structures) 
+function. 
+
+```py
+>>> # Generate the clean single-atom alloy structures
+>>> from autocat.saa import generate_saa_structures
+>>> from autocat.utils import extract_structures
+>>> saa_struct_dict = generate_saa_structures(
+...     ["Fe", "Cu", "Au"],
+...     ["Pt", "Pd", "Ni"],
+...     facets={"Fe":["110"], "Cu":["111"], "Au":["111"]},
+...     n_fixed_layers=2,
+... )
+>>> saa_structs = extract_structures(saa_struct_dict)
+```
+
+Now that we have the clean structures, let's adsorb hydrogen on the surface. 
+For convenience let's place H at the origin instead of considering all symmetry sites. 
+To accomplish this we can make use of AutoCat's 
+[`place_adsorbate`](../API/Structure_Generation/adsorption.md#autocat.adsorption.place_adsorbate)
+function.
+
+```py
+>>> # Adsorb hydrogen onto each of the generated SAA surfaces
+>>> from autocat.adsorption import place_adsorbate
+>>> ads_structs = []
+>>> for clean_struct in saa_structs:
+...     ads_dict = place_adsorbate(
+...        clean_struct,
+...        "H",
+...        (0.,0.)
+...     )
+...     ads_struct = extract_structures(ads_dict)[0]
+...     ads_structs.append(ads_struct)
+```
+
+This has collected all of the single-atom alloys with hydrogen adsorbed into 
+a single list of `ase.Atoms` objects, `ads_structs`. Ideally at this stage we'd have 
+adsorption energies for each of the generated structures after relaxation. As a proxy 
+in this demo we'll create random labels, but this should be adsorption energies if you 
+want to train a meaningful Predictor!
+
+```py
+>>> # Generate the labels for each structure
+>>> import numpy as np
+>>> labels = np.random.uniform(-1.5,1.5,size=len(ads_structs))
+```
+
+Finally, using both our structures and labels we can define a `DesignSpace`. In practice, 
+if any of the labels for a structure are unknown, it can be included as a `numpy.nan` 
+
+```py
+>>> from autocat.learning.sequential import DesignSpace
+>>> design_space = DesignSpace(ads_structs, labels)
+```
+
+## Setting up a `Predictor`
+
+When setting up our `Predictor` we now have two choices to make:
+
+1. The technique to be used for featurizing the systems
+2. The regression model to be used for training and predictions
+
+Internally, the `Predictor` will contain a `Featurizer` object which contains all of 
+our choices for how to featurize the systems. Our choice of featurizer class and 
+the associated kwargs are specified via the `featurizer_class` and 
+`featurization_kwargs` arguments, respectively. By providing the design space structures 
+some of the kwargs related to the featurization (e.g. maximum structure size) can be 
+automatically obtained.
+
+Similarly, we can specify the regressor to be used within the `model_class` and 
+`model_kwargs` arguments. The class should be "`sklearn`-like" with `fit` and 
+`predict` methods.
+
+Let's featurize the hydrogen environment via `dscribe`'s `SOAP` class with 
+`sklearn`'s `GaussianProcessRegressor` for regression.
+
+```py
+>>> from sklearn.gaussian_process import GaussianProcessRegressor
+>>> from sklearn.gaussian_process.kernels import RBF
+>>> from dscribe import SOAP
+>>> from autocat.learning.predictors import Predictor
+>>> kernel = RBF(1.5)
+>>> model_kwargs={"kernel": kernel}
+>>> featurization_kwargs={
+...     "design_space_structures": design_space.design_space_structures,
+...     "kwargs": {"rcut": 7.0, "nmax": 8, "lmax": 8}
+... }
+>>> predictor = Predictor(
+...     model_class=GaussianProcessRegressor,
+...     model_kwargs=model_kwargs,
+...     featurizer_class=SOAP,
+...     featurization_kwargs=featurization_kwargs,
+... )
+```
+
+## Training and making predictions
+
+With our newly defined `Predictor` we can train it using data from our 
+`DesignSpace` and the `fit` method.
+
+```py
+>>> train_structures = design_space.design_space_structures[:5]
+>>> train_labels = design_space.design_space_labels[:5]
+>>> predictor.fit(train_structures, train_labels)
+```
+
+Making predictions is a similar process except using the `predict` method.
+
+```py
+>>> test_structures = design_space.design_space_structures[5:]
+>>> predicted_labels = predictor.predict(test_structures)
+```
+
+In this example, since we already have the labels for the test structures, we can 
+also use the `score` method to calculate a prediction score.
+
+```py
+>>> test_labels = design_space.design_space_labels[5:]
+>>> mae = predictor.score(test_structures, test_labels)
+```