assembly-theory
is an open-source, high-performance library for computing assembly indices of molecular structures (see, e.g., Sharma et al., 2023; Walker et al., 2024).
It is implemented in Rust and is available as a Rust crate, Python package, and standalone executable.
If you want to use the Rust crate in a Rust project, refer to the docs.rs documentation for installation and usage examples. If you want to use the Python library (e.g., to take advantage of RDKit-compatible molecule loaders), refer to the documentation on PyPI.
Otherwise, clone/download this repository if you want to:
- Build and run the standalone executable
- Build and run tests and benchmarks on our reference datasets
- Build the Python package locally
Currently, this project only supports Unix-like systems (macOS and Linux). Windows is supported through WSL.
You need Rust, installed either using rustup
or via your system package manager of choice.
This provides the cargo
build system and dependency manager for compilation, testing, benchmarking, documentation, and packaging.
You will also need the clang toolkit for C/C++.
Build an optimized (release) version of the standalone executable with:
cargo build --release
Simply pass this executable a .mol
file path to compute that molecule's assembly index:
./target/release/assembly-theory data/checks/anthracene.mol # 6
A full list of options for returning more information or customizing the assembly index calculation procedure can be found by running:
./target/release/assembly-theory --help
assembly-theory
comes with a variety of unit, integration, and documentation example tests ensuring the correct calculation of assembly indices.
To run all tests, use:
cargo test
We actively encourage the use of assembly-theory
as a framework within which new algorithmic improvements can be implemented and tested.
To measure the performance of a potential improvement, we've implemented benchmarks using the criterion
crate.
These benchmarks run assembly index calculation against reference datasets of molecules, timing only the calculation part (skipping molecule parsing, etc.).
To run all benchmarks, use:
cargo bench
See the criterion
command line options for details on how to run only specific benchmarks or save baselines for comparison.
We use pyo3
to package functionality from our Rust crate as a Python package called assembly_theory
.
To build this package locally, first create a virtual environment for this project using a manager of your choice.
Then install maturin
:
pip install maturin # using pip
pipx install maturin # using pipx
uv tool install maturin # using uv
Within the virtual environment, build and install this project as a Python package:
maturin develop --release
Once installed, this Python package can be combined with standard cheminformatic packages like RDKit
to flexibly manipulate molecular representations and compute their assembly indices.
import assembly_theory as at
from rdkit import Chem
# Get a mol block from a molecule's SMILES representation.
anthracene = Chem.MolFromSmiles("c1ccc2cc3ccccc3cc2c1")
anthracene = Chem.MolToMolBlock(anthracene)
# Calculate the molecule's assembly index.
at.index(anthracene) # 6
See the assembly_theory::python
documentation for a complete list of functions exposed to the Python package along with usage examples.
To run the Python test suite, install pytest
in your virtual environment and then simply run pytest
.
- The current implementation tallies the number of states searched during recursive assembly index calculation.
If the number of states searched exceeds the (very large) limit of a
usize
, the code panics. This is unlikely to occur when various kernelization, memoization, and bounding strategies are enabled to prune the search space, but is theoretically always possible given a large enough molecule and sufficiently long search time. See #49 for details.
This project is under active development!
Any code on the main
branch is considered usable, but not necessarily stable or feature-complete.
See our releases for more reliable snapshots of the project.
Have a suggestion for new features or a bug you need fixed? Open a new issue.
Want to contribute your own code?
- Familiarize yourself with the Rust API Guidelines and overall architecture of
assembly-theory
. - Development team members should work in individual feature branches. External contributors should work in repository forks.
- Commit messages should follow conventional commits.
- Before opening a pull request onto
main
, make sure you rebase ontomain
, runcargo fmt
, and resolve any issues raised bycargo clippy
. - Open a new pull request, provide a descriptive list of your changes (with references to any issues your PR resolves), and assign one of @AgentElement, @jdaymude, or @colemathis as a reviewer. Your PR will not be reviewed unless it passes all GitHub Actions (compilation, formatting, tests, etc.).
assembly-theory
is maintained by Devansh Vimal (@AgentElement), Joshua J. Daymude (@jdaymude), and Cole Mathis (@colemathis) with support from other members of the Biodesign Center for Biocomputing, Security and Society at Arizona State University including Garrett Parzych (@Garrett-Pz), Olivia M. Smith (@omsmith161), Devendra Parkar (@devrz45), and Sean Bergen (@ARandomCl0wn).
The maintainers govern the project using the committee model: high-level decisions about the project's direction require maintainer consensus, major code changes require majority approval, hotfixes and patches require just one maintainer approval, new maintainers can be added by unanimous decision of the existing maintainers, and existing maintainers can step down with advance notice.
If you use this crate in your own scientific work, please consider citing us:
Coming soon!
assembly-theory
is licensed under the Apache License, Version 2.0 or the MIT License, at your option.
Unless you explicitly state otherwise, any contribution you intentionally submit for inclusion in this repository (as defined by Apache-2.0) shall be dual-licensed as above, without any additional terms or conditions.