SolvIT is a machine learning model designed to predict protein solubility in Escherichia coli and aid in enzyme design by prioritizing high-solubility candidates from large design sets. This approach leverages a small graph neural networks (GNNs) to achieve state-of-the-art performance that rivals much larger models.
- Deep Learning Integration: Uses SolvIT, a GNN-based solubility classifier trained on E. coli expression data.
- Comprehensive Pipeline: Automates feature extraction, solubility prediction, and result formatting.
- Ease of Use: Minimal setup requirements for running pre-defined protein designs.
Before running the SolvIT pipeline, ensure the following tools are installed:
- Apptainer/Singularity: Used for managing the containerized environments. Installation Instructions
- Python environment specified in the
environment.yaml
file provided in this repository.
-
Clone the repository:
git clone https://github.com/Enzymit/SolvIT.git cd SolvIT
-
Create and activate the Python environment:
conda env create -f environment.yaml conda activate solvit_snakemake
-
Download the necessary Singularity container:
./singularity/download_sif.sh
Modify the config.yaml
file as needed. Key parameters include:
OUTDIR
: Output directory for results.INPUTDIR
: Directory containing input.pdb
files.SINGULARITY_PATH
: Path to the directory containing the downloaded.sif
file. (usuallysingularity
)OUTFILENAME
: Name of the final output file.
- Execute the Snakemake workflow:
Replace
snakemake --cores <number_of_cores> --use-singularity
<number_of_cores>
with the number of CPU cores to use.
The final results will be saved in the output directory specified in config.yaml
under the name provided in OUTFILENAME
.
The pipeline consists of the following steps:
- Feature Extraction: Extracts features from input
.pdb
files using Rosetta and saves them in a compressed format. - SolvIT Prediction: Runs the solubility prediction model on the extracted features.
- Result Formatting: Processes raw predictions into a user-friendly
.csv
file.
OUTDIR: "output"
INPUTDIR: "example"
SINGULARITY_PATH: "singularity"
OUTFILENAME: "solvit_out.csv"
If you use SolvIT in your research, please cite:
Zimmerman et al., Context-dependent design of induced-fit enzymes using deep learning generates well-expressed, thermally stable, and active enzymes. PNAS, 2024. DOI:10.1073/pnas.2313809121
This project is licensed under the GNU General Public License v3.0 (GPL-3.0).
For any questions or issues, create a new issue in the repository.