SolvIT - Protein Solubility Prediction Deep Neural Network

SolvIT is a machine learning model designed to predict protein solubility in Escherichia coli and aid in enzyme design by prioritizing high-solubility candidates from large design sets. This approach leverages a small graph neural networks (GNNs) to achieve state-of-the-art performance that rivals much larger models.

Key Features

Deep Learning Integration: Uses SolvIT, a GNN-based solubility classifier trained on E. coli expression data.
Comprehensive Pipeline: Automates feature extraction, solubility prediction, and result formatting.
Ease of Use: Minimal setup requirements for running pre-defined protein designs.

Prerequisites

Before running the SolvIT pipeline, ensure the following tools are installed:

Apptainer/Singularity: Used for managing the containerized environments. Installation Instructions

Additional Requirements

Python environment specified in the environment.yaml file provided in this repository.

Installation

Clone the repository:

git clone https://github.com/Enzymit/SolvIT.git
cd SolvIT

Create and activate the Python environment:

conda env create -f environment.yaml
conda activate solvit_snakemake

Download the necessary Singularity container:
```
./singularity/download_sif.sh
```

Usage

Preliminary Configuration

Modify the config.yaml file as needed. Key parameters include:

OUTDIR: Output directory for results.
INPUTDIR: Directory containing input .pdb files.
SINGULARITY_PATH: Path to the directory containing the downloaded .sif file. (usually singularity)
OUTFILENAME: Name of the final output file.

Running the Pipeline

Execute the Snakemake workflow:
```
snakemake --cores <number_of_cores> --use-singularity
```
Replace <number_of_cores> with the number of CPU cores to use.

Output

The final results will be saved in the output directory specified in config.yaml under the name provided in OUTFILENAME.

Pipeline Overview

The pipeline consists of the following steps:

Feature Extraction: Extracts features from input .pdb files using Rosetta and saves them in a compressed format.
SolvIT Prediction: Runs the solubility prediction model on the extracted features.
Result Formatting: Processes raw predictions into a user-friendly .csv file.

Example Configuration (`config.yaml`)

OUTDIR: "output"
INPUTDIR: "example"
SINGULARITY_PATH: "singularity"
OUTFILENAME: "solvit_out.csv"

Citation

If you use SolvIT in your research, please cite:

Zimmerman et al., Context-dependent design of induced-fit enzymes using deep learning generates well-expressed, thermally stable, and active enzymes. PNAS, 2024. DOI:10.1073/pnas.2313809121

License

This project is licensed under the GNU General Public License v3.0 (GPL-3.0).

For any questions or issues, create a new issue in the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
example		example
models/ens_010_sol_102_esl_rosetta_off_m036_ff		models/ens_010_sol_102_esl_rosetta_off_m036_ff
scripts		scripts
singularity		singularity
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
config.yaml		config.yaml
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SolvIT - Protein Solubility Prediction Deep Neural Network

Key Features

Prerequisites

Additional Requirements

Installation

Usage

Preliminary Configuration

Running the Pipeline

Output

Pipeline Overview

Example Configuration (`config.yaml`)

Citation

License

About

Releases

Packages

Languages

License

Enzymit/SolvIT

Folders and files

Latest commit

History

Repository files navigation

SolvIT - Protein Solubility Prediction Deep Neural Network

Key Features

Prerequisites

Additional Requirements

Installation

Usage

Preliminary Configuration

Running the Pipeline

Output

Pipeline Overview

Example Configuration (config.yaml)

Citation

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Example Configuration (`config.yaml`)

Packages