GitHub - Jianxinnn/af3_batch

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README		README
af3_adar.py		af3_adar.py
alphafold3_localbase.py		alphafold3_localbase.py
batch_run.sh		batch_run.sh
config.yaml		config.yaml
execution.py		execution.py

Repository files navigation

# AlphaFold3 Local Batch Processing Script

A flexible script for running AlphaFold3 predictions locally with support for single/multiple GPU processing and batch operations.

## Features

- Single GPU, single task execution
- Single GPU, batch processing
- Multi-GPU parallel batch processing
- Checkpoint support (auto-skip completed predictions)
- Support for protein, RNA, DNA, and ligand inputs
- YAML-based configuration

## Prerequisites

- Local AlphaFold3 conda environment
- Downloaded AlphaFold3 model parameters
- Installed AlphaFold3 Python package
- Required binary tools (jackhmmer, hmmbuild, hmmsearch, nhmmer)

## Configuration

All paths and settings are configured in `config.yaml`:
- Model weights path
- Database path
- Input/Output directories
- Binary tool paths
- Environment settings

## Input Data Format

The script accepts data in the following format (List[Dict]):
```python
[
    {
        "name": "test1",
        "protein": "MALWMRLLPLLALLALWGPDPAAA",
        "rna": "GCAGAGCCCUCCAGCAUCGCGAGC",
        "dna": "GCTCGCGATGCTAGAGGGCTCTGC",
        "ligand": "CC(=O)NC1=CC=C(O)C=C1"  # Optional ligand in SMILES format
    }
]
```

Supported sequence types:
- protein: Protein sequence in one-letter code
- rna: RNA sequence
- dna: DNA sequence
- ligand: Small molecule in SMILES format (optional)

## Usage Examples

### Single Task on Single GPU

```python
from alphafold3_localbase import AlphaFoldModel

model = AlphaFoldModel()
sequences = [
    {
        "name": "test1", 
        "protein": "MALWMRLLPLLALLALWGPDPAAA",
        "rna": "GCTCGCGATGCTAGAGGGCTCTGC",
        "ligand": "CC(=O)NC1=CC=C(O)C=C1"
    }
]
batch_mode = False
input_data = model.single_prepare_sequences(sequences, "234321")
gpu_ids = "0"
gpu_num = len(gpu_ids.split(","))
input_data = model.prepare_input(
    input_data, 
    batch_mode=batch_mode, 
    num_gpus=gpu_num, 
    name_prefix="single_task"
)
model.run_prediction(input_data, device=f"cuda:{gpu_ids}")
```

### Batch Processing on Multiple GPUs

```python
model = AlphaFoldModel()
sequences = [
    {
        "name": "test1",
        "protein": "MALWMRLLPLLALLALWGPDPAAA",
        "rna": "GCTCGCGATGCTAGAGGGCTCTGC"
    },
    {
        "name": "test2",
        "protein": "MALWMRLLPLLALLALWGPDPAAA",
        "dna": "GCTCGCGATGCTAGAGGGCTCTGC",
        "rna": "GCAGAGCCCUCCAGCAUCGCGAGC",
        "ligand": "CC(=O)NC1=CC=C(O)C=C1"
    }
]
batch_mode = True
gpu_ids = "0,1,2,3"  # Multi-GPU setup
gpu_num = len(gpu_ids.split(","))
input_data = model.batch_prepare_sequences(sequences, "234321")
input_data = model.prepare_input(
    input_data, 
    batch_mode=batch_mode, 
    num_gpus=gpu_num, 
    name_prefix="batch_task"
)
model.run_prediction(input_data, device=f"cuda:{gpu_ids}")
```

## Advanced Features

1. Checkpoint Support
   - Automatically skips completed predictions
   - Validates output file integrity

2. Multi-GPU Load Balancing
   - Evenly distributes jobs across available GPUs
   - Handles remainder jobs efficiently

3. Flexible Input Formats
   - Supports both AlphaFold server and local formats
   - Automatic format conversion

4. Error Handling
   - Robust error checking
   - Detailed logging

## Notes

- For multi-GPU processing, jobs are distributed evenly across GPUs
- The script automatically checks for existing predictions to avoid redundant processing
- All paths and configurations can be customized in config.yaml
- Supports both protein-only and protein-nucleic acid-ligand complex predictions