-
Notifications
You must be signed in to change notification settings - Fork 0
Jianxinnn/af3_batch_process
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
# AlphaFold3 Local Batch Processing Script A flexible script for running AlphaFold3 predictions locally with support for single/multiple GPU processing and batch operations. ## Features - Single GPU, single task execution - Single GPU, batch processing - Multi-GPU parallel batch processing - Checkpoint support (auto-skip completed predictions) - Support for protein, RNA, DNA, and ligand inputs - YAML-based configuration ## Prerequisites - Local AlphaFold3 conda environment - Downloaded AlphaFold3 model parameters - Installed AlphaFold3 Python package - Required binary tools (jackhmmer, hmmbuild, hmmsearch, nhmmer) ## Configuration All paths and settings are configured in `config.yaml`: - Model weights path - Database path - Input/Output directories - Binary tool paths - Environment settings ## Input Data Format The script accepts data in the following format (List[Dict]): ```python [ { "name": "test1", "protein": "MALWMRLLPLLALLALWGPDPAAA", "rna": "GCAGAGCCCUCCAGCAUCGCGAGC", "dna": "GCTCGCGATGCTAGAGGGCTCTGC", "ligand": "CC(=O)NC1=CC=C(O)C=C1" # Optional ligand in SMILES format } ] ``` Supported sequence types: - protein: Protein sequence in one-letter code - rna: RNA sequence - dna: DNA sequence - ligand: Small molecule in SMILES format (optional) ## Usage Examples ### Single Task on Single GPU ```python from alphafold3_localbase import AlphaFoldModel model = AlphaFoldModel() sequences = [ { "name": "test1", "protein": "MALWMRLLPLLALLALWGPDPAAA", "rna": "GCTCGCGATGCTAGAGGGCTCTGC", "ligand": "CC(=O)NC1=CC=C(O)C=C1" } ] batch_mode = False input_data = model.single_prepare_sequences(sequences, "234321") gpu_ids = "0" gpu_num = len(gpu_ids.split(",")) input_data = model.prepare_input( input_data, batch_mode=batch_mode, num_gpus=gpu_num, name_prefix="single_task" ) model.run_prediction(input_data, device=f"cuda:{gpu_ids}") ``` ### Batch Processing on Multiple GPUs ```python model = AlphaFoldModel() sequences = [ { "name": "test1", "protein": "MALWMRLLPLLALLALWGPDPAAA", "rna": "GCTCGCGATGCTAGAGGGCTCTGC" }, { "name": "test2", "protein": "MALWMRLLPLLALLALWGPDPAAA", "dna": "GCTCGCGATGCTAGAGGGCTCTGC", "rna": "GCAGAGCCCUCCAGCAUCGCGAGC", "ligand": "CC(=O)NC1=CC=C(O)C=C1" } ] batch_mode = True gpu_ids = "0,1,2,3" # Multi-GPU setup gpu_num = len(gpu_ids.split(",")) input_data = model.batch_prepare_sequences(sequences, "234321") input_data = model.prepare_input( input_data, batch_mode=batch_mode, num_gpus=gpu_num, name_prefix="batch_task" ) model.run_prediction(input_data, device=f"cuda:{gpu_ids}") ``` ## Advanced Features 1. Checkpoint Support - Automatically skips completed predictions - Validates output file integrity 2. Multi-GPU Load Balancing - Evenly distributes jobs across available GPUs - Handles remainder jobs efficiently 3. Flexible Input Formats - Supports both AlphaFold server and local formats - Automatic format conversion 4. Error Handling - Robust error checking - Detailed logging ## Notes - For multi-GPU processing, jobs are distributed evenly across GPUs - The script automatically checks for existing predictions to avoid redundant processing - All paths and configurations can be customized in config.yaml - Supports both protein-only and protein-nucleic acid-ligand complex predictions
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published