Skip to content
forked from AngxiaoYue/ReQFlow

ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation

Notifications You must be signed in to change notification settings

SDS-Lab/ReQFlow

 
 

Repository files navigation

⚡️ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation

arXiv Hugging Face Checkpoint Project Page Try Demo

🔥 News

  • 2025/02/24 💥 Our model weights are hosted on Hugging Face and Google Drive now 😊.
  • 2025/02/20 💥 We release our work ReQFlow for efficient and high-quality protein backbone generation!

🧩 Introduction

Our ReQFlow achieves state-of-the-art (SOTA) performance in protein backbone generation while requiring significantly fewer sampling steps and substantially reducing inference time. For example, it is 37× faster than RFDiffusion and 62× faster than Genie2 when generating a backbone of length 300, demonstrating both its effectiveness and efficiency.

⚒️ Installation

We recommend using mamba. If using mamba then use mamba in place of conda.

conda env create -f reqflow-env.yml

conda activate reqflow-env

pip install torch-scatter -f https://data.pyg.org/whl/torch-2.0.0+cu117.html

# Install local package.
# Current directory should be ReQFlow/

pip install -e .

Warning

You will likely run into the follow error from DeepSpeed

ModuleNotFoundError: No module named 'torch._six'

If so, replace from torch._six import inf with from torch import inf or from math import inf.

  • /path/to/envs/site-packages/deepspeed/runtime/utils.py
  • /path/to/envs/site-packages/deepspeed/runtime/zero/stage_1_and_2.py

🚀 Quick Inference

Our model weights are available for download on Hugging Face or Google Drive. You can also use your own weights. If using ours, please organize the directory as follows:

ReQFlow
├── ckpts
│   ├── qflow_pdb
│   │   ├── config.yaml
│   │   └── qflow_pdb.ckpt
│   ├── qflow_scope
│   │   ├── config.yaml
│   │   └── qflow_scope.ckpt
│   ├── reqflow_pdb_rectify
│   │   ├── config.yaml
│   │   └── reqflow_pdb_rectify.ckpt
│   └── reqflow_scope_rectify
│       ├── config.yaml
│       └── reqflow_scope_rectify.ckpt

The inference configurations are available in configs/inference_unconditional.yaml, where you can conveniently specify the inference settings.

inference:
  task: unconditional
  ckpt_path: ./ckpts/reqflow_pdb_rectify/reqflow_pdb_rectify.ckpt # path to ckpts
  inference_subdir: ./inference_outputs/run_${now:%Y-%m-%d}_${now:%H-%M-%S} # path to inference outputs
  pmpnn_dir: ./ProteinMPNN
  pt_hub_dir: ./.cache/torch/ # path to ESMFold
  num_gpus: 4

  samples:
    min_length: 100
    max_length: 300
    length_step: 50 # sampling on length (100,150,200,250,300)
    samples_per_length: 50
    seq_per_sample: 8 # num. of seq. generated by ProteinMPNN

  interpolant:
    sampling:
      num_timesteps: 500
      do_sde: False
    rots:
      sample_schedule: exp

Once you have specified the configurations, you can run inference using the following command:

python -W ignore experiments/inference_se3_flows.py -cn inference_unconditional

During inference, we evaluate results using the ProteinMPNN and ESMFold following FrameDiff. The outputs will be saved as follows,

inference_outputs
└── expriment_name                      # Default is date time of inference
    ├── config.yaml                     # Config used during inference
    └── length_100                      # Sampled length 
        ├── sample_0                    # Sample ID for length
        │   ├── noise.pdb               # First sample, i.e., noise
        │   ├── sample.pdb              # Final sample
        │   ├── self_consistency        # Self consistency results        
        │   │   ├── esmf                # ESMFold predictions using ProteinMPNN sequences
        │   │   │   ├── sample_0.pdb
        │   │   │   ├── ...
        │   │   │   └── sample_8.pdb
        │   │   ├── parsed_pdbs.jsonl   # Parsed chains for ProteinMPNN
        │   │   ├── sample.pdb
        │   │   ├── sc_results.csv      # Summary metrics CSV 
        │   │   └── seqs                
        │   │       └── sample.fa       # ProteinMPNN sequences
        │   └── x0_traj_1.pdb           # x_0 model prediction trajectory
        └── sample_1                    # Next sample

Based on this inference_outputs, we can compute Designability, Diversity and Novelty. More evaluation details to reproduce the paper results are here.

📖 Train from Scratch

Data preparation

We train our models on Protein Data Bank (PDB) and SCOPe dataset, seperately. For PDB dataset, we reprocessed from PDB using the steps described in the FrameDiff, and detailed procedure is also available here. We also provide a demo PDB dataset in data folder to help you test or debug. For SCOPe, we directly downloaded using the link provided by FrameFlow. Tha dataset path is set in configs/_datasets.yaml.

QFlow

Similar to inference, you can simply control your training settings using the yaml files in configs. Take training QFlow on PDB dataset as an example, we speicfy the configurations in configs/train_pdb_base.yaml,

data:
  dataset: pdb
  rectify: False 

  sampler:
    # Setting for 80GB GPUs
    max_batch_size: 128
    max_num_res_squared: 1000000
  
  experiment:
    is_training: True
    debug: False
    num_devices: 4
    warm_start: null # keep it null on first stage
    warm_start_cfg_override: True
    training:
      aux_loss_t_pass: 0.50
    wandb:
      name: reqflow_train_pdb_base
      project: reqflow
  checkpointer: # where to save checkpoints
    dirpath: ./ckpts/${experiment.wandb.project}/${experiment.wandb.name}/${now:%Y-%m-%d}_${now:%H-%M-%S}
    save_last: True
    save_top_k: -1

And make sure configs in _datasets.yaml is set following instructions here.

The according training command is

python -W ignore experiments/train_se3_flows.py -cn train_pdb_base

ReQFlow

One of our key contributions is rectifying the SE(3) generation trajectories in Euclidean/Quaternion space to accelerate inference and enhance the designability of the generated protein backbones. We recitify the QFlow model with the generated noise-sample pairs (see noise.pdb and sample.pdb in inference_outputs).

We construct the rectify dataset by converting the generated .pdb files into a compatible format. You can follow instructions here to do it.

Once the rectify dataset is obtained, the training pipeline remains the same as QFlow. The configurations can be found in configs/train_pdb_rectify.yaml, and make sure experiment.warm_start is set to the ckpt you get from first stage training. The command to run it is:

python -W ignore experiments/train_se3_flows.py -cn train_pdb_rectify

The training of SCOPe dataset is the same as PDB dataset.

📌 Citation

If you find this work useful for your research, please consider citing it.

@misc{yue2025reqflowrectifiedquaternionflow,
      title={ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation}, 
      author={Angxiao Yue and Zichong Wang and Hongteng Xu},
      year={2025},
      eprint={2502.14637},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2502.14637}, 
}

👍 Acknowledgments

Thanks to FrameFlow, FrameDiff, FoldFlow for their great work and codebase, which served as the foundation for developing ReQFlow.

📧 Contact Us

If you have any question, please feel free to contact us via [email protected] or [email protected].

About

ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 53.4%
  • Jupyter Notebook 46.0%
  • Shell 0.6%