RAD (Retrieval Augmented Docking)

RAD is a scalable virtual screening library using HNSW graphs and distributed computing. The architecture supports deployment from single machines to HPC clusters using a central coordination service.

Requirements

Redis
Python >=3.11
GCC >= 9.3

Installation

git clone --recursive https://github.com/keiserlab/rad.git
cd rad
pip install .

We also provide a Dockerfile containing all required software.

Architecture Overview

RAD uses a service-oriented design with three main components:

HNSW Service: Handles HNSW neighbor searches and SMILES lookup
Coordination Service: Manages work distribution, acts as HNSW proxy, and maintains state via Redis
Distributed Workers: Lightweight scoring processes that can run anywhere with only Redis access

Running RAD

Basic Workflow

Build HNSW graph from molecular fingerprints
Create SQLite database mapping node keys to SMILES
Define a SMILES-based scoring function
Initialize RAD services and run traversal

Constructing the HNSW

Constructing the HNSW graph consists of setting the construction parameters expansion_add and connectivity and then adding each molecule by providing a numerical key and its fingerprint.

expansion_add controls the number of candidates considered as potential neighbors during element insertion, while connectivity controls how many of these candidates are actually connected to the inserted element.

from usearch.index import Index

hnsw = Index(
    ndim = 1024, # 1024 bit fingerprint
    dtype='b1', # For packed binary fingerprints
    metric='tanimoto',
    connectivity = 8,
    expansion_add = 400
)

fingerprints = ...
keys = np.arange(len(fingerprints))

hnsw.add(keys, fingerprints, log="Building HNSW")

The fingerprints are expected to be an (n x d/8) numpy array where each row is a packed binary fingerprint. e.g turning a 1024-bit binary fingerprint into a 128 uint8 fingerprint with np.packbits(). See the example notebook for more details.

Creating SQLite Database for SMILES mapping

RAD integrates with SQLite to provide SMILES directly to scoring functions:

import sqlite3

# Create database mapping HNSW keys to SMILES
conn = sqlite3.connect('molecules.db')
cursor = conn.cursor()

cursor.execute("""
    CREATE TABLE nodes (
        node_key INTEGER PRIMARY KEY,
        smi TEXT NOT NULL
    )
""")

# Insert SMILES data
for key, smiles in zip(keys, smiles):
    cursor.execute("INSERT INTO nodes (node_key, smi) VALUES (?, ?)", (key, smiles))

cursor.execute("CREATE INDEX idx_nodes_node_key ON nodes(node_key)")
conn.commit()
conn.close()

Defining a SMILES-Based Scoring Function

Scoring functions receive SMILES strings and return a score. Numerically smaller scores are considered better. Here is a mock example:

def score_fn(smiles: str) -> float:
    score = calculate_docking_score(smiles)
    return score  # Lower scores are better

Initializing RAD Services

With the HNSW index, SMILES database, and scoring function ready, initialize the RAD traverser:

from rad.hnsw_service import create_local_hnsw_service
from rad.traverser import RADTraverser

# Create HNSW service with database integration
hnsw_service = create_local_hnsw_service(hnsw, database_path='molecules.db')

# Create traverser with SMILES-based scoring
traverser = RADTraverser(hnsw_service=hnsw_service, scoring_fn=score_fn)

Deployment Modes

Local Deployment (single machine):

traverser = RADTraverser(hnsw_service=hnsw_service, scoring_fn=score_fn)

Distributed Deployment (HPC):

traverser = RADTraverser(
    hnsw_service=hnsw_service, 
    scoring_fn=score_fn,
    redis_host='head-node.cluster',
    redis_port=6379,
    namespace='job_12345'
)

Remote HNSW Service:

from rad.hnsw_service import create_remote_hnsw_service

# Start HNSW server elsewhere OR use the publicly provided server
# python scripts/start_hnsw_server.py --database-path molecules.db --hnsw-path index.usearch --port 8000

hnsw_service = create_remote_hnsw_service("https://rad.docking.org")
traverser = RADTraverser(hnsw_service=hnsw_service, scoring_fn=score_fn)

Priming the RAD Traverser

The traverser is 'primed' by finding and scoring the nodes on the top layer of the HNSW graph and initializing the priority queue. This should only be run once.

traverser.prime()

Performing the traversal

The traversal proceeds until a maximum number of molecules is scored or a timeout is reached:

# Run traversal until 100k molecules are scored
traverser.traverse(n_workers=4, n_to_score=100_000)

# Or run traversal for a specific time
traverser.traverse(n_workers=4, timeout=3600)  # 1 hour

Accessing the results

RAD provides two methods for accessing results:

Traversal Order:

# Get molecules in the order they were discovered
molecules = traverser.get_molecules()  # All molecules
first_100 = traverser.get_molecules(100)  # First 100 molecules

for node_id, score, smiles in molecules:
    print(f"Node {node_id}: {smiles} (score: {score})")

Best Molecules:

# Get top-scoring molecules regardless of discovery order
best_molecules = traverser.get_best_molecules(10)  # Top 10 by score

for node_id, score, smiles in best_molecules:
    print(f"Top hit: {smiles} (score: {score})")

Service Management and Cleanup

Gracefully shutdown all services:

traverser.shutdown()

Advanced Usage

Starting HNSW Server Independently:

# Start dedicated HNSW server with database
python scripts/start_hnsw_server.py \
  --hnsw-path /data/index.usearch \
  --database-path /data/molecules.db \
  --host 0.0.0.0 \
  --port 8000

Example Usage

The examples/ folder contains a Jupyter notebook demonstrating the construction and traversal of the DUDE-Z DOCK HNSW investigated in the original RAD paper.

For a larger billion-scale application and integration with Chemprop see the repo at https://github.com/bwhall61/lsd

References

The original HNSW paper by Yury Malkov and Dmitry Yashunin.

The original RAD paper by Brendan Hall and Michael Keiser.

The lsd.docking.org paper by Brendan Hall, Tia Tummino, et al. shows a billion-scale application and integration with Chemprop ML models.

And then most importantly, the HNSW graph code is built on the usearch library so large thanks to Ash Vardanian for his awesome HNSW library!

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
examples		examples
rad		rad
scripts		scripts
tests		tests
usearch @ 292564d		usearch @ 292564d
.gitlab-ci.yml		.gitlab-ci.yml
.gitmodules		.gitmodules
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
drawing.png		drawing.png
index.html		index.html
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAD (Retrieval Augmented Docking)

Requirements

Installation

Architecture Overview

Running RAD

Basic Workflow

Constructing the HNSW

Creating SQLite Database for SMILES mapping

Defining a SMILES-Based Scoring Function

Initializing RAD Services

Deployment Modes

Priming the RAD Traverser

Performing the traversal

Accessing the results

Service Management and Cleanup

Advanced Usage

Example Usage

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

keiserlab/rad

Folders and files

Latest commit

History

Repository files navigation

RAD (Retrieval Augmented Docking)

Requirements

Installation

Architecture Overview

Running RAD

Basic Workflow

Constructing the HNSW

Creating SQLite Database for SMILES mapping

Defining a SMILES-Based Scoring Function

Initializing RAD Services

Deployment Modes

Priming the RAD Traverser

Performing the traversal

Accessing the results

Service Management and Cleanup

Advanced Usage

Example Usage

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages