RDEvidence - Rare Disease Knowledge Platform

Comprehensive rare disease knowledge base integrating multiple authoritative databases with AI-powered semantic literature search.

🎯 Overview

RDEvidence integrates data from:

🧬 Orphanet - Expert rare disease database
🧪 MONDO - Disease ontology
📊 HPO - Human Phenotype Ontology
💊 MAxO - Medical Action Ontology
🔬 ClinVar - Genetic variant database
🏥 ClinicalTrials.gov - Clinical trials
📚 PubMed/PubTator3 - Biomedical literature (85,000+ papers)

✨ Features

RAG-Powered Search: Semantic search across 85K+ papers using BioBERT embeddings
Entity Annotation: Automatic highlighting of genes, diseases, chemicals, variants via PubTator3
Multi-Database Integration: Unified search across rare disease resources
Clinical Trials: Find relevant trials by disease, intervention, location
Medical Actions: Evidence-based treatment recommendations
Variant Analysis: ClinVar variant lookup and interpretation

📁 Repository Structure

RDEvidence/
├── frontend/                  # Web interface
│   └── index.html            # Main HTML file (PubTator3 integrated)
├── backend/                   # Python Flask API
│   ├── complete_backend_pubtator.py  # Main backend server
│   └── requirements.txt      # Python dependencies
├── scripts/                   # Database building utilities
│   ├── merge_clinvar_orpha_mondo_hpo_literature.py
│   ├── build_vectordb_from_merged.py
│   ├── test_vectordb.py
│   └── check_rag.py
└── docs/                      # Documentation
    └── setup.md

🚀 Quick Start

Prerequisites

Python 3.11+
8GB+ RAM (for vector database)
2GB+ disk space

Installation

Clone the repository:

git clone https://github.com/wangjl99/RDEvidence.git
cd RDEvidence

Install Python dependencies:

cd backend
pip install -r requirements.txt

Set up the database:

⚠️ Note: The vector database (~500MB-1GB) is not included due to GitHub size limits.

Option A: Contact author for pre-built database

Option B: Build from source data

cd scripts

# Step 1: Merge data sources
python merge_clinvar_orpha_mondo_hpo_literature.py

# Step 2: Build vector database (4-6 hours)
python build_vectordb_from_merged.py

# Step 3: Verify
python test_vectordb.py

Start the backend:

cd backend
python complete_backend_pubtator.py

Backend will run at: http://localhost:5000

Open the frontend:

cd frontend
# Open index.html in your browser
# Or serve with: python -m http.server 8000

🔧 Configuration

Backend Configuration

Update paths in backend/complete_backend_pubtator.py if needed:

# Database paths
CHROMA_DB_PATH = "./literature_vectordb"
DATA_DIR = "./data"

Frontend Configuration

Update API endpoint in frontend/index.html (line ~1380):

// For local development:
const API_BASE = 'http://localhost:5000';

// For production deployment:
const API_BASE = 'https://your-api-url.com';

📊 Database Information

Current database contains:

85,762 papers with BioBERT embeddings
Integrated data from Orphanet, MONDO, HPO, ClinVar
Full PubMed abstracts with PubTator3 annotations

Database is built from:

master_literature_results_mysql_input.tsv (source data, not in repo)
Merged with ClinVar, Orphanet, MONDO, HPO ontologies

🧪 Testing

# Check if vector database is working
python scripts/check_rag.py

# Test backend endpoints
curl http://localhost:5000/diseases?query=Bardet-Biedl

📝 Citation

If you use RDEvidence in your research, please cite:

@software{rdevidence2024,
  title = {RDEvidence: AI-Powered Rare Disease Knowledge Platform},
  author = {Wang, Jing},
  year = {2024},
  url = {https://github.com/wangjl99/RDEvidence}
}

📄 License

MIT License - see LICENSE file for details

🤝 Contributing

Contributions welcome! Please open an issue first to discuss proposed changes.

📧 Contact

GitHub: @wangjl99
Repository: https://github.com/wangjl99/RDEvidence

🙏 Acknowledgments

Orphanet for rare disease data
NCBI for PubMed and PubTator3
ClinVar for variant data
BioBERT team for embedding model
MONDO, HPO, MAxO ontology teams

Last Updated: December 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RDEvidence - Rare Disease Knowledge Platform

🎯 Overview

✨ Features

📁 Repository Structure

🚀 Quick Start

Prerequisites

Installation

🔧 Configuration

Backend Configuration

Frontend Configuration

📊 Database Information

🧪 Testing

📝 Citation

📄 License

🤝 Contributing

📧 Contact

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
backend		backend
frontend		frontend
scripts		scripts
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

RDEvidence - Rare Disease Knowledge Platform

🎯 Overview

✨ Features

📁 Repository Structure

🚀 Quick Start

Prerequisites

Installation

🔧 Configuration

Backend Configuration

Frontend Configuration

📊 Database Information

🧪 Testing

📝 Citation

📄 License

🤝 Contributing

📧 Contact

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages