An intelligent gene pathway annotation system that leverages Large Language Models and multi-agent frameworks to assist Reactome curators in annotating genes and their pathway involvement based on scientific literature.
- Intelligent Literature Mining: Automated PubMed abstract retrieval and analysis
- Multi-Agent Architecture: CrewAI-powered specialist agents for different annotation tasks
- Reactome Integration: Direct integration with Reactome Neo4j database and data models
- Evidence-Based Annotation: Literature-supported pathway predictions with confidence scoring
- Full-Text Analysis: PDF paper processing for deeper information extraction
- REST API: Complete API for programmatic access and integration
- Interactive Chat Interface: Chainlit-powered conversational interface
- GenePathwayAnnotator: Core annotation engine with PubMed integration
- Literature Processing: Automated abstract retrieval, embedding, and similarity scoring
- Pathway Enrichment: Statistical analysis of protein-protein interactions
- Reactome Modeling: Direct pathway instance generation
- ReactomeCurator: Converts structured data to Reactome instances
- LiteratureExtractor: Processes papers and extracts molecular information
- Reviewer: Domain expert validation and quality assessment
- QualityChecker: Technical compliance and consistency validation
- Python 3.10+
- Neo4j database (Reactome instance)
- MongoDB (for PubMed caching)
- OpenAI API access
# Install base dependencies
pip install -r requirements.txt
# Install CrewAI for multi-agent framework
pip install crewai crewai-toolsNote: At the local mac, use the paperqa env, which has installed all dependencies.
Create a .env file:
OPENAI_API_KEY=your_openai_api_key
PUBMED_API_KEY=your_ncbi_api_key
REACTOME_NEO4J_URI=bolt://localhost:7687
REACTOME_NEO4J_USER=neo4j
REACTOME_NEO4J_PWD=your_password
REACTOME_NEO4J_DATABASE=reactome
PUBMED_MONGO_URI=mongodb://localhost:27017
PUBMED_MONGO_DB=pubmed_cache
PUBMED_MONGO_COLLECTION=abstracts
PDF_PAPERS_FOLDER=./data/papersfrom reactome_llm.CrewAILiteratureAnnotator import CrewAILiteratureAnnotator, AnnotationRequest
from reactome_llm.GenePathwayAnnotator import GenePathwayAnnotator
# Initialize
annotator = GenePathwayAnnotator()
crewai = CrewAILiteratureAnnotator(annotator)
# Create annotation request
request = AnnotationRequest(
gene="NTN1",
papers=["25391454", "22982992", "23467207"],
quality_threshold=0.7,
enable_full_text=False
)
# Run multi-agent annotation
result = await crewai.annotate_literature(request)
print(f"Quality score: {result.quality_scores}")from reactome_llm.GenePathwayAnnotator import GenePathwayAnnotator
annotator = GenePathwayAnnotator()
result = await annotator.write_summary_for_gene_annotation("NTN1")Start the server:
flask --app reactome_llm/ReactomeLLMRestAPI run --debugMulti-Agent Annotation:
curl -X POST http://localhost:5000/crewai/annotate \
-H "Content-Type: application/json" \
-d '{
"queryGene": "NTN1",
"numberOfPubmed": 8,
"qualityThreshold": 0.7,
"targetPathways": ["Axon guidance"],
"enableFullText": false
}'Traditional Annotation:
curl -X POST http://localhost:5000/annotate \
-H "Content-Type: application/json" \
-d '{
"queryGene": "NTN1",
"numberOfPubmed": 8,
"cosineSimilarityCutoff": 0.38,
"llmScoreCutoff": 3
}'System Status:
curl http://localhost:5000/crewai/statusThe multi-agent framework provides comprehensive quality metrics:
- Biological Accuracy (0-1): Correctness of molecular mechanisms
- Evidence Support (0-1): Strength of literature backing
- Mechanistic Consistency (0-1): Alignment with known biology
- Integration Quality (0-1): Compatibility with existing data
- Approve: Score β₯ 0.7, no critical issues
- Requires Revision: Score 0.5-0.7, minor issues
- Reject: Score < 0.5, major inaccuracies
Run the validation suite:
python test_crewai_framework.pyRun example workflows:
python examples/crewai_annotation_examples.pyreactome_llm/
βββ CrewAILiteratureAnnotator.py # Main multi-agent orchestrator
βββ ReactomeAgents.py # Specialized agent definitions
βββ ReactomeTasks.py # Task definitions for each phase
βββ ReactomeTools.py # Agent-specific tools
βββ GenePathwayAnnotator.py # Core annotation engine
βββ ReactomeLLMRestAPI.py # REST API with both approaches
βββ ReactomeNeo4jUtils.py # Neo4j database utilities
βββ ReactomePubMed.py # PubMed integration
βββ README_CrewAI.md # Detailed CrewAI documentation
examples/
βββ crewai_annotation_examples.py # Usage examples
test/
βββ test_crewai_framework.py # Validation tests
- Reactome Database: Neo4j graph database with pathway knowledge
- PubMed: Literature abstracts via NCBI E-utilities API
- IntAct: Protein-protein interaction data
- BioGRID: Molecular interaction database
- MongoDB: Local caching of PubMed abstracts
To deploy to production server (curator.reactome.org):
- Zip the reactome_llm folder
- Transfer and unzip on server
- Configure .env with production settings
- Run using the shell script:
./run_llm.shTo stop the application:
ps aux | grep llm
kill <process_id>MongoDB databases are generated locally and migrated:
# Export from local
mongodump --db your_database_name --out /path/to/backup
# Import to server
mongorestore --db your_database_name /path/to/backup/your_database_name- CrewAI Framework Guide - Detailed multi-agent documentation
- API Reference - Complete REST API documentation
- Examples - Usage examples and tutorials
This tool is part of the Reactome project. For contributions:
- Follow existing code patterns
- Add tests for new functionality
- Update documentation
- Ensure compatibility with both single and multi-agent approaches
This project is part of the Reactome curation tools and follows the same licensing terms.